Xinference on Kubernetes#

Helm Support#

Xinference provides a method for installation in a Kubernetes cluster via Helm .

Prerequisites#

  • You have a fully functional Kubernetes cluster.

  • Enable GPU support in Kubernetes, refer to here.

  • Helm is correctly installed.

Steps#

  1. Add xinference helm repo.

    helm repo add xinference https://xorbitsai.github.io/xinference-helm-charts
    
  2. Update xinference helm repo indexes and query versions.

    helm repo update xinference
    helm search repo xinference/xinference --devel --versions
    
  3. Install

    helm install xinference xinference/xinference -n xinference --version <helm_charts_version>
    

Customized Installation#

The installation method mentioned above sets up a Xinference cluster similar to a single-machine setup, with only one worker and all startup parameters at their default values. However, this is usually not the desired setup.

Below are some common custom installation configurations.

  1. I need to download models from ModelScope.

    helm install xinference xinference/xinference -n xinference --version <helm_charts_version> --set config.model_src="modelscope"
    
  2. I want to use cpu image of xinference (or use any other version of xinference images).

    helm install xinference xinference/xinference -n xinference --version <helm_charts_version> --set config.xinference_image="<xinference_docker_image>"
    
  3. I want to have 4 Xinference workers, with each worker managing 4 GPUs.

    helm install xinference xinference/xinference -n xinference --version <helm_charts_version> --set config.worker_num=4 --set config.gpu_per_worker="4"
    

The above installation method is based on Helm --set option. For more complex custom installations, such as multiple workers with shared storage, it is highly recommended to use your own values.yaml file with Helm -f option for installation.

The default values.yaml file is located here. Some examples can be found here.

KubeBlocks Support#

You can also install Xinference in Kubernetes using the third-party KubeBlocks. This method is not maintained by Xinference and does not guarantee timely updates or availability. Please refer to the documentation at here.