Model Virtual Environments#

Added in version v1.5.0.

Background#

Some models are no longer maintained after their release, and the versions of the libraries they depend on remain outdated. For example, the GOT-OCR2 model still relies on transformers version 4.37.2. If this library is updated to a newer version, the model can no longer function properly. On the other hand, many newer models require the latest version of transformers. This version mismatch leads to dependency conflicts.

Solution#

To address this issue, we have introduced the Model Virtual Environment feature.

Install requirements for this functionality via

# all
pip install 'xinference[all]'
# or virtualenv
pip install 'xinference[virtualenv]'

Enable by setting environment variable XINFERENCE_ENABLE_VIRTUAL_ENV=1.

Example usage:

# For command line
XINFERENCE_ENABLE_VIRTUAL_ENV=1 xinference-local ...

# For Docker
docker run -e XINFERENCE_ENABLE_VIRTUAL_ENV=1 ...

Warning

This feature requires internet access or a self-hosted PyPI mirror.

Xinference will by default inherit the config for current pip.

Note

Note: When launching a vLLM/SgLang engine model inside a virtual environment, if you encounter a cuDNN error, you can set:

export LD_LIBRARY_PATH=$CONDA_PREFIX/lib/python3.12/site-packages/nvidia/cudnn/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$CONDA_PREFIX/lib/python3.12/site-packages/nvidia/cusparselt/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$CONDA_PREFIX/lib/python3.12/site-packages/nvidia/nccl/lib:$LD_LIBRARY_PATH

Changed in version v2.0.0: Starting from Xinference v2.0, the model virtual environment feature is enabled by default (i.e., XINFERENCE_ENABLE_VIRTUAL_ENV defaults to 1).

To disable it globally, set XINFERENCE_ENABLE_VIRTUAL_ENV=0 when starting Xinference.

When enabled, Xinference will automatically create a dedicated virtual environment for each model when it is loaded, and install its specific dependencies there. This prevents dependency conflicts between models, allowing them to run in isolation without affecting one another.

Using Virtual Environments (v2.0)#

Global toggle#

Virtual environments are enabled by default starting from v2.0. You can still override this globally:

# Enable globally (default)
XINFERENCE_ENABLE_VIRTUAL_ENV=1 xinference-local -H 0.0.0.0 -p 9997

# Disable globally
XINFERENCE_ENABLE_VIRTUAL_ENV=0 xinference-local -H 0.0.0.0 -p 9997

Per-model override at launch time#

You can override the global setting when launching a model:

# Force enable for this model
xinference launch -n qwen2.5-instruct --model-engine transformers --enable-virtual-env

# Force disable for this model
xinference launch -n qwen2.5-instruct --model-engine transformers --disable-virtual-env

Add or override packages at launch time#

Use --virtual-env-package (or -vp) multiple times:

xinference launch -n qwen2.5-instruct --model-engine transformers \
  --virtual-env-package transformers==4.46.3 \
  --virtual-env-package accelerate==0.33.0

If you specify a package that already exists in the model’s default virtualenv package list, your version replaces the default instead of being appended.

Storage Location#

By default, the model’s virtual environment is stored under path:

Before v1.6.0: XINFERENCE_HOME / virtualenv / {model_name}
From v1.6.0 to v1.13.0: XINFERENCE_HOME / virtualenv / v2 / {model_name}
Since v1.14.0: XINFERENCE_HOME / virtualenv / v3 / {model_name} / {python_version}
Since v2.0: XINFERENCE_HOME / virtualenv / v4 / {model_name} / {model_engine} / {python_version}

Skip Installed Libraries#

Added in version v1.8.1: This feature requires xoscar >= 0.7.12, which is the minimum Xoscar version required for Xinference v1.8.1.

xinference uses the uv tool to create virtual environments, with the current Python system site-packages set as the base environment. By default, uv does not check for existing packages in the system environment and reinstalls all dependencies in the virtual environment. This ensures better isolation from system packages but can result in redundant installations, longer setup times, and increased disk usage.

Starting from v1.8.1, an experimental feature is available: by setting the environment variable XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED=1, uv will skip packages already available in system site-packages.

Changed in version v2.0: This feature is enabled by default in v2.0. To disable it, set XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED=0.

Advantages#

Avoid redundant installations of large dependencies (e.g., torch + CUDA).
Speed up virtual environment creation.
Reduce disk usage.

Usage#

# Enable experimental feature

# For command line
XINFERENCE_ENABLE_VIRTUAL_ENV=1 XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED=1 xinference-local ...
# For docker
docker run -e XINFERENCE_ENABLE_VIRTUAL_ENV=1 -e XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED=1 ...

Performance Comparison#

Using the CosyVoice 0.5B model as an example:

Without this feature enabled:

Installed 98 packages in 187ms
 + aiohappyeyeballs==2.6.1
 + aiohttp==3.12.13
 ...
 + torch==2.7.1
 ...
 + yarl==1.20.1
 + zipp==3.23.0

With this feature enabled:

Installed 7 packages in 12ms
 + diffusers==0.29.0
 + hf-xet==1.1.5
 + huggingface-hub==0.33.2
 + importlib-metadata==8.7.0
 + pillow==11.3.0
 + typing-extensions==4.14.0
 + urllib3==2.5.0

Model Launching: Toggle Virtual Environments and Customize Dependencies#

Added in version v1.8.1.

Starting from v1.8.1, we support toggling the virtual environment for individual model launching, as well as overriding the model’s default settings with custom package dependencies.

Toggle Virtual Environment#

When loading a model, you can specify whether to enable the model’s virtual environment. If not specified, the setting will follow the environment variable configuration.

For the Web UI, this can be toggled on or off through the optional settings switch.

For command-line loading, use the --enable-virtual-env option to enable the virtual environment, or --disable-virtual-env to disable it.

Example usage:

xinference launch xxx --enable-virtual-env

Set Virtual Environment Package Dependencies#

For supported models, Xinference has already defined the package dependencies and version requirements within the virtual environment. However, if you need to specify particular versions or install additional dependencies, you can manually provide them during model loading.

In the Web UI, you can add custom dependencies by clicking the plus icon in the same location as the virtual environment toggle.

For the command line, use --virtual-env-package or -vp to specify a single package version.

Example usage:

xinference launch xxx --virtual-env-package transformers==4.54.0

In addition to the standard way of specifying package dependencies, such as transformers==xxx, Xinference also supports some extended syntax.

#system_xxx#: Using the same version as the system site packages, such as #system_numpy#, ensures that the installed package matches the system site package version of numpy. This helps prevent dependency conflicts.

Manage Virtual Enviroments#

Added in version v1.14.0.

Xinference provides comprehensive virtual environment management for model dependencies, allowing you to create isolated Python environments for each model with specific package requirements.

Key Features#

Multiple Python Version Support: Each model can have virtual environments with different Python versions (e.g., Python 3.10.18, 3.11.5), enabling compatibility with various model requirements.

Isolated Dependencies: Each virtual environment contains its own set of packages, preventing conflicts between different models’ requirements.

Management Operations#

Listing Virtual Environments: View all virtual environments across your cluster, filtered by model name or worker IP address.

Creating Environments: Automatically created when launching models with enable_virtual_env=true. The system detects your current Python version and creates an isolated environment with the required packages.

Removing Environments: Delete specific virtual environments by model name and optionally Python version, or remove all environments for a model.

ModelHub JSON for Xinference Models#

If you plan to add a model to a model hub for Xinference, define a virtualenv block in the model JSON. Starting from v2.0 (v4 flow), engine-aware markers are recommended so one JSON can cover multiple engines.

Important rule: If a new model supports a specific engine, you must include at least one package entry for that engine in virtualenv.packages and attach a marker, for example #engine# == "vllm". Engine availability checks rely on these markers when virtual environments are enabled.

{
  "virtualenv": {
    "packages": [
      "#transformers_dependencies# ; #engine# == \"transformers\"",
      "#vllm_dependencies# ; #engine# == \"vllm\"",
      "#sglang_dependencies# ; #engine# == \"sglang\"",
      "#llama_cpp_dependencies# ; #engine# == \"llama.cpp\"",
      "#mlx_dependencies# ; #engine# == \"mlx\"",
      "#system_numpy# ; #engine# == \"vllm\""
    ]
  }
}

packages (required): list of pip requirement strings or markers.
inherit_pip_config (default true): inherit system pip configuration if present.
index_url / extra_index_url / find_links / trusted_host: pip index and mirror controls.
index_strategy: passed through to the virtualenv installer (used by some engines).
no_build_isolation: pip build isolation switch for tricky builds.

Use wrapped placeholders to inject engine defaults:

#vllm_dependencies#
#sglang_dependencies#
#mlx_dependencies#
#transformers_dependencies#
#llama_cpp_dependencies#
#diffusers_dependencies#
#sentence_transformers_dependencies#

Markers use #engine# or #model_engine# comparisons (case-sensitive). Engine values are passed in lowercase internally, so prefer lowercase values, for example #engine# == "vllm" or #engine# == "transformers".