Getting Started#
- Installation
- Using Xinference
- Logging in Xinference
- Xinference Docker Image
- Xinference on Kubernetes
- Troubleshooting
- No huggingface repo access
- Incompatibility Between NVIDIA Driver and PyTorch Version
- Xinference service cannot be accessed from external systems through
<IP>:9997 - Launching a built-in model takes a long time, and sometimes the model fails to download
- When using the official Docker image, RayWorkerVllm died due to OOM, causing the model to fail to load
- Missing
model_engineparameter when launching LLM models - Resolving MKL Threading Layer Conflicts
- Configuring PyPI Mirrors to Speed Up Package Installation
- Installing Xinference 1.12.0 with uv Fails (As of November 2025)
- vLLM + Torch + Xinference Compatibility Issue (Segmentation Fault)
- Environments Variables
- XINFERENCE_ENDPOINT
- XINFERENCE_MODEL_SRC
- XINFERENCE_HOME
- XINFERENCE_HEALTH_CHECK_FAILURE_THRESHOLD
- XINFERENCE_HEALTH_CHECK_INTERVAL
- XINFERENCE_HEALTH_CHECK_TIMEOUT
- XINFERENCE_DISABLE_HEALTH_CHECK
- XINFERENCE_DISABLE_METRICS
- XINFERENCE_DOWNLOAD_MAX_ATTEMPTS
- XINFERENCE_TEXT_TO_IMAGE_BATCHING_SIZE
- XINFERENCE_SSE_PING_ATTEMPTS_SECONDS
- XINFERENCE_MAX_TOKENS
- XINFERENCE_ALLOWED_IPS
- XINFERENCE_BATCH_SIZE
- XINFERENCE_BATCH_INTERVAL
- XINFERENCE_ALLOW_MULTI_REPLICA_PER_GPU
- XINFERENCE_LAUNCH_STRATEGY
- XINFERENCE_ENABLE_VIRTUAL_ENV
- XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED
- XINFERENCE_CSG_TOKEN
- XINFERENCE_CSG_ENDPOINT
- XINFERENCE_QWEN3_RERANK_TEMPLATE
- Release Notes