Environments Variables#
XINFERENCE_ENDPOINT#
Endpoint of Xinference, used to connect to Xinference service. Default value is http://127.0.0.1:9997 , you can get it through logs.
XINFERENCE_MODEL_SRC#
Modelhub used for downloading models. Default is “huggingface”, or you can set “modelscope” as downloading source.
XINFERENCE_HOME#
By default, Xinference uses <HOME>/.xinference as home path to store
necessary files such as logs and models, where <HOME> is the home
path of current user. You can change this directory by configuring this environment
variable.
XINFERENCE_HEALTH_CHECK_FAILURE_THRESHOLD#
The maximum number of failed health checks tolerated at Xinference startup. Default value is 5.
XINFERENCE_HEALTH_CHECK_INTERVAL#
Health check interval (seconds) at Xinference startup. Default value is 5.
XINFERENCE_HEALTH_CHECK_TIMEOUT#
Health check timeout (seconds) at Xinference startup. Default value is 10.
XINFERENCE_DISABLE_HEALTH_CHECK#
Xinference will automatically report health check at Xinference startup. Setting this environment to 1 can disable health check.
XINFERENCE_DISABLE_METRICS#
Xinference will by default enable the metrics exporter on the supervisor and worker. Setting this environment to 1 will disable the /metrics endpoint on the supervisor and the HTTP service (only provide the /metrics endpoint) on the worker.
XINFERENCE_DOWNLOAD_MAX_ATTEMPTS#
Maximum download retry attempts for model files. Default value is 3.
XINFERENCE_TEXT_TO_IMAGE_BATCHING_SIZE#
Enable continuous batching for text-to-image models by specifying the target image size
(e.g., 1024*1024). Default is unset.
XINFERENCE_SSE_PING_ATTEMPTS_SECONDS#
Server-Sent Events keepalive ping interval (seconds). Default value is 600.
XINFERENCE_MAX_TOKENS#
Global max tokens limit override for requests. Default is unset.
XINFERENCE_ALLOWED_IPS#
Restrict access to specified IPs or CIDR blocks. Default is unset (no restriction).
XINFERENCE_BATCH_SIZE#
Default batch size used by the server when batching is enabled. Default value is 32.
XINFERENCE_BATCH_INTERVAL#
Default batching interval (seconds). Default value is 0.003.
XINFERENCE_ALLOW_MULTI_REPLICA_PER_GPU#
Whether to allow multiple replicas on a single GPU. Default value is 1 (enabled).
XINFERENCE_LAUNCH_STRATEGY#
GPU allocation strategy for replicas. Default is IDLE_FIRST_LAUNCH_STRATEGY.
XINFERENCE_ENABLE_VIRTUAL_ENV#
Enable model virtual environments globally. Default value is 1 (enabled, starting from v2.0).
XINFERENCE_VIRTUAL_ENV_SKIP_INSTALLED#
Skip packages already present in system site-packages when creating virtual environments. Default value is 1.
XINFERENCE_CSG_TOKEN#
Authentication token for CSGHub model source. Default is unset.
XINFERENCE_CSG_ENDPOINT#
CSGHub endpoint for model source.
Default value is https://hub-stg.opencsg.com/.