User Guide# Backends llama.cpp transformers vLLM SGLang MLX Client API LLM Embedding Image Audio Rerank Simple OAuth2 System (experimental) Permissions Startup Usage Http Status Code Note Model Launching Instructions Replica GPU Allocation Strategy Set Environment Variables Configuring Model Virtual Environment Batching / Continuous Batching Thinking Mode Metrics Supervisor Metrics Worker Metrics Distributed Inference Supported Engines Usage Continuous Batching Usage Abort your request Note Xavier: Share KV Cache between vllm replicas Usage Limitations