Skip to main content

Ctrl+K

Getting Started
Models
User Guide

GitHub
Telegram
Discord
Twitter

Getting Started
Models
User Guide
API Reference
Development
Official Site

GitHub
Telegram
Discord
Twitter

Section Navigation

Backends
Client API
Authentication System (database-backed)
OIDC Single Sign-On
Audit Logging and Security
Model Launching Instructions
Metrics
Distributed Inference
Continuous Batching
Xavier: Share KV Cache between vllm replicas

User Guide

User Guide#

Backends
- llama.cpp
- transformers
- vLLM
- SGLang
- MLX
Client API
- LLM
- Embedding
- Image
- Audio
- Rerank
Authentication System (database-backed)
OIDC Single Sign-On
Audit Logging and Security
- Audit logging
- Brute-force protection
Model Launching Instructions
Metrics
- Supervisor Metrics
- Worker Metrics
Distributed Inference
- Supported Engines
- Usage
Continuous Batching
Xavier: Share KV Cache between vllm replicas
- Usage
- Limitations

previous

Model Memory Calculation

next

Backends

© Copyright 2026, XINFERENCE HOLDINGS PTE. LTD..

Created using Sphinx 8.1.3.

Built with the PyData Sphinx Theme 0.19.0.