Skip to main content
Ctrl+K

Xinference

  • Getting Started
  • Models
  • User Guide
  • Examples
  • API Reference
  • Development
    • Official Site
  • GitHub
  • Discord
  • Twitter
  • Getting Started
  • Models
  • User Guide
  • Examples
  • API Reference
  • Development
  • Official Site
  • GitHub
  • Discord
  • Twitter

Section Navigation

  • Backends
  • Client API
  • Simple OAuth2 System (experimental)
  • Model Launching Instructions
  • Metrics
  • Distributed Inference
  • Continuous Batching
  • Xavier: Share KV Cache between vllm replicas
  • User Guide

User Guide#

  • Backends
    • llama.cpp
    • transformers
    • vLLM
    • SGLang
    • MLX
  • Client API
    • LLM
    • Embedding
    • Image
    • Audio
    • Rerank
  • Simple OAuth2 System (experimental)
    • Permissions
    • Startup
    • Usage
    • Http Status Code
    • Note
  • Model Launching Instructions
    • Replica
    • Set Environment Variables
    • Configuring Model Virtual Environment
  • Metrics
    • Supervisor Metrics
    • Worker Metrics
  • Distributed Inference
    • Supported Engines
    • Usage
  • Continuous Batching
    • Usage
    • Abort your request
    • Note
  • Xavier: Share KV Cache between vllm replicas
    • Usage
    • Limitations

previous

Model Memory Calculation

next

Backends

This Page

  • Show Source

© Copyright 2025, Xorbits Inc..

Created using Sphinx 7.4.7.

Built with the PyData Sphinx Theme 0.16.1.