Welcome to Xinference!#

Xorbits Inference (Xinference) is an open-source platform to streamline the operation and integration of a wide array of AI models. With Xinference, you’re empowered to run inference using any open-source LLMs, embedding models, and multimodal models either in the cloud or on your own premises, and create robust AI-driven applications.

Developing Real-world AI Applications with Xinference#

from xinference.client import Client

client = Client("http://localhost:9997")
model = client.get_model("MODEL_UID")

# Chat to LLM
   prompt="What is the largest animal?",
   system_prompt="You are a helpful assistant",
   generate_config={"max_tokens": 1024}

# Chat to VL model
        "role": "user",
        "content": [
           {"type": "text", "text": "What’s in this image?"},
              "type": "image_url",
              "image_url": {
                 "url": "http://i.epochtimes.com/assets/uploads/2020/07/shutterstock_675595789-600x400.jpg",
  generate_config={"max_tokens": 1024}

Getting Started#

Install Xinference

Install Xinference on Linux, Windows, and macOS.

Try it out!

Start by running Xinference on a local machine.

Explore models

Explore a wide range of models supported by Xinference.

Register your own model

Register model weights and turn it into an API.

Explore the API#

Chat & Generate

Learn how to chat with LLMs in Xinference.


Learn how to connect LLM with external tools.


Learn how to create text embeddings in Xinference.


Learn how to use rerank models in Xinference.


Learn how to generate images with Xinference.


Learn how to process image with LLMs.


Learn how to turn audio into text or text into audio with Xinference.

Getting Involved#

Get Latest News

Follow us on Twitter

Read our blogs

Get Support

Find community on WeChat

Find community on Slack

Open an issue

Contribute to Xinference

Create a pull request