Models#

List Models#

You can list all models of a certain type that are available to launch in Xinference:

xinference registrations --model-type <MODEL_TYPE> \
                         [--endpoint "http://<XINFERENCE_HOST>:<XINFERENCE_PORT>"] \

The following MODEL_TYPE is supported by Xinference:

LLM

Text generation models or large language models

Large language Models
embedding

Text embeddings models

Embedding Models
image

Image generation or manipulation models

Image Models
audio

Audio models

Audio Models
rerank

Rerank models

Rerank Models
video

Video models

Video Models

You can see all the built-in models supported by xinference here. If the model you need is not available, Xinference also allows you to register your own custom models.

Launch and Terminate Model#

Each running model instance will be assigned a unique model uid. By default, the model uid is equal to the model name. This unique id can be used as a handle for the further usage. You can manually assign it by passing --model-uid option in the launch command.

You can launch a model in Xinference either via command line or Xinference’s Python client:

xinference launch --model-name <MODEL_NAME> \
                  [--model-engine <MODEL_ENGINE>] \
                  [--model-type <MODEL_TYPE>] \
                  [--model-uid <MODEL_UID>] \
                  [--endpoint "http://<XINFERENCE_HOST>:<XINFERENCE_PORT>"] \

For model type LLM, launching the model requires not only specifying the model name, but also the size of the parameters , the model format and the model engine. Please refer to the list of LLM model families.

The following command gives you the currently running models in Xinference:

xinference list [--endpoint "http://<XINFERENCE_HOST>:<XINFERENCE_PORT>"]

When you no longer need a model that is currently running, you can remove it in the following way to free up the resources it occupies:

xinference terminate --model-uid "<MODEL_UID>" [--endpoint "http://<XINFERENCE_HOST>:<XINFERENCE_PORT>"]

Model Usage#

Chat & Generate

Learn how to chat with LLMs in Xinference.

Chat & Generate
Tools

Learn how to connect LLM with external tools.

Tools
Embeddings

Learn how to create text embeddings in Xinference.

Embeddings
Rerank

Learn how to use rerank models in Xinference.

Rerank
Images

Learn how to generate images with Xinference.

Images
Vision

Learn how to process image with LLMs.

Vision
Audio

Learn how to turn audio into text or text into audio with Xinference.

Audio (Experimental)
Video

Learn how to generate video with Xinference.

Video (Experimental)