Models#

List Models#

You can list all models of a certain type that are available to launch in Xinference:

xinference registrations --model-type <MODEL_TYPE> \
                         [--endpoint "http://<XINFERENCE_HOST>:<XINFERENCE_PORT>"] \

curl http://<XINFERENCE_HOST>:<XINFERENCE_PORT>/v1/model_registrations/<MODEL_TYPE>

from xinference.client import Client
client = Client("http://<XINFERENCE_HOST>:<XINFERENCE_PORT>")
print(client.list_model_registrations(model_type='<MODEL_TYPE>'))

The following MODEL_TYPE is supported by Xinference:

LLM

Text generation models or large language models

embedding

Text embeddings models

image

Image generation or manipulation models

audio

Audio models

rerank

Rerank models

You can see all the built-in models supported by xinference here. If the model you need is not available, Xinference also allows you to register your own custom models.

Launch and Terminate Model#

Each running model instance will be assigned a unique model uid. By default, the model uid is equal to the model name. This unique id can be used as a handle for the further usage. You can manually assign it by passing --model-uid option in the launch command.

You can launch a model in Xinference either via command line or Xinference’s Python client:

xinference launch --model-name <MODEL_NAME> \
                  [--model-type <MODEL_TYPE>] \
                  [--model-uid <MODEL_UID>] \
                  [--endpoint "http://<XINFERENCE_HOST>:<XINFERENCE_PORT>"] \

from xinference.client import Client

client = Client("http://<XINFERENCE_HOST>:<XINFERENCE_PORT>")
model_uid = client.launch_model(
  model_name="<MODEL_NAME>",
  model_type="<MODEL_TYPE>"
  model_uid="<MODEL_UID>"
)
print(model_uid)

For model type LLM, launching the model requires not only specifying the model name, but also the size of the parameters and the model format. Please refer to the list of LLM model families.

The following command gives you the currently running models in Xinference:

xinference list [--endpoint "http://<XINFERENCE_HOST>:<XINFERENCE_PORT>"]

curl http://<XINFERENCE_HOST>:<XINFERENCE_PORT>/v1/models

from xinference.client import Client

client = Client("http://<XINFERENCE_HOST>:<XINFERENCE_PORT>")
print(client.list_models())

When you no longer need a model that is currently running, you can remove it in the following way to free up the resources it occupies:

xinference terminate --model-uid "<MODEL_UID>" [--endpoint "http://<XINFERENCE_HOST>:<XINFERENCE_PORT>"]

curl -X DELETE http://<XINFERENCE_HOST>:<XINFERENCE_PORT>/v1/models/<MODEL_UID>

from xinference.client import Client

client = Client("http://<XINFERENCE_HOST>:<XINFERENCE_PORT>")
client.terminate_model(model_uid="<MODEL_UID>")

Model Usage#

Chat & Generate

Learn how to chat with LLMs in Xinference.

Tools

Learn how to connect LLM with external tools.

Embeddings

Learn how to create text embeddings in Xinference.

Rerank

Learn how to use rerank models in Xinference.

Images

Learn how to generate images with Xinference.

Vision

Learn how to process image with LLMs.

Audio

Learn how to turn audio into text or text into audio with Xinference.