Models#
List Models#
You can list all models of a certain type that are available to launch in Xinference:
xinference registrations --model-type <MODEL_TYPE> \
[--endpoint "http://<XINFERENCE_HOST>:<XINFERENCE_PORT>"] \
curl http://<XINFERENCE_HOST>:<XINFERENCE_PORT>/v1/model_registrations/<MODEL_TYPE>
from xinference.client import Client
client = Client("http://<XINFERENCE_HOST>:<XINFERENCE_PORT>")
print(client.list_model_registrations(model_type='<MODEL_TYPE>'))
The following MODEL_TYPE
is supported by Xinference:
Text generation models or large language models
Text embeddings models
Image generation or manipulation models
Audio models
Rerank models
Video models
You can see all the built-in models supported by xinference here. If the model you need is not available, Xinference also allows you to register your own custom models.
Launch and Terminate Model#
Each running model instance will be assigned a unique model uid. By default, the model uid is equal to the model name.
This unique id can be used as a handle for the further usage. You can manually assign it by passing --model-uid
option
in the launch command.
You can launch a model in Xinference either via command line or Xinference’s Python client:
xinference launch --model-name <MODEL_NAME> \
[--model-engine <MODEL_ENGINE>] \
[--model-type <MODEL_TYPE>] \
[--model-uid <MODEL_UID>] \
[--endpoint "http://<XINFERENCE_HOST>:<XINFERENCE_PORT>"] \
from xinference.client import Client
client = Client("http://<XINFERENCE_HOST>:<XINFERENCE_PORT>")
model_uid = client.launch_model(
model_name="<MODEL_NAME>",
model_engine="<MODEL_ENGINE>",
model_type="<MODEL_TYPE>"
model_uid="<MODEL_UID>"
)
print(model_uid)
For model type LLM
, launching the model requires not only specifying the model name, but also the size of the parameters
, the model format and the model engine. Please refer to the list of LLM model families.
The following command gives you the currently running models in Xinference:
xinference list [--endpoint "http://<XINFERENCE_HOST>:<XINFERENCE_PORT>"]
curl http://<XINFERENCE_HOST>:<XINFERENCE_PORT>/v1/models
from xinference.client import Client
client = Client("http://<XINFERENCE_HOST>:<XINFERENCE_PORT>")
print(client.list_models())
When you no longer need a model that is currently running, you can remove it in the following way to free up the resources it occupies:
xinference terminate --model-uid "<MODEL_UID>" [--endpoint "http://<XINFERENCE_HOST>:<XINFERENCE_PORT>"]
curl -X DELETE http://<XINFERENCE_HOST>:<XINFERENCE_PORT>/v1/models/<MODEL_UID>
from xinference.client import Client
client = Client("http://<XINFERENCE_HOST>:<XINFERENCE_PORT>")
client.terminate_model(model_uid="<MODEL_UID>")
Model Usage#
Learn how to chat with LLMs in Xinference.
Learn how to connect LLM with external tools.
Learn how to create text embeddings in Xinference.
Learn how to use rerank models in Xinference.
Learn how to generate images with Xinference.
Learn how to process image with LLMs.
Learn how to turn audio into text or text into audio with Xinference.
Learn how to generate video with Xinference.