Models#
List Models#
You can list all models of a certain type that are available to launch in Xinference:
xinference registrations --model-type <MODEL_TYPE> \
[--endpoint "http://<XINFERENCE_HOST>:<XINFERENCE_PORT>"] \
curl http://<XINFERENCE_HOST>:<XINFERENCE_PORT>/v1/model_registrations/<MODEL_TYPE>
from xinference.client import Client
client = Client("http://<XINFERENCE_HOST>:<XINFERENCE_PORT>")
print(client.list_model_registrations(model_type='<MODEL_TYPE>'))
The following MODEL_TYPE
is supported by Xinference:
You can see all the built-in models supported by xinference here. If the model you need is not available, Xinference also allows you to register your own custom models.
Launch and Terminate Model#
Each running model instance will be assigned a unique model uid. By default, the model uid is equal to the model name.
This unique id can be used as a handle for the further usage. You can manually assign it by passing --model-uid
option
in the launch command.
You can launch a model in Xinference either via command line or Xinference’s Python client:
xinference launch --model-name <MODEL_NAME> \
[--model-type <MODEL_TYPE>] \
[--model-uid <MODEL_UID>] \
[--endpoint "http://<XINFERENCE_HOST>:<XINFERENCE_PORT>"] \
from xinference.client import Client
client = Client("http://<XINFERENCE_HOST>:<XINFERENCE_PORT>")
model_uid = client.launch_model(
model_name="<MODEL_NAME>",
model_type="<MODEL_TYPE>"
model_uid="<MODEL_UID>"
)
print(model_uid)
For model type LLM
, launching the model requires not only specifying the model name, but also the size of the parameters
and the model format. Please refer to the list of LLM model families.
The following command gives you the currently running models in Xinference:
xinference list [--endpoint "http://<XINFERENCE_HOST>:<XINFERENCE_PORT>"]
curl http://<XINFERENCE_HOST>:<XINFERENCE_PORT>/v1/models
from xinference.client import Client
client = Client("http://<XINFERENCE_HOST>:<XINFERENCE_PORT>")
print(client.list_models())
When you no longer need a model that is currently running, you can remove it in the following way to free up the resources it occupies:
xinference terminate --model-uid "<MODEL_UID>" [--endpoint "http://<XINFERENCE_HOST>:<XINFERENCE_PORT>"]
curl -X DELETE http://<XINFERENCE_HOST>:<XINFERENCE_PORT>/v1/models/<MODEL_UID>
from xinference.client import Client
client = Client("http://<XINFERENCE_HOST>:<XINFERENCE_PORT>")
client.terminate_model(model_uid="<MODEL_UID>")