客户端 API#
完整地 API 指南: API 指南
使用 Client API,需要先使用以下命令拉起 Xinference 服务:
>>> xinference
2023-10-17 16:32:21,700 xinference 24584 INFO Xinference successfully started. Endpoint: http://127.0.0.1:9997
2023-10-17 16:32:21,700 xinference.core.supervisor 24584 INFO Worker 127.0.0.1:62590 has been added successfully
2023-10-17 16:32:21,701 xinference.deploy.worker 24584 INFO Xinference worker successfully started.
在命令日志里会打印服务地址,上述日志中为 http://127.0.0.1:9997。用户可以通过 Client 连接 Xinference 服务。
所有模型被分为 LLM、embedding、rerank 等类型。后续可能会支持更多类型的模型。
LLM#
列出所有内置支持的 LLM 模型:
>>> xinference registrations -t LLM
Type Name Language Ability Is-built-in
------ ----------------------- ------------ ----------------------------- -------------
LLM baichuan ['en', 'zh'] ['embed', 'generate'] True
LLM baichuan-2 ['en', 'zh'] ['embed', 'generate'] True
LLM baichuan-2-chat ['en', 'zh'] ['embed', 'generate', 'chat'] True
...
初始化一个大语言模型并且与之对话:
Xinference Client#
from xinference.client import Client
client = Client("http://localhost:9997")
# The chatglm2 model has the capabilities of "chat" and "embed".
model_uid = client.launch_model(model_name="glm4-chat",
model_engine="llama.cpp",
model_format="ggufv2",
model_size_in_billions=9,
quantization="Q4_K")
model = client.get_model(model_uid)
messages = [{"role": "user", "content": "What is the largest animal?"}]
# If the model has "generate" capability, then you can call the
# model.generate API.
model.chat(
messages,
generate_config={"max_tokens": 1024}
)
OpenAI Client#
使用 Openai 发送请求时,除了创建模型,其余的请求都保持与 Openai 的接口兼容。Openai 使用方式可以参考 https://platform.openai.com/docs/api-reference/chat?lang=python
import openai
# Assume that the model is already launched.
# The api_key can't be empty, any string is OK.
client = openai.Client(api_key="not empty", base_url="http://localhost:9997/v1")
client.chat.completions.create(
model=model_uid,
messages=[
{
"content": "What is the largest animal?",
"role": "user",
}
],
max_tokens=1024
)
OpenAI 工具调用#
import openai
tools = [
{
"type": "function",
"function": {
"name": "uber_ride",
"description": "Find suitable ride for customers given the location, "
"type of ride, and the amount of time the customer is "
"willing to wait as parameters",
"parameters": {
"type": "object",
"properties": {
"loc": {
"type": "int",
"description": "Location of the starting place of the Uber ride",
},
"type": {
"type": "string",
"enum": ["plus", "comfort", "black"],
"description": "Types of Uber ride user is ordering",
},
"time": {
"type": "int",
"description": "The amount of time in minutes the customer is willing to wait",
},
},
},
},
}
]
# Assume that the model is already launched.
# The api_key can't be empty, any string is OK.
client = openai.Client(api_key="not empty", base_url="http://localhost:9997/v1")
client.chat.completions.create(
model="chatglm3",
messages=[{"role": "user", "content": "Call me an Uber ride type 'Plus' in Berkeley at zipcode 94704 in 10 minutes"}],
tools=tools,
)
输出:
ChatCompletion(id='chatcmpl-ad2f383f-31c7-47d9-87b7-3abe928e629c', choices=[Choice(finish_reason='tool_calls', index=0, message=ChatCompletionMessage(content="```python\ntool_call(loc=94704, type='plus', time=10)\n```", role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_ad2f383f-31c7-47d9-87b7-3abe928e629c', function=Function(arguments='{"loc": 94704, "type": "plus", "time": 10}', name='uber_ride'), type='function')]))], created=1704687803, model='chatglm3', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=-1, prompt_tokens=-1, total_tokens=-1))
Embedding#
列出所有内置支持的 embedding 模型:
>>> xinference registrations -t embedding
Type Name Language Dimensions Is-built-in
--------- ----------------------- ---------- ------------ -------------
embedding bge-base-en ['en'] 768 True
embedding bge-base-en-v1.5 ['en'] 768 True
embedding bge-base-zh ['zh'] 768 True
...
拉起 embedding 模型并使用文本向量化:
Xinference Client#
from xinference.client import Client
client = Client("http://localhost:9997")
# The bge-small-en-v1.5 is an embedding model, so the `model_type` needs to be specified.
model_uid = client.launch_model(model_name="bge-small-en-v1.5", model_type="embedding")
model = client.get_model(model_uid)
input_text = "What is the capital of China?"
model.create_embedding(input_text)
输出:
{'object': 'list',
'model': 'da2a511c-6ccc-11ee-ad07-22c9969c1611-1-0',
'data': [{'index': 0,
'object': 'embedding',
'embedding': [-0.014207549393177032,
-0.01832585781812668,
0.010556723922491074,
...
-0.021243810653686523,
-0.03009396605193615,
0.05420297756791115]}],
'usage': {'prompt_tokens': 37, 'total_tokens': 37}}
OpenAI Client#
使用 Openai 发送请求时,除了创建模型,其余的请求都保持与 Openai 的接口兼容。Openai 使用方式可以参考 https://platform.openai.com/docs/api-reference/embeddings?lang=python
import openai
# Assume that the model is already launched.
# The api_key can't be empty, any string is OK.
client = openai.Client(api_key="not empty", base_url="http://localhost:9997/v1")
client.embeddings.create(model=model_uid, input=["What is the capital of China?"])
输出:
CreateEmbeddingResponse(data=[Embedding(embedding=[-0.014207549393177032, -0.01832585781812668, 0.010556723922491074, ..., -0.021243810653686523, -0.03009396605193615, 0.05420297756791115], index=0, object='embedding')], model='bge-small-en-v1.5-1-0', object='list', usage=Usage(prompt_tokens=37, total_tokens=37))
图片#
列出所有内置的文生图模型:
>>> xinference registrations -t image
Type Name Family Is-built-in
------ ---------------------------- ---------------- -------------
image sd-turbo stable_diffusion True
image sdxl-turbo stable_diffusion True
image stable-diffusion-v1.5 stable_diffusion True
image stable-diffusion-xl-base-1.0 stable_diffusion True
初始化一个文生图模型并通过提示词生成图片:
Xinference Client#
from xinference.client import Client
client = Client("http://localhost:9997")
# The stable-diffusion-v1.5 is an image model, so the `model_type` needs to be specified.
# Additional kwargs can be passed to AutoPipelineForText2Image.from_pretrained here.
model_uid = client.launch_model(model_name="stable-diffusion-v1.5", model_type="image")
model = client.get_model(model_uid)
input_text = "an apple"
model.text_to_image(input_text)
输出:
{'created': 1697536913,
'data': [{'url': '/home/admin/.xinference/image/605d2f545ac74142b8031455af31ee33.jpg',
'b64_json': None}]}
OpenAI Client#
使用 Openai 发送请求时,除了创建模型,其余的请求都保持与 Openai 的接口兼容。Openai 使用方式可以参考 https://platform.openai.com/docs/api-reference/images/create?lang=python
import openai
# Assume that the model is already launched.
# The api_key can't be empty, any string is OK.
client = openai.Client(api_key="not empty", base_url="http://localhost:9997/v1")
client.images.generate(model=model_uid, prompt="an apple")
输出:
ImagesResponse(created=1704445354, data=[Image(b64_json=None, revised_prompt=None, url='/home/admin/.xinference/image/605d2f545ac74142b8031455af31ee33.jpg')])
Audio#
列出所有内置的文生图模型:
>>> xinference registrations -t audio
Type Name Family Multilingual Is-built-in
------ ----------------- -------- -------------- -------------
audio whisper-base whisper True True
audio whisper-base.en whisper False True
audio whisper-large-v3 whisper True True
audio whisper-medium whisper True True
audio whisper-medium.en whisper False True
audio whisper-tiny whisper True True
audio whisper-tiny.en whisper False True
初始化一个语音模型并通过语音生成文字:
Xinference Client#
from xinference.client import Client
client = Client("http://localhost:9997")
model_uid = client.launch_model(model_name="whisper-large-v3", model_type="audio")
model = client.get_model(model_uid)
input_text = "an apple"
with open("audio.mp3", "rb") as audio_file:
model.transcriptions(audio_file.read())
输出:
{
"text": "Imagine the wildest idea that you've ever had, and you're curious about how it might scale to something that's a 100, a 1,000 times bigger. This is a place where you can get to do that."
}
OpenAI Client#
使用 Openai 发送请求时,除了创建模型,其余的请求都保持与 Openai 的接口兼容。Openai 使用方式可以参考 https://platform.openai.com/docs/api-reference/images/create?lang=python
import openai
# Assume that the model is already launched.
# The api_key can't be empty, any string is OK.
client = openai.Client(api_key="not empty", base_url="http://localhost:9997/v1")
with open("audio.mp3", "rb") as audio_file:
completion = client.audio.transcriptions.create(model=model_uid, file=audio_file)
输出:
Translation(text=' This list lists the airlines in Hong Kong.')
Rerank#
拉起 rerank 模型并计算文本相似度:
from xinference.client import Client
client = Client("http://localhost:9997")
model_uid = client.launch_model(model_name="bge-reranker-base", model_type="rerank")
model = client.get_model(model_uid)
query = "A man is eating pasta."
corpus = [
"A man is eating food.",
"A man is eating a piece of bread.",
"The girl is carrying a baby.",
"A man is riding a horse.",
"A woman is playing violin."
]
print(model.rerank(corpus, query))
输出:
{'id': '480dca92-8910-11ee-b76a-c2c8e4cad3f5', 'results': [{'index': 0, 'relevance_score': 0.9999247789382935,
'document': 'A man is eating food.'}, {'index': 1, 'relevance_score': 0.2564932405948639,
'document': 'A man is eating a piece of bread.'}, {'index': 3, 'relevance_score': 3.955026841140352e-05,
'document': 'A man is riding a horse.'}, {'index': 2, 'relevance_score': 3.742107219295576e-05,
'document': 'The girl is carrying a baby.'}, {'index': 4, 'relevance_score': 3.739788007806055e-05,
'document': 'A woman is playing violin.'}]}