客户端 API#

完整地 API 指南： API 指南

使用 Client API，需要先使用以下命令拉起 Xinference 服务：

>>> xinference
2023-10-17 16:32:21,700 xinference   24584 INFO     Xinference successfully started. Endpoint: http://127.0.0.1:9997
2023-10-17 16:32:21,700 xinference.core.supervisor 24584 INFO     Worker 127.0.0.1:62590 has been added successfully
2023-10-17 16:32:21,701 xinference.deploy.worker 24584 INFO     Xinference worker successfully started.

在命令日志里会打印服务地址，上述日志中为 http://127.0.0.1:9997。用户可以通过 Client 连接 Xinference 服务。

所有模型被分为 LLM、embedding、rerank 等类型。后续可能会支持更多类型的模型。

LLM#

列出所有内置支持的 LLM 模型：

>>> xinference registrations -t LLM

Type    Name                     Language      Ability                        Is-built-in
------  -----------------------  ------------  -----------------------------  -------------
LLM     baichuan                 ['en', 'zh']  ['embed', 'generate']          True
LLM     baichuan-2               ['en', 'zh']  ['embed', 'generate']          True
LLM     baichuan-2-chat          ['en', 'zh']  ['embed', 'generate', 'chat']  True
...

初始化一个大语言模型并且与之对话：

Xinference Client#

from xinference.client import Client

client = Client("http://localhost:9997")
# The chatglm2 model has the capabilities of "chat" and "embed".
model_uid = client.launch_model(model_name="glm4-chat",
                                model_engine="llama.cpp",
                                model_format="ggufv2",
                                model_size_in_billions=9,
                                quantization="Q4_K")
model = client.get_model(model_uid)

messages = [{"role": "user", "content": "What is the largest animal?"}]
# If the model has "generate" capability, then you can call the
# model.generate API.
model.chat(
    messages,
    generate_config={"max_tokens": 1024}
)

OpenAI Client#

使用 Openai 发送请求时，除了创建模型，其余的请求都保持与 Openai 的接口兼容。Openai 使用方式可以参考 https://platform.openai.com/docs/api-reference/chat?lang=python

import openai

# Assume that the model is already launched.
# The api_key can't be empty, any string is OK.
client = openai.Client(api_key="not empty", base_url="http://localhost:9997/v1")
client.chat.completions.create(
    model=model_uid,
    messages=[
        {
            "content": "What is the largest animal?",
            "role": "user",
        }
    ],
    max_tokens=1024
)

OpenAI 工具调用#

import openai

tools = [
    {
        "type": "function",
        "function": {
            "name": "uber_ride",
            "description": "Find suitable ride for customers given the location, "
            "type of ride, and the amount of time the customer is "
            "willing to wait as parameters",
            "parameters": {
                "type": "object",
                "properties": {
                    "loc": {
                        "type": "int",
                        "description": "Location of the starting place of the Uber ride",
                    },
                    "type": {
                        "type": "string",
                        "enum": ["plus", "comfort", "black"],
                        "description": "Types of Uber ride user is ordering",
                    },
                    "time": {
                        "type": "int",
                        "description": "The amount of time in minutes the customer is willing to wait",
                    },
                },
            },
        },
    }
]

# Assume that the model is already launched.
# The api_key can't be empty, any string is OK.
client = openai.Client(api_key="not empty", base_url="http://localhost:9997/v1")
client.chat.completions.create(
    model="chatglm3",
    messages=[{"role": "user", "content": "Call me an Uber ride type 'Plus' in Berkeley at zipcode 94704 in 10 minutes"}],
    tools=tools,
)

输出：

ChatCompletion(id='chatcmpl-ad2f383f-31c7-47d9-87b7-3abe928e629c', choices=[Choice(finish_reason='tool_calls', index=0, message=ChatCompletionMessage(content="```python\ntool_call(loc=94704, type='plus', time=10)\n```", role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_ad2f383f-31c7-47d9-87b7-3abe928e629c', function=Function(arguments='{"loc": 94704, "type": "plus", "time": 10}', name='uber_ride'), type='function')]))], created=1704687803, model='chatglm3', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=-1, prompt_tokens=-1, total_tokens=-1))

Anthropic Client#

Anthropic API's access address is: /anthropic/v1/messages

import anthropic

client = anthropic.Anthropic(
    # defaults to os.environ.get("ANTHROPIC_API_KEY")
    base_url="http://localhost:9997/anthropic",
)
message = client.messages.create(
    model="qwen3",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, Claude"}
    ]
)
print(message.content)

Embedding#

列出所有内置支持的 embedding 模型：

>>> xinference registrations -t embedding

Type       Name                     Language      Dimensions  Is-built-in
---------  -----------------------  ----------  ------------  -------------
embedding  bge-base-en              ['en']               768  True
embedding  bge-base-en-v1.5         ['en']               768  True
embedding  bge-base-zh              ['zh']               768  True
...

拉起 embedding 模型并使用文本向量化：

Xinference Client#

from xinference.client import Client

client = Client("http://localhost:9997")
# The bge-small-en-v1.5 is an embedding model, so the `model_type` needs to be specified.
model_uid = client.launch_model(model_name="bge-small-en-v1.5", model_type="embedding")
model = client.get_model(model_uid)

input_text = "What is the capital of China?"
model.create_embedding(input_text)

输出：

{'object': 'list',
 'model': 'da2a511c-6ccc-11ee-ad07-22c9969c1611-1-0',
 'data': [{'index': 0,
 'object': 'embedding',
 'embedding': [-0.014207549393177032,
    -0.01832585781812668,
    0.010556723922491074,
    ...
    -0.021243810653686523,
    -0.03009396605193615,
    0.05420297756791115]}],
 'usage': {'prompt_tokens': 37, 'total_tokens': 37}}

OpenAI Client#

使用 Openai 发送请求时，除了创建模型，其余的请求都保持与 Openai 的接口兼容。Openai 使用方式可以参考 https://platform.openai.com/docs/api-reference/embeddings?lang=python

import openai

# Assume that the model is already launched.
# The api_key can't be empty, any string is OK.
client = openai.Client(api_key="not empty", base_url="http://localhost:9997/v1")
client.embeddings.create(model=model_uid, input=["What is the capital of China?"])

输出：

CreateEmbeddingResponse(data=[Embedding(embedding=[-0.014207549393177032, -0.01832585781812668, 0.010556723922491074, ..., -0.021243810653686523, -0.03009396605193615, 0.05420297756791115], index=0, object='embedding')], model='bge-small-en-v1.5-1-0', object='list', usage=Usage(prompt_tokens=37, total_tokens=37))

图片#

列出所有内置的文生图模型：

>>> xinference registrations -t image

Type    Name                          Family            Is-built-in
------  ----------------------------  ----------------  -------------
image   sd-turbo                      stable_diffusion  True
image   sdxl-turbo                    stable_diffusion  True
image   stable-diffusion-v1.5         stable_diffusion  True
image   stable-diffusion-xl-base-1.0  stable_diffusion  True

初始化一个文生图模型并通过提示词生成图片：

Xinference Client#

from xinference.client import Client

client = Client("http://localhost:9997")
# The stable-diffusion-v1.5 is an image model, so the `model_type` needs to be specified.
# Additional kwargs can be passed to AutoPipelineForText2Image.from_pretrained here.
model_uid = client.launch_model(model_name="stable-diffusion-v1.5", model_type="image")
model = client.get_model(model_uid)

input_text = "an apple"
model.text_to_image(input_text)

输出：

{'created': 1697536913,
 'data': [{'url': '/home/admin/.xinference/image/605d2f545ac74142b8031455af31ee33.jpg',
 'b64_json': None}]}

OpenAI Client#

使用 Openai 发送请求时，除了创建模型，其余的请求都保持与 Openai 的接口兼容。Openai 使用方式可以参考 https://platform.openai.com/docs/api-reference/images/create?lang=python

import openai

# Assume that the model is already launched.
# The api_key can't be empty, any string is OK.
client = openai.Client(api_key="not empty", base_url="http://localhost:9997/v1")
client.images.generate(model=model_uid, prompt="an apple")

输出：

ImagesResponse(created=1704445354, data=[Image(b64_json=None, revised_prompt=None, url='/home/admin/.xinference/image/605d2f545ac74142b8031455af31ee33.jpg')])

Audio#

列出所有内置的文生图模型：

>>> xinference registrations -t audio

Type    Name               Family    Multilingual    Is-built-in
------  -----------------  --------  --------------  -------------
audio   whisper-base       whisper   True            True
audio   whisper-base.en    whisper   False           True
audio   whisper-large-v3   whisper   True            True
audio   whisper-medium     whisper   True            True
audio   whisper-medium.en  whisper   False           True
audio   whisper-tiny       whisper   True            True
audio   whisper-tiny.en    whisper   False           True

初始化一个语音模型并通过语音生成文字：

Xinference Client#

from xinference.client import Client

client = Client("http://localhost:9997")
model_uid = client.launch_model(model_name="whisper-large-v3", model_type="audio")
model = client.get_model(model_uid)

input_text = "an apple"
with open("audio.mp3", "rb") as audio_file:
    model.transcriptions(audio_file.read())

输出：

{
  "text": "Imagine the wildest idea that you've ever had, and you're curious about how it might scale to something that's a 100, a 1,000 times bigger. This is a place where you can get to do that."
}

OpenAI Client#

使用 Openai 发送请求时，除了创建模型，其余的请求都保持与 Openai 的接口兼容。Openai 使用方式可以参考 https://platform.openai.com/docs/api-reference/images/create?lang=python

import openai

# Assume that the model is already launched.
# The api_key can't be empty, any string is OK.
client = openai.Client(api_key="not empty", base_url="http://localhost:9997/v1")
with open("audio.mp3", "rb") as audio_file:
    completion = client.audio.transcriptions.create(model=model_uid, file=audio_file)

输出：

Translation(text=' This list lists the airlines in Hong Kong.')

Rerank#

拉起 rerank 模型并计算文本相似度：

from xinference.client import Client

client = Client("http://localhost:9997")
model_uid = client.launch_model(model_name="bge-reranker-base", model_type="rerank")
model = client.get_model(model_uid)

query = "A man is eating pasta."
corpus = [
    "A man is eating food.",
    "A man is eating a piece of bread.",
    "The girl is carrying a baby.",
    "A man is riding a horse.",
    "A woman is playing violin."
]
print(model.rerank(corpus, query))

输出：

{'id': '480dca92-8910-11ee-b76a-c2c8e4cad3f5', 'results': [{'index': 0, 'relevance_score': 0.9999247789382935,
 'document': 'A man is eating food.'}, {'index': 1, 'relevance_score': 0.2564932405948639,
 'document': 'A man is eating a piece of bread.'}, {'index': 3, 'relevance_score': 3.955026841140352e-05,
 'document': 'A man is riding a horse.'}, {'index': 2, 'relevance_score': 3.742107219295576e-05,
 'document': 'The girl is carrying a baby.'}, {'index': 4, 'relevance_score': 3.739788007806055e-05,
 'document': 'A woman is playing violin.'}]}