聊天 & 生成#

学习如何在 Xinference 中与 LLM 聊天。

介绍#

具备 chat 或 generate 能力的模型通常被称为大型语言模型（LLM）或文本生成模型。这些模型旨在根据接收到的输入以文本输出方式进行回应，通常被称为“提示”。一般来说，可以通过特定指令或提供具体示例来引导这些模型完成任务。

具备 generate 能力的模型通常是预训练的大型语言模型。另一方面，配备 chat 功能的模型是经过精调和对齐的 LLM（Language Model），专为对话场景进行优化。在大多数情况下，以“chat”结尾的模型（例如 llama-2-chat，qwen-chat 等）则具有 chat 功能。

Chat API 和 Generate API 提供了两种不同的与 LLMs 进行交互的方法：

Chat API（类似于 OpenAI 的 Chat Completion API）可以进行多轮对话。
Generate API（类似于 OpenAI 的 Completions API ）允许您根据文本提示生成文本。

模型能力	API 端点	OpenAI 兼容端点
chat	Chat API	/v1/chat/completions
generate	Generate API	/v1/completions

支持的模型列表#

你可以查看所有 Xinference 中内置的 LLM 模型的能力。

快速入门#

Chat API#

尝试使用 cURL、OpenAI Client 或 Xinference的 Python 客户端来测试 Chat API：

curl -X 'POST' \
  'http://<XINFERENCE_HOST>:<XINFERENCE_PORT>/v1/chat/completions' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "<MODEL_UID>",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "What is the largest animal?"
        }
    ],
    "max_tokens": 512,
    "temperature": 0.7
  }'

import openai

client = openai.Client(
    api_key="cannot be empty",
    base_url="http://<XINFERENCE_HOST>:<XINFERENCE_PORT>/v1"
)
client.chat.completions.create(
    model="<MODEL_UID>",
    messages=[
        {
            "content": "What is the largest animal?",
            "role": "user",
        }
    ],
    max_tokens=512,
    temperature=0.7
)

from xinference.client import RESTfulClient

client = RESTfulClient("http://<XINFERENCE_HOST>:<XINFERENCE_PORT>")
model = client.get_model("<MODEL_UID>")
print(model.chat(
    prompt="What is the largest animal?",
    system_prompt="You are a helpful assistant.",
    chat_history=[],
    generate_config={
      "max_tokens": 512,
      "temperature": 0.7
    }
))

{
  "id": "chatcmpl-8d76b65a-bad0-42ef-912d-4a0533d90d61",
  "model": "<MODEL_UID>",
  "object": "chat.completion",
  "created": 1688919187,
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The largest animal that has been scientifically measured is the blue whale, which has a maximum length of around 23 meters (75 feet) for adult animals and can weigh up to 150,000 pounds (68,000 kg). However, it is important to note that this is just an estimate and that the largest animal known to science may be larger still. Some scientists believe that the largest animals may not have a clear \"size\" in the same way that humans do, as their size can vary depending on the environment and the stage of their life."
      },
      "finish_reason": "None"
    }
  ],
  "usage": {
    "prompt_tokens": -1,
    "completion_tokens": -1,
    "total_tokens": -1
  }
}

你可以在教程笔记本中找到更多 Chat API 的示例。

Gradio Chat

学习如何使用 Xinference 的 Chat API 和 Python 客户端的示例。

Generate API#

Generate API 复刻了 OpenAI 的 Completions API。

Generate API 和 Chat API 之间的区别主要在于输入形式。Chat API 接受一个消息列表作为输入，Generate API 接受一个名为 prompt 的自由文本字符串作为输入。

curl -X 'POST' \
  'http://<XINFERENCE_HOST>:<XINFERENCE_PORT>/v1/completions' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "<MODEL_UID>",
    "prompt": "What is the largest animal?",
    "max_tokens": 512,
    "temperature": 0.7
  }'

import openai

client = openai.Client(api_key="cannot be empty", base_url="http://<XINFERENCE_HOST>:<XINFERENCE_PORT>/v1")
client.chat.completions.create(
    model=("<MODEL_UID>",
    prompt="What is the largest animal?"
    max_tokens=512,
    temperature=0.7
)

from xinference.client import RESTfulClient

client = RESTfulClient("http://<XINFERENCE_HOST>:<XINFERENCE_PORT>")
model = client.get_model("<MODEL_UID>")
print(model.generate(
    prompt="What is the largest animal?",
    generate_config={
      "max_tokens": 512,
      "temperature": 0.7
    }
))

{
  "id": "cmpl-8d76b65a-bad0-42ef-912d-4a0533d90d61",
  "model": "<MODEL_UID>",
  "object": "text_completion",
  "created": 1688919187,
  "choices": [
    {
      "index": 0,
      "text": "The largest animal that has been scientifically measured is the blue whale, which has a maximum length of around 23 meters (75 feet) for adult animals and can weigh up to 150,000 pounds (68,000 kg). However, it is important to note that this is just an estimate and that the largest animal known to science may be larger still. Some scientists believe that the largest animals may not have a clear \"size\" in the same way that humans do, as their size can vary depending on the environment and the stage of their life.",
      "finish_reason": "None"
    }
  ],
  "usage": {
    "prompt_tokens": -1,
    "completion_tokens": -1,
    "total_tokens": -1
  }
}

FAQ#

Xinference 的 LLM 是否提供与 LangChain 或 LlamaIndex 的集成方法？#

是的，你可以参考它们各自官方Xinference文档中的相关部分。以下是链接：