多模态#
学习如何使用 LLM 处理图像和音频。
视觉#
通过 vision
能力,您可以让模型接收图像并回答有关它们的问题。在 Xinference 中,这表示某些模型在通过 Chat API 进行对话时能够处理图像输入。
支持的模型列表#
在 Xinference 中支持 vision
功能的模型如下:
快速入门#
模型可以通过两种主要方式获取图像:通过传递图像的链接或直接在请求中传递 base64 编码的图像。
使用 OpenAI 客户端的示例#
import openai
client = openai.Client(
api_key="cannot be empty",
base_url=f"http://<XINFERENCE_HOST>:<XINFERENCE_PORT>/v1"
)
response = client.chat.completions.create(
model="<MODEL_UID>",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What’s in this image?"},
{
"type": "image_url",
"image_url": {
"url": "http://i.epochtimes.com/assets/uploads/2020/07/shutterstock_675595789-600x400.jpg",
},
},
],
}
],
)
print(response.choices[0])
上传 Base64 编码的图片#
import openai
import base64
# Function to encode the image
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
# Path to your image
image_path = "path_to_your_image.jpg"
# Getting the base64 string
b64_img = encode_image(image_path)
client = openai.Client(
api_key="cannot be empty",
base_url=f"http://<XINFERENCE_HOST>:<XINFERENCE_PORT>/v1"
)
response = client.chat.completions.create(
model="<MODEL_UID>",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What’s in this image?"},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{b64_img}",
},
},
],
}
],
)
print(response.choices[0])
你可以在教程笔记本中找到更多关于 vision
能力的示例。
Qwen VL Chat
通过使用 qwen-vl-chat 的示例来学习使用 LLM 的视觉能力
音频#
通过“音频”功能,您的模型可以接收音频并执行音频分析或根据语音指令直接生成文本响应。在 Xinference 中,这表示某些模型在通过 Chat API 进行对话时能够处理音频输入。
支持的模型列表#
“音频”功能在 Xinference 中支持以下模型:
快速入门#
音频可以通过两种主要方式提供给模型:通过传递图像链接或在请求中直接传递音频 URL。
带有音频的聊天#
from xinference.client import Client
client = Client("http://<XINFERENCE_HOST>:<XINFERENCE_PORT>")
model = client.get_model(<MODEL_UID>)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{
"role": "user",
"content": [
{
"type": "audio",
"audio_url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2-Audio/audio/glass-breaking-151256.mp3",
},
{"type": "text", "text": "What's that sound?"},
],
},
{"role": "assistant", "content": "It is the sound of glass shattering."},
{
"role": "user",
"content": [
{"type": "text", "text": "What can you do when you hear that?"},
],
},
{
"role": "assistant",
"content": "Stay alert and cautious, and check if anyone is hurt or if there is any damage to property.",
},
{
"role": "user",
"content": [
{
"type": "audio",
"audio_url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2-Audio/audio/1272-128104-0000.flac",
},
{"type": "text", "text": "What does the person say?"},
],
},
]
print(model.chat(messages))