音频(实验性质)#
学习如何使用 Xinference 将音频转换为文本或将文本转换为音频。
介绍#
Audio API提供了三种与音频交互的方法:
转录终端将音频转录为输入语言。
翻译端点将音频转换为英文。
转录终端将音频转录为输入语言。
API 端点 |
OpenAI 兼容端点 |
|---|---|
Transcription API |
/v1/audio/transcriptions |
Translation API |
/v1/audio/translations |
Speech API |
/v1/audio/speech |
支持的模型列表#
在Xinference中,以下模型支持音频API:
whisper-tiny
whisper-tiny.en
whisper-base
whisper-base.en
whisper-medium
whisper-medium.en
whisper-large-v3
Belle-distilwhisper-large-v2-zh
Belle-whisper-large-v2-zh
Belle-whisper-large-v3-zh
ChatTTS
CosyVoice
快速入门#
转录#
Transcription API 模仿了 OpenAI 的 create transcriptions API。你可以通过 cURL、OpenAI Client 或者 Xinference 的 Python 客户端来尝试 Transcription API:
curl -X 'POST' \
'http://<XINFERENCE_HOST>:<XINFERENCE_PORT>/v1/audio/transcriptions' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "<MODEL_UID>",
"file": "<audio bytes>",
}'
import openai
client = openai.Client(
api_key="cannot be empty",
base_url="http://<XINFERENCE_HOST>:<XINFERENCE_PORT>/v1"
)
with open("speech.mp3", "rb") as audio_file:
client.audio.transcriptions.create(
model=<MODEL_UID>,
file=audio_file,
)
from xinference.client import Client
client = Client("http://<XINFERENCE_HOST>:<XINFERENCE_PORT>")
model = client.get_model("<MODEL_UID>")
with open("speech.mp3", "rb") as audio_file:
model.transcriptions(audio=audio_file.read())
{
"text": "Imagine the wildest idea that you've ever had, and you're curious about how it might scale to something that's a 100, a 1,000 times bigger. This is a place where you can get to do that."
}
翻译#
Translation API 模仿了 OpenAI 的 create translations API。你可以通过 cURL、OpenAI Client 或 Xinference 的 Python 客户端来尝试使用 Translation API:
curl -X 'POST' \
'http://<XINFERENCE_HOST>:<XINFERENCE_PORT>/v1/audio/translations' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "<MODEL_UID>",
"file": "<audio bytes>",
}'
import openai
client = openai.Client(
api_key="cannot be empty",
base_url="http://<XINFERENCE_HOST>:<XINFERENCE_PORT>/v1"
)
with open("speech.mp3", "rb") as audio_file:
client.audio.translations.create(
model=<MODEL_UID>,
file=audio_file,
)
from xinference.client import Client
client = Client("http://<XINFERENCE_HOST>:<XINFERENCE_PORT>")
model = client.get_model("<MODEL_UID>")
with open("speech.mp3", "rb") as audio_file:
model.translations(audio=audio_file.read())
{
"text": "Hello, my name is Wolfgang and I come from Germany. Where are you heading today?"
}
语音#
Transcription API 模仿了 OpenAI 的 create speech API。你可以通过 cURL、OpenAI Client 或者 Xinference 的 Python 客户端来尝试 Speech API:
Speech API use non-stream by default as
The stream output of ChatTTS is not as good as the non-stream output, please refer to: 2noise/ChatTTS#564
The stream requires ffmpeg<7: https://pytorch.org/audio/stable/installation.html#optional-dependencies
curl -X 'POST' \
'http://<XINFERENCE_HOST>:<XINFERENCE_PORT>/v1/audio/speech' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "<MODEL_UID>",
"text": "<The text to generate audio for>",
"voice": "echo",
"stream": True,
}'
import openai
client = openai.Client(
api_key="cannot be empty",
base_url="http://<XINFERENCE_HOST>:<XINFERENCE_PORT>/v1"
)
client.audio.speech.create(
model=<MODEL_UID>,
input=<The text to generate audio for>,
voice="echo",
)
from xinference.client import Client
client = Client("http://<XINFERENCE_HOST>:<XINFERENCE_PORT>")
model = client.get_model("<MODEL_UID>")
model.speech(
input=<The text to generate audio for>,
voice="echo",
stream: True,
)
The output will be an audio binary.