Audio (Experimental)#

Learn how to turn audio into text or text into audio with Xinference.

Introduction#

The Audio API provides two methods for interacting with audio:

  • The transcriptions endpoint transcribes audio into the input language.

  • The translations endpoint translates audio into English.

API ENDPOINT

OpenAI-compatible ENDPOINT

Transcription API

/v1/audio/transcriptions

Translation API

/v1/audio/translations

Supported models#

The audio API is supported with the following models in Xinference:

  • whisper-tiny

  • whisper-tiny.en

  • whisper-base

  • whisper-base.en

  • whisper-medium

  • whisper-medium.en

  • whisper-large-v3

Quickstart#

Transcription#

The Transcription API mimics OpenAI’s create transcriptions API. We can try Transcription API out either via cURL, OpenAI Client, or Xinference’s python client:

curl -X 'POST' \
  'http://<XINFERENCE_HOST>:<XINFERENCE_PORT>/v1/audio/transcriptions' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "<MODEL_UID>",
    "file": "<audio bytes>",
  }'

Translation#

The Translation API mimics OpenAI’s create translations API. We can try Translation API out either via cURL, OpenAI Client, or Xinference’s python client:

curl -X 'POST' \
  'http://<XINFERENCE_HOST>:<XINFERENCE_PORT>/v1/audio/translations' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "<MODEL_UID>",
    "file": "<audio bytes>",
  }'