.. _audio: ===================== Audio (Experimental) ===================== Learn how to turn audio into text or text into audio with Xinference. Introduction ================== The Audio API provides three methods for interacting with audio: * The transcriptions endpoint transcribes audio into the input language. * The translations endpoint translates audio into English. * The speech endpoint generates audio from the input text. .. list-table:: :widths: 25 50 :header-rows: 1 * - API ENDPOINT - OpenAI-compatible ENDPOINT * - Transcription API - /v1/audio/transcriptions * - Translation API - /v1/audio/translations * - Speech API - /v1/audio/speech Supported models ------------------- The audio API is supported with the following models in Xinference: Audio to text ~~~~~~~~~~~~~ * whisper-tiny * whisper-tiny.en * whisper-base * whisper-base.en * whisper-medium * whisper-medium.en * whisper-large-v3 * whisper-large-v3-turbo * Belle-distilwhisper-large-v2-zh * Belle-whisper-large-v2-zh * Belle-whisper-large-v3-zh * SenseVoiceSmall Text to audio ~~~~~~~~~~~~~ * ChatTTS * CosyVoice Quickstart =================== Transcription -------------------- The Transcription API mimics OpenAI's `create transcriptions API `_. We can try Transcription API out either via cURL, OpenAI Client, or Xinference's python client: .. tabs:: .. code-tab:: bash cURL curl -X 'POST' \ 'http://:/v1/audio/transcriptions' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "model": "", "file": "