xinference.client.handlers.AudioModelHandle.speech#

AudioModelHandle.speech(input: str, voice: str = '', response_format: str = 'mp3', speed: float = 1.0, stream: bool = False, prompt_speech: bytes | None = None, **kwargs)#

Generates audio from the input text.

Parameters:
  • input (str) – The text to generate audio for. The maximum length is 4096 characters.

  • voice (str) – The voice to use when generating the audio.

  • response_format (str) – The format to audio in.

  • speed (str) – The speed of the generated audio.

  • stream (bool) – Use stream or not.

  • prompt_speech (bytes) – The audio bytes to be provided to the model.

Returns:

The generated audio binary.

Return type:

bytes