xinference.client.handlers.ChatModelHandle.generate#
- ChatModelHandle.generate(prompt: str, generate_config: LlamaCppGenerateConfig | PytorchGenerateConfig | None = None) Completion | Iterator[CompletionChunk] #
Creates a completion for the provided prompt and parameters via RESTful APIs.
- Parameters:
prompt (str) – The user’s message or user’s input.
generate_config (Optional[Union["LlamaCppGenerateConfig", "PytorchGenerateConfig"]]) – Additional configuration for the chat generation. “LlamaCppGenerateConfig” -> Configuration for llama-cpp-python model “PytorchGenerateConfig” -> Configuration for pytorch model
- Returns:
Stream is a parameter in generate_config. When stream is set to True, the function will return Iterator[“CompletionChunk”]. When stream is set to False, the function will return “Completion”.
- Return type:
Union[“Completion”, Iterator[“CompletionChunk”]]
- Raises:
RuntimeError – Fail to generate the completion from the server. Detailed information provided in error message.