xinference.client.handlers.ChatModelHandle.chat#

Given a list of messages comprising a conversation, the model will return a response via RESTful APIs.

参数:

prompt (str) – The user’s input.
system_prompt (Optional[str]) – The system context provide to Model prior to any chats.
chat_history (Optional[List["ChatCompletionMessage"]]) – A list of messages comprising the conversation so far.
tools (Optional[List[Dict]]) – A tool list.
generate_config (Optional[Union["LlamaCppGenerateConfig", "PytorchGenerateConfig"]]) – Additional configuration for the chat generation. “LlamaCppGenerateConfig” -> configuration for llama-cpp-python model “PytorchGenerateConfig” -> configuration for pytorch model

返回:

Stream is a parameter in generate_config. When stream is set to True, the function will return Iterator[“ChatCompletionChunk”]. When stream is set to False, the function will return “ChatCompletion”.

返回类型:

Union[“ChatCompletion”, Iterator[“ChatCompletionChunk”]]

抛出:

RuntimeError – Report the failure to generate the chat from the server. Detailed information provided in error message.