xinference.client.Client.launch_model#

Launch the model based on the parameters on the server via RESTful APIs.

参数:

model_name (str) -- The name of model.
model_type (str) -- type of model.
model_engine (Optional[str]) -- Specify the inference engine of the model when launching LLM.
model_uid (str) -- UID of model, auto generate a UUID if is None.
model_size_in_billions (Optional[Union[int, str, float]]) -- The size (in billions) of the model.
model_format (Optional[str]) -- The format of the model.
quantization (Optional[str]) -- The quantization of model.
replica (Optional[int]) -- The replica of model, default is 1.
n_worker (int) -- Number of workers to run.
n_gpu (Optional[Union[int, str]],) -- The number of GPUs used by the model, default is "auto". If n_worker>1, means number of GPUs per worker. n_gpu=None means cpu only, n_gpu=auto lets the system automatically determine the best number of GPUs to use.
peft_model_config (Optional[Dict]) --
- "lora_list": A List of PEFT (Parameter-Efficient Fine-Tuning) model and path.
- "image_lora_load_kwargs": A Dict of lora load parameters for image model
- "image_lora_fuse_kwargs": A Dict of lora fuse parameters for image model
request_limits (Optional[int]) -- The number of request limits for this model, default is None. request_limits=None means no limits for this model.
worker_ip (Optional[str]) -- Specify the worker ip where the model is located in a distributed scenario.
gpu_idx (Optional[Union[int, List[int]]]) -- Specify the GPU index where the model is located.
model_path (Optional[str]) -- Model path, if gguf format, should be the file path, otherwise, should be directory of the model.
enable_thinking (Optional[bool]) -- Enable or disable thinking mode for hybrid reasoning LLMs (e.g., Qwen3). None uses the model default.
enable_virtual_env (Optional[bool]) -- If enable virtual env.
virtual_env_packages (Optional[List[str]]) -- Packages to specify in virtual env, can be used to override builtin packages in virtual env.
envs (Optional[Dict[str, str]]) -- Environment variables to pass when launching model.
**kwargs -- Any other parameters been specified. e.g. multimodal_projector for multimodal inference with the llama.cpp backend.

返回:

The unique model_uid for the launched model.

返回类型:

str