.. _lora: ================ LoRA Integration ================ Currently, Xinference supports launching ``LLM`` and ``image`` models with an attached LoRA fine-tuned model. Usage ^^^^^ Different from built-in models, xinference currently does not involve managing LoRA models. Users need to first download the LoRA model themselves and then provide the storage path of the model files to xinference. .. tabs:: .. code-tab:: bash shell xinference launch --lora-modules --lora-modules --image-lora-load-kwargs --image-lora-load-kwargs --image-lora-fuse-kwargs --image-lora-fuse-kwargs .. code-tab:: python from xinference.client import Client client = Client("http://:") lora_model1={'lora_name': , 'local_path': } lora_model2={'lora_name': , 'local_path': } lora_models=[lora_model1, lora_model2] image_lora_load_kwargs={'': , '': }, image_lora_fuse_kwargs={'': , '': } peft_model_config = { "image_lora_load_kwargs": image_lora_load_params, "image_lora_fuse_kwargs": image_lora_fuse_params, "lora_list": lora_models } client.launch_model( , peft_model_config=peft_model_config ) Note ^^^^ * The options ``image_lora_load_kwargs`` and ``image_lora_fuse_kwargs`` are only applicable to models with model_type ``image``. They correspond to the parameters in the ``load_lora_weights`` and ``fuse_lora`` interfaces of the ``diffusers`` library. If launching an LLM model, these parameters are not required. * For LLM chat models, currently only LoRA models are supported that do not change the prompt style. * When using GPU, both LoRA and its base model occupy the same devices.