llama-3.2-vision-instruct#

Context Length: 131072
Model Name: llama-3.2-vision-instruct
Languages: en, de, fr, it, pt, hi, es, th
Abilities: chat, vision
Description: The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image. The models outperform many of the available open source and closed multimodal models on common industry benchmarks…

Specifications#

Model Spec 1 (pytorch, 11 Billion)#

Model Format: pytorch
Model Size (in billions): 11
Quantizations: none
Engines: vLLM, Transformers
Model ID: meta-llama/Meta-Llama-3.2-11B-Vision-Instruct
Model Hubs: Hugging Face, ModelScope

Execute the following command to launch the model, remember to replace ${quantization} with your chosen quantization method from the options listed above:

xinference launch --model-engine transformers --model-name llama-3.2-vision-instruct --size-in-billions 11 --model-format pytorch --quantization ${quantization}
xinference launch --model-engine vllm --enforce_eager --max_num_seqs 16 --model-name llama-3.2-vision-instruct --size-in-billions 11 --model-format pytorch

Model Spec 2 (pytorch, 90 Billion)#

Model Format: pytorch
Model Size (in billions): 90
Quantizations: none
Engines: vLLM, Transformers
Model ID: meta-llama/Meta-Llama-3.2-90B-Vision-Instruct
Model Hubs: Hugging Face, ModelScope

Execute the following command to launch the model, remember to replace ${quantization} with your chosen quantization method from the options listed above:

xinference launch --model-engine transformers --model-name llama-3.2-vision-instruct --size-in-billions 90 --model-format pytorch --quantization ${quantization}
xinference launch --model-engine vllm --enforce_eager --max_num_seqs 16 --model-name llama-3.2-vision-instruct --size-in-billions 90 --model-format pytorch