.. _models_llm_gpt-oss: ======================================== gpt-oss ======================================== - **Context Length:** 131072 - **Model Name:** gpt-oss - **Languages:** en - **Abilities:** chat, reasoning - **Description:** gpt-oss series, OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. Specifications ^^^^^^^^^^^^^^ Model Spec 1 (pytorch, 20 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 20 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** openai/gpt-oss-20b - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name gpt-oss --size-in-billions 20 --model-format pytorch --quantization ${quantization} Model Spec 2 (bnb, 20 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** bnb - **Model Size (in billions):** 20 - **Quantizations:** 4-bit - **Engines**: vLLM, Transformers - **Model ID:** unsloth/gpt-oss-20b-bnb-4bit - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name gpt-oss --size-in-billions 20 --model-format bnb --quantization ${quantization} Model Spec 3 (pytorch, 120 Billion) ++++++++++++++++++++++++++++++++++++++++ - **Model Format:** pytorch - **Model Size (in billions):** 120 - **Quantizations:** none - **Engines**: vLLM, Transformers - **Model ID:** openai/gpt-oss-120b - **Model Hubs**: `Hugging Face `__, `ModelScope `__ Execute the following command to launch the model, remember to replace ``${quantization}`` with your chosen quantization method from the options listed above:: xinference launch --model-engine ${engine} --model-name gpt-oss --size-in-billions 120 --model-format pytorch --quantization ${quantization}