Large language Models#
The following is a list of built-in LLM in Xinference:
MODEL NAME |
ABILITIES |
COTNEXT_LENGTH |
DESCRIPTION |
---|---|---|---|
generate |
2048 |
Aquila2 series models are the base language models |
|
chat |
2048 |
Aquila2-chat series models are the chat models |
|
chat |
16384 |
AquilaChat2-16k series models are the long-text chat models |
|
generate |
4096 |
Baichuan2 is an open-source Transformer based LLM that is trained on both Chinese and English data. |
|
chat |
4096 |
Baichuan2-chat is a fine-tuned version of the Baichuan LLM, specializing in chatting. |
|
chat |
131072 |
C4AI Command-R(+) is a research release of a 35 and 104 billion parameter highly performant generative model. |
|
generate |
100000 |
Code-Llama is an open-source LLM trained by fine-tuning LLaMA2 for generating and discussing code. |
|
chat |
100000 |
Code-Llama-Instruct is an instruct-tuned version of the Code-Llama LLM. |
|
generate |
100000 |
Code-Llama-Python is a fine-tuned version of the Code-Llama LLM, specializing in Python. |
|
chat |
131072 |
the open-source version of the latest CodeGeeX4 model series |
|
generate |
65536 |
CodeQwen1.5 is the Code-Specific version of Qwen1.5. It is a transformer-based decoder-only language model pretrained on a large amount of data of codes. |
|
chat |
65536 |
CodeQwen1.5 is the Code-Specific version of Qwen1.5. It is a transformer-based decoder-only language model pretrained on a large amount of data of codes. |
|
generate |
8194 |
CodeShell is a multi-language code LLM developed by the Knowledge Computing Lab of Peking University. |
|
chat |
8194 |
CodeShell is a multi-language code LLM developed by the Knowledge Computing Lab of Peking University. |
|
generate |
32768 |
Codestrall-22B-v0.1 is trained on a diverse dataset of 80+ programming languages, including the most popular ones, such as Python, Java, C, C++, JavaScript, and Bash |
|
chat, vision |
8192 |
CogVLM2 have achieved good results in many lists compared to the previous generation of CogVLM open source models. Its excellent performance can compete with some non-open source models. |
|
chat, vision |
8192 |
CogVLM2-Video achieves state-of-the-art performance on multiple video question answering tasks. |
|
chat |
32768 |
csg-wukong-1B is a 1 billion-parameter small language model(SLM) pretrained on 1T tokens. |
|
generate |
4096 |
DeepSeek LLM, trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. |
|
chat |
4096 |
DeepSeek LLM is an advanced language model comprising 67 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. |
|
generate |
16384 |
Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese. |
|
chat |
16384 |
deepseek-coder-instruct is a model initialized from deepseek-coder-base and fine-tuned on 2B tokens of instruction data. |
|
generate |
128000 |
DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. |
|
chat |
128000 |
DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. |
|
chat |
128000 |
DeepSeek-V2-Chat-0628 is an improved version of DeepSeek-V2-Chat. |
|
chat |
128000 |
DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct. The new model integrates the general and coding abilities of the two previous versions. |
|
chat, vision |
4096 |
DeepSeek-VL possesses general multimodal understanding capabilities, capable of processing logical diagrams, web pages, formula recognition, scientific literature, natural images, and embodied intelligence in complex scenarios. |
|
chat |
8192 |
Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. |
|
chat |
8192 |
Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. |
|
chat, vision |
8192 |
GLM4 is the open source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI. |
|
chat |
8192 |
The GLM-Edge series is our attempt to face the end-side real-life scenarios, which consists of two sizes of large-language dialogue models and multimodal comprehension models (GLM-Edge-1.5B-Chat, GLM-Edge-4B-Chat, GLM-Edge-V-2B, GLM-Edge-V-5B). Among them, the 1.5B / 2B model is mainly for platforms such as mobile phones and cars, and the 4B / 5B model is mainly for platforms such as PCs. |
|
chat, vision |
8192 |
The GLM-Edge series is our attempt to face the end-side real-life scenarios, which consists of two sizes of large-language dialogue models and multimodal comprehension models (GLM-Edge-1.5B-Chat, GLM-Edge-4B-Chat, GLM-Edge-V-2B, GLM-Edge-V-5B). Among them, the 1.5B / 2B model is mainly for platforms such as mobile phones and cars, and the 4B / 5B model is mainly for platforms such as PCs. |
|
chat, tools |
131072 |
GLM4 is the open source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI. |
|
chat, tools |
1048576 |
GLM4 is the open source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI. |
|
chat |
4096 |
OpenFunctions is designed to extend Large Language Model (LLM) Chat Completion feature to formulate executable APIs call given natural language instructions and API context. |
|
generate |
1024 |
GPT-2 is a Transformer-based LLM that is trained on WebTest, a 40 GB dataset of Reddit posts with 3+ upvotes. |
|
chat |
32768 |
The second generation of the InternLM model, InternLM2. |
|
chat |
32768 |
InternLM2.5 series of the InternLM model. |
|
chat |
262144 |
InternLM2.5 series of the InternLM model supports 1M long-context |
|
chat, vision |
32768 |
InternVL 1.5 is an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. |
|
chat, vision |
32768 |
InternVL 2 is an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. |
|
generate |
4096 |
Llama-2 is the second generation of Llama, open-source and trained on a larger amount of data. |
|
chat |
4096 |
Llama-2-Chat is a fine-tuned version of the Llama-2 LLM, specializing in chatting. |
|
generate |
8192 |
Llama 3 is an auto-regressive language model that uses an optimized transformer architecture |
|
chat |
8192 |
The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks.. |
|
generate |
131072 |
Llama 3.1 is an auto-regressive language model that uses an optimized transformer architecture |
|
chat, tools |
131072 |
The Llama 3.1 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks.. |
|
generate, vision |
131072 |
The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image… |
|
chat, vision |
131072 |
Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image… |
|
chat |
4096 |
MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. |
|
chat |
4096 |
MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. |
|
chat |
4096 |
MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. |
|
chat |
4096 |
MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. |
|
chat |
4096 |
MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. |
|
chat, vision |
8192 |
MiniCPM-Llama3-V 2.5 is the latest model in the MiniCPM-V series. The model is built on SigLip-400M and Llama3-8B-Instruct with a total of 8B parameters. |
|
chat, vision |
32768 |
MiniCPM-V 2.6 is the latest model in the MiniCPM-V series. The model is built on SigLip-400M and Qwen2-7B with a total of 8B parameters. |
|
chat |
32768 |
MiniCPM3-4B is the 3rd generation of MiniCPM series. The overall performance of MiniCPM3-4B surpasses Phi-3.5-mini-Instruct and GPT-3.5-Turbo-0125, being comparable with many recent 7B~9B models. |
|
chat |
8192 |
Mistral-7B-Instruct is a fine-tuned version of the Mistral-7B LLM on public datasets, specializing in chatting. |
|
chat |
8192 |
The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an improved instruct fine-tuned version of Mistral-7B-Instruct-v0.1. |
|
chat |
32768 |
The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an improved instruct fine-tuned version of Mistral-7B-Instruct-v0.1. |
|
chat |
131072 |
Mistral-Large-Instruct-2407 is an advanced dense Large Language Model (LLM) of 123B parameters with state-of-the-art reasoning, knowledge and coding capabilities. |
|
chat |
1024000 |
The Mistral-Nemo-Instruct-2407 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-Nemo-Base-2407 |
|
generate |
8192 |
Mistral-7B is a unmoderated Transformer based LLM claiming to outperform Llama2 on all benchmarks. |
|
chat |
65536 |
The Mixtral-8x22B-Instruct-v0.1 Large Language Model (LLM) is an instruct fine-tuned version of the Mixtral-8x22B-v0.1, specializing in chatting. |
|
chat |
32768 |
Mistral-8x7B-Instruct is a fine-tuned version of the Mistral-8x7B LLM, specializing in chatting. |
|
generate |
32768 |
The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. |
|
chat, vision |
2048 |
OmniLMM is a family of open-source large multimodal models (LMMs) adept at vision & language modeling. |
|
chat |
8192 |
Openhermes 2.5 is a fine-tuned version of Mistral-7B-v0.1 on primarily GPT-4 generated data. |
|
generate |
2048 |
Opt is an open-source, decoder-only, Transformer based LLM that was designed to replicate GPT-3. |
|
chat |
4096 |
Orion-14B series models are open-source multilingual large language models trained from scratch by OrionStarAI. |
|
chat |
4096 |
Orion-14B series models are open-source multilingual large language models trained from scratch by OrionStarAI. |
|
generate |
2048 |
Phi-2 is a 2.7B Transformer based LLM used for research on model safety, trained with data similar to Phi-1.5 but augmented with synthetic texts and curated websites. |
|
chat |
128000 |
The Phi-3-Mini-128K-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets. |
|
chat |
4096 |
The Phi-3-Mini-4k-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets. |
|
generate |
4096 |
Platypus-70B-instruct is a merge of garage-bAInd/Platypus2-70B and upstage/Llama-2-70b-instruct-v2. |
|
chat |
32768 |
Qwen-chat is a fine-tuned version of the Qwen LLM trained with alignment techniques, specializing in chatting. |
|
chat, vision |
4096 |
Qwen-VL-Chat supports more flexible interaction, such as multiple image inputs, multi-round question answering, and creative capabilities. |
|
chat, tools |
32768 |
Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. |
|
chat, tools |
32768 |
Qwen1.5-MoE is a transformer-based MoE decoder-only language model pretrained on a large amount of data. |
|
chat, audio |
32768 |
Qwen2-Audio: A large-scale audio-language model which is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions. |
|
chat, audio |
32768 |
Qwen2-Audio: A large-scale audio-language model which is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions. |
|
chat, tools |
32768 |
Qwen2 is the new series of Qwen large language models |
|
chat, tools |
32768 |
Qwen2 is the new series of Qwen large language models. |
|
chat, vision |
32768 |
Qwen2-VL: To See the World More Clearly.Qwen2-VL is the latest version of the vision language models in the Qwen model familities. |
|
generate |
32768 |
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. |
|
generate |
32768 |
Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). |
|
chat, tools |
32768 |
Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). |
|
chat, tools |
32768 |
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. |
|
chat |
32768 |
QwQ-32B-Preview is an experimental research model developed by the Qwen Team, focused on advancing AI reasoning capabilities. |
|
generate |
8192 |
We introduce SeaLLM-7B-v2, the state-of-the-art multilingual LLM for Southeast Asian (SEA) languages |
|
generate |
8192 |
We introduce SeaLLM-7B-v2.5, the state-of-the-art multilingual LLM for Southeast Asian (SEA) languages |
|
generate |
4096 |
Skywork is a series of large models developed by the Kunlun Group · Skywork team. |
|
generate |
4096 |
Skywork is a series of large models developed by the Kunlun Group · Skywork team. |
|
chat |
4096 |
We introduce Starling-7B, an open large language model (LLM) trained by Reinforcement Learning from AI Feedback (RLAIF). The model harnesses the power of our new GPT-4 labeled ranking dataset |
|
chat |
8192 |
The TeleChat is a large language model developed and trained by China Telecom Artificial Intelligence Technology Co., LTD. The 7B model base is trained with 1.5 trillion Tokens and 3 trillion Tokens and Chinese high-quality corpus. |
|
generate |
2048 |
The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens. |
|
chat |
100000 |
||
chat |
2048 |
WizardMath is an open-source LLM trained by fine-tuning Llama2 with Evol-Instruct, specializing in math. |
|
generate |
2048 |
XVERSE is a multilingual large language model, independently developed by Shenzhen Yuanxiang Technology. |
|
chat |
2048 |
XVERSEB-Chat is the aligned version of model XVERSE. |
|
generate |
4096 |
The Yi series models are large language models trained from scratch by developers at 01.AI. |
|
generate |
4096 |
Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples. |
|
chat |
4096 |
Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples. |
|
chat |
16384 |
Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples. |
|
generate |
262144 |
The Yi series models are large language models trained from scratch by developers at 01.AI. |
|
chat |
4096 |
The Yi series models are large language models trained from scratch by developers at 01.AI. |
|
generate |
131072 |
Yi-Coder is a series of open-source code language models that delivers state-of-the-art coding performance with fewer than 10 billion parameters.Excelling in long-context understanding with a maximum context length of 128K tokens.Supporting 52 major programming languages, including popular ones such as Java, Python, JavaScript, and C++. |
|
chat |
131072 |
Yi-Coder is a series of open-source code language models that delivers state-of-the-art coding performance with fewer than 10 billion parameters.Excelling in long-context understanding with a maximum context length of 128K tokens.Supporting 52 major programming languages, including popular ones such as Java, Python, JavaScript, and C++. |
|
chat, vision |
4096 |
Yi Vision Language (Yi-VL) model is the open-source, multimodal version of the Yi Large Language Model (LLM) series, enabling content comprehension, recognition, and multi-round conversations about images. |
- aquila2
- aquila2-chat
- aquila2-chat-16k
- baichuan-2
- baichuan-2-chat
- c4ai-command-r-v01
- code-llama
- code-llama-instruct
- code-llama-python
- codegeex4
- codeqwen1.5
- codeqwen1.5-chat
- codeshell
- codeshell-chat
- codestral-v0.1
- cogvlm2
- cogvlm2-video-llama3-chat
- csg-wukong-chat-v0.1
- deepseek
- deepseek-chat
- deepseek-coder
- Specifications
- Model Spec 1 (pytorch, 1_3 Billion)
- Model Spec 2 (pytorch, 6_7 Billion)
- Model Spec 3 (pytorch, 7 Billion)
- Model Spec 4 (pytorch, 33 Billion)
- Model Spec 5 (ggufv2, 1_3 Billion)
- Model Spec 6 (ggufv2, 6_7 Billion)
- Model Spec 7 (ggufv2, 7 Billion)
- Model Spec 8 (ggufv2, 33 Billion)
- Model Spec 9 (gptq, 1_3 Billion)
- Model Spec 10 (gptq, 6_7 Billion)
- Model Spec 11 (gptq, 33 Billion)
- Model Spec 12 (awq, 1_3 Billion)
- Model Spec 13 (awq, 6_7 Billion)
- Model Spec 14 (awq, 33 Billion)
- Specifications
- deepseek-coder-instruct
- Specifications
- Model Spec 1 (pytorch, 1_3 Billion)
- Model Spec 2 (pytorch, 6_7 Billion)
- Model Spec 3 (pytorch, 7 Billion)
- Model Spec 4 (pytorch, 33 Billion)
- Model Spec 5 (ggufv2, 1_3 Billion)
- Model Spec 6 (ggufv2, 6_7 Billion)
- Model Spec 7 (ggufv2, 7 Billion)
- Model Spec 8 (ggufv2, 33 Billion)
- Model Spec 9 (gptq, 1_3 Billion)
- Model Spec 10 (gptq, 6_7 Billion)
- Model Spec 11 (gptq, 33 Billion)
- Model Spec 12 (awq, 1_3 Billion)
- Model Spec 13 (awq, 6_7 Billion)
- Model Spec 14 (awq, 33 Billion)
- Specifications
- deepseek-v2
- deepseek-v2-chat
- deepseek-v2-chat-0628
- deepseek-v2.5
- deepseek-vl-chat
- gemma-2-it
- Specifications
- Model Spec 1 (pytorch, 2 Billion)
- Model Spec 2 (pytorch, 9 Billion)
- Model Spec 3 (pytorch, 27 Billion)
- Model Spec 4 (ggufv2, 2 Billion)
- Model Spec 5 (ggufv2, 9 Billion)
- Model Spec 6 (ggufv2, 27 Billion)
- Model Spec 7 (mlx, 2 Billion)
- Model Spec 8 (mlx, 2 Billion)
- Model Spec 9 (mlx, 2 Billion)
- Model Spec 10 (mlx, 9 Billion)
- Model Spec 11 (mlx, 9 Billion)
- Model Spec 12 (mlx, 9 Billion)
- Model Spec 13 (mlx, 27 Billion)
- Model Spec 14 (mlx, 27 Billion)
- Model Spec 15 (mlx, 27 Billion)
- Specifications
- gemma-it
- glm-4v
- glm-edge-chat
- glm-edge-v
- glm4-chat
- glm4-chat-1m
- gorilla-openfunctions-v2
- gpt-2
- internlm2-chat
- internlm2.5-chat
- internlm2.5-chat-1m
- internvl-chat
- internvl2
- Specifications
- Model Spec 1 (pytorch, 1 Billion)
- Model Spec 2 (pytorch, 2 Billion)
- Model Spec 3 (awq, 2 Billion)
- Model Spec 4 (pytorch, 4 Billion)
- Model Spec 5 (pytorch, 8 Billion)
- Model Spec 6 (awq, 8 Billion)
- Model Spec 7 (pytorch, 26 Billion)
- Model Spec 8 (awq, 26 Billion)
- Model Spec 9 (pytorch, 40 Billion)
- Model Spec 10 (awq, 40 Billion)
- Model Spec 11 (pytorch, 76 Billion)
- Model Spec 12 (awq, 76 Billion)
- Specifications
- llama-2
- Specifications
- Model Spec 1 (ggufv2, 7 Billion)
- Model Spec 2 (gptq, 7 Billion)
- Model Spec 3 (awq, 7 Billion)
- Model Spec 4 (ggufv2, 13 Billion)
- Model Spec 5 (ggufv2, 70 Billion)
- Model Spec 6 (pytorch, 7 Billion)
- Model Spec 7 (pytorch, 13 Billion)
- Model Spec 8 (gptq, 13 Billion)
- Model Spec 9 (awq, 13 Billion)
- Model Spec 10 (pytorch, 70 Billion)
- Model Spec 11 (gptq, 70 Billion)
- Model Spec 12 (awq, 70 Billion)
- Specifications
- llama-2-chat
- Specifications
- Model Spec 1 (ggufv2, 7 Billion)
- Model Spec 2 (ggufv2, 13 Billion)
- Model Spec 3 (ggufv2, 70 Billion)
- Model Spec 4 (pytorch, 7 Billion)
- Model Spec 5 (gptq, 7 Billion)
- Model Spec 6 (gptq, 70 Billion)
- Model Spec 7 (awq, 70 Billion)
- Model Spec 8 (awq, 7 Billion)
- Model Spec 9 (pytorch, 13 Billion)
- Model Spec 10 (gptq, 13 Billion)
- Model Spec 11 (awq, 13 Billion)
- Model Spec 12 (pytorch, 70 Billion)
- Specifications
- llama-3
- llama-3-instruct
- Specifications
- Model Spec 1 (ggufv2, 8 Billion)
- Model Spec 2 (pytorch, 8 Billion)
- Model Spec 3 (ggufv2, 70 Billion)
- Model Spec 4 (pytorch, 70 Billion)
- Model Spec 5 (mlx, 8 Billion)
- Model Spec 6 (mlx, 8 Billion)
- Model Spec 7 (mlx, 8 Billion)
- Model Spec 8 (mlx, 70 Billion)
- Model Spec 9 (mlx, 70 Billion)
- Model Spec 10 (mlx, 70 Billion)
- Model Spec 11 (gptq, 8 Billion)
- Model Spec 12 (gptq, 70 Billion)
- Specifications
- llama-3.1
- llama-3.1-instruct
- Specifications
- Model Spec 1 (ggufv2, 8 Billion)
- Model Spec 2 (pytorch, 8 Billion)
- Model Spec 3 (pytorch, 8 Billion)
- Model Spec 4 (gptq, 8 Billion)
- Model Spec 5 (awq, 8 Billion)
- Model Spec 6 (ggufv2, 70 Billion)
- Model Spec 7 (pytorch, 70 Billion)
- Model Spec 8 (pytorch, 70 Billion)
- Model Spec 9 (gptq, 70 Billion)
- Model Spec 10 (awq, 70 Billion)
- Model Spec 11 (mlx, 8 Billion)
- Model Spec 12 (mlx, 8 Billion)
- Model Spec 13 (mlx, 8 Billion)
- Model Spec 14 (mlx, 70 Billion)
- Model Spec 15 (mlx, 70 Billion)
- Model Spec 16 (mlx, 70 Billion)
- Model Spec 17 (pytorch, 405 Billion)
- Model Spec 18 (gptq, 405 Billion)
- Model Spec 19 (awq, 405 Billion)
- Specifications
- llama-3.2-vision
- llama-3.2-vision-instruct
- minicpm-2b-dpo-bf16
- minicpm-2b-dpo-fp16
- minicpm-2b-dpo-fp32
- minicpm-2b-sft-bf16
- minicpm-2b-sft-fp32
- MiniCPM-Llama3-V-2_5
- MiniCPM-V-2.6
- minicpm3-4b
- mistral-instruct-v0.1
- mistral-instruct-v0.2
- mistral-instruct-v0.3
- mistral-large-instruct
- mistral-nemo-instruct
- mistral-v0.1
- mixtral-8x22B-instruct-v0.1
- mixtral-instruct-v0.1
- mixtral-v0.1
- OmniLMM
- openhermes-2.5
- opt
- orion-chat
- orion-chat-rag
- phi-2
- phi-3-mini-128k-instruct
- phi-3-mini-4k-instruct
- platypus2-70b-instruct
- qwen-chat
- Specifications
- Model Spec 1 (ggufv2, 7 Billion)
- Model Spec 2 (ggufv2, 14 Billion)
- Model Spec 3 (pytorch, 1_8 Billion)
- Model Spec 4 (pytorch, 7 Billion)
- Model Spec 5 (pytorch, 14 Billion)
- Model Spec 6 (pytorch, 72 Billion)
- Model Spec 7 (gptq, 7 Billion)
- Model Spec 8 (gptq, 1_8 Billion)
- Model Spec 9 (gptq, 14 Billion)
- Model Spec 10 (gptq, 72 Billion)
- Specifications
- qwen-vl-chat
- qwen1.5-chat
- Specifications
- Model Spec 1 (pytorch, 0_5 Billion)
- Model Spec 2 (pytorch, 1_8 Billion)
- Model Spec 3 (pytorch, 4 Billion)
- Model Spec 4 (pytorch, 7 Billion)
- Model Spec 5 (pytorch, 14 Billion)
- Model Spec 6 (pytorch, 32 Billion)
- Model Spec 7 (pytorch, 72 Billion)
- Model Spec 8 (pytorch, 110 Billion)
- Model Spec 9 (gptq, 0_5 Billion)
- Model Spec 10 (gptq, 1_8 Billion)
- Model Spec 11 (gptq, 4 Billion)
- Model Spec 12 (gptq, 7 Billion)
- Model Spec 13 (gptq, 14 Billion)
- Model Spec 14 (gptq, 32 Billion)
- Model Spec 15 (gptq, 72 Billion)
- Model Spec 16 (gptq, 110 Billion)
- Model Spec 17 (awq, 0_5 Billion)
- Model Spec 18 (awq, 1_8 Billion)
- Model Spec 19 (awq, 4 Billion)
- Model Spec 20 (awq, 7 Billion)
- Model Spec 21 (awq, 14 Billion)
- Model Spec 22 (awq, 32 Billion)
- Model Spec 23 (awq, 72 Billion)
- Model Spec 24 (awq, 110 Billion)
- Model Spec 25 (ggufv2, 0_5 Billion)
- Model Spec 26 (ggufv2, 1_8 Billion)
- Model Spec 27 (ggufv2, 4 Billion)
- Model Spec 28 (ggufv2, 7 Billion)
- Model Spec 29 (ggufv2, 14 Billion)
- Model Spec 30 (ggufv2, 32 Billion)
- Model Spec 31 (ggufv2, 72 Billion)
- Specifications
- qwen1.5-moe-chat
- qwen2-audio
- qwen2-audio-instruct
- qwen2-instruct
- Specifications
- Model Spec 1 (pytorch, 0_5 Billion)
- Model Spec 2 (pytorch, 1_5 Billion)
- Model Spec 3 (pytorch, 7 Billion)
- Model Spec 4 (pytorch, 72 Billion)
- Model Spec 5 (gptq, 0_5 Billion)
- Model Spec 6 (gptq, 1_5 Billion)
- Model Spec 7 (gptq, 7 Billion)
- Model Spec 8 (gptq, 72 Billion)
- Model Spec 9 (awq, 0_5 Billion)
- Model Spec 10 (awq, 1_5 Billion)
- Model Spec 11 (awq, 7 Billion)
- Model Spec 12 (awq, 72 Billion)
- Model Spec 13 (fp8, 0_5 Billion)
- Model Spec 14 (fp8, 0_5 Billion)
- Model Spec 15 (fp8, 1_5 Billion)
- Model Spec 16 (fp8, 7 Billion)
- Model Spec 17 (fp8, 72 Billion)
- Model Spec 18 (mlx, 0_5 Billion)
- Model Spec 19 (mlx, 1_5 Billion)
- Model Spec 20 (mlx, 7 Billion)
- Model Spec 21 (mlx, 72 Billion)
- Model Spec 22 (ggufv2, 0_5 Billion)
- Model Spec 23 (ggufv2, 1_5 Billion)
- Model Spec 24 (ggufv2, 7 Billion)
- Model Spec 25 (ggufv2, 72 Billion)
- Specifications
- qwen2-moe-instruct
- qwen2-vl-instruct
- Specifications
- Model Spec 1 (pytorch, 2 Billion)
- Model Spec 2 (gptq, 2 Billion)
- Model Spec 3 (gptq, 2 Billion)
- Model Spec 4 (awq, 2 Billion)
- Model Spec 5 (pytorch, 7 Billion)
- Model Spec 6 (gptq, 7 Billion)
- Model Spec 7 (gptq, 7 Billion)
- Model Spec 8 (awq, 7 Billion)
- Model Spec 9 (pytorch, 72 Billion)
- Model Spec 10 (awq, 72 Billion)
- Model Spec 11 (gptq, 72 Billion)
- Specifications
- qwen2.5
- qwen2.5-coder
- qwen2.5-coder-instruct
- Specifications
- Model Spec 1 (pytorch, 0_5 Billion)
- Model Spec 2 (pytorch, 1_5 Billion)
- Model Spec 3 (pytorch, 3 Billion)
- Model Spec 4 (pytorch, 7 Billion)
- Model Spec 5 (pytorch, 14 Billion)
- Model Spec 6 (pytorch, 32 Billion)
- Model Spec 7 (gptq, 0_5 Billion)
- Model Spec 8 (gptq, 1_5 Billion)
- Model Spec 9 (gptq, 3 Billion)
- Model Spec 10 (gptq, 7 Billion)
- Model Spec 11 (gptq, 14 Billion)
- Model Spec 12 (gptq, 32 Billion)
- Model Spec 13 (awq, 0_5 Billion)
- Model Spec 14 (awq, 1_5 Billion)
- Model Spec 15 (awq, 3 Billion)
- Model Spec 16 (awq, 7 Billion)
- Model Spec 17 (awq, 14 Billion)
- Model Spec 18 (awq, 32 Billion)
- Model Spec 19 (ggufv2, 1_5 Billion)
- Model Spec 20 (ggufv2, 7 Billion)
- Specifications
- qwen2.5-instruct
- Specifications
- Model Spec 1 (pytorch, 0_5 Billion)
- Model Spec 2 (pytorch, 1_5 Billion)
- Model Spec 3 (pytorch, 3 Billion)
- Model Spec 4 (pytorch, 7 Billion)
- Model Spec 5 (pytorch, 14 Billion)
- Model Spec 6 (pytorch, 32 Billion)
- Model Spec 7 (pytorch, 72 Billion)
- Model Spec 8 (gptq, 0_5 Billion)
- Model Spec 9 (gptq, 1_5 Billion)
- Model Spec 10 (gptq, 3 Billion)
- Model Spec 11 (gptq, 7 Billion)
- Model Spec 12 (gptq, 14 Billion)
- Model Spec 13 (gptq, 32 Billion)
- Model Spec 14 (gptq, 72 Billion)
- Model Spec 15 (awq, 0_5 Billion)
- Model Spec 16 (awq, 1_5 Billion)
- Model Spec 17 (awq, 3 Billion)
- Model Spec 18 (awq, 7 Billion)
- Model Spec 19 (awq, 14 Billion)
- Model Spec 20 (awq, 32 Billion)
- Model Spec 21 (awq, 72 Billion)
- Model Spec 22 (ggufv2, 0_5 Billion)
- Model Spec 23 (ggufv2, 1_5 Billion)
- Model Spec 24 (ggufv2, 3 Billion)
- Model Spec 25 (ggufv2, 7 Billion)
- Model Spec 26 (ggufv2, 14 Billion)
- Model Spec 27 (ggufv2, 32 Billion)
- Model Spec 28 (ggufv2, 72 Billion)
- Model Spec 29 (mlx, 0_5 Billion)
- Model Spec 30 (mlx, 0_5 Billion)
- Model Spec 31 (mlx, 0_5 Billion)
- Model Spec 32 (mlx, 1_5 Billion)
- Model Spec 33 (mlx, 1_5 Billion)
- Model Spec 34 (mlx, 1_5 Billion)
- Model Spec 35 (mlx, 3 Billion)
- Model Spec 36 (mlx, 3 Billion)
- Model Spec 37 (mlx, 3 Billion)
- Model Spec 38 (mlx, 7 Billion)
- Model Spec 39 (mlx, 7 Billion)
- Model Spec 40 (mlx, 7 Billion)
- Model Spec 41 (mlx, 14 Billion)
- Model Spec 42 (mlx, 14 Billion)
- Model Spec 43 (mlx, 14 Billion)
- Model Spec 44 (mlx, 32 Billion)
- Model Spec 45 (mlx, 32 Billion)
- Model Spec 46 (mlx, 32 Billion)
- Model Spec 47 (mlx, 72 Billion)
- Model Spec 48 (mlx, 72 Billion)
- Model Spec 49 (mlx, 72 Billion)
- Specifications
- QwQ-32B-Preview
- seallm_v2
- seallm_v2.5
- Skywork
- Skywork-Math
- Starling-LM
- telechat
- tiny-llama
- wizardcoder-python-v1.0
- wizardmath-v1.0
- xverse
- xverse-chat
- Yi
- Yi-1.5
- Yi-1.5-chat
- Specifications
- Model Spec 1 (pytorch, 6 Billion)
- Model Spec 2 (pytorch, 9 Billion)
- Model Spec 3 (pytorch, 34 Billion)
- Model Spec 4 (ggufv2, 6 Billion)
- Model Spec 5 (ggufv2, 9 Billion)
- Model Spec 6 (ggufv2, 34 Billion)
- Model Spec 7 (gptq, 6 Billion)
- Model Spec 8 (gptq, 9 Billion)
- Model Spec 9 (gptq, 34 Billion)
- Model Spec 10 (awq, 6 Billion)
- Model Spec 11 (awq, 9 Billion)
- Model Spec 12 (awq, 34 Billion)
- Model Spec 13 (mlx, 6 Billion)
- Model Spec 14 (mlx, 6 Billion)
- Model Spec 15 (mlx, 9 Billion)
- Model Spec 16 (mlx, 9 Billion)
- Model Spec 17 (mlx, 34 Billion)
- Model Spec 18 (mlx, 34 Billion)
- Specifications
- Yi-1.5-chat-16k
- Yi-200k
- Yi-chat
- yi-coder
- yi-coder-chat
- yi-vl-chat