Large language Models#

The following is a list of built-in LLM in Xinference:

MODEL NAME	ABILITIES	COTNEXT_LENGTH	DESCRIPTION
aquila2	generate	2048	Aquila2 series models are the base language models
aquila2-chat	chat	2048	Aquila2-chat series models are the chat models
aquila2-chat-16k	chat	16384	AquilaChat2-16k series models are the long-text chat models
baichuan	generate	4096	Baichuan is an open-source Transformer based LLM that is trained on both Chinese and English data.
baichuan-2	generate	4096	Baichuan2 is an open-source Transformer based LLM that is trained on both Chinese and English data.
baichuan-2-chat	chat	4096	Baichuan2-chat is a fine-tuned version of the Baichuan LLM, specializing in chatting.
baichuan-chat	chat	4096	Baichuan-chat is a fine-tuned version of the Baichuan LLM, specializing in chatting.
c4ai-command-r-v01	generate	131072	C4AI Command-R is a research release of a 35 billion parameter highly performant generative model.
c4ai-command-r-v01-4bit	generate	131072	This model is 4bit quantized version of C4AI Command-R using bitsandbytes.
chatglm	chat	2048	ChatGLM is an open-source General Language Model (GLM) based LLM trained on both Chinese and English data.
chatglm2	chat	8192	ChatGLM2 is the second generation of ChatGLM, still open-source and trained on Chinese and English data.
chatglm2-32k	chat	32768	ChatGLM2-32k is a special version of ChatGLM2, with a context window of 32k tokens instead of 8k.
chatglm3	chat, tools	8192	ChatGLM3 is the third generation of ChatGLM, still open-source and trained on Chinese and English data.
chatglm3-128k	chat	131072	ChatGLM3 is the third generation of ChatGLM, still open-source and trained on Chinese and English data.
chatglm3-32k	chat	32768	ChatGLM3 is the third generation of ChatGLM, still open-source and trained on Chinese and English data.
code-llama	generate	100000	Code-Llama is an open-source LLM trained by fine-tuning LLaMA2 for generating and discussing code.
code-llama-instruct	chat	100000	Code-Llama-Instruct is an instruct-tuned version of the Code-Llama LLM.
code-llama-python	generate	100000	Code-Llama-Python is a fine-tuned version of the Code-Llama LLM, specializing in Python.
codeqwen1.5-chat	chat	65536	CodeQwen1.5 is the Code-Specific version of Qwen1.5. It is a transformer-based decoder-only language model pretrained on a large amount of data of codes.
codeshell	generate	8194	CodeShell is a multi-language code LLM developed by the Knowledge Computing Lab of Peking University.
codeshell-chat	chat	8194	CodeShell is a multi-language code LLM developed by the Knowledge Computing Lab of Peking University.
deepseek-chat	chat	4096	DeepSeek LLM is an advanced language model comprising 67 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese.
deepseek-coder-instruct	chat	4096	deepseek-coder-instruct is a model initialized from deepseek-coder-base and fine-tuned on 2B tokens of instruction data.
deepseek-vl-chat	chat, vision	4096	DeepSeek-VL possesses general multimodal understanding capabilities, capable of processing logical diagrams, web pages, formula recognition, scientific literature, natural images, and embodied intelligence in complex scenarios.
falcon	generate	2048	Falcon is an open-source Transformer based LLM trained on the RefinedWeb dataset.
falcon-instruct	chat	2048	Falcon-instruct is a fine-tuned version of the Falcon LLM, specializing in chatting.
gemma-it	chat	8192	Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.
glaive-coder	chat	16384	A code model trained on a dataset of ~140k programming related problems and solutions generated from Glaive’s synthetic data generation platform.
gorilla-openfunctions-v1	chat	4096	OpenFunctions is designed to extend Large Language Model (LLM) Chat Completion feature to formulate executable APIs call given natural language instructions and API context.
gorilla-openfunctions-v2	chat	4096	OpenFunctions is designed to extend Large Language Model (LLM) Chat Completion feature to formulate executable APIs call given natural language instructions and API context.
gpt-2	generate	1024	GPT-2 is a Transformer-based LLM that is trained on WebTest, a 40 GB dataset of Reddit posts with 3+ upvotes.
internlm-20b	generate	16384	Pre-trained on over 2.3T Tokens containing high-quality English, Chinese, and code data.
internlm-7b	generate	8192	InternLM is a Transformer-based LLM that is trained on both Chinese and English data, focusing on practical scenarios.
internlm-chat-20b	chat	16384	Pre-trained on over 2.3T Tokens containing high-quality English, Chinese, and code data. The Chat version has undergone SFT and RLHF training.
internlm-chat-7b	chat	4096	Internlm-chat is a fine-tuned version of the Internlm LLM, specializing in chatting.
internlm2-chat	chat	204800	The second generation of the InternLM model, InternLM2.
llama-2	generate	4096	Llama-2 is the second generation of Llama, open-source and trained on a larger amount of data.
llama-2-chat	chat	4096	Llama-2-Chat is a fine-tuned version of the Llama-2 LLM, specializing in chatting.
llama-3	generate	8192	Llama 3 is an auto-regressive language model that uses an optimized transformer architecture
llama-3-instruct	chat	8192	The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks..
minicpm-2b-dpo-bf16	chat	4096	MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings.
minicpm-2b-dpo-fp16	chat	4096	MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings.
minicpm-2b-dpo-fp32	chat	4096	MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings.
minicpm-2b-sft-bf16	chat	4096	MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings.
minicpm-2b-sft-fp32	chat	4096	MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings.
mistral-instruct-v0.1	chat	8192	Mistral-7B-Instruct is a fine-tuned version of the Mistral-7B LLM on public datasets, specializing in chatting.
mistral-instruct-v0.2	chat	8192	The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an improved instruct fine-tuned version of Mistral-7B-Instruct-v0.1.
mistral-v0.1	generate	8192	Mistral-7B is a unmoderated Transformer based LLM claiming to outperform Llama2 on all benchmarks.
mixtral-8x22b-instruct-v0.1	chat	65536	The Mixtral-8x22B-Instruct-v0.1 Large Language Model (LLM) is an instruct fine-tuned version of the Mixtral-8x22B-v0.1, specializing in chatting.
mixtral-instruct-v0.1	chat	32768	Mistral-8x7B-Instruct is a fine-tuned version of the Mistral-8x7B LLM, specializing in chatting.
mixtral-v0.1	generate	32768	The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts.
omnilmm	chat, vision	2048	OmniLMM is a family of open-source large multimodal models (LMMs) adept at vision & language modeling.
openbuddy	chat	2048	OpenBuddy is a powerful open multilingual chatbot model aimed at global users.
openhermes-2.5	chat	8192	Openhermes 2.5 is a fine-tuned version of Mistral-7B-v0.1 on primarily GPT-4 generated data.
opt	generate	2048	Opt is an open-source, decoder-only, Transformer based LLM that was designed to replicate GPT-3.
orca	chat	2048	Orca is an LLM trained by fine-tuning LLaMA on explanation traces obtained from GPT-4.
orion-chat	chat	4096	Orion-14B series models are open-source multilingual large language models trained from scratch by OrionStarAI.
orion-chat-rag	chat	4096	Orion-14B series models are open-source multilingual large language models trained from scratch by OrionStarAI.
phi-2	generate	2048	Phi-2 is a 2.7B Transformer based LLM used for research on model safety, trained with data similar to Phi-1.5 but augmented with synthetic texts and curated websites.
phi-3-mini-128k-instruct	chat	128000	The Phi-3-Mini-128K-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets.
phi-3-mini-4k-instruct	chat	4096	The Phi-3-Mini-4k-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets.
platypus2-70b-instruct	generate	4096	Platypus-70B-instruct is a merge of garage-bAInd/Platypus2-70B and upstage/Llama-2-70b-instruct-v2.
qwen-chat	chat, tools	32768	Qwen-chat is a fine-tuned version of the Qwen LLM trained with alignment techniques, specializing in chatting.
qwen-vl-chat	chat, vision	4096	Qwen-VL-Chat supports more flexible interaction, such as multiple image inputs, multi-round question answering, and creative capabilities.
qwen1.5-chat	chat, tools	32768	Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data.
qwen1.5-moe-chat	chat	32768	Qwen1.5-MoE is a transformer-based MoE decoder-only language model pretrained on a large amount of data.
seallm_v2	generate	8192	We introduce SeaLLM-7B-v2, the state-of-the-art multilingual LLM for Southeast Asian (SEA) languages
seallm_v2.5	generate	8192	We introduce SeaLLM-7B-v2.5, the state-of-the-art multilingual LLM for Southeast Asian (SEA) languages
skywork	generate	4096	Skywork is a series of large models developed by the Kunlun Group · Skywork team.
skywork-math	generate	4096	Skywork is a series of large models developed by the Kunlun Group · Skywork team.
starchat-beta	chat	8192	Starchat-beta is a fine-tuned version of the Starcoderplus LLM, specializing in coding assistance.
starcoder	generate	8192	Starcoder is an open-source Transformer based LLM that is trained on permissively licensed data from GitHub.
starcoderplus	generate	8192	Starcoderplus is an open-source LLM trained by fine-tuning Starcoder on RedefinedWeb and StarCoderData datasets.
starling-lm	chat	4096	We introduce Starling-7B, an open large language model (LLM) trained by Reinforcement Learning from AI Feedback (RLAIF). The model harnesses the power of our new GPT-4 labeled ranking dataset
tiny-llama	generate	2048	The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens.
vicuna-v1.3	chat	2048	Vicuna is an open-source LLM trained by fine-tuning LLaMA on data collected from ShareGPT.
vicuna-v1.5	chat	4096	Vicuna is an open-source LLM trained by fine-tuning LLaMA on data collected from ShareGPT.
vicuna-v1.5-16k	chat	16384	Vicuna-v1.5-16k is a special version of Vicuna-v1.5, with a context window of 16k tokens instead of 4k.
wizardcoder-python-v1.0	chat	100000
wizardlm-v1.0	chat	2048	WizardLM is an open-source LLM trained by fine-tuning LLaMA with Evol-Instruct.
wizardmath-v1.0	chat	2048	WizardMath is an open-source LLM trained by fine-tuning Llama2 with Evol-Instruct, specializing in math.
xverse	generate	2048	XVERSE is a multilingual large language model, independently developed by Shenzhen Yuanxiang Technology.
xverse-chat	chat	2048	XVERSEB-Chat is the aligned version of model XVERSE.
yi	generate	4096	The Yi series models are large language models trained from scratch by developers at 01.AI.
yi-1.5	generate	4096	Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples.
yi-1.5-chat	chat	4096	Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples.
yi-200k	generate	262144	The Yi series models are large language models trained from scratch by developers at 01.AI.
yi-chat	chat	4096	The Yi series models are large language models trained from scratch by developers at 01.AI.
yi-vl-chat	chat, vision	4096	Yi Vision Language (Yi-VL) model is the open-source, multimodal version of the Yi Large Language Model (LLM) series, enabling content comprehension, recognition, and multi-round conversations about images.
zephyr-7b-alpha	chat	8192	Zephyr-7B-α is the first model in the series, and is a fine-tuned version of mistralai/Mistral-7B-v0.1.
zephyr-7b-beta	chat	8192	Zephyr-7B-β is the second model in the series, and is a fine-tuned version of mistralai/Mistral-7B-v0.1