.. _models_llm_index: ===================== Large language Models ===================== The following is a list of built-in LLM in Xinference: .. list-table:: :widths: 25 25 25 50 :header-rows: 1 * - MODEL NAME - ABILITIES - COTNEXT_LENGTH - DESCRIPTION * - :ref:`aquila2 ` - generate - 2048 - Aquila2 series models are the base language models * - :ref:`aquila2-chat ` - chat - 2048 - Aquila2-chat series models are the chat models * - :ref:`aquila2-chat-16k ` - chat - 16384 - AquilaChat2-16k series models are the long-text chat models * - :ref:`baichuan ` - generate - 4096 - Baichuan is an open-source Transformer based LLM that is trained on both Chinese and English data. * - :ref:`baichuan-2 ` - generate - 4096 - Baichuan2 is an open-source Transformer based LLM that is trained on both Chinese and English data. * - :ref:`baichuan-2-chat ` - chat - 4096 - Baichuan2-chat is a fine-tuned version of the Baichuan LLM, specializing in chatting. * - :ref:`baichuan-chat ` - chat - 4096 - Baichuan-chat is a fine-tuned version of the Baichuan LLM, specializing in chatting. * - :ref:`c4ai-command-r-v01 ` - generate - 131072 - C4AI Command-R is a research release of a 35 billion parameter highly performant generative model. * - :ref:`c4ai-command-r-v01-4bit ` - generate - 131072 - This model is 4bit quantized version of C4AI Command-R using bitsandbytes. * - :ref:`chatglm ` - chat - 2048 - ChatGLM is an open-source General Language Model (GLM) based LLM trained on both Chinese and English data. * - :ref:`chatglm2 ` - chat - 8192 - ChatGLM2 is the second generation of ChatGLM, still open-source and trained on Chinese and English data. * - :ref:`chatglm2-32k ` - chat - 32768 - ChatGLM2-32k is a special version of ChatGLM2, with a context window of 32k tokens instead of 8k. * - :ref:`chatglm3 ` - chat, tools - 8192 - ChatGLM3 is the third generation of ChatGLM, still open-source and trained on Chinese and English data. * - :ref:`chatglm3-128k ` - chat - 131072 - ChatGLM3 is the third generation of ChatGLM, still open-source and trained on Chinese and English data. * - :ref:`chatglm3-32k ` - chat - 32768 - ChatGLM3 is the third generation of ChatGLM, still open-source and trained on Chinese and English data. * - :ref:`code-llama ` - generate - 100000 - Code-Llama is an open-source LLM trained by fine-tuning LLaMA2 for generating and discussing code. * - :ref:`code-llama-instruct ` - chat - 100000 - Code-Llama-Instruct is an instruct-tuned version of the Code-Llama LLM. * - :ref:`code-llama-python ` - generate - 100000 - Code-Llama-Python is a fine-tuned version of the Code-Llama LLM, specializing in Python. * - :ref:`codeqwen1.5-chat ` - chat - 32768 - CodeQwen1.5 is the Code-Specific version of Qwen1.5. It is a transformer-based decoder-only language model pretrained on a large amount of data of codes. * - :ref:`codeshell ` - generate - 8194 - CodeShell is a multi-language code LLM developed by the Knowledge Computing Lab of Peking University. * - :ref:`codeshell-chat ` - chat - 8194 - CodeShell is a multi-language code LLM developed by the Knowledge Computing Lab of Peking University. * - :ref:`deepseek-chat ` - chat - 4096 - DeepSeek LLM is an advanced language model comprising 67 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. * - :ref:`deepseek-coder-instruct ` - chat - 4096 - deepseek-coder-instruct is a model initialized from deepseek-coder-base and fine-tuned on 2B tokens of instruction data. * - :ref:`deepseek-vl-chat ` - chat, vision - 4096 - DeepSeek-VL possesses general multimodal understanding capabilities, capable of processing logical diagrams, web pages, formula recognition, scientific literature, natural images, and embodied intelligence in complex scenarios. * - :ref:`falcon ` - generate - 2048 - Falcon is an open-source Transformer based LLM trained on the RefinedWeb dataset. * - :ref:`falcon-instruct ` - chat - 2048 - Falcon-instruct is a fine-tuned version of the Falcon LLM, specializing in chatting. * - :ref:`gemma-it ` - chat - 8192 - Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. * - :ref:`glaive-coder ` - chat - 16384 - A code model trained on a dataset of ~140k programming related problems and solutions generated from Glaive’s synthetic data generation platform. * - :ref:`gorilla-openfunctions-v1 ` - chat - 4096 - OpenFunctions is designed to extend Large Language Model (LLM) Chat Completion feature to formulate executable APIs call given natural language instructions and API context. * - :ref:`gorilla-openfunctions-v2 ` - chat - 4096 - OpenFunctions is designed to extend Large Language Model (LLM) Chat Completion feature to formulate executable APIs call given natural language instructions and API context. * - :ref:`gpt-2 ` - generate - 1024 - GPT-2 is a Transformer-based LLM that is trained on WebTest, a 40 GB dataset of Reddit posts with 3+ upvotes. * - :ref:`internlm-20b ` - generate - 16384 - Pre-trained on over 2.3T Tokens containing high-quality English, Chinese, and code data. * - :ref:`internlm-7b ` - generate - 8192 - InternLM is a Transformer-based LLM that is trained on both Chinese and English data, focusing on practical scenarios. * - :ref:`internlm-chat-20b ` - chat - 16384 - Pre-trained on over 2.3T Tokens containing high-quality English, Chinese, and code data. The Chat version has undergone SFT and RLHF training. * - :ref:`internlm-chat-7b ` - chat - 4096 - Internlm-chat is a fine-tuned version of the Internlm LLM, specializing in chatting. * - :ref:`internlm2-chat ` - chat - 204800 - The second generation of the InternLM model, InternLM2. * - :ref:`llama-2 ` - generate - 4096 - Llama-2 is the second generation of Llama, open-source and trained on a larger amount of data. * - :ref:`llama-2-chat ` - chat - 4096 - Llama-2-Chat is a fine-tuned version of the Llama-2 LLM, specializing in chatting. * - :ref:`llama-3 ` - generate - 8192 - Llama 3 is an auto-regressive language model that uses an optimized transformer architecture * - :ref:`llama-3-instruct ` - chat - 8192 - The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks.. * - :ref:`minicpm-2b-dpo-bf16 ` - chat - 4096 - MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. * - :ref:`minicpm-2b-dpo-fp16 ` - chat - 4096 - MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. * - :ref:`minicpm-2b-dpo-fp32 ` - chat - 4096 - MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. * - :ref:`minicpm-2b-sft-bf16 ` - chat - 4096 - MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. * - :ref:`minicpm-2b-sft-fp32 ` - chat - 4096 - MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. * - :ref:`mistral-instruct-v0.1 ` - chat - 8192 - Mistral-7B-Instruct is a fine-tuned version of the Mistral-7B LLM on public datasets, specializing in chatting. * - :ref:`mistral-instruct-v0.2 ` - chat - 8192 - The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an improved instruct fine-tuned version of Mistral-7B-Instruct-v0.1. * - :ref:`mistral-v0.1 ` - generate - 8192 - Mistral-7B is a unmoderated Transformer based LLM claiming to outperform Llama2 on all benchmarks. * - :ref:`mixtral-8x22b-instruct-v0.1 ` - chat - 65536 - The Mixtral-8x22B-Instruct-v0.1 Large Language Model (LLM) is an instruct fine-tuned version of the Mixtral-8x22B-v0.1, specializing in chatting. * - :ref:`mixtral-instruct-v0.1 ` - chat - 32768 - Mistral-8x7B-Instruct is a fine-tuned version of the Mistral-8x7B LLM, specializing in chatting. * - :ref:`mixtral-v0.1 ` - generate - 32768 - The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. * - :ref:`omnilmm ` - chat, vision - 2048 - OmniLMM is a family of open-source large multimodal models (LMMs) adept at vision & language modeling. * - :ref:`openbuddy ` - chat - 2048 - OpenBuddy is a powerful open multilingual chatbot model aimed at global users. * - :ref:`openhermes-2.5 ` - chat - 8192 - Openhermes 2.5 is a fine-tuned version of Mistral-7B-v0.1 on primarily GPT-4 generated data. * - :ref:`opt ` - generate - 2048 - Opt is an open-source, decoder-only, Transformer based LLM that was designed to replicate GPT-3. * - :ref:`orca ` - chat - 2048 - Orca is an LLM trained by fine-tuning LLaMA on explanation traces obtained from GPT-4. * - :ref:`orion-chat ` - chat - 4096 - Orion-14B series models are open-source multilingual large language models trained from scratch by OrionStarAI. * - :ref:`orion-chat-rag ` - chat - 4096 - Orion-14B series models are open-source multilingual large language models trained from scratch by OrionStarAI. * - :ref:`phi-2 ` - generate - 2048 - Phi-2 is a 2.7B Transformer based LLM used for research on model safety, trained with data similar to Phi-1.5 but augmented with synthetic texts and curated websites. * - :ref:`phi-3-mini-128k-instruct ` - chat - 128000 - The Phi-3-Mini-128K-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets. * - :ref:`phi-3-mini-4k-instruct ` - chat - 4096 - The Phi-3-Mini-4k-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets. * - :ref:`platypus2-70b-instruct ` - generate - 4096 - Platypus-70B-instruct is a merge of garage-bAInd/Platypus2-70B and upstage/Llama-2-70b-instruct-v2. * - :ref:`qwen-chat ` - chat, tools - 32768 - Qwen-chat is a fine-tuned version of the Qwen LLM trained with alignment techniques, specializing in chatting. * - :ref:`qwen-vl-chat ` - chat, vision - 4096 - Qwen-VL-Chat supports more flexible interaction, such as multiple image inputs, multi-round question answering, and creative capabilities. * - :ref:`qwen1.5-chat ` - chat, tools - 32768 - Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. * - :ref:`qwen1.5-moe-chat ` - chat - 32768 - Qwen1.5-MoE is a transformer-based MoE decoder-only language model pretrained on a large amount of data. * - :ref:`seallm_v2 ` - generate - 8192 - We introduce SeaLLM-7B-v2, the state-of-the-art multilingual LLM for Southeast Asian (SEA) languages * - :ref:`seallm_v2.5 ` - generate - 8192 - We introduce SeaLLM-7B-v2.5, the state-of-the-art multilingual LLM for Southeast Asian (SEA) languages * - :ref:`skywork ` - generate - 4096 - Skywork is a series of large models developed by the Kunlun Group · Skywork team. * - :ref:`skywork-math ` - generate - 4096 - Skywork is a series of large models developed by the Kunlun Group · Skywork team. * - :ref:`starchat-beta ` - chat - 8192 - Starchat-beta is a fine-tuned version of the Starcoderplus LLM, specializing in coding assistance. * - :ref:`starcoder ` - generate - 8192 - Starcoder is an open-source Transformer based LLM that is trained on permissively licensed data from GitHub. * - :ref:`starcoderplus ` - generate - 8192 - Starcoderplus is an open-source LLM trained by fine-tuning Starcoder on RedefinedWeb and StarCoderData datasets. * - :ref:`tiny-llama ` - generate - 2048 - The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens. * - :ref:`vicuna-v1.3 ` - chat - 2048 - Vicuna is an open-source LLM trained by fine-tuning LLaMA on data collected from ShareGPT. * - :ref:`vicuna-v1.5 ` - chat - 4096 - Vicuna is an open-source LLM trained by fine-tuning LLaMA on data collected from ShareGPT. * - :ref:`vicuna-v1.5-16k ` - chat - 16384 - Vicuna-v1.5-16k is a special version of Vicuna-v1.5, with a context window of 16k tokens instead of 4k. * - :ref:`wizardcoder-python-v1.0 ` - chat - 100000 - * - :ref:`wizardlm-v1.0 ` - chat - 2048 - WizardLM is an open-source LLM trained by fine-tuning LLaMA with Evol-Instruct. * - :ref:`wizardmath-v1.0 ` - chat - 2048 - WizardMath is an open-source LLM trained by fine-tuning Llama2 with Evol-Instruct, specializing in math. * - :ref:`xverse ` - generate - 2048 - XVERSE is a multilingual large language model, independently developed by Shenzhen Yuanxiang Technology. * - :ref:`xverse-chat ` - chat - 2048 - XVERSEB-Chat is the aligned version of model XVERSE. * - :ref:`yi ` - generate - 4096 - The Yi series models are large language models trained from scratch by developers at 01.AI. * - :ref:`yi-200k ` - generate - 204800 - The Yi series models are large language models trained from scratch by developers at 01.AI. * - :ref:`yi-chat ` - chat - 204800 - The Yi series models are large language models trained from scratch by developers at 01.AI. * - :ref:`yi-vl-chat ` - chat, vision - 204800 - Yi Vision Language (Yi-VL) model is the open-source, multimodal version of the Yi Large Language Model (LLM) series, enabling content comprehension, recognition, and multi-round conversations about images. * - :ref:`zephyr-7b-alpha ` - chat - 8192 - Zephyr-7B-α is the first model in the series, and is a fine-tuned version of mistralai/Mistral-7B-v0.1. * - :ref:`zephyr-7b-beta ` - chat - 8192 - Zephyr-7B-β is the second model in the series, and is a fine-tuned version of mistralai/Mistral-7B-v0.1 .. toctree:: :maxdepth: 3 aquila2 aquila2-chat aquila2-chat-16k baichuan baichuan-2 baichuan-2-chat baichuan-chat c4ai-command-r-v01 c4ai-command-r-v01-4bit chatglm chatglm2 chatglm2-32k chatglm3 chatglm3-128k chatglm3-32k code-llama code-llama-instruct code-llama-python codeqwen1.5-chat codeshell codeshell-chat deepseek-chat deepseek-coder-instruct deepseek-vl-chat falcon falcon-instruct gemma-it glaive-coder gorilla-openfunctions-v1 gorilla-openfunctions-v2 gpt-2 internlm-20b internlm-7b internlm-chat-20b internlm-chat-7b internlm2-chat llama-2 llama-2-chat llama-3 llama-3-instruct minicpm-2b-dpo-bf16 minicpm-2b-dpo-fp16 minicpm-2b-dpo-fp32 minicpm-2b-sft-bf16 minicpm-2b-sft-fp32 mistral-instruct-v0.1 mistral-instruct-v0.2 mistral-v0.1 mixtral-8x22b-instruct-v0.1 mixtral-instruct-v0.1 mixtral-v0.1 omnilmm openbuddy openhermes-2.5 opt orca orion-chat orion-chat-rag phi-2 phi-3-mini-128k-instruct phi-3-mini-4k-instruct platypus2-70b-instruct qwen-chat qwen-vl-chat qwen1.5-chat qwen1.5-moe-chat seallm_v2 seallm_v2.5 skywork skywork-math starchat-beta starcoder starcoderplus tiny-llama vicuna-v1.3 vicuna-v1.5 vicuna-v1.5-16k wizardcoder-python-v1.0 wizardlm-v1.0 wizardmath-v1.0 xverse xverse-chat yi yi-200k yi-chat yi-vl-chat zephyr-7b-alpha zephyr-7b-beta