Large language Models#

The following is a list of built-in LLM in Xinference:

MODEL NAME

ABILITIES

COTNEXT_LENGTH

DESCRIPTION

aquila2

generate

2048

Aquila2 series models are the base language models

aquila2-chat

chat

2048

Aquila2-chat series models are the chat models

aquila2-chat-16k

chat

16384

AquilaChat2-16k series models are the long-text chat models

baichuan

generate

4096

Baichuan is an open-source Transformer based LLM that is trained on both Chinese and English data.

baichuan-2

generate

4096

Baichuan2 is an open-source Transformer based LLM that is trained on both Chinese and English data.

baichuan-2-chat

chat

4096

Baichuan2-chat is a fine-tuned version of the Baichuan LLM, specializing in chatting.

baichuan-chat

chat

4096

Baichuan-chat is a fine-tuned version of the Baichuan LLM, specializing in chatting.

c4ai-command-r-v01

generate

131072

C4AI Command-R is a research release of a 35 billion parameter highly performant generative model.

c4ai-command-r-v01-4bit

generate

131072

This model is 4bit quantized version of C4AI Command-R using bitsandbytes.

chatglm

chat

2048

ChatGLM is an open-source General Language Model (GLM) based LLM trained on both Chinese and English data.

chatglm2

chat

8192

ChatGLM2 is the second generation of ChatGLM, still open-source and trained on Chinese and English data.

chatglm2-32k

chat

32768

ChatGLM2-32k is a special version of ChatGLM2, with a context window of 32k tokens instead of 8k.

chatglm3

chat, tools

8192

ChatGLM3 is the third generation of ChatGLM, still open-source and trained on Chinese and English data.

chatglm3-128k

chat

131072

ChatGLM3 is the third generation of ChatGLM, still open-source and trained on Chinese and English data.

chatglm3-32k

chat

32768

ChatGLM3 is the third generation of ChatGLM, still open-source and trained on Chinese and English data.

code-llama

generate

100000

Code-Llama is an open-source LLM trained by fine-tuning LLaMA2 for generating and discussing code.

code-llama-instruct

chat

100000

Code-Llama-Instruct is an instruct-tuned version of the Code-Llama LLM.

code-llama-python

generate

100000

Code-Llama-Python is a fine-tuned version of the Code-Llama LLM, specializing in Python.

codeqwen1.5-chat

chat

65536

CodeQwen1.5 is the Code-Specific version of Qwen1.5. It is a transformer-based decoder-only language model pretrained on a large amount of data of codes.

codeshell

generate

8194

CodeShell is a multi-language code LLM developed by the Knowledge Computing Lab of Peking University.

codeshell-chat

chat

8194

CodeShell is a multi-language code LLM developed by the Knowledge Computing Lab of Peking University.

deepseek-chat

chat

4096

DeepSeek LLM is an advanced language model comprising 67 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese.

deepseek-coder-instruct

chat

4096

deepseek-coder-instruct is a model initialized from deepseek-coder-base and fine-tuned on 2B tokens of instruction data.

deepseek-vl-chat

chat, vision

4096

DeepSeek-VL possesses general multimodal understanding capabilities, capable of processing logical diagrams, web pages, formula recognition, scientific literature, natural images, and embodied intelligence in complex scenarios.

falcon

generate

2048

Falcon is an open-source Transformer based LLM trained on the RefinedWeb dataset.

falcon-instruct

chat

2048

Falcon-instruct is a fine-tuned version of the Falcon LLM, specializing in chatting.

gemma-it

chat

8192

Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.

glaive-coder

chat

16384

A code model trained on a dataset of ~140k programming related problems and solutions generated from Glaive’s synthetic data generation platform.

gorilla-openfunctions-v1

chat

4096

OpenFunctions is designed to extend Large Language Model (LLM) Chat Completion feature to formulate executable APIs call given natural language instructions and API context.

gorilla-openfunctions-v2

chat

4096

OpenFunctions is designed to extend Large Language Model (LLM) Chat Completion feature to formulate executable APIs call given natural language instructions and API context.

gpt-2

generate

1024

GPT-2 is a Transformer-based LLM that is trained on WebTest, a 40 GB dataset of Reddit posts with 3+ upvotes.

internlm-20b

generate

16384

Pre-trained on over 2.3T Tokens containing high-quality English, Chinese, and code data.

internlm-7b

generate

8192

InternLM is a Transformer-based LLM that is trained on both Chinese and English data, focusing on practical scenarios.

internlm-chat-20b

chat

16384

Pre-trained on over 2.3T Tokens containing high-quality English, Chinese, and code data. The Chat version has undergone SFT and RLHF training.

internlm-chat-7b

chat

4096

Internlm-chat is a fine-tuned version of the Internlm LLM, specializing in chatting.

internlm2-chat

chat

204800

The second generation of the InternLM model, InternLM2.

llama-2

generate

4096

Llama-2 is the second generation of Llama, open-source and trained on a larger amount of data.

llama-2-chat

chat

4096

Llama-2-Chat is a fine-tuned version of the Llama-2 LLM, specializing in chatting.

llama-3

generate

8192

Llama 3 is an auto-regressive language model that uses an optimized transformer architecture

llama-3-instruct

chat

8192

The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks..

minicpm-2b-dpo-bf16

chat

4096

MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings.

minicpm-2b-dpo-fp16

chat

4096

MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings.

minicpm-2b-dpo-fp32

chat

4096

MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings.

minicpm-2b-sft-bf16

chat

4096

MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings.

minicpm-2b-sft-fp32

chat

4096

MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings.

mistral-instruct-v0.1

chat

8192

Mistral-7B-Instruct is a fine-tuned version of the Mistral-7B LLM on public datasets, specializing in chatting.

mistral-instruct-v0.2

chat

8192

The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an improved instruct fine-tuned version of Mistral-7B-Instruct-v0.1.

mistral-v0.1

generate

8192

Mistral-7B is a unmoderated Transformer based LLM claiming to outperform Llama2 on all benchmarks.

mixtral-8x22b-instruct-v0.1

chat

65536

The Mixtral-8x22B-Instruct-v0.1 Large Language Model (LLM) is an instruct fine-tuned version of the Mixtral-8x22B-v0.1, specializing in chatting.

mixtral-instruct-v0.1

chat

32768

Mistral-8x7B-Instruct is a fine-tuned version of the Mistral-8x7B LLM, specializing in chatting.

mixtral-v0.1

generate

32768

The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts.

omnilmm

chat, vision

2048

OmniLMM is a family of open-source large multimodal models (LMMs) adept at vision & language modeling.

openbuddy

chat

2048

OpenBuddy is a powerful open multilingual chatbot model aimed at global users.

openhermes-2.5

chat

8192

Openhermes 2.5 is a fine-tuned version of Mistral-7B-v0.1 on primarily GPT-4 generated data.

opt

generate

2048

Opt is an open-source, decoder-only, Transformer based LLM that was designed to replicate GPT-3.

orca

chat

2048

Orca is an LLM trained by fine-tuning LLaMA on explanation traces obtained from GPT-4.

orion-chat

chat

4096

Orion-14B series models are open-source multilingual large language models trained from scratch by OrionStarAI.

orion-chat-rag

chat

4096

Orion-14B series models are open-source multilingual large language models trained from scratch by OrionStarAI.

phi-2

generate

2048

Phi-2 is a 2.7B Transformer based LLM used for research on model safety, trained with data similar to Phi-1.5 but augmented with synthetic texts and curated websites.

phi-3-mini-128k-instruct

chat

128000

The Phi-3-Mini-128K-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets.

phi-3-mini-4k-instruct

chat

4096

The Phi-3-Mini-4k-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets.

platypus2-70b-instruct

generate

4096

Platypus-70B-instruct is a merge of garage-bAInd/Platypus2-70B and upstage/Llama-2-70b-instruct-v2.

qwen-chat

chat, tools

32768

Qwen-chat is a fine-tuned version of the Qwen LLM trained with alignment techniques, specializing in chatting.

qwen-vl-chat

chat, vision

4096

Qwen-VL-Chat supports more flexible interaction, such as multiple image inputs, multi-round question answering, and creative capabilities.

qwen1.5-chat

chat, tools

32768

Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data.

qwen1.5-moe-chat

chat

32768

Qwen1.5-MoE is a transformer-based MoE decoder-only language model pretrained on a large amount of data.

seallm_v2

generate

8192

We introduce SeaLLM-7B-v2, the state-of-the-art multilingual LLM for Southeast Asian (SEA) languages

seallm_v2.5

generate

8192

We introduce SeaLLM-7B-v2.5, the state-of-the-art multilingual LLM for Southeast Asian (SEA) languages

skywork

generate

4096

Skywork is a series of large models developed by the Kunlun Group · Skywork team.

skywork-math

generate

4096

Skywork is a series of large models developed by the Kunlun Group · Skywork team.

starchat-beta

chat

8192

Starchat-beta is a fine-tuned version of the Starcoderplus LLM, specializing in coding assistance.

starcoder

generate

8192

Starcoder is an open-source Transformer based LLM that is trained on permissively licensed data from GitHub.

starcoderplus

generate

8192

Starcoderplus is an open-source LLM trained by fine-tuning Starcoder on RedefinedWeb and StarCoderData datasets.

starling-lm

chat

4096

We introduce Starling-7B, an open large language model (LLM) trained by Reinforcement Learning from AI Feedback (RLAIF). The model harnesses the power of our new GPT-4 labeled ranking dataset

tiny-llama

generate

2048

The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens.

vicuna-v1.3

chat

2048

Vicuna is an open-source LLM trained by fine-tuning LLaMA on data collected from ShareGPT.

vicuna-v1.5

chat

4096

Vicuna is an open-source LLM trained by fine-tuning LLaMA on data collected from ShareGPT.

vicuna-v1.5-16k

chat

16384

Vicuna-v1.5-16k is a special version of Vicuna-v1.5, with a context window of 16k tokens instead of 4k.

wizardcoder-python-v1.0

chat

100000

wizardlm-v1.0

chat

2048

WizardLM is an open-source LLM trained by fine-tuning LLaMA with Evol-Instruct.

wizardmath-v1.0

chat

2048

WizardMath is an open-source LLM trained by fine-tuning Llama2 with Evol-Instruct, specializing in math.

xverse

generate

2048

XVERSE is a multilingual large language model, independently developed by Shenzhen Yuanxiang Technology.

xverse-chat

chat

2048

XVERSEB-Chat is the aligned version of model XVERSE.

yi

generate

4096

The Yi series models are large language models trained from scratch by developers at 01.AI.

yi-1.5

generate

4096

Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples.

yi-1.5-chat

chat

4096

Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples.

yi-200k

generate

262144

The Yi series models are large language models trained from scratch by developers at 01.AI.

yi-chat

chat

4096

The Yi series models are large language models trained from scratch by developers at 01.AI.

yi-vl-chat

chat, vision

4096

Yi Vision Language (Yi-VL) model is the open-source, multimodal version of the Yi Large Language Model (LLM) series, enabling content comprehension, recognition, and multi-round conversations about images.

zephyr-7b-alpha

chat

8192

Zephyr-7B-α is the first model in the series, and is a fine-tuned version of mistralai/Mistral-7B-v0.1.

zephyr-7b-beta

chat

8192

Zephyr-7B-β is the second model in the series, and is a fine-tuned version of mistralai/Mistral-7B-v0.1