supported_models.md 2.89 KB
Newer Older
jixx's avatar
init  
jixx committed
1

jixx's avatar
jixx committed
2
# Supported Models
jixx's avatar
init  
jixx committed
3

jixx's avatar
jixx committed
4
Text Generation Inference enables serving optimized models. The following sections list which models (VLMs & LLMs) are supported.
jixx's avatar
init  
jixx committed
5

jixx's avatar
jixx committed
6
- [Deepseek V2](https://huggingface.co/deepseek-ai/DeepSeek-V2)
jixx's avatar
init  
jixx committed
7
8
- [Idefics 2](https://huggingface.co/HuggingFaceM4/idefics2-8b) (Multimodal)
- [Llava Next (1.6)](https://huggingface.co/llava-hf/llava-v1.6-vicuna-13b-hf) (Multimodal)
jixx's avatar
jixx committed
9
- [Llama](https://huggingface.co/collections/meta-llama/llama-31-669fc079a0c406a149a5738f)
jixx's avatar
init  
jixx committed
10
- [Phi 3](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)
jixx's avatar
jixx committed
11
- [Granite](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct)
jixx's avatar
init  
jixx committed
12
- [Gemma](https://huggingface.co/google/gemma-7b)
jixx's avatar
jixx committed
13
14
- [PaliGemma](https://huggingface.co/google/paligemma-3b-pt-224)
- [Gemma2](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315)
jixx's avatar
init  
jixx committed
15
16
17
- [Cohere](https://huggingface.co/CohereForAI/c4ai-command-r-plus)
- [Dbrx](https://huggingface.co/databricks/dbrx-instruct)
- [Mamba](https://huggingface.co/state-spaces/mamba-2.8b-slimpj)
jixx's avatar
jixx committed
18
- [Mistral](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407)
jixx's avatar
init  
jixx committed
19
20
21
- [Mixtral](https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1)
- [Gpt Bigcode](https://huggingface.co/bigcode/gpt_bigcode-santacoder)
- [Phi](https://huggingface.co/microsoft/phi-1_5)
jixx's avatar
jixx committed
22
- [PhiMoe](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct)
jixx's avatar
init  
jixx committed
23
24
25
26
27
28
29
30
31
32
33
34
- [Baichuan](https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat)
- [Falcon](https://huggingface.co/tiiuae/falcon-7b-instruct)
- [StarCoder 2](https://huggingface.co/bigcode/starcoder2-15b-instruct-v0.1)
- [Qwen 2](https://huggingface.co/collections/Qwen/qwen2-6659360b33528ced941e557f)
- [Opt](https://huggingface.co/facebook/opt-6.7b)
- [T5](https://huggingface.co/google/flan-t5-xxl)
- [Galactica](https://huggingface.co/facebook/galactica-120b)
- [SantaCoder](https://huggingface.co/bigcode/santacoder)
- [Bloom](https://huggingface.co/bigscience/bloom-560m)
- [Mpt](https://huggingface.co/mosaicml/mpt-7b-instruct)
- [Gpt2](https://huggingface.co/openai-community/gpt2)
- [Gpt Neox](https://huggingface.co/EleutherAI/gpt-neox-20b)
jixx's avatar
jixx committed
35
- [Gptj](https://huggingface.co/EleutherAI/gpt-j-6b)
jixx's avatar
init  
jixx committed
36
- [Idefics](https://huggingface.co/HuggingFaceM4/idefics-9b) (Multimodal)
jixx's avatar
jixx committed
37
38
- [Mllama](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) (Multimodal)

jixx's avatar
init  
jixx committed
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54


If the above list lacks the model you would like to serve, depending on the model's pipeline type, you can try to initialize and serve the model anyways to see how well it performs, but performance isn't guaranteed for non-optimized models:

```python
# for causal LMs/text-generation models
AutoModelForCausalLM.from_pretrained(<model>, device_map="auto")`
# or, for text-to-text generation models
AutoModelForSeq2SeqLM.from_pretrained(<model>, device_map="auto")
```

If you wish to serve a supported model that already exists on a local folder, just point to the local folder.

```bash
text-generation-launcher --model-id <PATH-TO-LOCAL-BLOOM>
```