supported_models.rst 5.54 KB
Newer Older
Woosuk Kwon's avatar
Woosuk Kwon committed
1
2
3
4
5
.. _supported_models:

Supported Models
================

6
vLLM supports a variety of generative Transformer models in `HuggingFace Transformers <https://huggingface.co/models>`_.
Woosuk Kwon's avatar
Woosuk Kwon committed
7
The following is the list of model architectures that are currently supported by vLLM.
Woosuk Kwon's avatar
Woosuk Kwon committed
8
9
10
Alongside each architecture, we include some popular models that use it.

.. list-table::
11
  :widths: 25 25 50 5
Woosuk Kwon's avatar
Woosuk Kwon committed
12
13
14
15
  :header-rows: 1

  * - Architecture
    - Models
16
    - Example HuggingFace Models
17
    - :ref:`LoRA <lora>`
18
  * - :code:`AquilaForCausalLM`
19
    - Aquila
20
    - :code:`BAAI/Aquila-7B`, :code:`BAAI/AquilaChat-7B`, etc.
21
    - ✅︎
Zhuohan Li's avatar
Zhuohan Li committed
22
  * - :code:`BaiChuanForCausalLM`
23
    - Baichuan
24
    - :code:`baichuan-inc/Baichuan2-13B-Chat`, :code:`baichuan-inc/Baichuan-7B`, etc.
Jee Li's avatar
Jee Li committed
25
    - ✅︎
26
27
28
  * - :code:`ChatGLMModel`
    - ChatGLM
    - :code:`THUDM/chatglm2-6b`, :code:`THUDM/chatglm3-6b`, etc.
Jee Li's avatar
Jee Li committed
29
    - ✅︎
30
31
32
33
  * - :code:`DbrxForCausalLM`
    - DBRX
    - :code:`databricks/dbrx-base`, :code:`databricks/dbrx-instruct`, etc.
    - 
34
35
36
  * - :code:`DeciLMForCausalLM`
    - DeciLM
    - :code:`Deci/DeciLM-7B`, :code:`Deci/DeciLM-7B-instruct`, etc.
37
    - 
Woosuk Kwon's avatar
Woosuk Kwon committed
38
39
40
  * - :code:`BloomForCausalLM`
    - BLOOM, BLOOMZ, BLOOMChat
    - :code:`bigscience/bloom`, :code:`bigscience/bloomz`, etc.
41
    - 
Zhuohan Li's avatar
Zhuohan Li committed
42
43
  * - :code:`FalconForCausalLM`
    - Falcon
44
    - :code:`tiiuae/falcon-7b`, :code:`tiiuae/falcon-40b`, :code:`tiiuae/falcon-rw-7b`, etc.
45
    - 
46
47
48
  * - :code:`GemmaForCausalLM`
    - Gemma
    - :code:`google/gemma-2b`, :code:`google/gemma-7b`, etc.
49
    - ✅︎
Woosuk Kwon's avatar
Woosuk Kwon committed
50
51
  * - :code:`GPT2LMHeadModel`
    - GPT-2
52
    - :code:`gpt2`, :code:`gpt2-xl`, etc.
53
    - 
54
55
56
  * - :code:`GPTBigCodeForCausalLM`
    - StarCoder, SantaCoder, WizardCoder
    - :code:`bigcode/starcoder`, :code:`bigcode/gpt_bigcode-santacoder`, :code:`WizardLM/WizardCoder-15B-V1.0`, etc.
57
    - 
58
59
60
  * - :code:`GPTJForCausalLM`
    - GPT-J
    - :code:`EleutherAI/gpt-j-6b`, :code:`nomic-ai/gpt4all-j`, etc.
61
    - 
Woosuk Kwon's avatar
Woosuk Kwon committed
62
63
  * - :code:`GPTNeoXForCausalLM`
    - GPT-NeoX, Pythia, OpenAssistant, Dolly V2, StableLM
64
    - :code:`EleutherAI/gpt-neox-20b`, :code:`EleutherAI/pythia-12b`, :code:`OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5`, :code:`databricks/dolly-v2-12b`, :code:`stabilityai/stablelm-tuned-alpha-7b`, etc.
65
    - 
66
67
68
  * - :code:`InternLMForCausalLM`
    - InternLM
    - :code:`internlm/internlm-7b`, :code:`internlm/internlm-chat-7b`, etc.
69
    - ✅︎
Fengzhe Zhou's avatar
Fengzhe Zhou committed
70
71
72
  * - :code:`InternLM2ForCausalLM`
    - InternLM2
    - :code:`internlm/internlm2-7b`, :code:`internlm/internlm2-chat-7b`, etc.
73
74
75
76
77
    -
  * - :code:`JAISLMHeadModel`
    - Jais
    - :code:`core42/jais-13b`, :code:`core42/jais-13b-chat`, :code:`core42/jais-30b-v3`, :code:`core42/jais-30b-chat-v3`, etc.
    -
Woosuk Kwon's avatar
Woosuk Kwon committed
78
  * - :code:`LlamaForCausalLM`
79
80
    - LLaMA, LLaMA-2, Vicuna, Alpaca, Yi
    - :code:`meta-llama/Llama-2-13b-hf`, :code:`meta-llama/Llama-2-70b-hf`, :code:`openlm-research/open_llama_13b`, :code:`lmsys/vicuna-13b-v1.3`, :code:`01-ai/Yi-6B`, :code:`01-ai/Yi-34B`, etc.
81
    - ✅︎
82
83
84
  * - :code:`MistralForCausalLM`
    - Mistral, Mistral-Instruct
    - :code:`mistralai/Mistral-7B-v0.1`, :code:`mistralai/Mistral-7B-Instruct-v0.1`, etc.
85
    - ✅︎
Woosuk Kwon's avatar
Woosuk Kwon committed
86
87
88
  * - :code:`MixtralForCausalLM`
    - Mixtral-8x7B, Mixtral-8x7B-Instruct
    - :code:`mistralai/Mixtral-8x7B-v0.1`, :code:`mistralai/Mixtral-8x7B-Instruct-v0.1`, etc.
89
    - ✅︎
Woosuk Kwon's avatar
Woosuk Kwon committed
90
  * - :code:`MPTForCausalLM`
91
92
    - MPT, MPT-Instruct, MPT-Chat, MPT-StoryWriter
    - :code:`mosaicml/mpt-7b`, :code:`mosaicml/mpt-7b-storywriter`, :code:`mosaicml/mpt-30b`, etc.
93
    - 
Isotr0py's avatar
Isotr0py committed
94
95
96
  * - :code:`OLMoForCausalLM`
    - OLMo
    - :code:`allenai/OLMo-1B`, :code:`allenai/OLMo-7B`, etc.
97
    - 
Woosuk Kwon's avatar
Woosuk Kwon committed
98
99
  * - :code:`OPTForCausalLM`
    - OPT, OPT-IML
100
    - :code:`facebook/opt-66b`, :code:`facebook/opt-iml-max-30b`, etc.
101
    - 
张大成's avatar
张大成 committed
102
103
104
  * - :code:`OrionForCausalLM`
    - Orion
    - :code:`OrionStarAI/Orion-14B-Base`, :code:`OrionStarAI/Orion-14B-Chat`, etc.
105
    - 
106
  * - :code:`PhiForCausalLM`
107
108
    - Phi
    - :code:`microsoft/phi-1_5`, :code:`microsoft/phi-2`, etc.
109
    - 
110
  * - :code:`QWenLMHeadModel`
111
112
    - Qwen
    - :code:`Qwen/Qwen-7B`, :code:`Qwen/Qwen-7B-Chat`, etc.
113
    - 
Junyang Lin's avatar
Junyang Lin committed
114
115
  * - :code:`Qwen2ForCausalLM`
    - Qwen2
116
    - :code:`Qwen/Qwen2-beta-7B`, :code:`Qwen/Qwen2-beta-7B-Chat`, etc.
117
    - ✅︎
118
  * - :code:`StableLmForCausalLM`
Hyunsung Lee's avatar
Hyunsung Lee committed
119
120
    - StableLM
    - :code:`stabilityai/stablelm-3b-4e1t/` , :code:`stabilityai/stablelm-base-alpha-7b-v2`, etc.
121
    - 
Woosuk Kwon's avatar
Woosuk Kwon committed
122

Woosuk Kwon's avatar
Woosuk Kwon committed
123
If your model uses one of the above model architectures, you can seamlessly run your model with vLLM.
Woosuk Kwon's avatar
Woosuk Kwon committed
124
Otherwise, please refer to :ref:`Adding a New Model <adding_a_new_model>` for instructions on how to implement support for your model.
125
Alternatively, you can raise an issue on our `GitHub <https://github.com/vllm-project/vllm/issues>`_ project.
Woosuk Kwon's avatar
Woosuk Kwon committed
126

127
.. note::
128
    Currently, the ROCm version of vLLM supports Mistral and Mixtral only for context lengths up to 4096.
129

Woosuk Kwon's avatar
Woosuk Kwon committed
130
131
132
133
134
.. tip::
    The easiest way to check if your model is supported is to run the program below:

    .. code-block:: python

Woosuk Kwon's avatar
Woosuk Kwon committed
135
        from vllm import LLM
Woosuk Kwon's avatar
Woosuk Kwon committed
136
137
138
139
140

        llm = LLM(model=...)  # Name or path of your model
        output = llm.generate("Hello, my name is")
        print(output)

141
142
143
    If vLLM successfully generates text, it indicates that your model is supported.

.. tip::
144
    To use models from `ModelScope <https://www.modelscope.cn>`_ instead of HuggingFace Hub, set an environment variable:
145
146
147
148
149

    .. code-block:: shell

       $ export VLLM_USE_MODELSCOPE=True

150
151
    And use with :code:`trust_remote_code=True`.

152
153
154
155
156
157
158
    .. code-block:: python

        from vllm import LLM

        llm = LLM(model=..., revision=..., trust_remote_code=True)  # Name or path of your model
        output = llm.generate("Hello, my name is")
        print(output)