supported_models.rst 1.4 KB
Newer Older
Woosuk Kwon's avatar
Woosuk Kwon committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
.. _supported_models:

Supported Models
================

CacheFlow supports a variety of generative Transformer models in `HuggingFace Transformers <https://github.com/huggingface/transformers>`_.
The following is the list of model architectures that are currently supported by CacheFlow.
Alongside each architecture, we include some popular models that use it.

.. list-table::
  :widths: 25 75
  :header-rows: 1

  * - Architecture
    - Models
  * - :code:`GPT2LMHeadModel`
    - GPT-2
  * - :code:`GPTNeoXForCausalLM`
    - GPT-NeoX, Pythia, OpenAssistant, Dolly V2, StableLM
  * - :code:`LlamaForCausalLM`
    - LLaMA, Vicuna, Alpaca, Koala
  * - :code:`OPTForCausalLM`
    - OPT, OPT-IML

If your model uses one of the above model architectures, you can seamlessly run your model with CacheFlow.
Otherwise, please refer to :ref:`Adding a New Model <adding_a_new_model>` for instructions on how to implement support for your model.
Alternatively, you can raise an issue on our `GitHub <https://github.com/WoosukKwon/cacheflow/issues>`_ project.

.. tip::
    The easiest way to check if your model is supported is to run the program below:

    .. code-block:: python

        from cacheflow import LLM

        llm = LLM(model=...)  # Name or path of your model
        output = llm.generate("Hello, my name is")
        print(output)

    If CacheFlow successfully generates text, it indicates that your model is supported.