Mixtral 8x7B support (#2011)

Co-authored-by: Pierre Stock <p@mistral.ai> Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>

Mixtral 8x7B support (#2011)
Co-authored-by: Pierre Stock <p@mistral.ai> Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
b5f882cc · Pierre Stock · GitHub · 2e8fc0d4 · b5f882cc · b5f882cc
Unverified Commit b5f882cc authored Dec 11, 2023 by Pierre Stock Committed by GitHub Dec 11, 2023
4 changed files
--- a/README.md
+++ b/README.md
@@ -60,6 +60,7 @@ vLLM seamlessly supports many Hugging Face models, including the following archi
 - InternLM (`internlm/internlm-7b`, `internlm/internlm-chat-7b`, etc.)
 - LLaMA & LLaMA-2 (`meta-llama/Llama-2-70b-hf`, `lmsys/vicuna-13b-v1.3`, `young-geng/koala`, `openlm-research/open_llama_13b`, etc.)
 - Mistral (`mistralai/Mistral-7B-v0.1`, `mistralai/Mistral-7B-Instruct-v0.1`, etc.)
+- Mixtral (`mistralai/Mixtral-8x7B-v0.1`, `mistralai/Mixtral-8x7B-Instruct-v0.1`, etc.)
 - MPT (`mosaicml/mpt-7b`, `mosaicml/mpt-30b`, etc.)
 - OPT (`facebook/opt-66b`, `facebook/opt-iml-max-30b`, etc.)
 - Phi-1.5 (`microsoft/phi-1_5`, etc.)

--- a/vllm/model_executor/model_loader.py
+++ b/vllm/model_executor/model_loader.py
@@ -33,6 +33,7 @@ _MODEL_REGISTRY = {
    "LlamaForCausalLM": LlamaForCausalLM,
    "LLaMAForCausalLM": LlamaForCausalLM,  # For decapoda-research/llama-*
    "MistralForCausalLM": MistralForCausalLM,
+    "MixtralForCausalLM": MixtralForCausalLM,
    # transformers's mpt class has lower case
    "MptForCausalLM": MPTForCausalLM,
    "MPTForCausalLM": MPTForCausalLM,

--- a/vllm/model_executor/models/__init__.py
+++ b/vllm/model_executor/models/__init__.py
@@ -10,6 +10,7 @@ from vllm.model_executor.models.gpt_neox import GPTNeoXForCausalLM
 from vllm.model_executor.models.internlm import InternLMForCausalLM
 from vllm.model_executor.models.llama import LlamaForCausalLM
 from vllm.model_executor.models.mistral import MistralForCausalLM
+from vllm.model_executor.models.mixtral import MixtralForCausalLM
 from vllm.model_executor.models.mpt import MPTForCausalLM
 from vllm.model_executor.models.opt import OPTForCausalLM
 from vllm.model_executor.models.phi_1_5 import PhiForCausalLM
@@ -35,5 +36,6 @@ __all__ = [
    "PhiForCausalLM",
    "QWenLMHeadModel",
    "MistralForCausalLM",
+    "MixtralForCausalLM",
    "YiForCausalLM",
 ]
--- a/vllm/model_executor/models/mixtral.py
+++ b/vllm/model_executor/models/mixtral.py