[Model] support bitsandbytes quantization with minicpm model (#10842)

Signed-off-by: Ubuntu <zixuanzhang@bytedance.com>

[Model] support bitsandbytes quantization with minicpm model (#10842)
Signed-off-by: Ubuntu <zixuanzhang@bytedance.com>
d746268e · zixuanzhang226 · GitHub · 4433195a · d746268e
Unverified Commit d746268e authored Dec 02, 2024 by zixuanzhang226 Committed by GitHub Dec 03, 2024
Hide whitespace changes
Inline Side-by-side

Showing with 10 additions and 0 deletions

vllm/model_executor/models/minicpm.py vllm/model_executor/models/minicpm.py +10 -0

No files found.
--- a/vllm/model_executor/models/minicpm.py
+++ b/vllm/model_executor/models/minicpm.py
@@ -534,6 +534,16 @@ class MiniCPMForCausalLM(nn.Module, SupportsLoRA, SupportsPP):
    }
    embedding_padding_modules = ["lm_head"]

+    # BitandBytes specific attributes
+    bitsandbytes_stacked_params_mapping = {
+        # shard_name, weight_name, index
+        "q_proj": ("qkv_proj", 0),
+        "k_proj": ("qkv_proj", 1),
+        "v_proj": ("qkv_proj", 2),
+        "gate_proj": ("gate_up_proj", 0),
+        "up_proj": ("gate_up_proj", 1),
+    }
+
    def __init__(self, *, vllm_config: VllmConfig, prefix: str = ""):
        super().__init__()
        config = vllm_config.model_config.hf_config