[Feature][Quantization] auto_round support for mixed bits quantization (#23812)

Signed-off-by: n1ck-guo <heng.guo@intel.com> Signed-off-by: Heng Guo <heng.guo@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

[Feature][Quantization] auto_round support for mixed bits quantization (#23812)
Signed-off-by: n1ck-guo <heng.guo@intel.com> Signed-off-by: Heng Guo <heng.guo@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
87778d5f · Heng Guo · GitHub · f9e7ad54 · 87778d5f
Unverified Commit 87778d5f authored Oct 21, 2025 by Heng Guo Committed by GitHub Oct 20, 2025
Show whitespace changes
Inline Side-by-side

Showing with 6 additions and 0 deletions

vllm/model_executor/layers/quantization/auto_round.py vllm/model_executor/layers/quantization/auto_round.py +6 -0

No files found.
--- a/vllm/model_executor/layers/quantization/auto_round.py
+++ b/vllm/model_executor/layers/quantization/auto_round.py
@@ -436,6 +436,12 @@ class AutoRoundConfig(QuantizationConfig):
            return None

    def get_quant_method(self, layer: torch.nn.Module, prefix: str):
+        if prefix and self.extra_config:
+            for layer_name in self.extra_config:
+                if (
+                    layer_name == prefix or layer_name == f"model.{prefix}"
+                ) and self.extra_config[layer_name].get("bits", 16) >= 16:
+                    return UnquantizedLinearMethod()
        if (
            current_platform.is_cpu()
            or current_platform.is_xpu()