[ROCm][Bugfix][FP8] Make fp8 quant respect fused modules mapping (#16031)

Signed-off-by: mgoin <michael@neuralmagic.com>

[ROCm][Bugfix][FP8] Make fp8 quant respect fused modules mapping (#16031)
Signed-off-by: mgoin <michael@neuralmagic.com>
21802c4b · Michael Goin · GitHub · 652907b3 · 21802c4b
Unverified Commit 21802c4b authored Apr 07, 2025 by Michael Goin Committed by GitHub Apr 07, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 3 additions and 1 deletion

vllm/model_executor/layers/quantization/fp8.py vllm/model_executor/layers/quantization/fp8.py +3 -1

No files found.
--- a/vllm/model_executor/layers/quantization/fp8.py
+++ b/vllm/model_executor/layers/quantization/fp8.py
@@ -116,7 +116,9 @@ class Fp8Config(QuantizationConfig):
        from vllm.attention.layer import Attention  # Avoid circular import

        if isinstance(layer, LinearBase):
-            if is_layer_skipped(prefix, self.ignored_layers):
+            if is_layer_skipped(prefix=prefix,
+                                ignored_layers=self.ignored_layers,
+                                fused_mapping=self.packed_modules_mapping):
                return UnquantizedLinearMethod()
            return Fp8LinearMethod(self)
        elif isinstance(layer, FusedMoE):