cherry-pick [Bugfix] Restore prepare_fp8_layer_for_marlin removed by merge conflict resolution

Signed-off-by: khluu <khluu000@gmail.com> Co-authored-by: vadiklyutiy <vgimpelson@nvidia.com> #38398

cherry-pick [Bugfix] Restore prepare_fp8_layer_for_marlin removed by merge conflict resolution
Signed-off-by: khluu <khluu000@gmail.com> Co-authored-by: vadiklyutiy <vgimpelson@nvidia.com> #38398
7624525b · Vadim Gimpelson · khluu · d1b4f10b · 7624525b
Commit 7624525b authored Mar 27, 2026 by Vadim Gimpelson Committed by khluu Mar 27, 2026
Hide whitespace changes
Inline Side-by-side

Showing with 8 additions and 0 deletions

vllm/model_executor/layers/quantization/fp8.py vllm/model_executor/layers/quantization/fp8.py +8 -0

No files found.
--- a/vllm/model_executor/layers/quantization/fp8.py
+++ b/vllm/model_executor/layers/quantization/fp8.py
@@ -437,6 +437,14 @@ class Fp8LinearMethod(LinearMethodBase):
        else:
            layer.input_scale = None
+        if self.use_marlin:
+            prepare_fp8_layer_for_marlin(
+                layer, size_k_first, input_dtype=self.marlin_input_dtype
+            )
+            # Activations not quantized for marlin.
+            del layer.input_scale
+            return
        if self.block_quant and self.use_deep_gemm:
            maybe_post_process_fp8_weight_block(layer)