[Bugfix] Fix mismatched nvfp4 gemm output shape (#29742)

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

[Bugfix] Fix mismatched nvfp4 gemm output shape (#29742)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
47539cfd · Isotr0py · GitHub · 2afcec4d · 47539cfd
Unverified Commit 47539cfd authored Nov 30, 2025 by Isotr0py Committed by GitHub Nov 30, 2025
Show whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a4_nvfp4.py ...mpressed_tensors/schemes/compressed_tensors_w4a4_nvfp4.py +1 -1

No files found.
--- a/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a4_nvfp4.py
+++ b/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w4a4_nvfp4.py
@@ -184,7 +184,7 @@ class CompressedTensorsW4A4Fp4(CompressedTensorsScheme):
            return out

        output_dtype = x.dtype
-        output_shape = [x.shape[0], layer.weight_packed.shape[0]]
+        output_shape = [*x.shape[:-1], layer.weight_packed.shape[0]]

        # quantize BF16 or FP16 to (FP4 and interleaved block scale)
        x_fp4, x_blockscale = scaled_fp4_quant(x, layer.input_global_scale)