[bugfix][quantization] Fix fp8 per_tensor scale shape (#30257)

Signed-off-by: Haoyang Li <lihaoyang0109@gmail.com>

[bugfix][quantization] Fix fp8 per_tensor scale shape (#30257)
Signed-off-by: Haoyang Li <lihaoyang0109@gmail.com>
03416ead · haoyangli-amd · GitHub · c72ea107 · 03416ead
Unverified Commit 03416ead authored Dec 09, 2025 by haoyangli-amd Committed by GitHub Dec 09, 2025
Show whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

vllm/_custom_ops.py vllm/_custom_ops.py +1 -1

No files found.
--- a/vllm/_custom_ops.py
+++ b/vllm/_custom_ops.py
@@ -1726,7 +1726,7 @@ def scaled_fp8_quant(
                output, input, scale, scale_ub
            )
        else:
-            scale = torch.empty((1, 1), device=input.device, dtype=torch.float32)
+            scale = torch.empty(1, device=input.device, dtype=torch.float32)
            torch.ops._C.dynamic_scaled_fp8_quant(output, input, scale)
    else:
        assert scale.numel() == 1, f"{scale.shape}"