Unverified Commit a29e62ea authored by Michael Goin's avatar Michael Goin Committed by GitHub
Browse files

Fix num_token_padding support for static per-tensor scaled_fp8_quant (#20188)


Signed-off-by: default avatarmgoin <mgoin64@gmail.com>
parent e53be6f0
......@@ -1274,8 +1274,7 @@ def scaled_fp8_quant(
scale = torch.zeros(1, device=input.device, dtype=torch.float32)
torch.ops._C.dynamic_scaled_fp8_quant(output, input, scale)
else:
# num_token_padding not implemented for this case
assert (scale.numel() == 1 and num_token_padding is None)
assert scale.numel() == 1
torch.ops._C.static_scaled_fp8_quant(output, input, scale)
return output, scale
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment