Unverified Commit c32fb7a2 authored by sogalin's avatar sogalin Committed by GitHub
Browse files

[ROCm] Fix fp8 quantization accuracy issue. (#10558)

parent 1ba137e9
......@@ -732,7 +732,7 @@ def apply_fp8_linear(
# final solution should be: 1. add support to per-tensor activation scaling.
# 2. solve the torch.compile error from weight_scale.numel() == 1 and x_scale.numel() > 1 (below line#308)
if _is_hip and weight_scale.numel() == 1:
qinput, x_scale = ops.scaled_fp8_quant(
qinput, x_scale = scaled_fp8_quant(
input_2d,
input_scale,
use_per_token_if_dynamic=use_per_token_if_dynamic,
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment