Add FP8 conversion for specific weight keys (#431)

Convert weights to FP8 format using FloatQuantizer.

Add FP8 conversion for specific weight keys (#431)
Convert weights to FP8 format using FloatQuantizer.
fc231d3d · gushiqiao · GitHub · f0f13701 · fc231d3d
Unverified Commit fc231d3d authored Nov 03, 2025 by gushiqiao Committed by GitHub Nov 03, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 1 addition and 0 deletions

tools/convert/quant_adapter.py tools/convert/quant_adapter.py +1 -0

No files found.
--- a/tools/convert/quant_adapter.py
+++ b/tools/convert/quant_adapter.py
@@ -49,6 +49,7 @@ def main():
            print(f"Converting {key} to FP8, dtype: {state_dict[key].dtype}")
            ## fp8
+            weight = state_dict[key].to(torch.float32).cuda()
            w_quantizer = FloatQuantizer("e4m3", True, "per_channel")
            weight, weight_scale, _ = w_quantizer.real_quant_tensor(weight)
            weight = weight.to(torch.float8_e4m3fn)