Commit 7624525b authored by Vadim Gimpelson's avatar Vadim Gimpelson Committed by khluu
Browse files

cherry-pick [Bugfix] Restore prepare_fp8_layer_for_marlin removed by merge conflict resolution


Signed-off-by: default avatarkhluu <khluu000@gmail.com>
Co-authored-by: default avatarvadiklyutiy <vgimpelson@nvidia.com>
#38398
parent d1b4f10b
...@@ -437,6 +437,14 @@ class Fp8LinearMethod(LinearMethodBase): ...@@ -437,6 +437,14 @@ class Fp8LinearMethod(LinearMethodBase):
else: else:
layer.input_scale = None layer.input_scale = None
if self.use_marlin:
prepare_fp8_layer_for_marlin(
layer, size_k_first, input_dtype=self.marlin_input_dtype
)
# Activations not quantized for marlin.
del layer.input_scale
return
if self.block_quant and self.use_deep_gemm: if self.block_quant and self.use_deep_gemm:
maybe_post_process_fp8_weight_block(layer) maybe_post_process_fp8_weight_block(layer)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment