Commit daa2784b authored by Michael Goin's avatar Michael Goin Committed by Robert Shaw
Browse files

[Bugfix] Disable RoutingMethodType.[Renormalize,RenormalizeNaive] TRTLLM...


[Bugfix] Disable RoutingMethodType.[Renormalize,RenormalizeNaive] TRTLLM per-tensor FP8 MoE (#33620)
Signed-off-by: default avatarmgoin <mgoin64@gmail.com>
(cherry picked from commit e346e2d0

)
Signed-off-by: default avatarRobert Shaw <rshaw@neuralmagic.com>
parent e4bf6ed9
...@@ -69,9 +69,14 @@ def _supports_routing_method( ...@@ -69,9 +69,14 @@ def _supports_routing_method(
RoutingMethodType.RenormalizeNaive, RoutingMethodType.RenormalizeNaive,
] ]
elif (weight_key, activation_key) == (kFp8StaticTensorSym, kFp8StaticTensorSym): elif (weight_key, activation_key) == (kFp8StaticTensorSym, kFp8StaticTensorSym):
# NOTE(rob): kernel requires Llama4. # NOTE(dbari): as above, potentially allow others here.
return routing_method == RoutingMethodType.Llama4 return routing_method in [
RoutingMethodType.Llama4,
# NOTE(mgoin): Disabled to investigate accuracy issues.
# See https://github.com/vllm-project/vllm/issues/33532
# RoutingMethodType.Renormalize,
# RoutingMethodType.RenormalizeNaive,
]
else: else:
raise ValueError("Unsupported quantization scheme.") raise ValueError("Unsupported quantization scheme.")
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment