[Bugfix] Disable RoutingMethodType.[Renormalize,RenormalizeNaive] TRTLLM...

[Bugfix] Disable RoutingMethodType.[Renormalize,RenormalizeNaive] TRTLLM per-tensor FP8 MoE (#33620) Signed-off-by: mgoin <mgoin64@gmail.com> (cherry picked from commit e346e2d0 ) Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

[Bugfix] Disable RoutingMethodType.[Renormalize,RenormalizeNaive] TRTLLM...
[Bugfix] Disable RoutingMethodType.[Renormalize,RenormalizeNaive] TRTLLM per-tensor FP8 MoE (#33620) Signed-off-by: mgoin <mgoin64@gmail.com> (cherry picked from commit e346e2d0 ) Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
daa2784b · Michael Goin · Robert Shaw · e4bf6ed9 · daa2784b
Commit daa2784b authored Feb 03, 2026 by Michael Goin Committed by Robert Shaw Feb 03, 2026
Hide whitespace changes
Inline Side-by-side

Showing with 8 additions and 3 deletions

vllm/model_executor/layers/fused_moe/flashinfer_trtllm_moe.py .../model_executor/layers/fused_moe/flashinfer_trtllm_moe.py +8 -3

No files found.
--- a/vllm/model_executor/layers/fused_moe/flashinfer_trtllm_moe.py
+++ b/vllm/model_executor/layers/fused_moe/flashinfer_trtllm_moe.py
@@ -69,9 +69,14 @@ def _supports_routing_method(
            RoutingMethodType.RenormalizeNaive,
        ]
    elif (weight_key, activation_key) == (kFp8StaticTensorSym, kFp8StaticTensorSym):
-        # NOTE(rob): kernel requires Llama4.
-        return routing_method == RoutingMethodType.Llama4
-
+        # NOTE(dbari): as above, potentially allow others here.
+        return routing_method in [
+            RoutingMethodType.Llama4,
+            # NOTE(mgoin): Disabled to investigate accuracy issues.
+            # See https://github.com/vllm-project/vllm/issues/33532
+            # RoutingMethodType.Renormalize,
+            # RoutingMethodType.RenormalizeNaive,
+        ]
    else:
        raise ValueError("Unsupported quantization scheme.")