[MoE] Add routing simulation override for MXFP4 quantized MoE (#33595)

Signed-off-by: Jaewon Lee <jaewon@meta.com>

[MoE] Add routing simulation override for MXFP4 quantized MoE (#33595)
Signed-off-by: Jaewon Lee <jaewon@meta.com>
aaa3092f · Jaewon · GitHub · 87985077 · aaa3092f
Unverified Commit aaa3092f authored Mar 12, 2026 by Jaewon Committed by GitHub Mar 13, 2026
Show whitespace changes
Inline Side-by-side

Showing with 6 additions and 0 deletions

vllm/model_executor/layers/quantization/mxfp4.py vllm/model_executor/layers/quantization/mxfp4.py +6 -0

No files found.
--- a/vllm/model_executor/layers/quantization/mxfp4.py
+++ b/vllm/model_executor/layers/quantization/mxfp4.py
@@ -1109,6 +1109,12 @@ class Mxfp4MoEMethod(FusedMoEMethodBase):
            layer.eplb_state.logical_replica_count,
        ), "MXFP4 are not supported with this configuration."
+        # Apply routing simulation strategy if specified.
+        # This applies to all monolithic backends (SM100_FI and TRITON).
+        routing_strategy = envs.VLLM_MOE_ROUTING_SIMULATION_STRATEGY
+        if routing_strategy == "uniform_random":
+            router_logits = torch.rand_like(router_logits)
        if (
            self.mxfp4_backend == Mxfp4Backend.SM100_FI_MXFP4_MXFP8_TRTLLM
            or self.mxfp4_backend == Mxfp4Backend.SM100_FI_MXFP4_BF16