[BugFix]fix Qwen3 MoE call gate twice (#40664)

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

[BugFix]fix Qwen3 MoE call gate twice (#40664)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
342c58bc · Kunshang Ji · GitHub · fe9c3d6c · 342c58bc
Unverified Commit 342c58bc authored Apr 23, 2026 by Kunshang Ji Committed by GitHub Apr 23, 2026
Hide whitespace changes
Inline Side-by-side

Showing with 13 additions and 5 deletions

vllm/model_executor/models/qwen3_moe.py vllm/model_executor/models/qwen3_moe.py +13 -5

No files found.
--- a/vllm/model_executor/models/qwen3_moe.py
+++ b/vllm/model_executor/models/qwen3_moe.py
@@ -231,11 +231,19 @@ class Qwen3MoeSparseMoeBlock(nn.Module):
        if self.is_sequence_parallel:
            hidden_states = sequence_parallel_chunk(hidden_states)
-        # router_logits: (num_tokens, n_experts)
+        if self.experts.is_internal_router:
-        router_logits, _ = self.gate(hidden_states)
+            # In this case, the gate/router runs inside the FusedMoE class
-        final_hidden_states = self.experts(
+            final_hidden_states = self.experts(
-            hidden_states=hidden_states, router_logits=router_logits
+                hidden_states=hidden_states, router_logits=hidden_states
-        )
+            )
+        else:
+            # Actually this will be dead code, since we always pass gate into
+            # FusedMoE in the current implementation. But we keep this code
+            # here for clarity and future flexibility.
+            router_logits, _ = self.gate(hidden_states)
+            final_hidden_states = self.experts(
+                hidden_states=hidden_states, router_logits=router_logits
+            )
        if self.is_sequence_parallel:
            final_hidden_states = tensor_model_parallel_all_gather(