[Bugfix] Fix incorrect use of hidden_states for shared_experts due to...

[Bugfix] Fix incorrect use of hidden_states for shared_experts due to do_naive_dispatch_combine (#28740) Signed-off-by: Alexander Matveev <amatveev@redhat.com>

[Bugfix] Fix incorrect use of hidden_states for shared_experts due to...
[Bugfix] Fix incorrect use of hidden_states for shared_experts due to do_naive_dispatch_combine (#28740) Signed-off-by: Alexander Matveev <amatveev@redhat.com>
e5c78956 · Alexander Matveev · GitHub · 2e0ad629 · e5c78956
Unverified Commit e5c78956 authored Nov 14, 2025 by Alexander Matveev Committed by GitHub Nov 14, 2025
Show whitespace changes
Inline Side-by-side

Showing with 4 additions and 2 deletions

vllm/model_executor/layers/fused_moe/layer.py vllm/model_executor/layers/fused_moe/layer.py +4 -2

No files found.
--- a/vllm/model_executor/layers/fused_moe/layer.py
+++ b/vllm/model_executor/layers/fused_moe/layer.py
@@ -1749,14 +1749,16 @@ class FusedMoE(CustomOp):

        with sp_ctx:
            if do_naive_dispatch_combine:
-                hidden_states, router_logits = get_ep_group().dispatch(
+                hidden_states_combined, router_logits = get_ep_group().dispatch(
                    hidden_states, router_logits, self.is_sequence_parallel
                )

            # Matrix multiply.
            final_hidden_states = self.quant_method.apply(
                layer=self,
-                x=hidden_states,
+                x=hidden_states_combined
+                if do_naive_dispatch_combine
+                else hidden_states,
                router_logits=router_logits,
                top_k=self.top_k,
                renormalize=self.renormalize,