[Bugfix] Fix DeepSeek V2-Lite Accuracy drop (#40673)

Signed-off-by: Bill Nell <bnell@redhat.com>

[Bugfix] Fix DeepSeek V2-Lite Accuracy drop (#40673)
Signed-off-by: Bill Nell <bnell@redhat.com>
4a6dd1c3 · bnellnm · GitHub · 7ff65b19 · 4a6dd1c3
Unverified Commit 4a6dd1c3 authored Apr 23, 2026 by bnellnm Committed by GitHub Apr 23, 2026
Hide whitespace changes
Inline Side-by-side

Showing with 9 additions and 4 deletions

vllm/model_executor/layers/fused_moe/runner/moe_runner.py vllm/model_executor/layers/fused_moe/runner/moe_runner.py +9 -4

No files found.
--- a/vllm/model_executor/layers/fused_moe/runner/moe_runner.py
+++ b/vllm/model_executor/layers/fused_moe/runner/moe_runner.py
@@ -335,11 +335,16 @@ class MoERunner(MoERunnerInterface):
        """All-reduce shared expert output when the combine kernel already
        reduced fused output.
-        This is the "early" all-reduce path. When the combine kernel produces
+        * If the combine kernel does the reduction for fused_output, reduce
-        already-reduced fused output, shared output must be reduced separately
+          shared_output separately. O.w, reduce fused_output+shared_output later.
-        to match.
+        * If we have SP (TP=N, DP=M, EP), there is a separate AG step handled
+          in the model.
        """
-        if shared_output is not None and self._fused_output_is_reduced:
+        if (
+            shared_output is not None
+            and not self.moe_config.is_sequence_parallel
+            and self._fused_output_is_reduced
+        ):
            shared_output = tensor_model_parallel_all_reduce(shared_output)
        return shared_output