Unverified Commit 4a6dd1c3 authored by bnellnm's avatar bnellnm Committed by GitHub
Browse files

[Bugfix] Fix DeepSeek V2-Lite Accuracy drop (#40673)


Signed-off-by: default avatarBill Nell <bnell@redhat.com>
parent 7ff65b19
......@@ -335,11 +335,16 @@ class MoERunner(MoERunnerInterface):
"""All-reduce shared expert output when the combine kernel already
reduced fused output.
This is the "early" all-reduce path. When the combine kernel produces
already-reduced fused output, shared output must be reduced separately
to match.
* If the combine kernel does the reduction for fused_output, reduce
shared_output separately. O.w, reduce fused_output+shared_output later.
* If we have SP (TP=N, DP=M, EP), there is a separate AG step handled
in the model.
"""
if shared_output is not None and self._fused_output_is_reduced:
if (
shared_output is not None
and not self.moe_config.is_sequence_parallel
and self._fused_output_is_reduced
):
shared_output = tensor_model_parallel_all_reduce(shared_output)
return shared_output
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment