Unverified Commit 4a6dd1c3 authored by bnellnm's avatar bnellnm Committed by GitHub
Browse files

[Bugfix] Fix DeepSeek V2-Lite Accuracy drop (#40673)


Signed-off-by: default avatarBill Nell <bnell@redhat.com>
parent 7ff65b19
...@@ -335,11 +335,16 @@ class MoERunner(MoERunnerInterface): ...@@ -335,11 +335,16 @@ class MoERunner(MoERunnerInterface):
"""All-reduce shared expert output when the combine kernel already """All-reduce shared expert output when the combine kernel already
reduced fused output. reduced fused output.
This is the "early" all-reduce path. When the combine kernel produces * If the combine kernel does the reduction for fused_output, reduce
already-reduced fused output, shared output must be reduced separately shared_output separately. O.w, reduce fused_output+shared_output later.
to match. * If we have SP (TP=N, DP=M, EP), there is a separate AG step handled
in the model.
""" """
if shared_output is not None and self._fused_output_is_reduced: if (
shared_output is not None
and not self.moe_config.is_sequence_parallel
and self._fused_output_is_reduced
):
shared_output = tensor_model_parallel_all_reduce(shared_output) shared_output = tensor_model_parallel_all_reduce(shared_output)
return shared_output return shared_output
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment