Unverified Commit 40d33264 authored by Sage Moore's avatar Sage Moore Committed by GitHub
Browse files

[Bugfix][EPLB] Disabled shared expert overlap when EPLB is enabled (#28377)


Signed-off-by: default avatarSage Moore <sage@neuralmagic.com>
Signed-off-by: default avatarSage Moore <sagemoore@utexas.edu>
Signed-off-by: default avatarWentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: default avatargemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: default avatarWentao Ye <44945378+yewentao256@users.noreply.github.com>
parent 9c84ca82
......@@ -28,13 +28,18 @@ class SharedFusedMoE(FusedMoE):
super().__init__(**kwargs)
self._shared_experts = shared_experts
# Disable shared expert overlap if we are not using
# flashinfer + DP since there is nothing to be gained in this case.
# Disabling the overlap optimization also prevents the shared experts
# from being hidden from torch.compile.
# Disable shared expert overlap if we are using eplb, because of
# correctness issues, or if using flashinfer with DP, since there
# is nothing to be gained in this case. Disabling the overlap
# optimization also prevents the shared experts from being hidden
# from torch.compile.
self.use_overlapped = (
use_overlapped
and not (self.use_flashinfer_cutlass_kernels and self.dp_size > 1)
and not (
# TODO(wentao): find the root cause and remove this condition
self.enable_eplb
or (self.use_flashinfer_cutlass_kernels and self.dp_size > 1)
)
and self._shared_experts is not None
)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment