[Bugfix][EPLB] Disabled shared expert overlap when EPLB is enabled (#28377)

Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: Sage Moore <sagemoore@utexas.edu> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>

[Bugfix][EPLB] Disabled shared expert overlap when EPLB is enabled (#28377)
Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: Sage Moore <sagemoore@utexas.edu> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
40d33264 · Sage Moore · GitHub · 9c84ca82 · 40d33264
Unverified Commit 40d33264 authored Nov 10, 2025 by Sage Moore Committed by GitHub Nov 10, 2025
Show whitespace changes
Inline Side-by-side

Showing with 10 additions and 5 deletions

vllm/model_executor/layers/fused_moe/shared_fused_moe.py vllm/model_executor/layers/fused_moe/shared_fused_moe.py +10 -5

No files found.
--- a/vllm/model_executor/layers/fused_moe/shared_fused_moe.py
+++ b/vllm/model_executor/layers/fused_moe/shared_fused_moe.py
@@ -28,13 +28,18 @@ class SharedFusedMoE(FusedMoE):
        super().__init__(**kwargs)
        self._shared_experts = shared_experts

-        # Disable shared expert overlap if we are not using
-        # flashinfer + DP since there is nothing to be gained in this case.
-        # Disabling the overlap optimization also prevents the shared experts
-        # from being hidden from torch.compile.
+        # Disable shared expert overlap if we are using eplb, because of
+        # correctness issues, or if using flashinfer with DP, since there
+        # is nothing to be gained in this case. Disabling the overlap
+        # optimization also prevents the shared experts from being hidden
+        # from torch.compile.
        self.use_overlapped = (
            use_overlapped
-            and not (self.use_flashinfer_cutlass_kernels and self.dp_size > 1)
+            and not (
+                # TODO(wentao): find the root cause and remove this condition
+                self.enable_eplb
+                or (self.use_flashinfer_cutlass_kernels and self.dp_size > 1)
+            )
            and self._shared_experts is not None
        )