Fix the shared expert & routed expert overlap in Llama 4 (#12405)

Co-authored-by: Brayden Zhong <b8zhong@users.noreply.github.com>

Fix the shared expert & routed expert overlap in Llama 4 (#12405)
Co-authored-by: Brayden Zhong <b8zhong@users.noreply.github.com>
34c286b8 · b8zhong · GitHub · 9416ee60 · 34c286b8
Unverified Commit 34c286b8 authored Oct 30, 2025 by b8zhong Committed by GitHub Oct 30, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

python/sglang/srt/models/llama4.py python/sglang/srt/models/llama4.py +1 -1

No files found.
--- a/python/sglang/srt/models/llama4.py
+++ b/python/sglang/srt/models/llama4.py
@@ -148,7 +148,7 @@ class Llama4MoE(nn.Module):
        return out_aD
    def _forward_core(self, hidden_states, forward_mode: ForwardMode):
-        if hidden_states.shape[0] < 4 and _is_cuda:
+        if _is_cuda:
            return self._forward_core_shared_routed_overlap(hidden_states)
        else:
            return self._forward_core_normal(hidden_states)