Unverified Commit 34c286b8 authored by b8zhong's avatar b8zhong Committed by GitHub
Browse files

Fix the shared expert & routed expert overlap in Llama 4 (#12405)


Co-authored-by: default avatarBrayden Zhong <b8zhong@users.noreply.github.com>
parent 9416ee60
...@@ -148,7 +148,7 @@ class Llama4MoE(nn.Module): ...@@ -148,7 +148,7 @@ class Llama4MoE(nn.Module):
return out_aD return out_aD
def _forward_core(self, hidden_states, forward_mode: ForwardMode): def _forward_core(self, hidden_states, forward_mode: ForwardMode):
if hidden_states.shape[0] < 4 and _is_cuda: if _is_cuda:
return self._forward_core_shared_routed_overlap(hidden_states) return self._forward_core_shared_routed_overlap(hidden_states)
else: else:
return self._forward_core_normal(hidden_states) return self._forward_core_normal(hidden_states)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment