[TPU][Bugfix] fix the MoE OOM issue (#20339)

Signed-off-by: Chengji Yao <chengjiyao@google.com>

[TPU][Bugfix] fix the MoE OOM issue (#20339)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
4548c03c · Chengji Yao · GitHub · 40b86aa0 · 4548c03c
Unverified Commit 4548c03c authored Jul 05, 2025 by Chengji Yao Committed by GitHub Jul 05, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 7 additions and 2 deletions

vllm/model_executor/layers/fused_moe/layer.py vllm/model_executor/layers/fused_moe/layer.py +7 -2

No files found.
--- a/vllm/model_executor/layers/fused_moe/layer.py
+++ b/vllm/model_executor/layers/fused_moe/layer.py
@@ -1320,8 +1320,13 @@ class FusedMoE(torch.nn.Module):
    def forward(self, hidden_states: torch.Tensor,
                router_logits: torch.Tensor):
-        return torch.ops.vllm.moe_forward(hidden_states, router_logits,
+        # TODO: Once the OOM issue for the TPU backend is resolved, we will
-                                          self.layer_name)
+        # switch to using the moe_forward custom op.
+        if current_platform.is_tpu():
+            return self.forward_impl(hidden_states, router_logits)
+        else:
+            return torch.ops.vllm.moe_forward(hidden_states, router_logits,
+                                              self.layer_name)
    def forward_impl_chunked(self, full_hidden_states: torch.Tensor,
                             full_router_logits: torch.Tensor):