[BugFix] Workspace allocation during profile run : DeepEPHighThroughput + DeepGEMM (#30899)

(cherry picked from commit e3fc374a)

[BugFix] Workspace allocation during profile run : DeepEPHighThroughput + DeepGEMM (#30899)
(cherry picked from commit e3fc374a)
17f39880 · Varun Sundar Rabindranath · Kevin H. Luu · 682c3858 · 17f39880
Commit 17f39880 authored Dec 17, 2025 by Varun Sundar Rabindranath Committed by Kevin H. Luu Dec 17, 2025
Show whitespace changes
Inline Side-by-side

Showing with 4 additions and 1 deletion

vllm/model_executor/layers/fused_moe/modular_kernel.py vllm/model_executor/layers/fused_moe/modular_kernel.py +4 -1

No files found.
--- a/vllm/model_executor/layers/fused_moe/modular_kernel.py
+++ b/vllm/model_executor/layers/fused_moe/modular_kernel.py
@@ -795,7 +795,10 @@ class FusedMoEModularKernel(torch.nn.Module):
                    top_k,
                    global_num_experts,
                    local_num_experts,
-                    expert_tokens_meta,
+                    # expert_tokens_meta help in allocating optimal/minimal
+                    # amount of workspace. Mark it None, so we allocate for
+                    # the worst-case scenario.
+                    expert_tokens_meta=None,
                )
            )