Unverified Commit 4adc66f6 authored by ElizaWszola's avatar ElizaWszola Committed by GitHub
Browse files

[Bugfix] Allocate less memory in non-batched CUTLASS MoE (#21121)


Signed-off-by: default avatarElizaWszola <ewszola@redhat.com>
parent 55ad6487
...@@ -283,8 +283,8 @@ class CutlassExpertsFp8(mk.FusedMoEPermuteExpertsUnpermute): ...@@ -283,8 +283,8 @@ class CutlassExpertsFp8(mk.FusedMoEPermuteExpertsUnpermute):
(N // 2)) (N // 2))
output = (self.max_experts_per_worker, padded_M, K) output = (self.max_experts_per_worker, padded_M, K)
else: else:
workspace1 = (M * topk, max(2 * N, K)) workspace1 = (M * topk, max(N, K))
workspace2 = (M * topk, N) workspace2 = (M * topk, N // 2)
output = (M * topk, K) output = (M * topk, K)
return (workspace1, workspace2, output, return (workspace1, workspace2, output,
self.out_dtype if self.out_dtype is not None else a.dtype) self.out_dtype if self.out_dtype is not None else a.dtype)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment