-
Jinzhen Lin authored
[Kernel] optimize moe_align_block_size for cuda graph and large num_experts (e.g. DeepSeek-V3) (#12222) Signed-off-by:
Jinzhen Lin <linjinzhen@hotmail.com> Co-authored-by:
Michael Goin <mgoin@redhat.com> Co-authored-by:
Tyler Michael Smith <tyler@neuralmagic.com>
750f4cab