csrc/moe/moe_align_sum_kernels.cu · 750f4cabfac4bfed679d95074d9550b043e3f8d5 · OpenDAS / vllm_cscc

[Kernel] optimize moe_align_block_size for cuda graph and large num_experts... · 750f4cab

Jinzhen Lin authored Jan 21, 2025


[Kernel] optimize moe_align_block_size for cuda graph and large num_experts (e.g. DeepSeek-V3) (#12222)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: Michael Goin <mgoin@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>

750f4cab

moe_align_sum_kernels.cu 12.9 KB

Replace moe_align_sum_kernels.cu