[Kernel] fix moe_align_block_size error condition (#12239)

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

[Kernel] fix moe_align_block_size error condition (#12239)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
1e60f87b · Jinzhen Lin · GitHub · 9705b90b · 1e60f87b
Unverified Commit 1e60f87b authored Jan 22, 2025 by Jinzhen Lin Committed by GitHub Jan 21, 2025
Show whitespace changes
Inline Side-by-side

Showing with 6 additions and 4 deletions

csrc/moe/moe_align_sum_kernels.cu csrc/moe/moe_align_sum_kernels.cu +6 -4

No files found.
--- a/csrc/moe/moe_align_sum_kernels.cu
+++ b/csrc/moe/moe_align_sum_kernels.cu
@@ -234,14 +234,16 @@ void moe_align_block_size(torch::Tensor topk_ids, int64_t num_experts,

  bool use_global_memory = false;
  bool use_i16 = false;  // Use uint16_t for shared memory token counts
-  if (shared_mem_i16 > device_max_shared_mem) {
-    use_global_memory = true;
-  } else if (shared_mem_i32 > device_max_shared_mem &&
+  if (shared_mem_i32 < device_max_shared_mem) {
+    // Do nothing in this case. We're all set to use int32_t token counts
+  } else if (shared_mem_i16 < device_max_shared_mem &&
             topk_ids.numel() <= 65535) {
    // when nelements of topk_ids is smaller than 65535 (max value of uint16),
    // element value of token_cnts would also smaller than 65535,
    // so we can use uint16 as dtype of token_cnts
    use_i16 = true;
+  } else {
+    use_global_memory = true;
  }

  if (use_global_memory) {