fix: fix illegal cuda memory access at fused_moe_kernel (#4727)

Co-authored-by: yuethe <yuethe@tencent.com>

fix: fix illegal cuda memory access at fused_moe_kernel (#4727)
Co-authored-by: yuethe <yuethe@tencent.com>
e41549c3 · saltyfish66 · GitHub · cccfc10e · e41549c3
Unverified Commit e41549c3 authored Apr 03, 2025 by saltyfish66 Committed by GitHub Apr 03, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 1 addition and 0 deletions

python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py +1 -0

No files found.
--- a/python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py
+++ b/python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py
@@ -152,6 +152,7 @@ def fused_moe_kernel(
        return
    offs_token_id = pid_m * BLOCK_SIZE_M + tl.arange(0, BLOCK_SIZE_M)
    offs_token = tl.load(sorted_token_ids_ptr + offs_token_id)
+    offs_token = offs_token.to(tl.int64)
    token_mask = offs_token < num_valid_tokens
    offs_bn = (pid_n * BLOCK_SIZE_N + tl.arange(0, BLOCK_SIZE_N)) % N