Unverified Commit e41549c3 authored by saltyfish66's avatar saltyfish66 Committed by GitHub
Browse files

fix: fix illegal cuda memory access at fused_moe_kernel (#4727)


Co-authored-by: default avataryuethe <yuethe@tencent.com>
parent cccfc10e
......@@ -152,6 +152,7 @@ def fused_moe_kernel(
return
offs_token_id = pid_m * BLOCK_SIZE_M + tl.arange(0, BLOCK_SIZE_M)
offs_token = tl.load(sorted_token_ids_ptr + offs_token_id)
offs_token = offs_token.to(tl.int64)
token_mask = offs_token < num_valid_tokens
offs_bn = (pid_n * BLOCK_SIZE_N + tl.arange(0, BLOCK_SIZE_N)) % N
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment