Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
change
sglang
Commits
e41549c3
Unverified
Commit
e41549c3
authored
Apr 03, 2025
by
saltyfish66
Committed by
GitHub
Apr 03, 2025
Browse files
fix: fix illegal cuda memory access at fused_moe_kernel (#4727)
Co-authored-by:
yuethe
<
yuethe@tencent.com
>
parent
cccfc10e
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
1 addition
and
0 deletions
+1
-0
python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py
python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py
+1
-0
No files found.
python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py
View file @
e41549c3
...
@@ -152,6 +152,7 @@ def fused_moe_kernel(
...
@@ -152,6 +152,7 @@ def fused_moe_kernel(
return
return
offs_token_id
=
pid_m
*
BLOCK_SIZE_M
+
tl
.
arange
(
0
,
BLOCK_SIZE_M
)
offs_token_id
=
pid_m
*
BLOCK_SIZE_M
+
tl
.
arange
(
0
,
BLOCK_SIZE_M
)
offs_token
=
tl
.
load
(
sorted_token_ids_ptr
+
offs_token_id
)
offs_token
=
tl
.
load
(
sorted_token_ids_ptr
+
offs_token_id
)
offs_token
=
offs_token
.
to
(
tl
.
int64
)
token_mask
=
offs_token
<
num_valid_tokens
token_mask
=
offs_token
<
num_valid_tokens
offs_bn
=
(
pid_n
*
BLOCK_SIZE_N
+
tl
.
arange
(
0
,
BLOCK_SIZE_N
))
%
N
offs_bn
=
(
pid_n
*
BLOCK_SIZE_N
+
tl
.
arange
(
0
,
BLOCK_SIZE_N
))
%
N
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment