Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
change
sglang
Commits
64480df4
Unverified
Commit
64480df4
authored
Feb 08, 2025
by
yiakwy-xpu-ml-framework-team
Committed by
GitHub
Feb 08, 2025
Browse files
[BUG] fix moe benchmark when bs*seq is small (#3382)
parent
4530136e
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
2 additions
and
2 deletions
+2
-2
benchmark/kernels/fused_moe_triton/benchmark_deepseekv3_moe_align_blocks.py
...fused_moe_triton/benchmark_deepseekv3_moe_align_blocks.py
+2
-2
No files found.
benchmark/kernels/fused_moe_triton/benchmark_deepseekv3_moe_align_blocks.py
View file @
64480df4
...
...
@@ -157,7 +157,7 @@ def calculate_diff(batch_size, seq_len):
)
sorted_ids_cuda
.
fill_
(
topk_ids
.
numel
())
max_num_m_blocks
=
max_num_tokens_padded
//
block_size
expert_ids_cuda
=
torch
.
empty
(
expert_ids_cuda
=
torch
.
zeros
(
(
max_num_m_blocks
,),
dtype
=
torch
.
int32
,
device
=
topk_ids
.
device
)
num_tokens_post_pad_cuda
=
torch
.
empty
(
...
...
@@ -172,7 +172,7 @@ def calculate_diff(batch_size, seq_len):
sorted_ids_triton
=
torch
.
empty_like
(
sorted_ids_cuda
)
sorted_ids_triton
.
fill_
(
topk_ids
.
numel
())
expert_ids_triton
=
torch
.
empty
_like
(
expert_ids_cuda
)
expert_ids_triton
=
torch
.
zeros
_like
(
expert_ids_cuda
)
num_tokens_post_pad_triton
=
torch
.
empty_like
(
num_tokens_post_pad_cuda
)
# compare the performance of cuda and triton implementation
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment