Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
change
sglang
Commits
e132cba2
"apps/vscode:/vscode.git/clone" did not exist on "c7f6cf628287b341ddebc9c04ff611fd8c619d4f"
Unverified
Commit
e132cba2
authored
Apr 29, 2025
by
Xiaoyu Zhang
Committed by
GitHub
Apr 28, 2025
Browse files
fused moe triton tuning script support qwen3 (#5842)
parent
0045f4b2
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
18 additions
and
1 deletion
+18
-1
benchmark/kernels/fused_moe_triton/README.md
benchmark/kernels/fused_moe_triton/README.md
+7
-0
benchmark/kernels/fused_moe_triton/benchmark_torch_compile_fused_moe.py
...els/fused_moe_triton/benchmark_torch_compile_fused_moe.py
+6
-1
benchmark/kernels/fused_moe_triton/benchmark_vllm_vs_sglang_fused_moe_triton.py
...d_moe_triton/benchmark_vllm_vs_sglang_fused_moe_triton.py
+5
-0
No files found.
benchmark/kernels/fused_moe_triton/README.md
View file @
e132cba2
...
@@ -20,6 +20,13 @@ python benchmark/kernels/fused_moe_triton/tuning_fused_moe_triton.py \
...
@@ -20,6 +20,13 @@ python benchmark/kernels/fused_moe_triton/tuning_fused_moe_triton.py \
--dtype
fp8_w8a8
\
--dtype
fp8_w8a8
\
--tune
--tune
# Tune Qwen3-235B-A22B-FP8 and TP=4
python benchmark/kernels/fused_moe_triton/tuning_fused_moe_triton.py
\
--model
Qwen/Qwen3-235B-A22B-FP8
\
--tp-size
4
\
--dtype
fp8_w8a8
\
--tune
# Tune DeepSeek-V3 with FP8, TP=8 and n_share_experts_fusion=8
# Tune DeepSeek-V3 with FP8, TP=8 and n_share_experts_fusion=8
python benchmark/kernels/fused_moe_triton/tuning_fused_moe_triton.py
\
python benchmark/kernels/fused_moe_triton/tuning_fused_moe_triton.py
\
--model
deepseek-ai/DeepSeek-V3-0324
\
--model
deepseek-ai/DeepSeek-V3-0324
\
...
...
benchmark/kernels/fused_moe_triton/benchmark_torch_compile_fused_moe.py
View file @
e132cba2
...
@@ -30,10 +30,15 @@ def get_model_config(model_name: str, tp_size: int):
...
@@ -30,10 +30,15 @@ def get_model_config(model_name: str, tp_size: int):
topk
=
config
.
num_experts_per_tok
topk
=
config
.
num_experts_per_tok
intermediate_size
=
config
.
moe_intermediate_size
intermediate_size
=
config
.
moe_intermediate_size
shard_intermediate_size
=
2
*
intermediate_size
//
tp_size
shard_intermediate_size
=
2
*
intermediate_size
//
tp_size
elif
config
.
architectures
[
0
]
==
"Qwen3MoeForCausalLM"
:
E
=
config
.
num_experts
topk
=
config
.
num_experts_per_tok
intermediate_size
=
config
.
moe_intermediate_size
shard_intermediate_size
=
2
*
intermediate_size
//
tp_size
elif
config
.
architectures
[
0
]
in
[
"DeepseekV2ForCausalLM"
,
"DeepseekV3ForCausalLM"
]:
elif
config
.
architectures
[
0
]
in
[
"DeepseekV2ForCausalLM"
,
"DeepseekV3ForCausalLM"
]:
E
=
config
.
n_routed_experts
E
=
config
.
n_routed_experts
topk
=
config
.
num_experts_per_tok
topk
=
config
.
num_experts_per_tok
intermediate_size
=
config
.
intermediate_size
intermediate_size
=
config
.
moe_
intermediate_size
shard_intermediate_size
=
2
*
intermediate_size
//
tp_size
shard_intermediate_size
=
2
*
intermediate_size
//
tp_size
elif
config
.
architectures
[
0
]
in
[
elif
config
.
architectures
[
0
]
in
[
"Grok1ForCausalLM"
,
"Grok1ForCausalLM"
,
...
...
benchmark/kernels/fused_moe_triton/benchmark_vllm_vs_sglang_fused_moe_triton.py
View file @
e132cba2
...
@@ -30,6 +30,11 @@ def get_model_config(model_name: str, tp_size: int):
...
@@ -30,6 +30,11 @@ def get_model_config(model_name: str, tp_size: int):
topk
=
config
.
num_experts_per_tok
topk
=
config
.
num_experts_per_tok
intermediate_size
=
config
.
moe_intermediate_size
intermediate_size
=
config
.
moe_intermediate_size
shard_intermediate_size
=
2
*
intermediate_size
//
tp_size
shard_intermediate_size
=
2
*
intermediate_size
//
tp_size
elif
config
.
architectures
[
0
]
==
"Qwen3MoeForCausalLM"
:
E
=
config
.
num_experts
topk
=
config
.
num_experts_per_tok
intermediate_size
=
config
.
moe_intermediate_size
shard_intermediate_size
=
2
*
intermediate_size
//
tp_size
elif
config
.
architectures
[
0
]
in
[
"DeepseekV2ForCausalLM"
,
"DeepseekV3ForCausalLM"
]:
elif
config
.
architectures
[
0
]
in
[
"DeepseekV2ForCausalLM"
,
"DeepseekV3ForCausalLM"
]:
E
=
config
.
n_routed_experts
E
=
config
.
n_routed_experts
topk
=
config
.
num_experts_per_tok
topk
=
config
.
num_experts_per_tok
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment