Unverified Commit 3e4794aa authored by Xiaoyu Zhang's avatar Xiaoyu Zhang Committed by GitHub
Browse files

refine fused_moe tuning docs (#5294)

parent 690ec205
...@@ -8,6 +8,11 @@ This directory contains benchmarking tools for MoE (Mixture of Experts) kernels. ...@@ -8,6 +8,11 @@ This directory contains benchmarking tools for MoE (Mixture of Experts) kernels.
Example usage: Example usage:
```bash ```bash
# Tune Mixtral-8x7B with default settings
python benchmark/kernels/fused_moe_triton/tuning_fused_moe_triton.py \
--model mistralai/Mixtral-8x7B-Instruct-v0.1 \
--tune
# Tune Qwen2-57B with FP8 and TP=4 # Tune Qwen2-57B with FP8 and TP=4
python benchmark/kernels/fused_moe_triton/tuning_fused_moe_triton.py \ python benchmark/kernels/fused_moe_triton/tuning_fused_moe_triton.py \
--model Qwen/Qwen2-57B-A14B-Instruct \ --model Qwen/Qwen2-57B-A14B-Instruct \
...@@ -15,9 +20,12 @@ python benchmark/kernels/fused_moe_triton/tuning_fused_moe_triton.py \ ...@@ -15,9 +20,12 @@ python benchmark/kernels/fused_moe_triton/tuning_fused_moe_triton.py \
--dtype fp8_w8a8 \ --dtype fp8_w8a8 \
--tune --tune
# Tune Mixtral-8x7B with default settings # Tune DeepSeek-V3 with FP8, TP=8 and n_share_experts_fusion=8
python benchmark/kernels/fused_moe_triton/tuning_fused_moe_triton.py \ python benchmark/kernels/fused_moe_triton/tuning_fused_moe_triton.py \
--model mistralai/Mixtral-8x7B-Instruct-v0.1 \ --model deepseek-ai/DeepSeek-V3-0324 \
--tp-size 8 \
--n-share-experts-fusion 8 \
--dtype fp8_w8a8 \
--tune --tune
``` ```
......
...@@ -956,7 +956,7 @@ def get_moe_configs( ...@@ -956,7 +956,7 @@ def get_moe_configs(
logger.warning( logger.warning(
( (
"Using default MoE config. Performance might be sub-optimal! " "Using default MoE config. Performance might be sub-optimal! "
"Config file not found at %s" "Config file not found at %s, you can tune the config with https://github.com/sgl-project/sglang/blob/main/benchmark/kernels/fused_moe_triton/tuning_fused_moe_triton.py."
), ),
config_file_path, config_file_path,
) )
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment