Unverified Commit fa3592cf authored by Xiaoyu Zhang's avatar Xiaoyu Zhang Committed by GitHub
Browse files

rebase h20 fused_moe config (#6966)

parent 608668e1
...@@ -42,7 +42,7 @@ python benchmark/kernels/fused_moe_triton/tuning_fused_moe_triton.py \ ...@@ -42,7 +42,7 @@ python benchmark/kernels/fused_moe_triton/tuning_fused_moe_triton.py \
--tune --tune
``` ```
After tuning, a configuration file (e.g., `E=64,N=640,device_name=NVIDIA_GeForce_RTX_4090,dtype=fp8_w8a8.json`) will be generated in the current directory. You can move this file to `sglang/srt/layers/fused_moe_triton/configs/` to use it in `sglang`. After tuning, a configuration file (e.g., `E=64,N=640,device_name=NVIDIA_GeForce_RTX_4090,dtype=fp8_w8a8.json`) will be generated in the current directory. You can move this file to `sglang/srt/layers/fused_moe_triton/configs/triton_version` dir to use it in `sglang`.
### Performance Comparison Tool ### Performance Comparison Tool
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment