[2/2] Use moe_sum_reduce cuda kernel (#10654)

Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> Co-authored-by: huangtingwei <141888744+huangtingwei9988@users.noreply.github.com>

[2/2] Use moe_sum_reduce cuda kernel (#10654)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> Co-authored-by: huangtingwei <141888744+huangtingwei9988@users.noreply.github.com>
813bd6f8 · Yuan Luo · GitHub · 729f612d · 813bd6f8
Unverified Commit 813bd6f8 authored Oct 28, 2025 by Yuan Luo Committed by GitHub Oct 28, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 3 additions and 2 deletions

python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py +3 -2

No files found.
--- a/python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py
+++ b/python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py
@@ -36,7 +36,7 @@ _is_cpu = is_cpu()
 _use_aiter = get_bool_env_var("SGLANG_USE_AITER") and _is_hip
 if _is_cuda:
-    from sgl_kernel import gelu_and_mul, silu_and_mul
+    from sgl_kernel import gelu_and_mul, moe_sum_reduce, silu_and_mul
 elif _is_cpu and _is_cpu_amx_available:
    pass
 elif _is_hip:
@@ -569,11 +569,12 @@ def fused_experts_impl(
                        routed_scaling_factor,
                    )
                else:
-                    moe_sum_reduce_triton(
+                    moe_sum_reduce(
                        intermediate_cache3.view(*intermediate_cache3.shape),
                        out_hidden_states[begin_chunk_idx:end_chunk_idx],
                        routed_scaling_factor,
                    )
        elif _is_hip:
            if _use_aiter:
                moe_sum(