[moe] fix: correct the cache size in the last chunk (#3679)

Co-authored-by: Abatom <abzhonghua@gmail.com>

[moe] fix: correct the cache size in the last chunk (#3679)
Co-authored-by: Abatom <abzhonghua@gmail.com>
2f6bacee · Cheng Wan · GitHub · 40148041 · 2f6bacee
Unverified Commit 2f6bacee authored Mar 13, 2025 by Cheng Wan Committed by GitHub Mar 12, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 3 additions and 1 deletion

python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py +3 -1

No files found.
--- a/python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py
+++ b/python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py
@@ -1064,7 +1064,9 @@ def fused_experts_impl(
            # so the cache size and config are already set correctly and
            # do not need to be adjusted.
            intermediate_cache1 = intermediate_cache1[:tokens_in_chunk]
-            intermediate_cache2 = intermediate_cache2[:tokens_in_chunk]
+            intermediate_cache2 = intermediate_cache2[
+                : tokens_in_chunk * topk_ids.shape[1]
+            ]
            intermediate_cache3 = intermediate_cache3[:tokens_in_chunk]
            config = get_config_func(tokens_in_chunk)