[Fix] Fix a bug for flashmla to run R1 model (#5875)

Co-authored-by: pengcuo <dgpengcuo@gmail.com>

[Fix] Fix a bug for flashmla to run R1 model (#5875)
Co-authored-by: pengcuo <dgpengcuo@gmail.com>
8e5a6d34 · pengcuo · GitHub · 8465f035 · 8e5a6d34
Unverified Commit 8e5a6d34 authored Apr 29, 2025 by pengcuo Committed by GitHub Apr 29, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 3 additions and 0 deletions

python/sglang/srt/layers/attention/flashmla_backend.py python/sglang/srt/layers/attention/flashmla_backend.py +3 -0

No files found.
--- a/python/sglang/srt/layers/attention/flashmla_backend.py
+++ b/python/sglang/srt/layers/attention/flashmla_backend.py
@@ -241,6 +241,9 @@ class FlashMLABackend(FlashInferMLAAttnBackend):
                seq_lens_cpu,
            )

+    def get_cuda_graph_seq_len_fill_value(self):
+        return 1024
+
    def forward_decode(
        self,
        q: torch.Tensor,