[main][BugFix] Fixed an accuracy bug of Qwen3-next-MTP when batched inferring (#30632)

Signed-off-by: drslark <slarksblood@qq.com>

[main][BugFix] Fixed an accuracy bug of Qwen3-next-MTP when batched inferring (#30632)
Signed-off-by: drslark <slarksblood@qq.com>
add1b9d3 · drslark · GitHub · dcb31196 · add1b9d3
Unverified Commit add1b9d3 authored Dec 14, 2025 by drslark Committed by GitHub Dec 14, 2025
Show whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

vllm/v1/attention/backends/gdn_attn.py vllm/v1/attention/backends/gdn_attn.py +1 -1

No files found.
--- a/vllm/v1/attention/backends/gdn_attn.py
+++ b/vllm/v1/attention/backends/gdn_attn.py
@@ -211,7 +211,7 @@ class GDNAttentionMetadataBuilder(AttentionMetadataBuilder[GDNAttentionMetadata]
                spec_token_masks = torch.repeat_interleave(
                    spec_sequence_masks, query_lens
                )
-                index = torch.argsort(spec_token_masks)
+                index = torch.argsort(spec_token_masks, stable=True)
                num_non_spec_tokens = num_prefill_tokens + num_decode_tokens
                non_spec_token_indx = index[:num_non_spec_tokens]
                spec_token_indx = index[num_non_spec_tokens:]