Unverified Commit add1b9d3 authored by drslark's avatar drslark Committed by GitHub
Browse files

[main][BugFix] Fixed an accuracy bug of Qwen3-next-MTP when batched inferring (#30632)


Signed-off-by: default avatardrslark <slarksblood@qq.com>
parent dcb31196
......@@ -211,7 +211,7 @@ class GDNAttentionMetadataBuilder(AttentionMetadataBuilder[GDNAttentionMetadata]
spec_token_masks = torch.repeat_interleave(
spec_sequence_masks, query_lens
)
index = torch.argsort(spec_token_masks)
index = torch.argsort(spec_token_masks, stable=True)
num_non_spec_tokens = num_prefill_tokens + num_decode_tokens
non_spec_token_indx = index[:num_non_spec_tokens]
spec_token_indx = index[num_non_spec_tokens:]
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment