Unverified Commit ccd0d1d9 authored by Wentao Ye's avatar Wentao Ye Committed by GitHub
Browse files

[Bug] Fix rocm sparse attn indexer issue (#39225)


Signed-off-by: default avataryewentao256 <zhyanwentao@126.com>
Co-authored-by: default avatarTJian <tunjian.tan@embeddedllm.com>
parent d8ddb316
......@@ -532,6 +532,11 @@ def rocm_aiter_sparse_attn_indexer(
has_prefill = layer_attn_metadata.num_prefills > 0
num_decode_tokens = layer_attn_metadata.num_decode_tokens
# during speculative decoding, k may be padded to the CUDA graph batch
# size while slot_mapping only covers actual tokens.
num_tokens = slot_mapping.shape[0]
k = k[:num_tokens]
ops.indexer_k_quant_and_cache(
k,
kv_cache,
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment