"vscode:/vscode.git/clone" did not exist on "60cad94b86ae1cd884a327a68864926b573bc5bc"
Unverified Commit 5f2cacdb authored by Sage Moore's avatar Sage Moore Committed by GitHub
Browse files

Quick fix for IMA with the Prefix Prefill kernel during graph capture (#25983)


Signed-off-by: default avatarSage Moore <sage@neuralmagic.com>
parent aa5053e3
......@@ -83,6 +83,14 @@ class RocmAttentionMetadataBuilder(
# max_model_len will cause graph capture to be extremely
# slow, so here we set it to 1.
attn_metadata.seq_lens.fill_(1)
if envs.VLLM_V1_USE_PREFILL_DECODE_ATTENTION:
# Here we set the query start locs to 0. This is to
# cover up an invalid memory access in the prefix_prefil kernel
# that we run into during graph capture (#25985)
common_attn_metadata.query_start_loc.zero_()
common_attn_metadata.query_start_loc_cpu.zero_()
return attn_metadata
def build(self,
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment