Unverified Commit 95aca283 authored by Divakar Verma's avatar Divakar Verma Committed by GitHub
Browse files

[rocm][V0] fix selection logic for custom PA in V0 (#16426)


Signed-off-by: default avatarDivakar Verma <divakar.verma@amd.com>
parent 2b05b8ce
...@@ -109,8 +109,11 @@ def use_rocm_custom_paged_attention(qtype: torch.dtype, head_size: int, ...@@ -109,8 +109,11 @@ def use_rocm_custom_paged_attention(qtype: torch.dtype, head_size: int,
ON_MI250_MI300 = any(arch in GPU_ARCH for arch in ["gfx90a", "gfx942"]) ON_MI250_MI300 = any(arch in GPU_ARCH for arch in ["gfx90a", "gfx942"])
# rocm custom page attention not support on navi (gfx1*) # rocm custom page attention not support on navi (gfx1*)
# custom paged attn always supported on V0. On V1, requires sliding window
# disabled due to observed numerical discrepancy.
return (ON_MI250_MI300 and not ON_NAVI return (ON_MI250_MI300 and not ON_NAVI
and (sliding_window == 0 or sliding_window == (-1, -1)) and (not envs.VLLM_USE_V1 or sliding_window == 0
or sliding_window == (-1, -1))
and (qtype == torch.half or qtype == torch.bfloat16) and (qtype == torch.half or qtype == torch.bfloat16)
and (head_size == 64 or head_size == 128) and (head_size == 64 or head_size == 128)
and (block_size == 16 or block_size == 32) and (block_size == 16 or block_size == 32)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment