[rocm][V0] fix selection logic for custom PA in V0 (#16426)

Signed-off-by: Divakar Verma <divakar.verma@amd.com>

[rocm][V0] fix selection logic for custom PA in V0 (#16426)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
95aca283 · Divakar Verma · GitHub · 2b05b8ce · 95aca283
Unverified Commit 95aca283 authored Apr 16, 2025 by Divakar Verma Committed by GitHub Apr 16, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 4 additions and 1 deletion

vllm/platforms/rocm.py vllm/platforms/rocm.py +4 -1

No files found.
--- a/vllm/platforms/rocm.py
+++ b/vllm/platforms/rocm.py
@@ -109,8 +109,11 @@ def use_rocm_custom_paged_attention(qtype: torch.dtype, head_size: int,
    ON_MI250_MI300 = any(arch in GPU_ARCH for arch in ["gfx90a", "gfx942"])
    # rocm custom page attention not support on navi (gfx1*)
+    # custom paged attn always supported on V0. On V1, requires sliding window
+    # disabled due to observed numerical discrepancy.
    return (ON_MI250_MI300 and not ON_NAVI
-            and (sliding_window == 0 or sliding_window == (-1, -1))
+            and (not envs.VLLM_USE_V1 or sliding_window == 0
+                 or sliding_window == (-1, -1))
            and (qtype == torch.half or qtype == torch.bfloat16)
            and (head_size == 64 or head_size == 128)
            and (block_size == 16 or block_size == 32)