[ROCm][Hardware][AMD] Adding Navi21 to fallback to naive attention if Triton is not used (#4658)

c0724fc9 · alexeykondrat · GitHub · 86b45ae0 · c0724fc9
Unverified Commit c0724fc9 authored May 18, 2024 by alexeykondrat Committed by GitHub May 18, 2024
Show whitespace changes
Inline Side-by-side

Showing with 3 additions and 2 deletions

vllm/attention/backends/rocm_flash_attn.py vllm/attention/backends/rocm_flash_attn.py +3 -2

No files found.
--- a/vllm/attention/backends/rocm_flash_attn.py
+++ b/vllm/attention/backends/rocm_flash_attn.py
@@ -231,8 +231,9 @@ class ROCmFlashAttentionImpl(AttentionImpl):
            self.attn_func = triton_attention
            logger.debug("Using Triton FA in ROCmBackend")
        else:
-            # if not using triton, navi3x not use flash-attn either
+            # if not using triton, navi3x/navi21/navi10 do not use flash-attn
-            if torch.cuda.get_device_capability()[0] == 11:
+            # either
+            if torch.cuda.get_device_capability()[0] != 9:
                self.use_naive_attn = True
            else:
                try: