[Bugfix] use flash attn on sm90 (#22933)

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>

[Bugfix] use flash attn on sm90 (#22933)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>
39cd09dc · Yongye Zhu · GitHub · 919234fe · 39cd09dc
Unverified Commit 39cd09dc authored Aug 14, 2025 by Yongye Zhu Committed by GitHub Aug 14, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

vllm/platforms/cuda.py vllm/platforms/cuda.py +1 -1

No files found.
--- a/vllm/platforms/cuda.py
+++ b/vllm/platforms/cuda.py
@@ -316,7 +316,7 @@ class CudaPlatformBase(Platform):
            # FlashAttention is the default for SM 8.0+ GPUs
            if cls.has_device_capability(80):
-                if has_sink:
+                if has_sink and not cls.is_device_capability(90):
                    logger.info_once("Using Triton backend on V1 engine.")
                    return TRITON_ATTN_VLLM_V1
                if is_default_backend_supported := is_attn_backend_supported(