[Bugfix][MM encoder] Fix ViT attention backend resolving for Turing GPU (#29614)

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

[Bugfix][MM encoder] Fix ViT attention backend resolving for Turing GPU (#29614)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
38658ec6 · Isotr0py · GitHub · a24ea541 · 38658ec6
Unverified Commit 38658ec6 authored Nov 28, 2025 by Isotr0py Committed by GitHub Nov 27, 2025
Show whitespace changes
Inline Side-by-side

Showing with 9 additions and 8 deletions

vllm/platforms/cuda.py vllm/platforms/cuda.py +9 -8

No files found.
--- a/vllm/platforms/cuda.py
+++ b/vllm/platforms/cuda.py
@@ -264,6 +264,7 @@ class CudaPlatformBase(Platform):
        cls, head_size: int, dtype: torch.dtype
    ) -> "AttentionBackendEnum":
        # Try FlashAttention first
+        if (cc := cls.get_device_capability()) and cc.major >= 8:
            try:
                backend_class = AttentionBackendEnum.FLASH_ATTN.get_class()
                if backend_class.supports_head_size(