Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
38658ec6
Unverified
Commit
38658ec6
authored
Nov 28, 2025
by
Isotr0py
Committed by
GitHub
Nov 27, 2025
Browse files
[Bugfix][MM encoder] Fix ViT attention backend resolving for Turing GPU (#29614)
Signed-off-by:
Isotr0py
<
mozf@mail2.sysu.edu.cn
>
parent
a24ea541
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
9 additions
and
8 deletions
+9
-8
vllm/platforms/cuda.py
vllm/platforms/cuda.py
+9
-8
No files found.
vllm/platforms/cuda.py
View file @
38658ec6
...
@@ -264,14 +264,15 @@ class CudaPlatformBase(Platform):
...
@@ -264,14 +264,15 @@ class CudaPlatformBase(Platform):
cls
,
head_size
:
int
,
dtype
:
torch
.
dtype
cls
,
head_size
:
int
,
dtype
:
torch
.
dtype
)
->
"AttentionBackendEnum"
:
)
->
"AttentionBackendEnum"
:
# Try FlashAttention first
# Try FlashAttention first
try
:
if
(
cc
:
=
cls
.
get_device_capability
())
and
cc
.
major
>=
8
:
backend_class
=
AttentionBackendEnum
.
FLASH_ATTN
.
get_class
()
try
:
if
backend_class
.
supports_head_size
(
backend_class
=
AttentionBackendEnum
.
FLASH_ATTN
.
get_class
()
head_size
if
backend_class
.
supports_head_size
(
)
and
backend_class
.
supports_dtype
(
dtype
):
head_size
return
AttentionBackendEnum
.
FLASH_ATTN
)
and
backend_class
.
supports_dtype
(
dtype
):
except
ImportError
:
return
AttentionBackendEnum
.
FLASH_ATTN
pass
except
ImportError
:
pass
return
AttentionBackendEnum
.
TORCH_SDPA
return
AttentionBackendEnum
.
TORCH_SDPA
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment