Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
fa9ee081
Unverified
Commit
fa9ee081
authored
Jan 22, 2025
by
wangxiyuan
Committed by
GitHub
Jan 21, 2025
Browse files
[Misc] Set default backend to SDPA for get_vit_attn_backend (#12235)
Signed-off-by:
wangxiyuan
<
wangxiyuan1007@gmail.com
>
parent
347eeebe
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
16 additions
and
14 deletions
+16
-14
vllm/model_executor/models/vision.py
vllm/model_executor/models/vision.py
+16
-14
No files found.
vllm/model_executor/models/vision.py
View file @
fa9ee081
...
@@ -82,23 +82,25 @@ def get_vit_attn_backend(support_fa: bool = False) -> _Backend:
...
@@ -82,23 +82,25 @@ def get_vit_attn_backend(support_fa: bool = False) -> _Backend:
if
backend_by_env_var
is
not
None
:
if
backend_by_env_var
is
not
None
:
selected_backend
=
backend_name_to_enum
(
backend_by_env_var
)
selected_backend
=
backend_name_to_enum
(
backend_by_env_var
)
if
selected_backend
is
None
:
if
selected_backend
is
None
:
# For Volta and Turing GPUs, use xformers instead.
if
current_platform
.
is_cuda
():
device_available
=
current_platform
.
has_device_capability
(
80
)
device_available
=
current_platform
.
has_device_capability
(
80
)
if
device_available
and
support_fa
:
if
device_available
and
support_fa
:
from
transformers.utils
import
is_flash_attn_2_available
from
transformers.utils
import
is_flash_attn_2_available
if
is_flash_attn_2_available
():
if
is_flash_attn_2_available
():
selected_backend
=
_Backend
.
FLASH_ATTN
selected_backend
=
_Backend
.
FLASH_ATTN
else
:
logger
.
warning_once
(
"Current `vllm-flash-attn` has a bug inside vision "
"module, so we use xformers backend instead. You can "
"run `pip install flash-attn` to use flash-attention "
"backend."
)
selected_backend
=
_Backend
.
XFORMERS
else
:
else
:
logger
.
warning_once
(
# For Volta and Turing GPUs, use xformers instead.
"Current `vllm-flash-attn` has a bug inside vision module, "
"so we use xformers backend instead. You can run "
"`pip install flash-attn` to use flash-attention backend."
)
selected_backend
=
_Backend
.
XFORMERS
selected_backend
=
_Backend
.
XFORMERS
elif
current_platform
.
is_cpu
()
or
current_platform
.
is_rocm
():
# ROCM doesn't support xformers
selected_backend
=
_Backend
.
TORCH_SDPA
else
:
else
:
selected_backend
=
_Backend
.
XFORMERS
# Default to torch SDPA for other non-GPU platforms.
selected_backend
=
_Backend
.
TORCH_SDPA
return
selected_backend
return
selected_backend
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment