Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
7f417161
Commit
7f417161
authored
Oct 20, 2025
by
zhuwenwen
Browse files
switching to the implementation of MHA in FA
parent
f3731273
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
11 additions
and
4 deletions
+11
-4
vllm/attention/layer.py
vllm/attention/layer.py
+11
-4
No files found.
vllm/attention/layer.py
View file @
7f417161
...
...
@@ -416,11 +416,14 @@ class MultiHeadAttention(nn.Module):
backend
=
_Backend
.
FLASH_ATTN
use_upstream_fa
=
True
if
current_platform
.
is_rocm
()
or
current_platform
.
is_xpu
():
if
current_platform
.
is_xpu
():
# currently, only torch_sdpa is supported on rocm/xpu
self
.
attn_backend
=
_Backend
.
TORCH_SDPA
elif
current_platform
.
is_rocm
():
self
.
attn_backend
=
backend
if
backend
in
{
_Backend
.
FLASH_ATTN
,
}
else
_Backend
.
TORCH_SDPA
else
:
self
.
attn_backend
=
backend
if
backend
in
{
_Backend
.
TORCH_SDPA
,
_Backend
.
XFORMERS
,
...
...
@@ -437,6 +440,10 @@ class MultiHeadAttention(nn.Module):
if
use_upstream_fa
:
from
flash_attn
import
flash_attn_varlen_func
self
.
_flash_attn_varlen_func
=
flash_attn_varlen_func
else
:
if
current_platform
.
is_rocm
():
from
flash_attn
import
flash_attn_varlen_func
self
.
_flash_attn_varlen_func
=
flash_attn_varlen_func
else
:
from
vllm.vllm_flash_attn
import
flash_attn_varlen_func
self
.
_flash_attn_varlen_func
=
flash_attn_varlen_func
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment