Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
e64b39ea
Unverified
Commit
e64b39ea
authored
Apr 14, 2026
by
Andrew Barnes
Committed by
GitHub
Apr 14, 2026
Browse files
[ROCm] Align AiterFlashAttentionImpl attn_type check with backend (#39119)
Signed-off-by:
Bortlesboat
<
bortstheboat@gmail.com
>
parent
2faad083
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
6 additions
and
2 deletions
+6
-2
vllm/v1/attention/backends/rocm_aiter_fa.py
vllm/v1/attention/backends/rocm_aiter_fa.py
+6
-2
No files found.
vllm/v1/attention/backends/rocm_aiter_fa.py
View file @
e64b39ea
...
...
@@ -844,9 +844,13 @@ class AiterFlashAttentionImpl(AttentionImpl):
assert
self
.
num_heads
%
self
.
num_kv_heads
==
0
self
.
num_queries_per_kv
=
self
.
num_heads
//
self
.
num_kv_heads
if
attn_type
not
in
[
AttentionType
.
DECODER
,
AttentionType
.
ENCODER_DECODER
]
:
if
attn_type
!=
AttentionType
.
DECODER
:
raise
NotImplementedError
(
"Encoder self-attention is not implemented for AiterFlashAttentionImpl"
"Only decoder self-attention is supported for "
"AiterFlashAttentionImpl. ENCODER_DECODER is not supported "
"because the prefill path uses cu_seqlens_k set to decoder "
"query_start_loc with causal=True, which is incorrect for "
"cross-attention."
)
def
extend_for_sliding_window
(
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment