Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
1f3a2c29
Unverified
Commit
1f3a2c29
authored
Jan 27, 2026
by
Nicolò Lucchesi
Committed by
GitHub
Jan 27, 2026
Browse files
[Bugfix] Disable CG for Whisper+FA2 (#33164)
Signed-off-by:
NickLucche
<
nlucches@redhat.com
>
parent
7227d061
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
20 additions
and
0 deletions
+20
-0
vllm/v1/attention/backends/flash_attn.py
vllm/v1/attention/backends/flash_attn.py
+20
-0
No files found.
vllm/v1/attention/backends/flash_attn.py
View file @
1f3a2c29
...
...
@@ -257,6 +257,26 @@ class FlashAttentionMetadataBuilder(AttentionMetadataBuilder[FlashAttentionMetad
)
supports_update_block_table
:
bool
=
True
@
classmethod
def
get_cudagraph_support
(
cls
,
vllm_config
:
"VllmConfig"
,
kv_cache_spec
:
"AttentionSpec"
,
)
->
AttentionCGSupport
:
# FA2 does not support CUDA graphs with encoder-decoder models due to
# accuracy issues reported in https://github.com/vllm-project/vllm/issues/33091
if
(
vllm_config
.
model_config
.
is_encoder_decoder
and
get_flash_attn_version
()
==
2
):
logger
.
warning_once
(
"FlashAttention2 does not support CUDA graphs with "
"encoder-decoder models due to accuracy issues reported in #33091. "
"Disabling CUDA graph."
)
return
AttentionCGSupport
.
NEVER
return
cls
.
_cudagraph_support
def
__init__
(
self
,
kv_cache_spec
:
AttentionSpec
,
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment