Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
468e2400
Unverified
Commit
468e2400
authored
Jul 19, 2025
by
Lucas Wilkinson
Committed by
GitHub
Jul 18, 2025
Browse files
[BugFix][CPU] Fix `TorchSDPABackendImpl` doesn't have `use_irope` (#21200)
Signed-off-by:
Lucas Wilkinson
<
lwilkins@redhat.com
>
parent
dcc6cfb9
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
2 additions
and
1 deletion
+2
-1
vllm/v1/worker/gpu_model_runner.py
vllm/v1/worker/gpu_model_runner.py
+2
-1
No files found.
vllm/v1/worker/gpu_model_runner.py
View file @
468e2400
...
@@ -2668,7 +2668,8 @@ class GPUModelRunner(LoRAModelRunnerMixin):
...
@@ -2668,7 +2668,8 @@ class GPUModelRunner(LoRAModelRunnerMixin):
# TODO: Support other attention modules, e.g., cross-attention
# TODO: Support other attention modules, e.g., cross-attention
if
attn_module
.
attn_type
==
AttentionType
.
DECODER
:
if
attn_module
.
attn_type
==
AttentionType
.
DECODER
:
use_local_attention
=
(
self
.
attention_chunk_size
is
not
None
use_local_attention
=
(
self
.
attention_chunk_size
is
not
None
and
attn_module
.
impl
.
use_irope
)
and
getattr
(
attn_module
.
impl
,
"use_irope"
,
False
))
if
attn_module
.
sliding_window
is
not
None
:
if
attn_module
.
sliding_window
is
not
None
:
kv_cache_spec
[
layer_name
]
=
SlidingWindowSpec
(
kv_cache_spec
[
layer_name
]
=
SlidingWindowSpec
(
block_size
=
block_size
,
block_size
=
block_size
,
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment