Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
70af44fd
Unverified
Commit
70af44fd
authored
Nov 07, 2025
by
gnovack
Committed by
GitHub
Nov 08, 2025
Browse files
[bugfix] support eagle with lora cudagraph specialization (#28318)
Signed-off-by:
gnovack
<
gnovack@amazon.com
>
parent
781f5ebf
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
12 additions
and
1 deletion
+12
-1
vllm/v1/worker/gpu_model_runner.py
vllm/v1/worker/gpu_model_runner.py
+12
-1
No files found.
vllm/v1/worker/gpu_model_runner.py
View file @
70af44fd
...
...
@@ -3602,7 +3602,18 @@ class GPUModelRunner(LoRAModelRunnerMixin, KVConnectorModelRunnerMixin):
cudagraph_runtime_mode
==
CUDAGraphMode
.
PIECEWISE
and
not
self
.
speculative_config
.
enforce_eager
)
self
.
drafter
.
dummy_run
(
num_tokens
,
use_cudagraphs
=
use_cudagraphs
)
# Note(gnovack) - We need to disable cudagraphs for one of the two
# lora cases when cudagraph_specialize_lora is enabled. This is a
# short term mitigation for issue mentioned in
# https://github.com/vllm-project/vllm/issues/28334
if
self
.
compilation_config
.
cudagraph_specialize_lora
and
activate_lora
:
use_cudagraphs
=
False
self
.
drafter
.
dummy_run
(
num_tokens
,
use_cudagraphs
=
use_cudagraphs
,
)
# This is necessary to avoid blocking DP.
# For dummy runs, we typically skip EPLB since we don't have any real
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment