Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
03b5f940
Unverified
Commit
03b5f940
authored
Dec 10, 2025
by
dongbo910220
Committed by
GitHub
Dec 10, 2025
Browse files
[V1][Spec Decode] Optimize Medusa proposer to avoid GPU-CPU sync (#29723)
Signed-off-by:
dongbo910220
<
1275604947@qq.com
>
parent
2e7054da
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
6 additions
and
6 deletions
+6
-6
vllm/v1/spec_decode/medusa.py
vllm/v1/spec_decode/medusa.py
+6
-6
No files found.
vllm/v1/spec_decode/medusa.py
View file @
03b5f940
...
@@ -38,16 +38,16 @@ class MedusaProposer:
...
@@ -38,16 +38,16 @@ class MedusaProposer:
self
,
self
,
target_hidden_states
:
torch
.
Tensor
,
target_hidden_states
:
torch
.
Tensor
,
sampling_metadata
:
SamplingMetadata
,
sampling_metadata
:
SamplingMetadata
,
)
->
list
[
list
[
int
]]
:
)
->
torch
.
Tensor
:
# Generate blocks and compute logits
# Generate blocks and compute logits
blocks
=
self
.
model
(
target_hidden_states
)
blocks
=
self
.
model
(
target_hidden_states
)
logits
=
self
.
model
.
compute_logits
(
blocks
)
logits
=
self
.
model
.
compute_logits
(
blocks
)
#
Get draft tokens and transpose the result
#
Compute argmax for each Medusa head and stack into a single tensor
#
TODO(woosuk): OPTIMIZATION: Return GPU tensor without GPU-CPU
#
Shape: [batch_size, num_heads]
# synchronization.
draft_tokens
=
torch
.
stack
([
logit
.
argmax
(
dim
=-
1
)
for
logit
in
logits
],
dim
=
1
)
draft_tokens
=
[
logit
.
argmax
(
dim
=-
1
).
tolist
()
for
logit
in
logits
]
return
[
list
(
row
)
for
row
in
zip
(
*
draft_tokens
)]
return
draft_tokens
def
load_model
(
self
,
target_model
:
nn
.
Module
)
->
None
:
def
load_model
(
self
,
target_model
:
nn
.
Module
)
->
None
:
from
vllm.compilation.backends
import
set_model_tag
from
vllm.compilation.backends
import
set_model_tag
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment