Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
faa02757
Unverified
Commit
faa02757
authored
Mar 16, 2025
by
Woosuk Kwon
Committed by
GitHub
Mar 16, 2025
Browse files
[V1] Optimize the overhead of rewinding (#14905)
Signed-off-by:
Woosuk Kwon
<
woosuk.kwon@berkeley.edu
>
parent
8a5a9b70
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
5 additions
and
6 deletions
+5
-6
vllm/v1/worker/gpu_model_runner.py
vllm/v1/worker/gpu_model_runner.py
+5
-6
No files found.
vllm/v1/worker/gpu_model_runner.py
View file @
faa02757
...
@@ -1032,15 +1032,14 @@ class GPUModelRunner(LoRAModelRunnerMixin):
...
@@ -1032,15 +1032,14 @@ class GPUModelRunner(LoRAModelRunnerMixin):
# TODO(woosuk): The following loop can be slow since it iterates over
# TODO(woosuk): The following loop can be slow since it iterates over
# the requests one by one. Optimize.
# the requests one by one. Optimize.
for
i
,
req_id
in
enumerate
(
self
.
input_batch
.
req_ids
):
for
i
,
generator
in
self
.
input_batch
.
generators
.
items
():
req_id
=
self
.
input_batch
.
req_ids
[
i
]
req_state
=
self
.
requests
[
req_id
]
req_state
=
self
.
requests
[
req_id
]
seq_len
=
(
req_state
.
num_computed_tokens
+
seq_len
=
(
req_state
.
num_computed_tokens
+
scheduler_output
.
num_scheduled_tokens
[
req_id
])
scheduler_output
.
num_scheduled_tokens
[
req_id
])
if
seq_len
<
req_state
.
num_tokens
:
if
seq_len
<
req_state
.
num_tokens
:
# Ignore the sampled token.
# Ignore the sampled token
for partial prefills
.
# Rewind the generator state as if the token was not sampled.
# Rewind the generator state as if the token was not sampled.
generator
=
self
.
input_batch
.
generators
.
get
(
i
)
if
generator
is
not
None
:
# This relies on cuda-specific torch-internal impl details
# This relies on cuda-specific torch-internal impl details
generator
.
set_offset
(
generator
.
get_offset
()
-
4
)
generator
.
set_offset
(
generator
.
get_offset
()
-
4
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment