Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
624a1e47
Unverified
Commit
624a1e47
authored
Jan 27, 2025
by
Woosuk Kwon
Committed by
GitHub
Jan 27, 2025
Browse files
[V1][Minor] Minor optimizations for update_from_output (#12454)
Signed-off-by:
Woosuk Kwon
<
woosuk.kwon@berkeley.edu
>
parent
372bf089
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
13 additions
and
7 deletions
+13
-7
vllm/v1/core/scheduler.py
vllm/v1/core/scheduler.py
+13
-7
No files found.
vllm/v1/core/scheduler.py
View file @
624a1e47
...
...
@@ -411,6 +411,10 @@ class Scheduler:
num_scheduled_tokens
=
scheduler_output
.
num_scheduled_tokens
new_running
:
List
[
Request
]
=
[]
outputs
:
List
[
EngineCoreOutput
]
=
[]
# NOTE(woosuk): As len(self.running) can be up to 1K or more, the below
# loop can be a performance bottleneck. We should do our best to avoid
# expensive operations inside the loop.
for
request
in
self
.
running
:
req_id
=
request
.
request_id
request
.
num_computed_tokens
+=
num_scheduled_tokens
[
req_id
]
...
...
@@ -421,13 +425,15 @@ class Scheduler:
cached_encoder_input_ids
=
(
self
.
encoder_cache_manager
.
get_cached_input_ids
(
request
))
for
input_id
in
list
(
cached_encoder_input_ids
):
start_pos
=
request
.
mm_positions
[
input_id
][
"offset"
]
num_tokens
=
request
.
mm_positions
[
input_id
][
"length"
]
if
start_pos
+
num_tokens
<=
request
.
num_computed_tokens
:
# The encoder output is already processed and stored
# in the decoder's KV cache.
self
.
encoder_cache_manager
.
free
(
request
,
input_id
)
# OPTIMIZATION: Avoid list(set) if the set is empty.
if
cached_encoder_input_ids
:
for
input_id
in
list
(
cached_encoder_input_ids
):
start_pos
=
request
.
mm_positions
[
input_id
][
"offset"
]
num_tokens
=
request
.
mm_positions
[
input_id
][
"length"
]
if
start_pos
+
num_tokens
<=
request
.
num_computed_tokens
:
# The encoder output is already processed and stored
# in the decoder's KV cache.
self
.
encoder_cache_manager
.
free
(
request
,
input_id
)
if
request
.
num_computed_tokens
==
request
.
num_tokens
:
req_index
=
model_runner_output
.
req_id_to_index
[
req_id
]
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment