Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
624a1e47
Unverified
Commit
624a1e47
authored
Jan 27, 2025
by
Woosuk Kwon
Committed by
GitHub
Jan 27, 2025
Browse files
[V1][Minor] Minor optimizations for update_from_output (#12454)
Signed-off-by:
Woosuk Kwon
<
woosuk.kwon@berkeley.edu
>
parent
372bf089
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
13 additions
and
7 deletions
+13
-7
vllm/v1/core/scheduler.py
vllm/v1/core/scheduler.py
+13
-7
No files found.
vllm/v1/core/scheduler.py
View file @
624a1e47
...
...
@@ -411,6 +411,10 @@ class Scheduler:
num_scheduled_tokens
=
scheduler_output
.
num_scheduled_tokens
new_running
:
List
[
Request
]
=
[]
outputs
:
List
[
EngineCoreOutput
]
=
[]
# NOTE(woosuk): As len(self.running) can be up to 1K or more, the below
# loop can be a performance bottleneck. We should do our best to avoid
# expensive operations inside the loop.
for
request
in
self
.
running
:
req_id
=
request
.
request_id
request
.
num_computed_tokens
+=
num_scheduled_tokens
[
req_id
]
...
...
@@ -421,6 +425,8 @@ class Scheduler:
cached_encoder_input_ids
=
(
self
.
encoder_cache_manager
.
get_cached_input_ids
(
request
))
# OPTIMIZATION: Avoid list(set) if the set is empty.
if
cached_encoder_input_ids
:
for
input_id
in
list
(
cached_encoder_input_ids
):
start_pos
=
request
.
mm_positions
[
input_id
][
"offset"
]
num_tokens
=
request
.
mm_positions
[
input_id
][
"length"
]
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment