Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
bd0e7802
Unverified
Commit
bd0e7802
authored
Jun 03, 2024
by
Zhuohan Li
Committed by
GitHub
Jun 03, 2024
Browse files
[Bugfix] Add warmup for prefix caching example (#5235)
parent
06b2550c
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
4 additions
and
2 deletions
+4
-2
examples/offline_inference_with_prefix.py
examples/offline_inference_with_prefix.py
+4
-2
No files found.
examples/offline_inference_with_prefix.py
View file @
bd0e7802
...
...
@@ -51,8 +51,10 @@ for output in outputs:
print
(
"-"
*
80
)
# The llm.generate call will batch all prompts and send the batch at once
# if resources allow.
# Warmup so that the shared prompt's KV cache is computed.
prefix_cached_llm
.
generate
(
generating_prompts
[
0
],
sampling_params
)
# Generate with prefix caching.
start_time_cached
=
time
()
outputs
=
prefix_cached_llm
.
generate
(
generating_prompts
,
sampling_params
)
duration_cached
=
time
()
-
start_time_cached
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment