Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
05c531be
Unverified
Commit
05c531be
authored
Oct 04, 2024
by
Andy Dai
Committed by
GitHub
Oct 04, 2024
Browse files
[Misc] Improved prefix cache example (#9077)
parent
fbb74420
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
3 additions
and
9 deletions
+3
-9
examples/offline_inference_with_prefix.py
examples/offline_inference_with_prefix.py
+3
-9
No files found.
examples/offline_inference_with_prefix.py
View file @
05c531be
from
time
import
time
from
vllm
import
LLM
,
SamplingParams
# NOTE: This is just a running example. For benchmarking purpose,
# please see benchmarks/benchmark_prefix_caching.py
# Common prefix.
prefix
=
(
"You are an expert school principal, skilled in effectively managing "
...
...
@@ -37,9 +38,7 @@ print("Results without `enable_prefix_caching`")
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
start_time_regular
=
time
()
outputs
=
regular_llm
.
generate
(
generating_prompts
,
sampling_params
)
duration_regular
=
time
()
-
start_time_regular
regular_generated_texts
=
[]
# Print the outputs.
...
...
@@ -55,9 +54,7 @@ print("-" * 80)
prefix_cached_llm
.
generate
(
generating_prompts
[
0
],
sampling_params
)
# Generate with prefix caching.
start_time_cached
=
time
()
outputs
=
prefix_cached_llm
.
generate
(
generating_prompts
,
sampling_params
)
duration_cached
=
time
()
-
start_time_cached
print
(
"Results with `enable_prefix_caching`"
)
...
...
@@ -77,6 +74,3 @@ generated_same = all([
for
i
in
range
(
len
(
prompts
))
])
print
(
f
"Generated answers are the same:
{
generated_same
}
"
)
speedup
=
round
(
duration_regular
/
duration_cached
,
2
)
print
(
f
"Speed up of cached generation compared to the regular is:
{
speedup
}
"
)
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment