Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
norm
vllm
Commits
5d80a917
"vscode:/vscode.git/clone" did not exist on "1ca608bcd155c771d0fed683a75d8367fe9c7144"
Unverified
Commit
5d80a917
authored
Jan 18, 2024
by
Jason Zhu
Committed by
GitHub
Jan 18, 2024
Browse files
Minor fix in prefill cache example (#2494)
parent
8a25d3a7
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
10 additions
and
2 deletions
+10
-2
examples/offline_inference_with_prefix.py
examples/offline_inference_with_prefix.py
+10
-2
No files found.
examples/offline_inference_with_prefix.py
View file @
5d80a917
...
...
@@ -40,8 +40,16 @@ print("-" * 80)
# -1 since the last token can change when concatenating prompts.
prefix_pos
=
len
(
llm
.
llm_engine
.
tokenizer
.
encode
(
prefix
))
-
1
# Generate with prefix
outputs
=
llm
.
generate
(
generating_prompts
,
sampling_params
,
# The llm.generate call will batch all prompts and send the batch at once if resources allow.
# The prefix will only be cached after the first batch is processed, so we need to call generate once
# to calculate the prefix and cache it.
outputs
=
llm
.
generate
(
generating_prompts
[
0
],
sampling_params
,
prefix_pos
=
[
prefix_pos
])
# Subsequent batches can leverage the cached prefix
outputs
=
llm
.
generate
(
generating_prompts
,
sampling_params
,
prefix_pos
=
[
prefix_pos
]
*
len
(
generating_prompts
))
# Print the outputs. You should see the same outputs as before
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment