Unverified Commit bd0e7802 authored by Zhuohan Li's avatar Zhuohan Li Committed by GitHub
Browse files

[Bugfix] Add warmup for prefix caching example (#5235)

parent 06b2550c
...@@ -51,8 +51,10 @@ for output in outputs: ...@@ -51,8 +51,10 @@ for output in outputs:
print("-" * 80) print("-" * 80)
# The llm.generate call will batch all prompts and send the batch at once # Warmup so that the shared prompt's KV cache is computed.
# if resources allow. prefix_cached_llm.generate(generating_prompts[0], sampling_params)
# Generate with prefix caching.
start_time_cached = time() start_time_cached = time()
outputs = prefix_cached_llm.generate(generating_prompts, sampling_params) outputs = prefix_cached_llm.generate(generating_prompts, sampling_params)
duration_cached = time() - start_time_cached duration_cached = time() - start_time_cached
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment