Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
dynamo
Commits
1ba53f2c
Unverified
Commit
1ba53f2c
authored
Sep 16, 2025
by
Ziqi Fan
Committed by
GitHub
Sep 16, 2025
Browse files
docs: add benchmark section to KVBM vLLM runbook (#3066)
Signed-off-by:
Ziqi Fan
<
ziqif@nvidia.com
>
parent
960dc896
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
23 additions
and
0 deletions
+23
-0
docs/guides/run_kvbm_in_vllm.md
docs/guides/run_kvbm_in_vllm.md
+23
-0
No files found.
docs/guides/run_kvbm_in_vllm.md
View file @
1ba53f2c
...
...
@@ -77,3 +77,26 @@ sudo ufw allow 6881/tcp
```
View grafana metrics via http://localhost:3001 (default login: dynamo/dynamo) and look for KVBM Dashboard
## Benchmark KVBM
Once vllm serve is ready, follow below steps to use LMBenchmark to benchmark KVBM performance:
```
bash
git clone https://github.com/LMCache/LMBenchmark.git
# show case of running the synthetic multi-turn chat dataset.
# we are passing model, endpoint, output file prefix and qps to the sh script.
cd
LMBenchmark/synthetic-multi-round-qa
./long_input_short_output_run.sh
\
"deepseek-ai/DeepSeek-R1-Distill-Llama-8B"
\
"http://localhost:8000"
\
"benchmark_kvbm"
\
1
# Average TTFT and other perf numbers would be in the output from above cmd
```
More details about how to use LMBenchmark could be found
[
here
](
https://github.com/LMCache/LMBenchmark
)
.
`NOTE`
: if metrics are enabled as mentioned in the above section, you can observe KV offloading, and KV onboarding in the grafana dashboard.
To compare, you can run
`vllm serve deepseek-ai/DeepSeek-R1-Distill-Llama-8B`
to turn KVBM off as the baseline.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment