Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
dynamo
Commits
b0605788
Unverified
Commit
b0605788
authored
Nov 19, 2025
by
Ziqi Fan
Committed by
GitHub
Nov 20, 2025
Browse files
docs: add Developing Locally section to KVBM runbooks (#4488)
Signed-off-by:
Ziqi Fan
<
ziqif@nvidia.com
>
parent
f3f764eb
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
20 additions
and
0 deletions
+20
-0
docs/kvbm/trtllm-setup.md
docs/kvbm/trtllm-setup.md
+10
-0
docs/kvbm/vllm-setup.md
docs/kvbm/vllm-setup.md
+10
-0
No files found.
docs/kvbm/trtllm-setup.md
View file @
b0605788
...
@@ -199,3 +199,13 @@ EOF
...
@@ -199,3 +199,13 @@ EOF
# Run trtllm-serve for the baseline for comparison
# Run trtllm-serve for the baseline for comparison
trtllm-serve Qwen/Qwen3-0.6B
--host
localhost
--port
8000
--backend
pytorch
--extra_llm_api_options
/tmp/llm_api_config.yaml &
trtllm-serve Qwen/Qwen3-0.6B
--host
localhost
--port
8000
--backend
pytorch
--extra_llm_api_options
/tmp/llm_api_config.yaml &
```
```
## Developing Locally
Inside the Dynamo container, after changing KVBM related code (Rust and/or Python), to test or use it:
```
bash
cd
/workspace/lib/bindings/kvbm
uv pip
install
maturin[patchelf]
maturin build
--release
--out
/workspace/dist
uv pip
install
--upgrade
--force-reinstall
--no-deps
/workspace/dist/kvbm
*
.whl
```
docs/kvbm/vllm-setup.md
View file @
b0605788
...
@@ -180,3 +180,13 @@ More details about how to use LMBenchmark could be found [here](https://github.c
...
@@ -180,3 +180,13 @@ More details about how to use LMBenchmark could be found [here](https://github.c
`NOTE`
: if metrics are enabled as mentioned in the above section, you can observe KV offloading, and KV onboarding in the grafana dashboard.
`NOTE`
: if metrics are enabled as mentioned in the above section, you can observe KV offloading, and KV onboarding in the grafana dashboard.
To compare, you can run
`vllm serve Qwen/Qwen3-0.6B`
to turn KVBM off as the baseline.
To compare, you can run
`vllm serve Qwen/Qwen3-0.6B`
to turn KVBM off as the baseline.
## Developing Locally
Inside the Dynamo container, after changing KVBM related code (Rust and/or Python), to test or use it:
```
bash
cd
/workspace/lib/bindings/kvbm
uv pip
install
maturin[patchelf]
maturin build
--release
--out
/workspace/dist
uv pip
install
--upgrade
--force-reinstall
--no-deps
/workspace/dist/kvbm
*
.whl
```
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment