Unverified Commit b0605788 authored by Ziqi Fan's avatar Ziqi Fan Committed by GitHub
Browse files

docs: add Developing Locally section to KVBM runbooks (#4488)


Signed-off-by: default avatarZiqi Fan <ziqif@nvidia.com>
parent f3f764eb
...@@ -199,3 +199,13 @@ EOF ...@@ -199,3 +199,13 @@ EOF
# Run trtllm-serve for the baseline for comparison # Run trtllm-serve for the baseline for comparison
trtllm-serve Qwen/Qwen3-0.6B --host localhost --port 8000 --backend pytorch --extra_llm_api_options /tmp/llm_api_config.yaml & trtllm-serve Qwen/Qwen3-0.6B --host localhost --port 8000 --backend pytorch --extra_llm_api_options /tmp/llm_api_config.yaml &
``` ```
## Developing Locally
Inside the Dynamo container, after changing KVBM related code (Rust and/or Python), to test or use it:
```bash
cd /workspace/lib/bindings/kvbm
uv pip install maturin[patchelf]
maturin build --release --out /workspace/dist
uv pip install --upgrade --force-reinstall --no-deps /workspace/dist/kvbm*.whl
```
...@@ -180,3 +180,13 @@ More details about how to use LMBenchmark could be found [here](https://github.c ...@@ -180,3 +180,13 @@ More details about how to use LMBenchmark could be found [here](https://github.c
`NOTE`: if metrics are enabled as mentioned in the above section, you can observe KV offloading, and KV onboarding in the grafana dashboard. `NOTE`: if metrics are enabled as mentioned in the above section, you can observe KV offloading, and KV onboarding in the grafana dashboard.
To compare, you can run `vllm serve Qwen/Qwen3-0.6B` to turn KVBM off as the baseline. To compare, you can run `vllm serve Qwen/Qwen3-0.6B` to turn KVBM off as the baseline.
## Developing Locally
Inside the Dynamo container, after changing KVBM related code (Rust and/or Python), to test or use it:
```bash
cd /workspace/lib/bindings/kvbm
uv pip install maturin[patchelf]
maturin build --release --out /workspace/dist
uv pip install --upgrade --force-reinstall --no-deps /workspace/dist/kvbm*.whl
```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment