Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
dynamo
Commits
aebf1686
Unverified
Commit
aebf1686
authored
Nov 06, 2025
by
Ziqi Fan
Committed by
GitHub
Nov 07, 2025
Browse files
docs: update kvbm runbooks with a troubleshooting section (#4170)
Signed-off-by:
Ziqi Fan
<
ziqif@nvidia.com
>
parent
eedfc3d4
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
38 additions
and
11 deletions
+38
-11
docs/kvbm/trtllm-setup.md
docs/kvbm/trtllm-setup.md
+19
-10
docs/kvbm/vllm-setup.md
docs/kvbm/vllm-setup.md
+19
-1
No files found.
docs/kvbm/trtllm-setup.md
View file @
aebf1686
...
@@ -56,20 +56,11 @@ export DYN_KVBM_DISK_CACHE_GB=8
...
@@ -56,20 +56,11 @@ export DYN_KVBM_DISK_CACHE_GB=8
# [Experimental] Option 3: Disk cache only (GPU -> Disk direct offloading, bypassing CPU)
# [Experimental] Option 3: Disk cache only (GPU -> Disk direct offloading, bypassing CPU)
# NOTE: this option is only experimental and it might not give out the best performance.
# NOTE: this option is only experimental and it might not give out the best performance.
# NOTE: disk offload filtering is not support when using this option.
# NOTE: disk offload filtering is not support
ed
when using this option.
export
DYN_KVBM_DISK_CACHE_GB
=
8
export
DYN_KVBM_DISK_CACHE_GB
=
8
# Note: You can also use DYN_KVBM_CPU_CACHE_OVERRIDE_NUM_BLOCKS or
# Note: You can also use DYN_KVBM_CPU_CACHE_OVERRIDE_NUM_BLOCKS or
# DYN_KVBM_DISK_CACHE_OVERRIDE_NUM_BLOCKS to specify exact block counts instead of GB
# DYN_KVBM_DISK_CACHE_OVERRIDE_NUM_BLOCKS to specify exact block counts instead of GB
# Allocating memory and disk storage can take some time.
# We recommend setting a higher timeout for leader–worker initialization.
# 1200 means 1200 seconds timeout
export
DYN_KVBM_LEADER_WORKER_INIT_TIMEOUT_SECS
=
1200
# Enable disk zerofill fallback for KVBM
# Set to true to enable fallback behavior when disk operations fail
export
DYN_KVBM_DISK_ZEROFILL_FALLBACK
=
true
```
```
> [!NOTE]
> [!NOTE]
...
@@ -121,6 +112,24 @@ Alternatively, can use "trtllm-serve" with KVBM by replacing the above two [DYNA
...
@@ -121,6 +112,24 @@ Alternatively, can use "trtllm-serve" with KVBM by replacing the above two [DYNA
trtllm-serve Qwen/Qwen3-0.6B
--host
localhost
--port
8000
--backend
pytorch
--extra_llm_api_options
/tmp/kvbm_llm_api_config.yaml
trtllm-serve Qwen/Qwen3-0.6B
--host
localhost
--port
8000
--backend
pytorch
--extra_llm_api_options
/tmp/kvbm_llm_api_config.yaml
```
```
## Troubleshooting
1.
Allocating large memory and disk storage can take some time and lead to KVBM worker initialization timeout.
To avoid it, please set a longer timeout for leader–worker initialization.
```
bash
# 1200 means 1200 seconds timeout
export
DYN_KVBM_LEADER_WORKER_INIT_TIMEOUT_SECS
=
1200
```
2.
When offloading to disk is enabled, KVBM could fail to start up if fallocate is not supported to create the files.
To bypass the issue, please use disk zerofill fallback.
```
bash
# Set to true to enable fallback behavior when disk operations fail (e.g. fallocate not available)
export
DYN_KVBM_DISK_ZEROFILL_FALLBACK
=
true
```
## Enable and View KVBM Metrics
## Enable and View KVBM Metrics
Follow below steps to enable metrics collection and view via Grafana dashboard:
Follow below steps to enable metrics collection and view via Grafana dashboard:
...
...
docs/kvbm/vllm-setup.md
View file @
aebf1686
...
@@ -70,7 +70,7 @@ cd $DYNAMO_HOME/examples/backends/vllm
...
@@ -70,7 +70,7 @@ cd $DYNAMO_HOME/examples/backends/vllm
>
>
> # [Experimental] Option 3: Disk cache only (GPU -> Disk direct offloading, bypassing CPU)
> # [Experimental] Option 3: Disk cache only (GPU -> Disk direct offloading, bypassing CPU)
> # NOTE: this option is only experimental and it might not give out the best performance.
> # NOTE: this option is only experimental and it might not give out the best performance.
> # NOTE: disk offload filtering is not support when using this option.
> # NOTE: disk offload filtering is not support
ed
when using this option.
> export DYN_KVBM_DISK_CACHE_GB=8
> export DYN_KVBM_DISK_CACHE_GB=8
> ```
> ```
>
>
...
@@ -104,6 +104,24 @@ Alternatively, can use `vllm serve` directly to use KVBM for aggregated serving:
...
@@ -104,6 +104,24 @@ Alternatively, can use `vllm serve` directly to use KVBM for aggregated serving:
vllm serve
--kv-transfer-config
'{"kv_connector":"DynamoConnector","kv_role":"kv_both", "kv_connector_module_path": "kvbm.vllm_integration.connector"}'
Qwen/Qwen3-0.6B
vllm serve
--kv-transfer-config
'{"kv_connector":"DynamoConnector","kv_role":"kv_both", "kv_connector_module_path": "kvbm.vllm_integration.connector"}'
Qwen/Qwen3-0.6B
```
```
## Troubleshooting
1.
Allocating large memory and disk storage can take some time and lead to KVBM worker initialization timeout.
To avoid it, please set a longer timeout for leader–worker initialization.
```
bash
# 1200 means 1200 seconds timeout
export
DYN_KVBM_LEADER_WORKER_INIT_TIMEOUT_SECS
=
1200
```
2.
When offloading to disk is enabled, KVBM could fail to start up if fallocate is not supported to create the files.
To bypass the issue, please use disk zerofill fallback.
```
bash
# Set to true to enable fallback behavior when disk operations fail (e.g. fallocate not available)
export
DYN_KVBM_DISK_ZEROFILL_FALLBACK
=
true
```
## Enable and View KVBM Metrics
## Enable and View KVBM Metrics
Follow below steps to enable metrics collection and view via Grafana dashboard:
Follow below steps to enable metrics collection and view via Grafana dashboard:
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment