docs: update kvbm runbooks with a troubleshooting section (#4170)

Signed-off-by: Ziqi Fan <ziqif@nvidia.com>

docs: update kvbm runbooks with a troubleshooting section (#4170)
Signed-off-by: Ziqi Fan <ziqif@nvidia.com>
aebf1686 · Ziqi Fan · GitHub · eedfc3d4 · aebf1686 · aebf1686
Unverified Commit aebf1686 authored Nov 06, 2025 by Ziqi Fan Committed by GitHub Nov 07, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 38 additions and 11 deletions

docs/kvbm/trtllm-setup.md docs/kvbm/trtllm-setup.md +19 -10

docs/kvbm/vllm-setup.md docs/kvbm/vllm-setup.md +19 -1

No files found.
--- a/docs/kvbm/trtllm-setup.md
+++ b/docs/kvbm/trtllm-setup.md
@@ -56,20 +56,11 @@ export DYN_KVBM_DISK_CACHE_GB=8
 # [Experimental] Option 3: Disk cache only (GPU -> Disk direct offloading, bypassing CPU)
 # NOTE: this option is only experimental and it might not give out the best performance.
-# NOTE: disk offload filtering is not support when using this option.
+# NOTE: disk offload filtering is not supported when using this option.
 export DYN_KVBM_DISK_CACHE_GB=8
 # Note: You can also use DYN_KVBM_CPU_CACHE_OVERRIDE_NUM_BLOCKS or
 # DYN_KVBM_DISK_CACHE_OVERRIDE_NUM_BLOCKS to specify exact block counts instead of GB
-# Allocating memory and disk storage can take some time.
-# We recommend setting a higher timeout for leader–worker initialization.
-# 1200 means 1200 seconds timeout
-export DYN_KVBM_LEADER_WORKER_INIT_TIMEOUT_SECS=1200
-# Enable disk zerofill fallback for KVBM
-# Set to true to enable fallback behavior when disk operations fail
-export DYN_KVBM_DISK_ZEROFILL_FALLBACK=true
 ```
 > [!NOTE]
@@ -121,6 +112,24 @@ Alternatively, can use "trtllm-serve" with KVBM by replacing the above two [DYNA
 trtllm-serve Qwen/Qwen3-0.6B --host localhost --port 8000 --backend pytorch --extra_llm_api_options /tmp/kvbm_llm_api_config.yaml
 ```
+## Troubleshooting
+1. Allocating large memory and disk storage can take some time and lead to KVBM worker initialization timeout.
+To avoid it, please set a longer timeout for leader–worker initialization.
+```bash
+# 1200 means 1200 seconds timeout
+export DYN_KVBM_LEADER_WORKER_INIT_TIMEOUT_SECS=1200
+```
+2. When offloading to disk is enabled, KVBM could fail to start up if fallocate is not supported to create the files.
+To bypass the issue, please use disk zerofill fallback.
+```bash
+# Set to true to enable fallback behavior when disk operations fail (e.g. fallocate not available)
+export DYN_KVBM_DISK_ZEROFILL_FALLBACK=true
+```
 ## Enable and View KVBM Metrics
 Follow below steps to enable metrics collection and view via Grafana dashboard:

--- a/docs/kvbm/vllm-setup.md
+++ b/docs/kvbm/vllm-setup.md
@@ -70,7 +70,7 @@ cd $DYNAMO_HOME/examples/backends/vllm
 >
 > # [Experimental] Option 3: Disk cache only (GPU -> Disk direct offloading, bypassing CPU)
 > # NOTE: this option is only experimental and it might not give out the best performance.
-> # NOTE: disk offload filtering is not support when using this option.
+> # NOTE: disk offload filtering is not supported when using this option.
 > export DYN_KVBM_DISK_CACHE_GB=8
 > ```
 >
@@ -104,6 +104,24 @@ Alternatively, can use `vllm serve` directly to use KVBM for aggregated serving:
 vllm serve --kv-transfer-config '{"kv_connector":"DynamoConnector","kv_role":"kv_both", "kv_connector_module_path": "kvbm.vllm_integration.connector"}' Qwen/Qwen3-0.6B
 ```
+## Troubleshooting
+1. Allocating large memory and disk storage can take some time and lead to KVBM worker initialization timeout.
+To avoid it, please set a longer timeout for leader–worker initialization.
+```bash
+# 1200 means 1200 seconds timeout
+export DYN_KVBM_LEADER_WORKER_INIT_TIMEOUT_SECS=1200
+```
+2. When offloading to disk is enabled, KVBM could fail to start up if fallocate is not supported to create the files.
+To bypass the issue, please use disk zerofill fallback.
+```bash
+# Set to true to enable fallback behavior when disk operations fail (e.g. fallocate not available)
+export DYN_KVBM_DISK_ZEROFILL_FALLBACK=true
+```
 ## Enable and View KVBM Metrics
 Follow below steps to enable metrics collection and view via Grafana dashboard: