Unverified Commit aebf1686 authored by Ziqi Fan's avatar Ziqi Fan Committed by GitHub
Browse files

docs: update kvbm runbooks with a troubleshooting section (#4170)


Signed-off-by: default avatarZiqi Fan <ziqif@nvidia.com>
parent eedfc3d4
...@@ -56,20 +56,11 @@ export DYN_KVBM_DISK_CACHE_GB=8 ...@@ -56,20 +56,11 @@ export DYN_KVBM_DISK_CACHE_GB=8
# [Experimental] Option 3: Disk cache only (GPU -> Disk direct offloading, bypassing CPU) # [Experimental] Option 3: Disk cache only (GPU -> Disk direct offloading, bypassing CPU)
# NOTE: this option is only experimental and it might not give out the best performance. # NOTE: this option is only experimental and it might not give out the best performance.
# NOTE: disk offload filtering is not support when using this option. # NOTE: disk offload filtering is not supported when using this option.
export DYN_KVBM_DISK_CACHE_GB=8 export DYN_KVBM_DISK_CACHE_GB=8
# Note: You can also use DYN_KVBM_CPU_CACHE_OVERRIDE_NUM_BLOCKS or # Note: You can also use DYN_KVBM_CPU_CACHE_OVERRIDE_NUM_BLOCKS or
# DYN_KVBM_DISK_CACHE_OVERRIDE_NUM_BLOCKS to specify exact block counts instead of GB # DYN_KVBM_DISK_CACHE_OVERRIDE_NUM_BLOCKS to specify exact block counts instead of GB
# Allocating memory and disk storage can take some time.
# We recommend setting a higher timeout for leader–worker initialization.
# 1200 means 1200 seconds timeout
export DYN_KVBM_LEADER_WORKER_INIT_TIMEOUT_SECS=1200
# Enable disk zerofill fallback for KVBM
# Set to true to enable fallback behavior when disk operations fail
export DYN_KVBM_DISK_ZEROFILL_FALLBACK=true
``` ```
> [!NOTE] > [!NOTE]
...@@ -121,6 +112,24 @@ Alternatively, can use "trtllm-serve" with KVBM by replacing the above two [DYNA ...@@ -121,6 +112,24 @@ Alternatively, can use "trtllm-serve" with KVBM by replacing the above two [DYNA
trtllm-serve Qwen/Qwen3-0.6B --host localhost --port 8000 --backend pytorch --extra_llm_api_options /tmp/kvbm_llm_api_config.yaml trtllm-serve Qwen/Qwen3-0.6B --host localhost --port 8000 --backend pytorch --extra_llm_api_options /tmp/kvbm_llm_api_config.yaml
``` ```
## Troubleshooting
1. Allocating large memory and disk storage can take some time and lead to KVBM worker initialization timeout.
To avoid it, please set a longer timeout for leader–worker initialization.
```bash
# 1200 means 1200 seconds timeout
export DYN_KVBM_LEADER_WORKER_INIT_TIMEOUT_SECS=1200
```
2. When offloading to disk is enabled, KVBM could fail to start up if fallocate is not supported to create the files.
To bypass the issue, please use disk zerofill fallback.
```bash
# Set to true to enable fallback behavior when disk operations fail (e.g. fallocate not available)
export DYN_KVBM_DISK_ZEROFILL_FALLBACK=true
```
## Enable and View KVBM Metrics ## Enable and View KVBM Metrics
Follow below steps to enable metrics collection and view via Grafana dashboard: Follow below steps to enable metrics collection and view via Grafana dashboard:
......
...@@ -70,7 +70,7 @@ cd $DYNAMO_HOME/examples/backends/vllm ...@@ -70,7 +70,7 @@ cd $DYNAMO_HOME/examples/backends/vllm
> >
> # [Experimental] Option 3: Disk cache only (GPU -> Disk direct offloading, bypassing CPU) > # [Experimental] Option 3: Disk cache only (GPU -> Disk direct offloading, bypassing CPU)
> # NOTE: this option is only experimental and it might not give out the best performance. > # NOTE: this option is only experimental and it might not give out the best performance.
> # NOTE: disk offload filtering is not support when using this option. > # NOTE: disk offload filtering is not supported when using this option.
> export DYN_KVBM_DISK_CACHE_GB=8 > export DYN_KVBM_DISK_CACHE_GB=8
> ``` > ```
> >
...@@ -104,6 +104,24 @@ Alternatively, can use `vllm serve` directly to use KVBM for aggregated serving: ...@@ -104,6 +104,24 @@ Alternatively, can use `vllm serve` directly to use KVBM for aggregated serving:
vllm serve --kv-transfer-config '{"kv_connector":"DynamoConnector","kv_role":"kv_both", "kv_connector_module_path": "kvbm.vllm_integration.connector"}' Qwen/Qwen3-0.6B vllm serve --kv-transfer-config '{"kv_connector":"DynamoConnector","kv_role":"kv_both", "kv_connector_module_path": "kvbm.vllm_integration.connector"}' Qwen/Qwen3-0.6B
``` ```
## Troubleshooting
1. Allocating large memory and disk storage can take some time and lead to KVBM worker initialization timeout.
To avoid it, please set a longer timeout for leader–worker initialization.
```bash
# 1200 means 1200 seconds timeout
export DYN_KVBM_LEADER_WORKER_INIT_TIMEOUT_SECS=1200
```
2. When offloading to disk is enabled, KVBM could fail to start up if fallocate is not supported to create the files.
To bypass the issue, please use disk zerofill fallback.
```bash
# Set to true to enable fallback behavior when disk operations fail (e.g. fallocate not available)
export DYN_KVBM_DISK_ZEROFILL_FALLBACK=true
```
## Enable and View KVBM Metrics ## Enable and View KVBM Metrics
Follow below steps to enable metrics collection and view via Grafana dashboard: Follow below steps to enable metrics collection and view via Grafana dashboard:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment