docs: updated KVBM docs to specify correct configurations for the write-through property (#6506)

Signed-off-by: Kyle McGill <kmcgill@nvidia.com>

docs: updated KVBM docs to specify correct configurations for the write-through property (#6506)
Signed-off-by: Kyle McGill <kmcgill@nvidia.com>
5c64ffc3 · Kyle McGill · GitHub · 6642e23e · 5c64ffc3 · 5c64ffc3
Unverified Commit 5c64ffc3 authored Feb 24, 2026 by Kyle McGill Committed by GitHub Feb 24, 2026
Hide whitespace changes
Inline Side-by-side

Showing with 5 additions and 2 deletions

docs/pages/components/kvbm/README.md docs/pages/components/kvbm/README.md +1 -1

docs/pages/components/kvbm/kvbm-guide.md docs/pages/components/kvbm/kvbm-guide.md +4 -1

No files found.
--- a/docs/pages/components/kvbm/README.md
+++ b/docs/pages/components/kvbm/README.md
@@ -6,7 +6,7 @@ title: KVBM

 # KV Block Manager (KVBM)

-The Dynamo KV Block Manager (KVBM) is a scalable runtime component designed to handle memory allocation, management, and remote sharing of Key-Value (KV) blocks for inference tasks across heterogeneous and distributed environments. It acts as a unified memory layer for frameworks like vLLM and TensorRT-LLM.
+The Dynamo KV Block Manager (KVBM) is a scalable runtime component designed to handle memory allocation, management, and remote sharing of Key-Value (KV) blocks for inference tasks across heterogeneous and distributed environments. It acts as a unified memory layer and write-through cache for frameworks like vLLM and TensorRT-LLM.

 KVBM offers:
 - A **unified memory API** spanning GPU memory, pinned host memory, remote RDMA-accessible memory, local/distributed SSDs, and remote file/object/cloud storage systems

--- a/docs/pages/components/kvbm/kvbm-guide.md
+++ b/docs/pages/components/kvbm/kvbm-guide.md
@@ -6,7 +6,7 @@ subtitle: Enable KV offloading using KV Block Manager (KVBM) for Dynamo deployme
 ---

 # KVBM Guide
-The Dynamo KV Block Manager (KVBM) is a scalable runtime component designed to handle memory allocation, management, and remote sharing of Key-Value (KV) blocks for inference tasks across heterogeneous and distributed environments. It acts as a unified memory layer for frameworks like vLLM and TensorRT-LLM.
+The Dynamo KV Block Manager (KVBM) is a scalable runtime component designed to handle memory allocation, management, and remote sharing of Key-Value (KV) blocks for inference tasks across heterogeneous and distributed environments. It acts as a unified memory layer and write-through cache for frameworks like vLLM and TensorRT-LLM.

 KVBM is modular and can be used standalone via `pip install kvbm` or as the memory management component in the full Dynamo stack. This guide covers installation, configuration, and deployment of the Dynamo KV Block Manager (KVBM) and other KV cache management systems.

@@ -237,6 +237,9 @@ You can also specify exact block counts instead of GB:
 - `DYN_KVBM_CPU_CACHE_OVERRIDE_NUM_BLOCKS`
 - `DYN_KVBM_DISK_CACHE_OVERRIDE_NUM_BLOCKS`

+> [!NOTE] KVBM is a write-through cache and it is possible to misconfigure. Each of the capacities should increase as you enable more tiers. As an example, if you configure your GPU device to have 100GB of memory dedicated for KV cache storage, then configure
+`DYN_KVBM_CPU_CACHE_GB >= 100`. The same goes for configuring the disk cache; `DYN_KVBM_DISK_CACHE_GB >= DYN_KVBM_CPU_CACHE_GB`. If the cpu cache is configured to be less than the device cache, then _there will be no benefit from KVBM_. In many cases you will see performance degradation as KVBM will churn by offloading blocks from the GPU to CPU after every forward pass. To know what your minimum value for `DYN_KVBM_CPU_CACHE_GB` should be for your setup, consult your llm engine's kv cache configuration.
+
 ### SSD Lifespan Protection

 When disk offloading is enabled, disk offload filtering is enabled by default to extend SSD lifespan. The current policy only offloads KV blocks from CPU to disk if the blocks have frequency ≥ 2. Frequency doubles on cache hit (initialized at 1) and decrements by 1 on each time decay step.