docs: update docs to remove KVBM cuda graph limitation (#4902)

5c2415d8 · Kyle McGill · GitHub · 31f31e8e · 5c2415d8
Unverified Commit 5c2415d8 authored Dec 11, 2025 by Kyle McGill Committed by GitHub Dec 11, 2025
Show whitespace changes
Inline Side-by-side

Showing with 0 additions and 1 deletion

docs/kvbm/trtllm-setup.md docs/kvbm/trtllm-setup.md +0 -1

No files found.
--- a/docs/kvbm/trtllm-setup.md
+++ b/docs/kvbm/trtllm-setup.md
@@ -23,7 +23,6 @@ To learn what KVBM is, please check [here](kvbm_architecture.md)
 > [!Note]
 > - Ensure that `etcd` and `nats` are running before starting.
-> - KVBM does not currently support CUDA graphs in TensorRT-LLM.
 > - KVBM only supports TensorRT-LLM’s PyTorch backend.
 > - Disable partial reuse `enable_partial_reuse: false` in the LLM API config’s `kv_connector_config` to increase offloading cache hits.
 > - KVBM requires TensorRT-LLM v1.1.0rc5 or newer.