Unverified Commit 5c2415d8 authored by Kyle McGill's avatar Kyle McGill Committed by GitHub
Browse files

docs: update docs to remove KVBM cuda graph limitation (#4902)

parent 31f31e8e
...@@ -23,7 +23,6 @@ To learn what KVBM is, please check [here](kvbm_architecture.md) ...@@ -23,7 +23,6 @@ To learn what KVBM is, please check [here](kvbm_architecture.md)
> [!Note] > [!Note]
> - Ensure that `etcd` and `nats` are running before starting. > - Ensure that `etcd` and `nats` are running before starting.
> - KVBM does not currently support CUDA graphs in TensorRT-LLM.
> - KVBM only supports TensorRT-LLM’s PyTorch backend. > - KVBM only supports TensorRT-LLM’s PyTorch backend.
> - Disable partial reuse `enable_partial_reuse: false` in the LLM API config’s `kv_connector_config` to increase offloading cache hits. > - Disable partial reuse `enable_partial_reuse: false` in the LLM API config’s `kv_connector_config` to increase offloading cache hits.
> - KVBM requires TensorRT-LLM v1.1.0rc5 or newer. > - KVBM requires TensorRT-LLM v1.1.0rc5 or newer.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment