Unverified Commit 330f001d authored by dagil-nvidia's avatar dagil-nvidia Committed by GitHub
Browse files

docs: update KVBM diagram from PNG to SVG (#7277)


Signed-off-by: default avatarakshatha-k <akshutk@gmail.com>
Signed-off-by: default avatarDan Gil <dagil@nvidia.com>
Co-authored-by: default avatarakshatha-k <akshutk@gmail.com>
parent 3f565053
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
This source diff could not be displayed because it is too large. You can view the blob instead.
......@@ -40,7 +40,7 @@ Offloading KV cache to CPU or storage is most effective when KV Cache exceeds GP
## Architecture
![KVBM Architecture](../../assets/img/kvbm-architecture.png)
![KVBM Architecture](../../assets/img/kvbm-components.svg)
*High-level layered architecture view of Dynamo KV Block Manager and how it interfaces with different components of the LLM inference ecosystem*
KVBM has three primary logical layers:
......@@ -60,4 +60,4 @@ KVBM has three primary logical layers:
- **[LMCache Integration](../../integrations/lmcache-integration.md)** — Use LMCache with Dynamo vLLM backend
- **[FlexKV Integration](../../integrations/flexkv-integration.md)** — Use FlexKV for KV cache management
- **[SGLang HiCache](../../integrations/sglang-hicache.md)** — Enable SGLang's hierarchical cache with NIXL
- **[NIXL Documentation](https://github.com/ai-dynamo/nixl/blob/main/docs/nixl.md)** — NIXL communication library details
- **[NIXL Documentation](https://github.com/ai-dynamo/nixl/blob/main/docs/nixl.md)** — NIXL communication library details
\ No newline at end of file
......@@ -8,7 +8,7 @@ This document provides an in-depth look at the architecture, components, framewo
## KVBM Components
![Internal Components of Dynamo KVBM](../assets/img/kvbm-components.png)
![Internal Components of Dynamo KVBM](../assets/img/kvbm-components.svg)
*Internal Components of Dynamo KVBM*
......
......@@ -19,7 +19,7 @@ limitations under the License.
The Dynamo KVBM is a distributed KV-cache block management system designed for scalable LLM inference. It cleanly separates memory management from inference runtimes (vLLM, TensorRT-LLM, and SGLang), enabling GPU↔CPU↔Disk/Remote tiering, asynchronous block offload/onboard, and efficient block reuse.
![A block diagram showing a layered architecture view of Dynamo KV Block manager.](../../../docs/assets/img/kvbm-architecture.png)
![A block diagram showing a layered architecture view of Dynamo KV Block manager.](../../../docs/assets/img/kvbm-components.svg)
## Feature Highlights
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment