Unverified Commit 330f001d authored by dagil-nvidia's avatar dagil-nvidia Committed by GitHub
Browse files

docs: update KVBM diagram from PNG to SVG (#7277)


Signed-off-by: default avatarakshatha-k <akshutk@gmail.com>
Signed-off-by: default avatarDan Gil <dagil@nvidia.com>
Co-authored-by: default avatarakshatha-k <akshutk@gmail.com>
parent 3f565053
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
This source diff could not be displayed because it is too large. You can view the blob instead.
...@@ -40,7 +40,7 @@ Offloading KV cache to CPU or storage is most effective when KV Cache exceeds GP ...@@ -40,7 +40,7 @@ Offloading KV cache to CPU or storage is most effective when KV Cache exceeds GP
## Architecture ## Architecture
![KVBM Architecture](../../assets/img/kvbm-architecture.png) ![KVBM Architecture](../../assets/img/kvbm-components.svg)
*High-level layered architecture view of Dynamo KV Block Manager and how it interfaces with different components of the LLM inference ecosystem* *High-level layered architecture view of Dynamo KV Block Manager and how it interfaces with different components of the LLM inference ecosystem*
KVBM has three primary logical layers: KVBM has three primary logical layers:
......
...@@ -8,7 +8,7 @@ This document provides an in-depth look at the architecture, components, framewo ...@@ -8,7 +8,7 @@ This document provides an in-depth look at the architecture, components, framewo
## KVBM Components ## KVBM Components
![Internal Components of Dynamo KVBM](../assets/img/kvbm-components.png) ![Internal Components of Dynamo KVBM](../assets/img/kvbm-components.svg)
*Internal Components of Dynamo KVBM* *Internal Components of Dynamo KVBM*
......
...@@ -19,7 +19,7 @@ limitations under the License. ...@@ -19,7 +19,7 @@ limitations under the License.
The Dynamo KVBM is a distributed KV-cache block management system designed for scalable LLM inference. It cleanly separates memory management from inference runtimes (vLLM, TensorRT-LLM, and SGLang), enabling GPU↔CPU↔Disk/Remote tiering, asynchronous block offload/onboard, and efficient block reuse. The Dynamo KVBM is a distributed KV-cache block management system designed for scalable LLM inference. It cleanly separates memory management from inference runtimes (vLLM, TensorRT-LLM, and SGLang), enabling GPU↔CPU↔Disk/Remote tiering, asynchronous block offload/onboard, and efficient block reuse.
![A block diagram showing a layered architecture view of Dynamo KV Block manager.](../../../docs/assets/img/kvbm-architecture.png) ![A block diagram showing a layered architecture view of Dynamo KV Block manager.](../../../docs/assets/img/kvbm-components.svg)
## Feature Highlights ## Feature Highlights
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment