docs: update KVBM diagram from PNG to SVG (#7277)

Signed-off-by: akshatha-k <akshutk@gmail.com> Signed-off-by: Dan Gil <dagil@nvidia.com> Co-authored-by: akshatha-k <akshutk@gmail.com>

docs: update KVBM diagram from PNG to SVG (#7277)
Signed-off-by: akshatha-k <akshutk@gmail.com> Signed-off-by: Dan Gil <dagil@nvidia.com> Co-authored-by: akshatha-k <akshutk@gmail.com>
330f001d · dagil-nvidia · GitHub · 3f565053 · 3f565053 · 3f565053
Unverified Commit 330f001d authored Mar 12, 2026 by dagil-nvidia Committed by GitHub Mar 12, 2026
6 changed files
--- a/docs/assets/img/kvbm-architecture.png
+++ b/docs/assets/img/kvbm-architecture.png
--- a/docs/assets/img/kvbm-components.png
+++ b/docs/assets/img/kvbm-components.png
--- a/docs/assets/img/kvbm-components.svg
+++ b/docs/assets/img/kvbm-components.svg
--- a/docs/components/kvbm/README.md
+++ b/docs/components/kvbm/README.md
@@ -40,7 +40,7 @@ Offloading KV cache to CPU or storage is most effective when KV Cache exceeds GP

 ## Architecture

-![KVBM Architecture](../../assets/img/kvbm-architecture.png)
+![KVBM Architecture](../../assets/img/kvbm-components.svg)
 *High-level layered architecture view of Dynamo KV Block Manager and how it interfaces with different components of the LLM inference ecosystem*

 KVBM has three primary logical layers:
@@ -60,4 +60,4 @@ KVBM has three primary logical layers:
 - **[LMCache Integration](../../integrations/lmcache-integration.md)** — Use LMCache with Dynamo vLLM backend
 - **[FlexKV Integration](../../integrations/flexkv-integration.md)** — Use FlexKV for KV cache management
 - **[SGLang HiCache](../../integrations/sglang-hicache.md)** — Enable SGLang's hierarchical cache with NIXL
- **[NIXL Documentation](https://github.com/ai-dynamo/nixl/blob/main/docs/nixl.md)** — NIXL communication library details
+- **[NIXL Documentation](https://github.com/ai-dynamo/nixl/blob/main/docs/nixl.md)** — NIXL communication library details
\ No newline at end of file
--- a/docs/design-docs/kvbm-design.md
+++ b/docs/design-docs/kvbm-design.md
@@ -8,7 +8,7 @@ This document provides an in-depth look at the architecture, components, framewo

 ## KVBM Components

-![Internal Components of Dynamo KVBM](../assets/img/kvbm-components.png)
+![Internal Components of Dynamo KVBM](../assets/img/kvbm-components.svg)


 *Internal Components of Dynamo KVBM*

--- a/lib/bindings/kvbm/README.md
+++ b/lib/bindings/kvbm/README.md
@@ -19,7 +19,7 @@ limitations under the License.

 The Dynamo KVBM is a distributed KV-cache block management system designed for scalable LLM inference. It cleanly separates memory management from inference runtimes (vLLM, TensorRT-LLM, and SGLang), enabling GPU↔CPU↔Disk/Remote tiering, asynchronous block offload/onboard, and efficient block reuse.

-![A block diagram showing a layered architecture view of Dynamo KV Block manager.](../../../docs/assets/img/kvbm-architecture.png)
+![A block diagram showing a layered architecture view of Dynamo KV Block manager.](../../../docs/assets/img/kvbm-components.svg)


 ## Feature Highlights