docs: improve KVBM guide by adding NGC and --platform (#8313)

Signed-off-by: Ziqi Fan <ziqif@nvidia.com>

docs: improve KVBM guide by adding NGC and --platform (#8313)
Signed-off-by: Ziqi Fan <ziqif@nvidia.com>
595ff0fa · Ziqi Fan · GitHub · 5b03a597 · 595ff0fa
Unverified Commit 595ff0fa authored Apr 17, 2026 by Ziqi Fan Committed by GitHub Apr 17, 2026
Show whitespace changes
Inline Side-by-side

Showing with 30 additions and 2 deletions

docs/components/kvbm/kvbm-guide.md docs/components/kvbm/kvbm-guide.md +30 -2

No files found.
--- a/docs/components/kvbm/kvbm-guide.md
+++ b/docs/components/kvbm/kvbm-guide.md
@@ -9,8 +9,6 @@ The Dynamo KV Block Manager (KVBM) is a scalable runtime component designed to h
 KVBM is modular and can be used standalone via `pip install kvbm` or as the memory management component in the full Dynamo stack. This guide covers installation, configuration, and deployment of the Dynamo KV Block Manager (KVBM) and other KV cache management systems.
-## Quick Start
 ## Run KVBM Standalone
 KVBM can be used independently without using the rest of the Dynamo stack:
@@ -32,8 +30,23 @@ To build KVBM from source, see the detailed instructions in the [KVBM bindings R
 ```bash
 # Start up etcd for KVBM leader/worker registration and discovery
 docker compose -f deploy/docker-compose.yml up -d
+```
+Pick one of the following to get a Dynamo vLLM container with KVBM built in. The subsequent serving commands are the same either way.
+**Option A: Pre-built NGC container (recommended for quick start)**
+```bash
+docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0
+```
+See the [Local Installation Guide](../../getting-started/local-installation.md) for full setup instructions and [Release Artifacts](../../reference/release-artifacts.md#container-images) for available versions.
+**Option B: Build from source**
+```bash
 # Build a dynamo vLLM container (KVBM is built in by default)
+# NOTE: render.py defaults to --platform linux/amd64. On ARM64 hosts, pass --platform linux/arm64.
 python container/render.py --framework vllm --target runtime --output-short-filename
 docker build -t dynamo:latest-vllm-runtime -f container/rendered.Dockerfile .
@@ -83,8 +96,23 @@ vllm serve --kv-transfer-config '{"kv_connector":"DynamoConnector","kv_role":"kv
 ```bash
 # Start up etcd for KVBM leader/worker registration and discovery
 docker compose -f deploy/docker-compose.yml up -d
+```
+Pick one of the following to get a Dynamo TensorRT-LLM container with KVBM built in. The subsequent serving commands are the same either way.
+**Option A: Pre-built NGC container (recommended for quick start)**
+```bash
+docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.0.0
+```
+See the [Local Installation Guide](../../getting-started/local-installation.md) for full setup instructions and [Release Artifacts](../../reference/release-artifacts.md#container-images) for available versions.
+**Option B: Build from source**
+```bash
 # Build a dynamo TRTLLM container (KVBM is built in by default)
+# NOTE: render.py defaults to --platform linux/amd64. On ARM64 hosts, pass --platform linux/arm64.
 python container/render.py --framework trtllm --target runtime --output-short-filename
 docker build -t dynamo:latest-trtllm-runtime -f container/rendered.Dockerfile .