"...ssh:/git@developer.sourcefind.cn:2222/OpenDAS/dynamo.git" did not exist on "03360b84756931e13113656711817094fb97799e"
Unverified Commit 595ff0fa authored by Ziqi Fan's avatar Ziqi Fan Committed by GitHub
Browse files

docs: improve KVBM guide by adding NGC and --platform (#8313)


Signed-off-by: default avatarZiqi Fan <ziqif@nvidia.com>
parent 5b03a597
......@@ -9,8 +9,6 @@ The Dynamo KV Block Manager (KVBM) is a scalable runtime component designed to h
KVBM is modular and can be used standalone via `pip install kvbm` or as the memory management component in the full Dynamo stack. This guide covers installation, configuration, and deployment of the Dynamo KV Block Manager (KVBM) and other KV cache management systems.
## Quick Start
## Run KVBM Standalone
KVBM can be used independently without using the rest of the Dynamo stack:
......@@ -32,8 +30,23 @@ To build KVBM from source, see the detailed instructions in the [KVBM bindings R
```bash
# Start up etcd for KVBM leader/worker registration and discovery
docker compose -f deploy/docker-compose.yml up -d
```
Pick one of the following to get a Dynamo vLLM container with KVBM built in. The subsequent serving commands are the same either way.
**Option A: Pre-built NGC container (recommended for quick start)**
```bash
docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0
```
See the [Local Installation Guide](../../getting-started/local-installation.md) for full setup instructions and [Release Artifacts](../../reference/release-artifacts.md#container-images) for available versions.
**Option B: Build from source**
```bash
# Build a dynamo vLLM container (KVBM is built in by default)
# NOTE: render.py defaults to --platform linux/amd64. On ARM64 hosts, pass --platform linux/arm64.
python container/render.py --framework vllm --target runtime --output-short-filename
docker build -t dynamo:latest-vllm-runtime -f container/rendered.Dockerfile .
......@@ -83,8 +96,23 @@ vllm serve --kv-transfer-config '{"kv_connector":"DynamoConnector","kv_role":"kv
```bash
# Start up etcd for KVBM leader/worker registration and discovery
docker compose -f deploy/docker-compose.yml up -d
```
Pick one of the following to get a Dynamo TensorRT-LLM container with KVBM built in. The subsequent serving commands are the same either way.
**Option A: Pre-built NGC container (recommended for quick start)**
```bash
docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.0.0
```
See the [Local Installation Guide](../../getting-started/local-installation.md) for full setup instructions and [Release Artifacts](../../reference/release-artifacts.md#container-images) for available versions.
**Option B: Build from source**
```bash
# Build a dynamo TRTLLM container (KVBM is built in by default)
# NOTE: render.py defaults to --platform linux/amd64. On ARM64 hosts, pass --platform linux/arm64.
python container/render.py --framework trtllm --target runtime --output-short-filename
docker build -t dynamo:latest-trtllm-runtime -f container/rendered.Dockerfile .
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment