Unverified Commit 595ff0fa authored by Ziqi Fan's avatar Ziqi Fan Committed by GitHub
Browse files

docs: improve KVBM guide by adding NGC and --platform (#8313)


Signed-off-by: default avatarZiqi Fan <ziqif@nvidia.com>
parent 5b03a597
...@@ -9,8 +9,6 @@ The Dynamo KV Block Manager (KVBM) is a scalable runtime component designed to h ...@@ -9,8 +9,6 @@ The Dynamo KV Block Manager (KVBM) is a scalable runtime component designed to h
KVBM is modular and can be used standalone via `pip install kvbm` or as the memory management component in the full Dynamo stack. This guide covers installation, configuration, and deployment of the Dynamo KV Block Manager (KVBM) and other KV cache management systems. KVBM is modular and can be used standalone via `pip install kvbm` or as the memory management component in the full Dynamo stack. This guide covers installation, configuration, and deployment of the Dynamo KV Block Manager (KVBM) and other KV cache management systems.
## Quick Start
## Run KVBM Standalone ## Run KVBM Standalone
KVBM can be used independently without using the rest of the Dynamo stack: KVBM can be used independently without using the rest of the Dynamo stack:
...@@ -32,8 +30,23 @@ To build KVBM from source, see the detailed instructions in the [KVBM bindings R ...@@ -32,8 +30,23 @@ To build KVBM from source, see the detailed instructions in the [KVBM bindings R
```bash ```bash
# Start up etcd for KVBM leader/worker registration and discovery # Start up etcd for KVBM leader/worker registration and discovery
docker compose -f deploy/docker-compose.yml up -d docker compose -f deploy/docker-compose.yml up -d
```
Pick one of the following to get a Dynamo vLLM container with KVBM built in. The subsequent serving commands are the same either way.
**Option A: Pre-built NGC container (recommended for quick start)**
```bash
docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0
```
See the [Local Installation Guide](../../getting-started/local-installation.md) for full setup instructions and [Release Artifacts](../../reference/release-artifacts.md#container-images) for available versions.
**Option B: Build from source**
```bash
# Build a dynamo vLLM container (KVBM is built in by default) # Build a dynamo vLLM container (KVBM is built in by default)
# NOTE: render.py defaults to --platform linux/amd64. On ARM64 hosts, pass --platform linux/arm64.
python container/render.py --framework vllm --target runtime --output-short-filename python container/render.py --framework vllm --target runtime --output-short-filename
docker build -t dynamo:latest-vllm-runtime -f container/rendered.Dockerfile . docker build -t dynamo:latest-vllm-runtime -f container/rendered.Dockerfile .
...@@ -83,8 +96,23 @@ vllm serve --kv-transfer-config '{"kv_connector":"DynamoConnector","kv_role":"kv ...@@ -83,8 +96,23 @@ vllm serve --kv-transfer-config '{"kv_connector":"DynamoConnector","kv_role":"kv
```bash ```bash
# Start up etcd for KVBM leader/worker registration and discovery # Start up etcd for KVBM leader/worker registration and discovery
docker compose -f deploy/docker-compose.yml up -d docker compose -f deploy/docker-compose.yml up -d
```
Pick one of the following to get a Dynamo TensorRT-LLM container with KVBM built in. The subsequent serving commands are the same either way.
**Option A: Pre-built NGC container (recommended for quick start)**
```bash
docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.0.0
```
See the [Local Installation Guide](../../getting-started/local-installation.md) for full setup instructions and [Release Artifacts](../../reference/release-artifacts.md#container-images) for available versions.
**Option B: Build from source**
```bash
# Build a dynamo TRTLLM container (KVBM is built in by default) # Build a dynamo TRTLLM container (KVBM is built in by default)
# NOTE: render.py defaults to --platform linux/amd64. On ARM64 hosts, pass --platform linux/arm64.
python container/render.py --framework trtllm --target runtime --output-short-filename python container/render.py --framework trtllm --target runtime --output-short-filename
docker build -t dynamo:latest-trtllm-runtime -f container/rendered.Dockerfile . docker build -t dynamo:latest-trtllm-runtime -f container/rendered.Dockerfile .
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment