@@ -9,8 +9,6 @@ The Dynamo KV Block Manager (KVBM) is a scalable runtime component designed to h
...
@@ -9,8 +9,6 @@ The Dynamo KV Block Manager (KVBM) is a scalable runtime component designed to h
KVBM is modular and can be used standalone via `pip install kvbm` or as the memory management component in the full Dynamo stack. This guide covers installation, configuration, and deployment of the Dynamo KV Block Manager (KVBM) and other KV cache management systems.
KVBM is modular and can be used standalone via `pip install kvbm` or as the memory management component in the full Dynamo stack. This guide covers installation, configuration, and deployment of the Dynamo KV Block Manager (KVBM) and other KV cache management systems.
## Quick Start
## Run KVBM Standalone
## Run KVBM Standalone
KVBM can be used independently without using the rest of the Dynamo stack:
KVBM can be used independently without using the rest of the Dynamo stack:
...
@@ -32,8 +30,23 @@ To build KVBM from source, see the detailed instructions in the [KVBM bindings R
...
@@ -32,8 +30,23 @@ To build KVBM from source, see the detailed instructions in the [KVBM bindings R
```bash
```bash
# Start up etcd for KVBM leader/worker registration and discovery
# Start up etcd for KVBM leader/worker registration and discovery
docker compose -f deploy/docker-compose.yml up -d
docker compose -f deploy/docker-compose.yml up -d
```
Pick one of the following to get a Dynamo vLLM container with KVBM built in. The subsequent serving commands are the same either way.
**Option A: Pre-built NGC container (recommended for quick start)**
```bash
docker run --gpus all --network host --rm-it nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0
```
See the [Local Installation Guide](../../getting-started/local-installation.md) for full setup instructions and [Release Artifacts](../../reference/release-artifacts.md#container-images) for available versions.
**Option B: Build from source**
```bash
# Build a dynamo vLLM container (KVBM is built in by default)
# Build a dynamo vLLM container (KVBM is built in by default)
# NOTE: render.py defaults to --platform linux/amd64. On ARM64 hosts, pass --platform linux/arm64.
# Start up etcd for KVBM leader/worker registration and discovery
# Start up etcd for KVBM leader/worker registration and discovery
docker compose -f deploy/docker-compose.yml up -d
docker compose -f deploy/docker-compose.yml up -d
```
Pick one of the following to get a Dynamo TensorRT-LLM container with KVBM built in. The subsequent serving commands are the same either way.
**Option A: Pre-built NGC container (recommended for quick start)**
```bash
docker run --gpus all --network host --rm-it nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.0.0
```
See the [Local Installation Guide](../../getting-started/local-installation.md) for full setup instructions and [Release Artifacts](../../reference/release-artifacts.md#container-images) for available versions.
**Option B: Build from source**
```bash
# Build a dynamo TRTLLM container (KVBM is built in by default)
# Build a dynamo TRTLLM container (KVBM is built in by default)
# NOTE: render.py defaults to --platform linux/amd64. On ARM64 hosts, pass --platform linux/arm64.