Unverified Commit 46dad85b authored by dagil-nvidia's avatar dagil-nvidia Committed by GitHub
Browse files

docs: update KVBM diagram and bump container image tags to 1.0.0 (#7365)


Signed-off-by: default avatarDan Gil <dagil@nvidia.com>
parent 6b62df65
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
This source diff could not be displayed because it is too large. You can view the blob instead.
......@@ -44,7 +44,7 @@ docker compose -f deploy/docker-compose.yml up -d
**Step 2 (host terminal):** Pull and run the prebuilt container:
```bash
DYNAMO_VERSION=0.9.0
DYNAMO_VERSION=1.0.0
docker pull nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:$DYNAMO_VERSION
docker run --gpus all -it --network host --ipc host \
nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:$DYNAMO_VERSION
......
......@@ -80,7 +80,7 @@ following environment variables based:
```bash
# NOTE: IMAGE must be set manually for now
# Use the prebuilt container from NGC (see ../README.md#quick-start):
# export IMAGE="nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.9.0"
# export IMAGE="nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.0.0"
# Or build a custom one (see ../trtllm-building-custom-container.md)
# Or you can also download the image to shared storage and point
# IMAGE to the local path.
......
......@@ -121,7 +121,7 @@ spec:
replicas: 1
extraPodSpec:
mainContainer:
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.9.0
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0
env:
- name: POD_UID
valueFrom:
......@@ -146,7 +146,7 @@ spec:
values:
- gpu-h100-sxm # Adjust to your GPU node type
mainContainer:
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.9.0
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0
workingDir: /workspace
command:
- /bin/sh
......@@ -212,7 +212,7 @@ spec:
replicas: 1
extraPodSpec:
mainContainer:
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.9.0
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0
env:
- name: POD_UID
valueFrom:
......@@ -240,7 +240,7 @@ spec:
values:
- gpu-h100-sxm # Adjust to your GPU node type
mainContainer:
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.9.0
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0
workingDir: /workspace
command:
- /bin/sh
......@@ -438,7 +438,7 @@ spec:
restartPolicy: Never
containers:
- name: benchmark
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.9.0
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0
securityContext:
runAsUser: 0 # Required: apt-get and pip install need root in ephemeral benchmark pod
command:
......
......@@ -37,7 +37,7 @@ metadata:
spec:
model: "Qwen/Qwen3-0.6B"
backend: vllm
image: "nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.9.0"
image: "nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0"
workload:
isl: 3000 # Average input sequence length
......
......@@ -200,7 +200,7 @@ Each DGDR requires a container image for profiling and deployment:
```yaml
spec:
image: "nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.9.0"
image: "nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0"
```
#### Quick Start: Deploy with DGDR
......@@ -371,7 +371,7 @@ metadata:
spec:
model: "Qwen/Qwen3-0.6B"
backend: vllm
image: "nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.9.0"
image: "nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0"
searchStrategy: rapid # or thorough
autoApply: true
......
......@@ -130,7 +130,7 @@ spec:
value: "16"
extraPodSpec:
mainContainer:
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0
```
### Alternative: Using Command Args in K8s
......@@ -140,7 +140,7 @@ You can also pass CLI arguments directly in the container command:
```yaml
extraPodSpec:
mainContainer:
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0
command:
- /bin/sh
- -c
......
......@@ -75,7 +75,7 @@ aiconfigurator cli default \
--tpot 25 \
--backend vllm \
--backend-version 0.12.0 \
--generator-dynamo-version 0.8.0 \
--generator-dynamo-version 1.0.0 \
--generator-set K8sConfig.k8s_namespace=$YOUR_NAMESPACE \
--generator-set K8sConfig.k8s_pvc_name=$YOUR_PVC \
--save-dir ./results_vllm
......@@ -272,7 +272,7 @@ spec:
value: /opt/models
extraPodSpec:
mainContainer:
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.8.0
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0
imagePullPolicy: IfNotPresent
VLLMWorker:
......@@ -292,7 +292,7 @@ spec:
value: /opt/models
extraPodSpec:
mainContainer:
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.8.0
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0
workingDir: /workspace
imagePullPolicy: IfNotPresent
command:
......@@ -506,7 +506,7 @@ spec:
value: /opt/models
extraPodSpec:
mainContainer:
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.8.0
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0
imagePullPolicy: IfNotPresent
VLLMPrefillWorker:
......@@ -533,7 +533,7 @@ spec:
value: "0"
extraPodSpec:
mainContainer:
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.8.0
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0
workingDir: /workspace
imagePullPolicy: IfNotPresent
securityContext:
......@@ -581,7 +581,7 @@ spec:
value: "0"
extraPodSpec:
mainContainer:
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.8.0
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0
workingDir: /workspace
imagePullPolicy: IfNotPresent
securityContext:
......
......@@ -20,13 +20,13 @@ Containers have all dependencies pre-installed. No setup required.
```bash
# SGLang
docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.8.1
docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/sglang-runtime:1.0.0
# TensorRT-LLM
docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.8.1
docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.0.0
# vLLM
docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.8.1
docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0
```
<Tip>
......
......@@ -176,7 +176,7 @@ _Appears in:_
| `namespace` _string_ | Namespace is the desired namespace for the created DynamoGraphDeployment.<br />If not specified, defaults to the DGDR namespace. | | Optional: \{\} <br /> |
| `labels` _object (keys:string, values:string)_ | Labels are additional labels to add to the DynamoGraphDeployment metadata.<br />These are merged with auto-generated labels from the profiling process. | | Optional: \{\} <br /> |
| `annotations` _object (keys:string, values:string)_ | Annotations are additional annotations to add to the DynamoGraphDeployment metadata. | | Optional: \{\} <br /> |
| `workersImage` _string_ | WorkersImage specifies the container image to use for DynamoGraphDeployment worker components.<br />This image is used for both temporary DGDs created during online profiling and the final DGD.<br />If omitted, the image from the base config file (e.g., disagg.yaml) is used.<br />Example: "nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.9.0" | | Optional: \{\} <br /> |
| `workersImage` _string_ | WorkersImage specifies the container image to use for DynamoGraphDeployment worker components.<br />This image is used for both temporary DGDs created during online profiling and the final DGD.<br />If omitted, the image from the base config file (e.g., disagg.yaml) is used.<br />Example: "nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0" | | Optional: \{\} <br /> |
#### DeploymentStatus
......@@ -945,7 +945,7 @@ _Appears in:_
| --- | --- | --- | --- |
| `config` _[JSON](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#json-v1-apiextensions-k8s-io)_ | Config is the profiling configuration as arbitrary JSON/YAML. This will be passed directly to the profiler.<br />The profiler will validate the configuration and report any errors. | | Optional: \{\} <br />Type: object <br /> |
| `configMapRef` _[ConfigMapKeySelector](#configmapkeyselector)_ | ConfigMapRef is an optional reference to a ConfigMap containing the DynamoGraphDeployment<br />base config file (disagg.yaml). This is separate from the profiling config above.<br />The path to this config will be set as engine.config in the profiling config. | | Optional: \{\} <br /> |
| `profilerImage` _string_ | ProfilerImage specifies the container image to use for profiling jobs.<br />This image contains the profiler code and dependencies needed for SLA-based profiling.<br />Example: "nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.9.0" | | Required: \{\} <br /> |
| `profilerImage` _string_ | ProfilerImage specifies the container image to use for profiling jobs.<br />This image contains the profiler code and dependencies needed for SLA-based profiling.<br />Example: "nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0" | | Required: \{\} <br /> |
| `outputPVC` _string_ | OutputPVC is an optional PersistentVolumeClaim name for storing profiling output.<br />If specified, all profiling artifacts (logs, plots, configs, raw data) will be written<br />to this PVC instead of an ephemeral emptyDir volume. This allows users to access<br />complete profiling results after the job completes by mounting the PVC.<br />The PVC must exist in the same namespace as the DGDR.<br />If not specified, profiling uses emptyDir and only essential data is saved to ConfigMaps.<br />Note: ConfigMaps are still created regardless of this setting for planner integration. | | Optional: \{\} <br /> |
| `resources` _[ResourceRequirements](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#resourcerequirements-v1-core)_ | Resources specifies the compute resource requirements for the profiling job container.<br />If not specified, no resource requests or limits are set. | | Optional: \{\} <br /> |
| `tolerations` _[Toleration](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#toleration-v1-core) array_ | Tolerations allows the profiling job to be scheduled on nodes with matching taints.<br />For example, to schedule on GPU nodes, add a toleration for the nvidia.com/gpu taint. | | Optional: \{\} <br /> |
......
......@@ -291,7 +291,7 @@ checkpoint:
identity:
model: "meta-llama/Llama-3-8B"
backendFramework: "vllm"
dynamoVersion: "0.9.0"
dynamoVersion: "1.0.0"
tensorParallelSize: 1
pipelineParallelSize: 1
dtype: "bfloat16"
......
......@@ -6,7 +6,7 @@ title: Feature Matrix
This document provides a comprehensive compatibility matrix for key Dynamo features across the supported backends.
*Updated for Dynamo v0.9.0*
*Updated for Dynamo v1.0.0*
**Legend:**
* ✅ : Supported
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment