"vscode:/vscode.git/clone" did not exist on "2b6d1338ca6b105b9842bebd9869376568c42c84"
Unverified Commit 7ca6a562 authored by Jonathan Tong's avatar Jonathan Tong Committed by GitHub
Browse files

docs: update Fern docs for main branch (#5706)


Signed-off-by: default avatarJont828 <jt572@cornell.edu>
parent 704c1dad
---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
title: "Autoscaling"
---
# Autoscaling
This guide explains how to configure autoscaling for DynamoGraphDeployment (DGD) services using the `sglang-agg` example from `examples/backends/sglang/deploy/agg.yaml`.
## Example DGD
......@@ -50,9 +51,8 @@ Dynamo provides flexible autoscaling through the `DynamoGraphDeploymentScalingAd
| **Dynamo Planner** | LLM-aware autoscaling with SLA optimization | Production LLM workloads |
| **Custom Controllers** | Any scale-subresource-compatible controller | Custom requirements |
<Warning>
**Deprecation Notice:** The `spec.services[X].autoscaling` field in DGD is **deprecated and ignored**. Use DGDSA with HPA, KEDA, or Planner instead. If you have existing DGDs with `autoscaling` configured, you'll see a warning. Remove the field to silence the warning.
</Warning>
> [!WARNING]
> **Deprecation Notice:** The `spec.services[X].autoscaling` field in DGD is **deprecated and ignored**. Use DGDSA with HPA, KEDA, or Planner instead. If you have existing DGDs with `autoscaling` configured, you'll see a warning. Remove the field to silence the warning.
## Architecture
......
---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
title: "Creating Kubernetes Deployments"
---
# Creating Kubernetes Deployments
The scripts in the `examples/<backend>/launch` folder like [agg.sh](https://github.com/ai-dynamo/dynamo/tree/main/examples/backends/vllm/launch/agg.sh) demonstrate how you can serve your models locally.
The corresponding YAML files like [agg.yaml](https://github.com/ai-dynamo/dynamo/tree/main/examples/backends/vllm/deploy/agg.yaml) show you how you could create a Kubernetes deployment for your inference graph.
......
---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
title: "Managing Models with DynamoModel"
---
# Managing Models with DynamoModel
## Overview
`DynamoModel` is a Kubernetes Custom Resource that represents a machine learning model deployed on Dynamo. It enables you to:
......
---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
title: "Minikube Setup Guide"
---
# Minikube Setup Guide
Don't have a Kubernetes cluster? No problem! You can set up a local development environment using Minikube. This guide walks through the set up of everything you need to run Dynamo Kubernetes Platform locally.
## 1. Install Minikube
......@@ -12,9 +13,8 @@ First things first! Start by installing Minikube. Follow the official [Minikube
## 2. Configure GPU Support (Optional)
Planning to use GPU-accelerated workloads? You'll need to configure GPU support in Minikube. Follow the [Minikube GPU guide](https://minikube.sigs.k8s.io/docs/tutorials/nvidia/) to set up NVIDIA GPU support before proceeding.
<Tip>
Make sure to configure GPU support before starting Minikube if you plan to use GPU workloads!
</Tip>
> [!TIP]
> Make sure to configure GPU support before starting Minikube if you plan to use GPU workloads!
## 3. Start Minikube
......
---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
title: "Multinode Deployment Guide"
---
# Multinode Deployment Guide
This guide explains how to deploy Dynamo workloads across multiple nodes. Multinode deployments enable you to scale compute-intensive LLM workloads across multiple physical machines, maximizing GPU utilization and supporting larger models.
## Overview
......
---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
title: "Working with Dynamo Kubernetes Operator"
---
# Working with Dynamo Kubernetes Operator
## Overview
Dynamo operator is a Kubernetes operator that simplifies the deployment, configuration, and lifecycle management of DynamoGraphs. It automates the reconciliation of custom resources to ensure your desired state is always achieved. This operator is ideal for users who want to manage complex deployments using declarative YAML definitions and Kubernetes-native tooling.
......@@ -132,7 +133,33 @@ The Dynamo Operator uses **Kubernetes admission webhooks** for real-time validat
For complete documentation on webhooks, certificate management, and troubleshooting, see:
**📖 [Webhooks Guide](webhooks.md)**
**[Webhooks Guide](webhooks.md)**
## Observability
The Dynamo Operator provides comprehensive observability through Prometheus metrics and Grafana dashboards. This allows you to monitor:
- **Controller Performance**: Reconciliation loop duration, success rates, and error rates by resource type
- **Webhook Activity**: Validation performance, admission rates, and denial patterns
- **Resource Inventory**: Current count of managed resources by state and namespace
- **Operational Health**: Success rates and health indicators for controllers and webhooks
### Metrics Collection
Metrics are automatically exposed on the operator's `/metrics` endpoint (port 8443 by default) and collected by Prometheus via a ServiceMonitor. The ServiceMonitor is automatically created when you install the operator via Helm (controlled by `metricsService.enabled`, which defaults to `true`).
### Grafana Dashboard
A pre-built Grafana dashboard is available for visualizing operator metrics. The dashboard includes:
- **Reconciliation Metrics**: Rate, duration (P95), and errors by resource type
- **Webhook Metrics**: Request rate, duration (P95), and denials by resource type and operation
- **Resource Inventory**: Count of DynamoGraphDeployments by state and namespace
- **Operational Health**: Success rate gauges for controllers and webhooks
For complete setup instructions and metrics reference, see:
**[Operator Metrics Guide](observability/operator-metrics.md)**
## Installation
......
---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
title: "GitOps Deployment with FluxCD"
---
# GitOps Deployment with FluxCD
This section describes how to use FluxCD for GitOps-based deployment of Dynamo inference graphs. GitOps enables you to manage your Dynamo deployments declaratively using Git as the source of truth. We'll use the [aggregated vLLM example](../backends/vllm/README.md) to demonstrate the workflow.
## Prerequisites
......
---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
title: "Grove Deployment Guide"
---
# Grove Deployment Guide
Grove is a Kubernetes API specifically designed to address the orchestration challenges of modern AI workloads, particularly disaggregated inference systems. Grove provides seamless integration with NVIDIA Dynamo for comprehensive AI infrastructure management.
## Overview
......
---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
title: "Installation Guide for Dynamo Kubernetes Platform"
---
# Installation Guide for Dynamo Kubernetes Platform
Deploy and manage Dynamo inference graphs on Kubernetes with automated orchestration and scaling, using the Dynamo Kubernetes Platform.
## Before You Start
......@@ -152,37 +153,34 @@ VALIDATION ERROR: Cannot install cluster-wide Dynamo operator.
Found existing namespace-restricted Dynamo operators in namespaces: ...
```
<Tip>
For multinode deployments, you need to install multinode orchestration components:
**Option 1 (Recommended): Grove + KAI Scheduler**
- Grove and KAI Scheduler can be installed manually or through the dynamo-platform helm install command.
- When using the dynamo-platform helm install command, Grove and KAI Scheduler are NOT installed by default. You can enable their installation by setting the following flags:
```bash
--set "grove.enabled=true"
--set "kai-scheduler.enabled=true"
```
**Option 2: LeaderWorkerSet (LWS) + Volcano**
- If using LWS for multinode deployments, you must also install Volcano (required dependency):
- [LWS Installation](https://github.com/kubernetes-sigs/lws#installation)
- [Volcano Installation](https://volcano.sh/en/docs/installation/) (required for gang scheduling with LWS)
- These must be installed manually before deploying multinode workloads with LWS.
See the [Multinode Deployment Guide](deployment/multinode-deployment.md) for details on orchestrator selection.
</Tip>
<Tip>
By default, Model Express Server is not used.
If you wish to use an existing Model Express Server, you can set the modelExpressURL to the existing server's URL in the helm install command:
</Tip>
> [!TIP]
> For multinode deployments, you need to install multinode orchestration components:
> **Option 1 (Recommended): Grove + KAI Scheduler**
> - Grove and KAI Scheduler can be installed manually or through the dynamo-platform helm install command.
> - When using the dynamo-platform helm install command, Grove and KAI Scheduler are NOT installed by default. You can enable their installation by setting the following flags:
> ```bash
> --set "grove.enabled=true"
> --set "kai-scheduler.enabled=true"
> ```
> **Option 2: LeaderWorkerSet (LWS) + Volcano**
> - If using LWS for multinode deployments, you must also install Volcano (required dependency):
> - [LWS Installation](https://github.com/kubernetes-sigs/lws#installation)
> - [Volcano Installation](https://volcano.sh/en/docs/installation/) (required for gang scheduling with LWS)
> - These must be installed manually before deploying multinode workloads with LWS.
> See the [Multinode Deployment Guide](deployment/multinode-deployment.md) for details on orchestrator selection.
> [!TIP]
> By default, Model Express Server is not used.
> If you wish to use an existing Model Express Server, you can set the modelExpressURL to the existing server's URL in the helm install command:
```bash
--set "dynamo-operator.modelExpressURL=http://model-express-server.model-express.svc.cluster.local:8080"
```
<Tip>
By default, Dynamo Operator is installed cluster-wide and will monitor all namespaces.
If you wish to restrict the operator to monitor only a specific namespace (the helm release namespace by default), you can set the namespaceRestriction.enabled to true.
You can also change the restricted namespace by setting the targetNamespace property.
</Tip>
> [!TIP]
> By default, Dynamo Operator is installed cluster-wide and will monitor all namespaces.
> If you wish to restrict the operator to monitor only a specific namespace (the helm release namespace by default), you can set the namespaceRestriction.enabled to true.
> You can also change the restricted namespace by setting the targetNamespace property.
```bash
--set "dynamo-operator.namespaceRestriction.enabled=true"
......
---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
title: "Model Caching with Fluid: Cloud-Native Data Orchestration and Acceleration"
---
# Model Caching with Fluid: Cloud-Native Data Orchestration and Acceleration
Fluid is an open-source, cloud-native data orchestration and acceleration platform for Kubernetes. It virtualizes and accelerates data access from various sources (object storage, distributed file systems, cloud storage), making it ideal for AI, machine learning, and big data workloads.
## Key Features
......
---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
title: "Log Aggregation in Dynamo on Kubernetes"
---
# Log Aggregation in Dynamo on Kubernetes
This guide demonstrates how to set up logging for Dynamo in Kubernetes using Grafana Loki and Alloy. This setup provides a simple reference logging setup that can be followed in Kubernetes clusters including Minikube and MicroK8s.
<Note>
This setup is intended for development and testing purposes. For production environments, please refer to the official documentation for high-availability configurations.
</Note>
> [!NOTE]
> This setup is intended for development and testing purposes. For production environments, please refer to the official documentation for high-availability configurations.
## Components Overview
......@@ -131,9 +131,8 @@ envsubst < deploy/observability/k8s/logging/grafana/loki-datasource.yaml | kubec
envsubst < deploy/observability/k8s/logging/grafana/logging-dashboard.yaml | kubectl apply -n $MONITORING_NAMESPACE -f -
```
<Note>
If using Grafana installed without the Prometheus Operator, you can manually import the Loki datasource and Dynamo Logs dashboard using the Grafana UI.
</Note>
> [!NOTE]
> If using Grafana installed without the Prometheus Operator, you can manually import the Loki datasource and Dynamo Logs dashboard using the Grafana UI.
### 4. Deploy a DynamoGraphDeployment with JSONL Logging
......
---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
title: "Dynamo Metrics Collection on Kubernetes"
---
# Dynamo Metrics Collection on Kubernetes
## Overview
This guide provides a walkthrough for collecting and visualizing metrics from Dynamo components using the kube-prometheus-stack. The kube-prometheus-stack provides a powerful and flexible way to configure monitoring for Kubernetes applications through custom resources like PodMonitors, making it easy to automatically discover and scrape metrics from Dynamo components.
......@@ -28,9 +29,8 @@ helm install prometheus -n monitoring --create-namespace prometheus-community/ku
--set prometheus.prometheusSpec.probeNamespaceSelector="{}"
```
<Note>
The commands enumerated below assume you have installed the kube-prometheus-stack with the installation method listed above. Depending on your installation configuration of the monitoring stack, you may need to modify the `kubectl` commands that follow in this document accordingly (e.g modifying Namespace or Service names accordingly).
</Note>
> [!NOTE]
> The commands enumerated below assume you have installed the kube-prometheus-stack with the installation method listed above. Depending on your installation configuration of the monitoring stack, you may need to modify the `kubectl` commands that follow in this document accordingly (e.g modifying Namespace or Service names accordingly).
### Install Dynamo Operator
Before setting up metrics collection, you'll need to have the Dynamo operator installed in your cluster. Follow our [Installation Guide](../installation-guide.md) for detailed instructions on deploying the Dynamo operator.
......@@ -46,9 +46,8 @@ helm install dynamo-platform ...
The Dynamo Grafana dashboard includes panels for node-level CPU utilization, system load, and container resource usage. These metrics are collected and exported to Prometheus via [node-exporter](https://github.com/prometheus/node_exporter), which exposes hardware and OS metrics from Linux systems.
<Note>
The kube-prometheus-stack installation described above includes node-exporter by default. If you're using a custom Prometheus setup, you'll need to ensure node-exporter is deployed as a DaemonSet on your cluster nodes.
</Note>
> [!NOTE]
> The kube-prometheus-stack installation described above includes node-exporter by default. If you're using a custom Prometheus setup, you'll need to ensure node-exporter is deployed as a DaemonSet on your cluster nodes.
To verify node-exporter is running:
......
---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
---
# Dynamo Operator Metrics
## Overview
The Dynamo Operator exposes Prometheus metrics for monitoring its own health and performance. These metrics are separate from application metrics (frontend/worker) and provide visibility into:
- **Controller Reconciliation**: How efficiently controllers process DynamoGraphDeployments, DynamoComponentDeployments, and DynamoModels
- **Webhook Validation**: Performance and outcomes of admission webhook requests
- **Resource Inventory**: Current count of managed resources by state and namespace
## Prerequisites
The operator metrics feature requires the same monitoring infrastructure as application metrics. For detailed setup instructions, see the [Kubernetes Metrics Guide](./metrics.md#prerequisites).
**Quick checklist:**
- kube-prometheus-stack installed (for ServiceMonitor support)
- Prometheus and Grafana running
- Dynamo Operator installed via Helm
## Metrics Collection
### ServiceMonitor
Operator metrics are automatically collected via a ServiceMonitor, which is created by the Helm chart when `metricsService.enabled: true` (default).
**Unlike application metrics** (which use PodMonitor), the operator uses ServiceMonitor and requires no manual RBAC configuration. The operator's kube-rbac-proxy sidecar is configured with `--ignore-paths=/metrics` to allow Prometheus access.
To verify the ServiceMonitor is created:
```bash
kubectl get servicemonitor -n dynamo-system
```
### Disabling Metrics Collection
To disable operator metrics collection:
```bash
helm upgrade dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz \
--namespace dynamo-system \
--set dynamo-operator.metricsService.enabled=false
```
## Available Metrics
All metrics use the `dynamo_operator` namespace prefix.
### Reconciliation Metrics
| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `dynamo_operator_reconcile_duration_seconds` | Histogram | `resource_type`, `namespace`, `result` | Duration of reconciliation loops |
| `dynamo_operator_reconcile_total` | Counter | `resource_type`, `namespace`, `result` | Total number of reconciliations |
| `dynamo_operator_reconcile_errors_total` | Counter | `resource_type`, `namespace`, `error_type` | Total reconciliation errors by type |
**Labels:**
- `resource_type`: `DynamoGraphDeployment`, `DynamoComponentDeployment`, `DynamoModel`, `DynamoGraphDeploymentRequest`, `DynamoGraphDeploymentScalingAdapter`
- `namespace`: Target namespace of the resource
- `result`: `success`, `error`, `requeue`
- `error_type`: `not_found`, `already_exists`, `conflict`, `validation`, `bad_request`, `unauthorized`, `forbidden`, `timeout`, `server_timeout`, `unavailable`, `rate_limited`, `internal`
### Webhook Metrics
| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `dynamo_operator_webhook_duration_seconds` | Histogram | `resource_type`, `operation` | Duration of webhook validation requests |
| `dynamo_operator_webhook_requests_total` | Counter | `resource_type`, `operation`, `result` | Total webhook admission requests |
| `dynamo_operator_webhook_denials_total` | Counter | `resource_type`, `operation`, `reason` | Total webhook denials with reasons |
**Labels:**
- `resource_type`: Same as reconciliation metrics
- `operation`: `CREATE`, `UPDATE`, `DELETE`
- `result`: `allowed`, `denied`
- `reason`: Validation failure reason (e.g., `immutable_field_changed`, `invalid_config`)
### Resource Inventory Metrics
| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `dynamo_operator_resources_total` | Gauge | `resource_type`, `namespace`, `status` | Current count of resources by state |
**Labels:**
- `resource_type`: `DynamoGraphDeployment`, `DynamoComponentDeployment`, `DynamoModel`, `DynamoGraphDeploymentRequest`, `DynamoGraphDeploymentScalingAdapter`
- `namespace`: Resource namespace
- `status`: Resource state derived from each CRD's status. Common values:
- `"ready"` - Resource is healthy and operational (DCD, DM, DGDSA)
- `"not_ready"` - Resource exists but is not operational (DCD, DM, DGDSA)
- `"unknown"` - State cannot be determined (default for empty status)
- DGD uses: `"pending"`, `"successful"`, `"failed"` from `.status.state`
- DGDR uses: `"Pending"`, `"Profiling"`, `"Deploying"`, `"Ready"`, `"DeploymentDeleted"`, `"Failed"` from `.status.state`
## Example Queries
### Reconciliation Performance
```promql
# P95 reconciliation duration by resource type
histogram_quantile(0.95,
sum by (resource_type, le) (
rate(dynamo_operator_reconcile_duration_seconds_bucket[5m])
)
)
# Reconciliation rate by result
sum by (resource_type, result) (
rate(dynamo_operator_reconcile_total[5m])
)
# Error rate by type
sum by (resource_type, error_type) (
rate(dynamo_operator_reconcile_errors_total[5m])
)
```
### Webhook Performance
```promql
# Webhook P95 latency
histogram_quantile(0.95,
sum by (resource_type, le) (
rate(dynamo_operator_webhook_duration_seconds_bucket[5m])
)
)
# Webhook denial rate
sum by (resource_type, operation, reason) (
rate(dynamo_operator_webhook_denials_total[5m])
)
```
### Resource Inventory
```promql
# Total resources by type and state
sum by (resource_type, status) (
dynamo_operator_resources_total
)
# DynamoGraphDeployments by state
sum by (status) (
dynamo_operator_resources_total{resource_type="DynamoGraphDeployment"}
)
# All resources by namespace and state
sum by (resource_type, namespace, status) (
dynamo_operator_resources_total
)
```
## Grafana Dashboard
A pre-built Grafana dashboard is available for visualizing operator metrics.
### Dashboard Sections
1. **Reconciliation Metrics** (3 panels)
- Reconciliation rate by resource type and result
- P95 reconciliation duration
- Reconciliation errors by type
2. **Webhook Metrics** (3 panels)
- Webhook request rate by operation
- P95 webhook duration
- Webhook denials by reason
3. **Resource Inventory** (2 panels)
- Resource inventory timeline by state and namespace (filterable by resource type)
- Current resource count by state (filterable by resource type)
4. **Operational Health** (2 panels)
- Reconciliation success rate gauges
- Webhook admission success rate gauges
### Deploying the Dashboard
```bash
kubectl apply -f deploy/observability/k8s/grafana-operator-dashboard-configmap.yaml
```
The dashboard will automatically appear in Grafana (assuming you have the Grafana dashboard sidecar configured, which is included in kube-prometheus-stack).
### Finding the Dashboard
1. Port-forward to Grafana (if needed):
```bash
kubectl port-forward svc/prometheus-grafana 3000:80 -n monitoring
```
2. Log in to Grafana at http://localhost:3000
3. Navigate to **Dashboards** → Search for **"Dynamo Operator"**
### Dashboard Filters
The dashboard includes two filter variables:
- **Namespace**: View metrics across all namespaces or filter by specific ones (multi-select)
- **Resource Type**: Filter all panels by resource type or select "All" to see aggregated metrics across all CRDs (single select)
When "All" is selected for Resource Type, all panels will show data for all five managed CRDs with resource_type labels for differentiation.
## Accessing Metrics Directly
For instructions on accessing Prometheus and Grafana, see the [Kubernetes Metrics Guide](./metrics.md#viewing-the-metrics).
Once you have access to Prometheus, you can query operator metrics directly:
```bash
# Port-forward to Prometheus
kubectl port-forward svc/prometheus-kube-prometheus-prometheus 9090:9090 -n monitoring
# Visit http://localhost:9090 and try queries like:
# - dynamo_operator_reconcile_total
# - dynamo_operator_webhook_requests_total
# - dynamo_operator_resources_total
```
## Troubleshooting
### Metrics Not Appearing in Prometheus
1. **Check ServiceMonitor exists:**
```bash
kubectl get servicemonitor -n dynamo-system | grep operator
```
2. **Check ServiceMonitor is discovered by Prometheus:**
- Go to Prometheus UI → Status → Targets
- Look for `serviceMonitor/dynamo-system/dynamo-platform-dynamo-operator-operator`
- Should show state: `UP`
3. **Check Prometheus selector configuration:**
```bash
kubectl get prometheus -o yaml | grep serviceMonitorSelector
```
Ensure `serviceMonitorSelectorNilUsesHelmValues: false` was set during kube-prometheus-stack installation.
### Dashboard Not Appearing in Grafana
1. **Check ConfigMap is created:**
```bash
kubectl get configmap -n monitoring grafana-operator-dashboard
```
2. **Check ConfigMap has the label:**
```bash
kubectl get configmap -n monitoring grafana-operator-dashboard -o jsonpath='{.metadata.labels.grafana_dashboard}'
```
Should return `"1"`
3. **Check Grafana dashboard sidecar configuration:**
```bash
kubectl get deployment -n monitoring prometheus-grafana -o yaml | grep -A 5 sidecar
```
The sidecar should be configured to watch for `grafana_dashboard: "1"` label.
4. **Restart Grafana pod** to force dashboard refresh:
```bash
kubectl rollout restart deployment/prometheus-grafana -n monitoring
```
## Related Documentation
- [Kubernetes Metrics Guide](./metrics.md) - Application metrics for frontends and workers
- [Dynamo Operator Guide](../dynamo-operator.md) - Operator architecture and deployment modes
- [Operator Webhooks](../webhooks.md) - Webhook validation details
---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
title: "Deploying Dynamo on Kubernetes"
---
# Deploying Dynamo on Kubernetes
High-level guide to Dynamo Kubernetes deployments. Start here, then dive into specific guides.
## Important Terminology
......
---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
title: "Service Discovery"
---
# Service Discovery
Dynamo components (frontends, workers, planner) need to be able to discover each other and their capabilities at runtime. We refer to this as service discovery. There are 2 kinds of service discovery backends supported on Kubernetes.
## Discovery Backends
......
---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
title: "Webhooks"
---
# Webhooks
This document describes the webhook functionality in the Dynamo Operator, including validation webhooks, certificate management, and troubleshooting.
## Table of Contents
......
---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
title: "KVBM Architecture"
---
# KVBM Architecture
The KVBM serves as a critical infrastructure component for scaling LLM inference workloads efficiently. By cleanly separating runtime logic from memory management, and by enabling distributed block sharing, KVBM lays the foundation for high-throughput, multi-node, and memory-disaggregated AI systems.
![A block diagram showing a layered architecture view of Dynamo KV Block manager.](../../assets/img/kvbm-architecture.png)
......
---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
title: "Understanding KVBM components"
---
# Understanding KVBM components
KVBM design takes inspiration from the KV block managers used in vLLM and SGLang, with an added influence from historical memory tiering strategies common in general GPU programming. For more details, [See KVBM Reading](kvbm-reading.md). The figure below illustrates the internal components of KVBM.
![Internal Components of Dynamo KVBM. ](../../assets/img/kvbm-components.png)
......
---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
title: "KVBM components"
---
# KVBM components
The design of the KVBM is inspired from vLLM and SGLang KV block managers but with a twist from historical memory tiering design aspired in general GPU programming. [See KVBM Reading](kvbm-reading.md). The following figure shows the internal architecture of KVBM and how it works across workers using NIXL.
![Internal architecture and key modules in the Dynamo KVBM. ](../../assets/img/kvbm-internal-arch.png)
......
---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
title: "KVBM Integrations"
---
# KVBM Integrations
KVBM Integrates with Inference frameworks (vLLM, TRTLLM, SGLang) via Connector APIs to influence KV caching behaviour, scheduling, and forward pass execution.
There are two components of the interface, Scheduler and Worker. Scheduler(leader) is responsible for the orchestration of KV block offload/onboard, builds metadata specifying transfer data to the workers. It also maintains hooks for handling asynchronous transfer completion. Worker is responsible for reading metadata built by the scheduler(leader), does async onboarding/ offloading at the end of the forward pass.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment