metrics.md 8.85 KB
Newer Older
1
2
3
4
# Dynamo Metrics Collection on Kubernetes

## Overview

5
This guide provides a walkthrough for collecting and visualizing metrics from Dynamo components using the kube-prometheus-stack. The kube-prometheus-stack provides a powerful and flexible way to configure monitoring for Kubernetes applications through custom resources like PodMonitors, making it easy to automatically discover and scrape metrics from Dynamo components.
6
7
8

## Prerequisites

9
10
### Install kube-prometheus-stack
If you don't have an existing Prometheus setup, you'll likely want to install the kube-prometheus-stack. This is a collection of Kubernetes manifests that includes the Prometheus Operator, Prometheus, Grafana, and other monitoring components in a pre-configured setup. The stack introduces custom resources that make it easy to deploy and manage monitoring in Kubernetes:
11
12
13
14
15
16
17
18
19

- `PodMonitor`: Automatically discovers and scrapes metrics from pods based on label selectors
- `ServiceMonitor`: Similar to PodMonitor but works with Services
- `PrometheusRule`: Defines alerting and recording rules

For a basic installation:
```bash
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
20
21
# Values allow PodMonitors to be picked up that are outside of the kube-prometheus-stack helm release
helm install prometheus -n monitoring --create-namespace prometheus-community/kube-prometheus-stack \
22
  --set prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues=false \
23
24
  --set prometheus.prometheusSpec.podMonitorNamespaceSelector.matchLabels=null \
  --set prometheus.prometheusSpec.probeNamespaceSelector.matchLabels=null
25
26
```

27
28
29
> [!Note]
> The commands enumerated below assume you have installed the kube-prometheus-stack with the installation method listed above. Depending on your installation configuration of the monitoring stack, you may need to modify the `kubectl` commands that follow in this document accordingly (e.g modifying Namespace or Service names accordingly).

30
### Install Dynamo Operator
31
Before setting up metrics collection, you'll need to have the Dynamo operator installed in your cluster. Follow our [Installation Guide](../installation_guide.md) for detailed instructions on deploying the Dynamo operator.
32
33
34
35
36
37
38
39
Make sure to set the `prometheusEndpoint` to the Prometheus endpoint you installed in the previous step.

```bash
helm install dynamo-platform ...
  --set prometheusEndpoint=http://prometheus-kube-prometheus-prometheus.monitoring.svc.cluster.local:9090
```


40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
### Node Exporter for CPU/Memory Metrics

The Dynamo Grafana dashboard includes panels for node-level CPU utilization, system load, and container resource usage. These metrics are collected and exported to Prometheus via [node-exporter](https://github.com/prometheus/node_exporter), which exposes hardware and OS metrics from Linux systems.

> [!Note]
> The kube-prometheus-stack installation described above includes node-exporter by default. If you're using a custom Prometheus setup, you'll need to ensure node-exporter is deployed as a DaemonSet on your cluster nodes.

To verify node-exporter is running:

```bash
kubectl get daemonset -A | grep node-exporter
```

If node-exporter is not running, you can install it via the kube-prometheus-stack or deploy it separately. For more information, see the [node-exporter documentation](https://github.com/prometheus/node_exporter).

55
56
57
58
59
60
61
62
63
64
65
### DCGM Metrics Collection (Optional)

GPU utilization metrics are collected and exported to Prometheus via dcgm-exporter. The Dynamo Grafana dashboard includes a panel for GPU utilization related to your Dynamo deployment. For that panel to be populated, you need to ensure that the dcgm-exporter is running in your cluster. To check if the dcgm-exporter is running, please run the following command:

```bash
kubectl get daemonset -A | grep dcgm-exporter
```

If the output is empty, you need to install the dcgm-exporter. For more information, please consult the official [dcgm-exporter documentation](https://docs.nvidia.com/datacenter/cloud-native/gpu-telemetry/latest/dcgm-exporter.html).


66
67
68
69
70
## Deploy a DynamoGraphDeployment

Let's start by deploying a simple vLLM aggregated deployment:

```bash
71
export NAMESPACE=dynamo-system # namespace where dynamo operator is installed
72
pushd examples/backends/vllm/deploy
73
74
75
76
77
78
79
80
81
kubectl apply -f agg.yaml -n $NAMESPACE
popd
```

This will create two components:
- A Frontend component exposing metrics on its HTTP port
- A Worker component exposing metrics on its system port

Both components expose a `/metrics` endpoint following the OpenMetrics format, but with different metrics appropriate to their roles. For details about:
82
83
- Deployment configuration: See the [vLLM README](../../backends/vllm/README.md)
- Available metrics: See the [metrics guide](../../observability/metrics.md)
84
85
86
87
88
89

### Validate the Deployment

Let's send some test requests to populate metrics:

```bash
90
curl localhost:8000/v1/chat/completions \
91
92
93
94
95
96
97
98
99
100
101
102
103
104
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-0.6B",
    "messages": [
    {
        "role": "user",
        "content": "In the heart of Eldoria, an ancient land of boundless magic and mysterious creatures, lies the long-forgotten city of Aeloria. Once a beacon of knowledge and power, Aeloria was buried beneath the shifting sands of time, lost to the world for centuries. You are an intrepid explorer, known for your unparalleled curiosity and courage, who has stumbled upon an ancient map hinting at ests that Aeloria holds a secret so profound that it has the potential to reshape the very fabric of reality. Your journey will take you through treacherous deserts, enchanted forests, and across perilous mountain ranges. Your Task: Character Background: Develop a detailed background for your character. Describe their motivations for seeking out Aeloria, their skills and weaknesses, and any personal connections to the ancient city or its legends. Are they driven by a quest for knowledge, a search for lost familt clue is hidden."
    }
    ],
    "stream": true,
    "max_tokens": 30
  }'
```

105
For more information about validating the deployment, see the [vLLM README](../../backends/vllm/README.md).
106
107
108
109
110

## Set Up Metrics Collection

### Create PodMonitors

111
The Prometheus Operator uses PodMonitor resources to automatically discover and scrape metrics from pods. To enable this discovery, the Dynamo operator automatically creates PodMonitor resource and adds these labels to all pods:
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
- `nvidia.com/metrics-enabled: "true"` - Enables metrics collection
- `nvidia.com/dynamo-component-type: "frontend|worker"` - Identifies the component type

> **Note**: You can opt-out specific deployments from metrics collection by adding this annotation to your DynamoGraphDeployment:
```yaml
apiVersion: nvidia.com/v1
kind: DynamoGraphDeployment
metadata:
  name: my-deployment
  annotations:
    nvidia.com/enable-metrics: "false"
spec:
  # …
```

### Configure Grafana Dashboard

Apply the Dynamo dashboard configuration to populate Grafana with the Dynamo dashboard:
```bash
131
kubectl apply -n monitoring -f deploy/observability/k8s/grafana-dynamo-dashboard-configmap.yaml
132
133
134
135
136
137
138
139
```

The dashboard is embedded in the ConfigMap. Since it is labeled with `grafana_dashboard: "1"`, the Grafana will discover and populate it to its list of available dashboards. The dashboard includes panels for:
- Frontend request rates
- Time to first token
- Inter-token latency
- Request duration
- Input/Output sequence lengths
140
- GPU utilization via DCGM
141
142
143
- Node CPU utilization and system load
- Container CPU usage per pod
- Memory usage per pod
144
145
146
147
148

## Viewing the Metrics

### In Prometheus
```bash
149
kubectl port-forward svc/prometheus-kube-prometheus-prometheus 9090:9090 -n monitoring
150
151
152
153
154
155
```

Visit http://localhost:9090 and try these example queries:
- `dynamo_frontend_requests_total`
- `dynamo_frontend_time_to_first_token_seconds_bucket`

156
![Prometheus UI showing Dynamo metrics](../../images/prometheus-k8s.png)
157
158
159

### In Grafana
```bash
160
161
162
163
164
165
166
167
# Get Grafana credentials
export GRAFANA_USER=$(kubectl get secret -n monitoring prometheus-grafana -o jsonpath="{.data.admin-user}" | base64 --decode)
export GRAFANA_PASSWORD=$(kubectl get secret -n monitoring prometheus-grafana -o jsonpath="{.data.admin-password}" | base64 --decode)
echo "Grafana user: $GRAFANA_USER"
echo "Grafana password: $GRAFANA_PASSWORD"

# Port forward Grafana service
kubectl port-forward svc/prometheus-grafana 3000:80 -n monitoring
168
169
```

170
171
172
Visit http://localhost:3000 and log in with the credentials captured above.

Once logged in, find the Dynamo dashboard under General.
173

174
![Grafana dashboard showing Dynamo metrics](../../images/grafana-k8s.png)
175
176
177
178
179
180

## Operator Metrics

> **Note:** The metrics described above are for Dynamo **applications** (frontends, workers). The Dynamo **Operator** itself also exposes metrics for monitoring controller reconciliation, webhook validation, and resource inventory.
>
> See the **[Operator Metrics Guide](operator-metrics.md)** for details on operator-specific metrics and the operator dashboard.