@@ -130,6 +130,32 @@ For complete documentation on webhooks, certificate management, and troubleshoot
**📖 [Webhooks Guide](./webhooks.md)**
## Observability
The Dynamo Operator provides comprehensive observability through Prometheus metrics and Grafana dashboards. This allows you to monitor:
- **Controller Performance**: Reconciliation loop duration, success rates, and error rates by resource type
- **Webhook Activity**: Validation performance, admission rates, and denial patterns
- **Resource Inventory**: Current count of managed resources by state and namespace
- **Operational Health**: Success rates and health indicators for controllers and webhooks
### Metrics Collection
Metrics are automatically exposed on the operator's `/metrics` endpoint (port 8443 by default) and collected by Prometheus via a ServiceMonitor. The ServiceMonitor is automatically created when you install the operator via Helm (controlled by `metricsService.enabled`, which defaults to `true`).
### Grafana Dashboard
A pre-built Grafana dashboard is available for visualizing operator metrics. The dashboard includes:
- **Reconciliation Metrics**: Rate, duration (P95), and errors by resource type
- **Webhook Metrics**: Request rate, duration (P95), and denials by resource type and operation
- **Resource Inventory**: Count of DynamoGraphDeployments by state and namespace
- **Operational Health**: Success rate gauges for controllers and webhooks
For complete setup instructions and metrics reference, see:
@@ -172,3 +172,9 @@ Visit http://localhost:3000 and log in with the credentials captured above.
Once logged in, find the Dynamo dashboard under General.

## Operator Metrics
> **Note:** The metrics described above are for Dynamo **applications** (frontends, workers). The Dynamo **Operator** itself also exposes metrics for monitoring controller reconciliation, webhook validation, and resource inventory.
>
> See the **[Operator Metrics Guide](operator-metrics.md)** for details on operator-specific metrics and the operator dashboard.
The Dynamo Operator exposes Prometheus metrics for monitoring its own health and performance. These metrics are separate from application metrics (frontend/worker) and provide visibility into:
-**Controller Reconciliation**: How efficiently controllers process DynamoGraphDeployments, DynamoComponentDeployments, and DynamoModels
-**Webhook Validation**: Performance and outcomes of admission webhook requests
-**Resource Inventory**: Current count of managed resources by state and namespace
## Prerequisites
The operator metrics feature requires the same monitoring infrastructure as application metrics. For detailed setup instructions, see the [Kubernetes Metrics Guide](./metrics.md#prerequisites).
Operator metrics are automatically collected via a ServiceMonitor, which is created by the Helm chart when `metricsService.enabled: true` (default).
**Unlike application metrics** (which use PodMonitor), the operator uses ServiceMonitor and requires no manual RBAC configuration. The operator's kube-rbac-proxy sidecar is configured with `--ignore-paths=/metrics` to allow Prometheus access.
The dashboard will automatically appear in Grafana (assuming you have the Grafana dashboard sidecar configured, which is included in kube-prometheus-stack).
| [Metrics](metrics.md) | Available metrics reference | `DYN_SYSTEM_PORT`† |
| [Operator Metrics (Kubernetes)](../kubernetes/observability/operator-metrics.md) | Operator controller and webhook metrics for Kubernetes | N/A (configured via Helm) |
| [Health Checks](health-checks.md) | Component health monitoring and readiness probes | `DYN_SYSTEM_PORT`†, `DYN_SYSTEM_STARTING_HEALTH_STATUS`, `DYN_SYSTEM_HEALTH_PATH`, `DYN_SYSTEM_LIVE_PATH`, `DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS` |
| [Tracing](tracing.md) | Distributed tracing with OpenTelemetry and Tempo | `DYN_LOGGING_JSONL`†, `OTEL_EXPORT_ENABLED`†, `OTEL_EXPORTER_OTLP_TRACES_ENDPOINT`†, `OTEL_SERVICE_NAME`† |
@@ -53,6 +54,8 @@ For detailed setup instructions and configuration, see [Prometheus + Grafana Set
For Kubernetes-specific setup and configuration, see [docs/kubernetes/observability/](../kubernetes/observability/).
**Operator Metrics**: The Dynamo Operator running in Kubernetes exposes its own set of metrics for monitoring controller reconciliation, webhook validation, and resource inventory. See the [Operator Metrics Guide](../kubernetes/observability/operator-metrics.md).