Unverified Commit faafa5ff authored by Keiven C's avatar Keiven C Committed by GitHub
Browse files

docs: add a docs/guides/metrics.md (#2160)


Co-authored-by: default avatarKeiven Chang <keivenchang@users.noreply.github.com>
parent efd863d6
...@@ -2,12 +2,17 @@ ...@@ -2,12 +2,17 @@
This directory contains configuration for visualizing metrics from the metrics aggregation service using Prometheus and Grafana. This directory contains configuration for visualizing metrics from the metrics aggregation service using Prometheus and Grafana.
## Components > [!NOTE]
> For detailed information about Dynamo's metrics system, including hierarchical metrics, automatic labeling, and usage examples, see the [Metrics Guide](../../docs/guides/metrics.md).
## Overview
### Components
- **Prometheus Server**: Collects and stores metrics from Dynamo services and other components. - **Prometheus Server**: Collects and stores metrics from Dynamo services and other components.
- **Grafana**: Provides dashboards by querying the Prometheus Server. - **Grafana**: Provides dashboards by querying the Prometheus Server.
## Topology ### Topology
Default Service Relationship Diagram: Default Service Relationship Diagram:
```mermaid ```mermaid
...@@ -29,17 +34,63 @@ The dcgm-exporter service in the Docker Compose network is configured to use por ...@@ -29,17 +34,63 @@ The dcgm-exporter service in the Docker Compose network is configured to use por
As of Q2 2025, Dynamo HTTP Frontend metrics are exposed when you build containers with `--framework VLLM` or `--framework TENSORRTLLM`. As of Q2 2025, Dynamo HTTP Frontend metrics are exposed when you build containers with `--framework VLLM` or `--framework TENSORRTLLM`.
### Available Metrics
#### Component Metrics
The core Dynamo backend system automatically exposes metrics with the `dynamo_component_*` prefix for all components that use the `DistributedRuntime` framework:
- `dynamo_component_concurrent_requests`: Requests currently being processed (gauge)
- `dynamo_component_request_bytes_total`: Total bytes received in requests (counter)
- `dynamo_component_request_duration_seconds`: Request processing time (histogram)
- `dynamo_component_requests_total`: Total requests processed (counter)
- `dynamo_component_response_bytes_total`: Total bytes sent in responses (counter)
- `dynamo_component_system_uptime_seconds`: DistributedRuntime uptime (gauge)
#### Specialized Component Metrics
Some components expose additional metrics specific to their functionality:
- `dynamo_preprocessor_*`: Metrics specific to preprocessor components
#### Frontend Metrics
When using Dynamo HTTP Frontend (`--framework VLLM` or `--framework TENSORRTLLM`), these metrics are automatically exposed with the `dynamo_frontend_*` prefix and include `model` labels containing the model name:
- `dynamo_frontend_inflight_requests`: Inflight requests (gauge)
- `dynamo_frontend_input_sequence_tokens`: Input sequence length (histogram)
- `dynamo_frontend_inter_token_latency_seconds`: Inter-token latency (histogram)
- `dynamo_frontend_output_sequence_tokens`: Output sequence length (histogram)
- `dynamo_frontend_request_duration_seconds`: LLM request duration (histogram)
- `dynamo_frontend_requests_total`: Total LLM requests (counter)
- `dynamo_frontend_time_to_first_token_seconds`: Time to first token (histogram)
### Required Files
The following configuration files should be present in this directory:
- [docker-compose.yml](./docker-compose.yml): Defines the Prometheus and Grafana services
- [prometheus.yml](./prometheus.yml): Contains Prometheus scraping configuration
- [grafana-datasources.yml](./grafana-datasources.yml): Contains Grafana datasource configuration
- [grafana_dashboards/grafana-dashboard-providers.yml](./grafana_dashboards/grafana-dashboard-providers.yml): Contains Grafana dashboard provider configuration
- [grafana_dashboards/grafana-dynamo-dashboard.json](./grafana_dashboards/grafana-dynamo-dashboard.json): A general Dynamo Dashboard for both SW and HW metrics.
- [grafana_dashboards/grafana-dcgm-metrics.json](./grafana_dashboards/grafana-dcgm-metrics.json): Contains Grafana dashboard configuration for DCGM GPU metrics
- [grafana_dashboards/grafana-llm-metrics.json](./grafana_dashboards/grafana-llm-metrics.json): This file, which is being phased out, contains the Grafana dashboard configuration for LLM-specific metrics. It requires an additional `metrics` component to operate concurrently. A new version is under development.
## Getting Started ## Getting Started
### Prerequisites
1. Make sure Docker and Docker Compose are installed on your system 1. Make sure Docker and Docker Compose are installed on your system
2. Start Dynamo dependencies. Assume you're at the root dynamo path: ### Quick Start
1. Start Dynamo dependencies. Assume you're at the root dynamo path:
```bash ```bash
# Start the basic services (etcd & natsd), along with Prometheus and Grafana # Start the basic services (etcd & natsd), along with Prometheus and Grafana
docker compose -f deploy/docker-compose.yml --profile metrics up -d docker compose -f deploy/docker-compose.yml --profile metrics up -d
# Minimum components for Dynamo: etcd/nats/dcgm-exporter # Minimum components for Dynamo (will not have Prometheus and Grafana): etcd/nats/dcgm-exporter
docker compose -f deploy/docker-compose.yml up -d docker compose -f deploy/docker-compose.yml up -d
``` ```
...@@ -48,7 +99,7 @@ As of Q2 2025, Dynamo HTTP Frontend metrics are exposed when you build container ...@@ -48,7 +99,7 @@ As of Q2 2025, Dynamo HTTP Frontend metrics are exposed when you build container
export CUDA_VISIBLE_DEVICES=0,2 export CUDA_VISIBLE_DEVICES=0,2
``` ```
3. Web servers started. The ones that end in /metrics are in Prometheus format: 2. Web servers started. The ones that end in /metrics are in Prometheus format:
- Grafana: `http://localhost:3001` (default login: dynamo/dynamo) - Grafana: `http://localhost:3001` (default login: dynamo/dynamo)
- Prometheus Server: `http://localhost:9090` - Prometheus Server: `http://localhost:9090`
- NATS Server: `http://localhost:8222` (monitoring endpoints: /varz, /healthz, etc.) - NATS Server: `http://localhost:8222` (monitoring endpoints: /varz, /healthz, etc.)
...@@ -56,16 +107,14 @@ As of Q2 2025, Dynamo HTTP Frontend metrics are exposed when you build container ...@@ -56,16 +107,14 @@ As of Q2 2025, Dynamo HTTP Frontend metrics are exposed when you build container
- etcd Server: `http://localhost:2379/metrics` - etcd Server: `http://localhost:2379/metrics`
- DCGM Exporter: `http://localhost:9401/metrics` - DCGM Exporter: `http://localhost:9401/metrics`
4. Optionally, if you want to experiment further, look through components/metrics/README.md for more details on launching a metrics server (subscribes to nats), mock_worker (publishes to nats), and real workers.
- Start the [components/metrics](../../components/metrics/README.md) application to begin monitoring for metric events from dynamo workers and aggregating them on a Prometheus metrics endpoint: `http://localhost:9091/metrics`. - Start the [components/metrics](../../components/metrics/README.md) application to begin monitoring for metric events from dynamo workers and aggregating them on a Prometheus metrics endpoint: `http://localhost:9091/metrics`.
- Uncomment the appropriate lines in prometheus.yml to poll port 9091. - Uncomment the appropriate lines in prometheus.yml to poll port 9091.
- Start worker(s) that publishes KV Cache metrics: [lib/runtime/examples/service_metrics/README.md](../../lib/runtime/examples/service_metrics/README.md) can populate dummy KV Cache metrics. - Start worker(s) that publishes KV Cache metrics: [lib/runtime/examples/service_metrics/README.md](../../lib/runtime/examples/service_metrics/README.md) can populate dummy KV Cache metrics.
### Configuration
## Configuration #### Prometheus
### Prometheus
The Prometheus configuration is specified in [prometheus.yml](./prometheus.yml). This file is set up to collect metrics from the metrics aggregation service endpoint. The Prometheus configuration is specified in [prometheus.yml](./prometheus.yml). This file is set up to collect metrics from the metrics aggregation service endpoint.
...@@ -77,29 +126,233 @@ After making changes to prometheus.yml, it is necessary to reload the configurat ...@@ -77,29 +126,233 @@ After making changes to prometheus.yml, it is necessary to reload the configurat
docker compose -f deploy/docker-compose.yml up prometheus -d --force-recreate docker compose -f deploy/docker-compose.yml up prometheus -d --force-recreate
``` ```
### Grafana #### Grafana
Grafana is pre-configured with: Grafana is pre-configured with:
- Prometheus datasource - Prometheus datasource
- Sample dashboard for visualizing service metrics - Sample dashboard for visualizing service metrics
![grafana image](./grafana-dynamo-composite.png) ![grafana image](./grafana-dynamo-composite.png)
## Required Files ### Troubleshooting
The following configuration files should be present in this directory: 1. Verify services are running:
- [docker-compose.yml](./docker-compose.yml): Defines the Prometheus and Grafana services ```bash
- [prometheus.yml](./prometheus.yml): Contains Prometheus scraping configuration docker compose ps
- [grafana-datasources.yml](./grafana-datasources.yml): Contains Grafana datasource configuration ```
- [grafana_dashboards/grafana-dashboard-providers.yml](./grafana_dashboards/grafana-dashboard-providers.yml): Contains Grafana dashboard provider configuration
- [grafana_dashboards/grafana-dynamo-dashboard.json](./grafana_dashboards/grafana-dynamo-dashboard.json): A general Dynamo Dashboard for both SW and HW metrics.
- [grafana_dashboards/grafana-dcgm-metrics.json](./grafana_dashboards/grafana-dcgm-metrics.json): Contains Grafana dashboard configuration for DCGM GPU metrics
- [grafana_dashboards/grafana-llm-metrics.json](./grafana_dashboards/grafana-llm-metrics.json): This file, which is being phased out, contains the Grafana dashboard configuration for LLM-specific metrics. It requires an additional `metrics` component to operate concurrently. A new version is under development.
## Running the deprecated `metrics` component 2. Check logs:
```bash
docker compose logs prometheus
docker compose logs grafana
```
3. For issues with the legacy metrics component (being phased out), see [components/metrics/README.md](../../components/metrics/README.md) for details on the exposed metrics and troubleshooting steps.
## Developer Guide
### Creating Metrics at Different Hierarchy Levels
#### Runtime-Level Metrics
```rust
use dynamo_runtime::DistributedRuntime;
let runtime = DistributedRuntime::new()?;
let namespace = runtime.namespace("my_namespace")?;
let component = namespace.component("my_component")?;
let endpoint = component.endpoint("my_endpoint")?;
// Create endpoint-level counters (this is a Prometheus Counter type)
let total_requests = endpoint.create_counter(
"total_requests",
"Total requests across all namespaces",
&[]
)?;
let active_connections = endpoint.create_gauge(
"active_connections",
"Number of active client connections",
&[]
)?;
```
#### Namespace-Level Metrics
```rust
let namespace = runtime.namespace("my_model")?;
// Namespace-scoped metrics
let model_requests = namespace.create_counter(
"model_requests",
"Requests for this specific model",
&[]
)?;
let model_latency = namespace.create_histogram(
"model_latency_seconds",
"Model inference latency",
&[],
&[0.001, 0.01, 0.1, 1.0, 10.0]
)?;
```
#### Component-Level Metrics
```rust
let component = namespace.component("backend")?;
// Component-specific metrics
let backend_requests = component.create_counter(
"backend_requests",
"Requests handled by this backend component",
&[]
)?;
let gpu_memory_usage = component.create_gauge(
"gpu_memory_bytes",
"GPU memory usage in bytes",
&[]
)?;
```
#### Endpoint-Level Metrics
```rust
let endpoint = component.endpoint("generate")?;
// Endpoint-specific metrics
let generate_requests = endpoint.create_counter(
"generate_requests",
"Generate endpoint requests",
&[]
)?;
let generate_latency = endpoint.create_histogram(
"generate_latency_seconds",
"Generate endpoint latency",
&[],
&[0.001, 0.01, 0.1, 1.0, 10.0]
)?;
```
### Creating Vector Metrics with Dynamic Labels
Use vector metrics when you need to track metrics with different label values:
```rust
// Counter with labels
let requests_by_model = endpoint.create_counter_vec(
"requests_by_model",
"Requests by model type",
&["model_type", "model_size"]
)?;
// Increment with specific labels
requests_by_model.with_label_values(&["llama", "7b"]).inc();
requests_by_model.with_label_values(&["gpt", "13b"]).inc();
// Gauge with labels
let memory_by_gpu = component.create_gauge_vec(
"gpu_memory_bytes",
"GPU memory usage by device",
&["gpu_id", "memory_type"]
)?;
memory_by_gpu.with_label_values(&["0", "allocated"]).set(8192.0);
memory_by_gpu.with_label_values(&["0", "cached"]).set(4096.0);
```
### Creating Histograms
Histograms are useful for measuring distributions of values like latency:
```rust
let latency_histogram = endpoint.create_histogram(
"request_latency_seconds",
"Request latency distribution",
&[],
&[0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0, 5.0]
)?;
// Record latency values
latency_histogram.observe(0.023); // 23ms
latency_histogram.observe(0.156); // 156ms
```
### Transitioning from Plain Prometheus
If you're currently using plain Prometheus metrics, transitioning to Dynamo's `MetricsRegistry` is straightforward:
#### Before (Plain Prometheus)
```rust
use prometheus::{Counter, Opts, Registry};
// Create a registry to hold metrics
let registry = Registry::new();
let counter_opts = Opts::new("my_counter", "My custom counter");
let counter = Counter::with_opts(counter_opts).unwrap();
registry.register(Box::new(counter.clone())).unwrap();
// Use the counter
counter.inc();
// To expose metrics, you'd need to set up an HTTP server manually
// and implement the /metrics endpoint yourself
```
#### After (Dynamo MetricsRegistry)
```rust
let counter = endpoint.create_counter(
"my_counter",
"My custom counter",
&[]
)?;
counter.inc();
```
**Note:** The metric is automatically registered when created via the endpoint's `create_counter` factory method.
**Benefits of Dynamo's approach:**
- **Automatic registration**: Metrics created via endpoint's `create_*` factory methods are automatically registered with the system
- Automatic labeling with namespace, component, and endpoint information
- Consistent metric naming with `dynamo_` prefix
- Built-in HTTP metrics endpoint when enabled with `DYN_SYSTEM_ENABLED=true`
- Hierarchical metric organization
### Advanced Features
#### Custom Buckets for Histograms
```rust
// Define custom buckets for your use case
let custom_buckets = vec![0.001, 0.01, 0.1, 1.0, 10.0];
let latency = endpoint.create_histogram(
"api_latency_seconds",
"API latency in seconds",
&[],
&custom_buckets
)?;
```
#### Metric Aggregation
```rust
// Aggregate metrics across multiple endpoints
let total_requests = namespace.create_counter(
"total_requests",
"Total requests across all endpoints",
&[]
)?;
```
## Running the deprecated `components/metrics` program
⚠️ **DEPRECATION NOTICE** ⚠️ ⚠️ **DEPRECATION NOTICE** ⚠️
When you run the example [components/metrics](../../components/metrics/README.md) component, it exposes a Prometheus /metrics endpoint with the following metrics (defined in [components/metrics/src/lib.rs](../../components/metrics/src/lib.rs)): When you run the example [components/metrics](../../components/metrics/README.md) program, it exposes a Prometheus /metrics endpoint with the following metrics (defined in [components/metrics/src/lib.rs](../../components/metrics/src/lib.rs)):
**⚠️ The following `llm_kv_*` metrics are deprecated:** **⚠️ The following `llm_kv_*` metrics are deprecated:**
...@@ -123,3 +376,5 @@ When you run the example [components/metrics](../../components/metrics/README.md ...@@ -123,3 +376,5 @@ When you run the example [components/metrics](../../components/metrics/README.md
docker compose logs prometheus docker compose logs prometheus
docker compose logs grafana docker compose logs grafana
``` ```
3. For issues with the legacy metrics component (being phased out), see [components/metrics/README.md](../../components/metrics/README.md) for details on the exposed metrics and troubleshooting steps.
<!--
SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# Dynamo MetricsRegistry
## Overview
Dynamo provides built-in metrics capabilities through the `MetricsRegistry` trait, which is automatically available whenever you use the `DistributedRuntime` framework. This guide explains how to use metrics for observability and monitoring across all Dynamo components.
## Automatic Metrics
Dynamo automatically exposes metrics with the `dynamo_` name prefixes. It also adds the following labels `dynamo_namespace`, `dynamo_component`, and `dynamo_endpoint` to indicate which component is providing the metric.
**Frontend Metrics**: When using Dynamo HTTP Frontend (`--framework VLLM` or `--framework TENSORRTLLM`), these metrics are automatically exposed with the `dynamo_frontend_*` prefix and include `model` labels containing the model name. These cover request handling, token processing, and latency measurements. See the [Available Metrics section](../../deploy/metrics/README.md#available-metrics) for the complete list of frontend metrics.
**Component Metrics**: The core Dynamo backend system automatically exposes metrics with the `dynamo_component_*` prefix for all components that use the `DistributedRuntime` framework. These include request counts, processing times, byte transfers, and system uptime metrics. See the [Available Metrics section](../../deploy/metrics/README.md#available-metrics) for the complete list of component metrics.
**Specialized Component Metrics**: Components can also expose additional metrics specific to their functionality. For example, a `preprocessor` component exposes metrics with the `dynamo_preprocessor_*` prefix. See the [Available Metrics section](../../deploy/metrics/README.md#available-metrics) for details on specialized component metrics.
## Coming Soon
**Kubernetes Integration**: Comprehensive Kubernetes deployment and monitoring information will be available soon, including Helm charts, Kubernetes-native metrics collection, and cluster-wide observability solutions.
## Metrics Hierarchy
The `MetricsRegistry` trait is implemented by `DistributedRuntime`, `Namespace`, `Component`, and `Endpoint`, providing a hierarchical approach to metric collection that matches Dynamo's distributed architecture:
- `DistributedRuntime`: Global metrics across the entire runtime
- `Namespace`: Metrics scoped to a specific dynamo_namespace
- `Component`: Metrics for a specific dynamo_component within a namespace
- `Endpoint`: Metrics for individual dynamo_endpoint within a component
This hierarchical structure allows you to create metrics at the appropriate level of granularity for your monitoring needs.
## Getting Started
For a complete setup guide including Docker Compose configuration, Prometheus setup, and Grafana dashboards, see the [Getting Started section](../../deploy/metrics/README.md#getting-started) in the deploy metrics documentation.
The quick start includes:
- Docker Compose setup for Prometheus and Grafana
- Pre-configured dashboards and datasources
- Access URLs for all monitoring endpoints
- GPU targeting configuration
## Implementation Examples
See [Implementation Examples](../../deploy/metrics/README.md#implementation-examples) for detailed examples of creating metrics at different hierarchy levels and using dynamic labels.
### Grafana Dashboards
Use dashboards in `deploy/metrics/grafana_dashboards/`:
- `grafana-dynamo-dashboard.json`: General Dynamo dashboard
- `grafana-dcgm-metrics.json`: DCGM GPU metrics dashboard
## Metrics Visualization Architecture
### Service Topology
The metrics system follows this architecture for collecting and visualizing metrics:
```mermaid
graph TD
BROWSER[Browser] -->|:3001| GRAFANA[Grafana :3001]
subgraph DockerComposeNetwork [Network inside Docker Compose]
NATS_PROM_EXP[nats-prom-exp :7777 /metrics] -->|:8222/varz| NATS_SERVER[nats-server :4222, :6222, :8222]
PROMETHEUS[Prometheus server :9090] -->|:2379/metrics| ETCD_SERVER[etcd-server :2379, :2380]
PROMETHEUS -->|:9401/metrics| DCGM_EXPORTER[dcgm-exporter :9401]
PROMETHEUS -->|:7777/metrics| NATS_PROM_EXP
PROMETHEUS -->|:8080/metrics| DYNAMOFE[Dynamo HTTP FE :8080]
PROMETHEUS -->|:8081/metrics| DYNAMOBACKEND[Dynamo backend :8081]
DYNAMOFE --> DYNAMOBACKEND
GRAFANA -->|:9090/query API| PROMETHEUS
end
```
### Grafana Dashboard
The metrics system includes a pre-configured Grafana dashboard for visualizing service metrics:
![Grafana Dynamo Dashboard](../../deploy/metrics/grafana-dynamo-composite.png)
## Related Documentation
- [Distributed Runtime Architecture](../architecture/distributed_runtime.md)
- [Dynamo Architecture Overview](../architecture/architecture.md)
- [Backend Guide](backend.md)
- [Metrics Implementation Examples](../../deploy/metrics/README.md#implementation-examples)
- [Complete Metrics Setup Guide](../../deploy/metrics/README.md)
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment