docs: add a docs/guides/metrics.md (#2160)

Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>

docs: add a docs/guides/metrics.md (#2160)
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
faafa5ff · Keiven C · GitHub · efd863d6 · faafa5ff · faafa5ff
Unverified Commit faafa5ff authored Aug 01, 2025 by Keiven C Committed by GitHub Aug 01, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 379 additions and 21 deletions

deploy/metrics/README.md deploy/metrics/README.md +276 -21

docs/guides/metrics.md docs/guides/metrics.md +103 -0

No files found.
--- a/deploy/metrics/README.md
+++ b/deploy/metrics/README.md
@@ -2,12 +2,17 @@
 This directory contains configuration for visualizing metrics from the metrics aggregation service using Prometheus and Grafana.
-## Components
+> [!NOTE]
+> For detailed information about Dynamo's metrics system, including hierarchical metrics, automatic labeling, and usage examples, see the [Metrics Guide](../../docs/guides/metrics.md).
+## Overview
+### Components
 - **Prometheus Server**: Collects and stores metrics from Dynamo services and other components.
 - **Grafana**: Provides dashboards by querying the Prometheus Server.
-## Topology
+### Topology
 Default Service Relationship Diagram:
 ```mermaid
@@ -29,17 +34,63 @@ The dcgm-exporter service in the Docker Compose network is configured to use por
 As of Q2 2025, Dynamo HTTP Frontend metrics are exposed when you build containers with `--framework VLLM` or `--framework TENSORRTLLM`.
+### Available Metrics
+#### Component Metrics
+The core Dynamo backend system automatically exposes metrics with the `dynamo_component_*` prefix for all components that use the `DistributedRuntime` framework:
+- `dynamo_component_concurrent_requests`: Requests currently being processed (gauge)
+- `dynamo_component_request_bytes_total`: Total bytes received in requests (counter)
+- `dynamo_component_request_duration_seconds`: Request processing time (histogram)
+- `dynamo_component_requests_total`: Total requests processed (counter)
+- `dynamo_component_response_bytes_total`: Total bytes sent in responses (counter)
+- `dynamo_component_system_uptime_seconds`: DistributedRuntime uptime (gauge)
+#### Specialized Component Metrics
+Some components expose additional metrics specific to their functionality:
+- `dynamo_preprocessor_*`: Metrics specific to preprocessor components
+#### Frontend Metrics
+When using Dynamo HTTP Frontend (`--framework VLLM` or `--framework TENSORRTLLM`), these metrics are automatically exposed with the `dynamo_frontend_*` prefix and include `model` labels containing the model name:
+- `dynamo_frontend_inflight_requests`: Inflight requests (gauge)
+- `dynamo_frontend_input_sequence_tokens`: Input sequence length (histogram)
+- `dynamo_frontend_inter_token_latency_seconds`: Inter-token latency (histogram)
+- `dynamo_frontend_output_sequence_tokens`: Output sequence length (histogram)
+- `dynamo_frontend_request_duration_seconds`: LLM request duration (histogram)
+- `dynamo_frontend_requests_total`: Total LLM requests (counter)
+- `dynamo_frontend_time_to_first_token_seconds`: Time to first token (histogram)
+### Required Files
+The following configuration files should be present in this directory:
+- [docker-compose.yml](./docker-compose.yml): Defines the Prometheus and Grafana services
+- [prometheus.yml](./prometheus.yml): Contains Prometheus scraping configuration
+- [grafana-datasources.yml](./grafana-datasources.yml): Contains Grafana datasource configuration
+- [grafana_dashboards/grafana-dashboard-providers.yml](./grafana_dashboards/grafana-dashboard-providers.yml): Contains Grafana dashboard provider configuration
+- [grafana_dashboards/grafana-dynamo-dashboard.json](./grafana_dashboards/grafana-dynamo-dashboard.json): A general Dynamo Dashboard for both SW and HW metrics.
+- [grafana_dashboards/grafana-dcgm-metrics.json](./grafana_dashboards/grafana-dcgm-metrics.json): Contains Grafana dashboard configuration for DCGM GPU metrics
+- [grafana_dashboards/grafana-llm-metrics.json](./grafana_dashboards/grafana-llm-metrics.json): This file, which is being phased out, contains the Grafana dashboard configuration for LLM-specific metrics. It requires an additional `metrics` component to operate concurrently. A new version is under development.
 ## Getting Started
+### Prerequisites
 1. Make sure Docker and Docker Compose are installed on your system
-2. Start Dynamo dependencies. Assume you're at the root dynamo path:
+### Quick Start
+1. Start Dynamo dependencies. Assume you're at the root dynamo path:
   ```bash
   # Start the basic services (etcd & natsd), along with Prometheus and Grafana
   docker compose -f deploy/docker-compose.yml --profile metrics up -d
-   # Minimum components for Dynamo: etcd/nats/dcgm-exporter
+   # Minimum components for Dynamo (will not have Prometheus and Grafana): etcd/nats/dcgm-exporter
   docker compose -f deploy/docker-compose.yml up -d
   ```
@@ -48,7 +99,7 @@ As of Q2 2025, Dynamo HTTP Frontend metrics are exposed when you build container
   export CUDA_VISIBLE_DEVICES=0,2
   ```
-3. Web servers started. The ones that end in /metrics are in Prometheus format:
+2. Web servers started. The ones that end in /metrics are in Prometheus format:
   - Grafana: `http://localhost:3001` (default login: dynamo/dynamo)
   - Prometheus Server: `http://localhost:9090`
   - NATS Server: `http://localhost:8222` (monitoring endpoints: /varz, /healthz, etc.)
@@ -56,16 +107,14 @@ As of Q2 2025, Dynamo HTTP Frontend metrics are exposed when you build container
   - etcd Server: `http://localhost:2379/metrics`
   - DCGM Exporter: `http://localhost:9401/metrics`
-4. Optionally, if you want to experiment further, look through components/metrics/README.md for more details on launching a metrics server (subscribes to nats), mock_worker (publishes to nats), and real workers.
   - Start the [components/metrics](../../components/metrics/README.md) application to begin monitoring for metric events from dynamo workers and aggregating them on a Prometheus metrics endpoint: `http://localhost:9091/metrics`.
   - Uncomment the appropriate lines in prometheus.yml to poll port 9091.
   - Start worker(s) that publishes KV Cache metrics: [lib/runtime/examples/service_metrics/README.md](../../lib/runtime/examples/service_metrics/README.md) can populate dummy KV Cache metrics.
+### Configuration
-## Configuration
+#### Prometheus
-### Prometheus
 The Prometheus configuration is specified in [prometheus.yml](./prometheus.yml). This file is set up to collect metrics from the metrics aggregation service endpoint.
@@ -77,29 +126,233 @@ After making changes to prometheus.yml, it is necessary to reload the configurat
 docker compose -f deploy/docker-compose.yml up prometheus -d --force-recreate
 ```
-### Grafana
+#### Grafana
 Grafana is pre-configured with:
 - Prometheus datasource
 - Sample dashboard for visualizing service metrics
 ![grafana image](./grafana-dynamo-composite.png)
-## Required Files
+### Troubleshooting
-The following configuration files should be present in this directory:
+1. Verify services are running:
- [docker-compose.yml](./docker-compose.yml): Defines the Prometheus and Grafana services
+  ```bash
- [prometheus.yml](./prometheus.yml): Contains Prometheus scraping configuration
+  docker compose ps
- [grafana-datasources.yml](./grafana-datasources.yml): Contains Grafana datasource configuration
+  ```
- [grafana_dashboards/grafana-dashboard-providers.yml](./grafana_dashboards/grafana-dashboard-providers.yml): Contains Grafana dashboard provider configuration
- [grafana_dashboards/grafana-dynamo-dashboard.json](./grafana_dashboards/grafana-dynamo-dashboard.json): A general Dynamo Dashboard for both SW and HW metrics.
- [grafana_dashboards/grafana-dcgm-metrics.json](./grafana_dashboards/grafana-dcgm-metrics.json): Contains Grafana dashboard configuration for DCGM GPU metrics
- [grafana_dashboards/grafana-llm-metrics.json](./grafana_dashboards/grafana-llm-metrics.json): This file, which is being phased out, contains the Grafana dashboard configuration for LLM-specific metrics. It requires an additional `metrics` component to operate concurrently. A new version is under development.
-## Running the deprecated `metrics` component
+2. Check logs:
+  ```bash
+  docker compose logs prometheus
+  docker compose logs grafana
+  ```
+3. For issues with the legacy metrics component (being phased out), see [components/metrics/README.md](../../components/metrics/README.md) for details on the exposed metrics and troubleshooting steps.
+## Developer Guide
+### Creating Metrics at Different Hierarchy Levels
+#### Runtime-Level Metrics
+```rust
+use dynamo_runtime::DistributedRuntime;
+let runtime = DistributedRuntime::new()?;
+let namespace = runtime.namespace("my_namespace")?;
+let component = namespace.component("my_component")?;
+let endpoint = component.endpoint("my_endpoint")?;
+// Create endpoint-level counters (this is a Prometheus Counter type)
+let total_requests = endpoint.create_counter(
+    "total_requests",
+    "Total requests across all namespaces",
+    &[]
+)?;
+let active_connections = endpoint.create_gauge(
+    "active_connections",
+    "Number of active client connections",
+    &[]
+)?;
+```
+#### Namespace-Level Metrics
+```rust
+let namespace = runtime.namespace("my_model")?;
+// Namespace-scoped metrics
+let model_requests = namespace.create_counter(
+    "model_requests",
+    "Requests for this specific model",
+    &[]
+)?;
+let model_latency = namespace.create_histogram(
+    "model_latency_seconds",
+    "Model inference latency",
+    &[],
+    &[0.001, 0.01, 0.1, 1.0, 10.0]
+)?;
+```
+#### Component-Level Metrics
+```rust
+let component = namespace.component("backend")?;
+// Component-specific metrics
+let backend_requests = component.create_counter(
+    "backend_requests",
+    "Requests handled by this backend component",
+    &[]
+)?;
+let gpu_memory_usage = component.create_gauge(
+    "gpu_memory_bytes",
+    "GPU memory usage in bytes",
+    &[]
+)?;
+```
+#### Endpoint-Level Metrics
+```rust
+let endpoint = component.endpoint("generate")?;
+// Endpoint-specific metrics
+let generate_requests = endpoint.create_counter(
+    "generate_requests",
+    "Generate endpoint requests",
+    &[]
+)?;
+let generate_latency = endpoint.create_histogram(
+    "generate_latency_seconds",
+    "Generate endpoint latency",
+    &[],
+    &[0.001, 0.01, 0.1, 1.0, 10.0]
+)?;
+```
+### Creating Vector Metrics with Dynamic Labels
+Use vector metrics when you need to track metrics with different label values:
+```rust
+// Counter with labels
+let requests_by_model = endpoint.create_counter_vec(
+    "requests_by_model",
+    "Requests by model type",
+    &["model_type", "model_size"]
+)?;
+// Increment with specific labels
+requests_by_model.with_label_values(&["llama", "7b"]).inc();
+requests_by_model.with_label_values(&["gpt", "13b"]).inc();
+// Gauge with labels
+let memory_by_gpu = component.create_gauge_vec(
+    "gpu_memory_bytes",
+    "GPU memory usage by device",
+    &["gpu_id", "memory_type"]
+)?;
+memory_by_gpu.with_label_values(&["0", "allocated"]).set(8192.0);
+memory_by_gpu.with_label_values(&["0", "cached"]).set(4096.0);
+```
+### Creating Histograms
+Histograms are useful for measuring distributions of values like latency:
+```rust
+let latency_histogram = endpoint.create_histogram(
+    "request_latency_seconds",
+    "Request latency distribution",
+    &[],
+    &[0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0, 5.0]
+)?;
+// Record latency values
+latency_histogram.observe(0.023); // 23ms
+latency_histogram.observe(0.156); // 156ms
+```
+### Transitioning from Plain Prometheus
+If you're currently using plain Prometheus metrics, transitioning to Dynamo's `MetricsRegistry` is straightforward:
+#### Before (Plain Prometheus)
+```rust
+use prometheus::{Counter, Opts, Registry};
+// Create a registry to hold metrics
+let registry = Registry::new();
+let counter_opts = Opts::new("my_counter", "My custom counter");
+let counter = Counter::with_opts(counter_opts).unwrap();
+registry.register(Box::new(counter.clone())).unwrap();
+// Use the counter
+counter.inc();
+// To expose metrics, you'd need to set up an HTTP server manually
+// and implement the /metrics endpoint yourself
+```
+#### After (Dynamo MetricsRegistry)
+```rust
+let counter = endpoint.create_counter(
+    "my_counter",
+    "My custom counter",
+    &[]
+)?;
+counter.inc();
+```
+**Note:** The metric is automatically registered when created via the endpoint's `create_counter` factory method.
+**Benefits of Dynamo's approach:**
+- **Automatic registration**: Metrics created via endpoint's `create_*` factory methods are automatically registered with the system
+- Automatic labeling with namespace, component, and endpoint information
+- Consistent metric naming with `dynamo_` prefix
+- Built-in HTTP metrics endpoint when enabled with `DYN_SYSTEM_ENABLED=true`
+- Hierarchical metric organization
+### Advanced Features
+#### Custom Buckets for Histograms
+```rust
+// Define custom buckets for your use case
+let custom_buckets = vec![0.001, 0.01, 0.1, 1.0, 10.0];
+let latency = endpoint.create_histogram(
+    "api_latency_seconds",
+    "API latency in seconds",
+    &[],
+    &custom_buckets
+)?;
+```
+#### Metric Aggregation
+```rust
+// Aggregate metrics across multiple endpoints
+let total_requests = namespace.create_counter(
+    "total_requests",
+    "Total requests across all endpoints",
+    &[]
+)?;
+```
+## Running the deprecated `components/metrics` program
 ⚠️ **DEPRECATION NOTICE** ⚠️
-When you run the example [components/metrics](../../components/metrics/README.md) component, it exposes a Prometheus /metrics endpoint with the following metrics (defined in [components/metrics/src/lib.rs](../../components/metrics/src/lib.rs)):
+When you run the example [components/metrics](../../components/metrics/README.md) program, it exposes a Prometheus /metrics endpoint with the following metrics (defined in [components/metrics/src/lib.rs](../../components/metrics/src/lib.rs)):
 **⚠️ The following `llm_kv_*` metrics are deprecated:**
@@ -123,3 +376,5 @@ When you run the example [components/metrics](../../components/metrics/README.md
  docker compose logs prometheus
  docker compose logs grafana
  ```
+3. For issues with the legacy metrics component (being phased out), see [components/metrics/README.md](../../components/metrics/README.md) for details on the exposed metrics and troubleshooting steps.
--- a/docs/guides/metrics.md
+++ b/docs/guides/metrics.md
+<!--
+SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+SPDX-License-Identifier: Apache-2.0
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# Dynamo MetricsRegistry
+## Overview
+Dynamo provides built-in metrics capabilities through the `MetricsRegistry` trait, which is automatically available whenever you use the `DistributedRuntime` framework. This guide explains how to use metrics for observability and monitoring across all Dynamo components.
+## Automatic Metrics
+Dynamo automatically exposes metrics with the `dynamo_` name prefixes. It also adds the following labels `dynamo_namespace`, `dynamo_component`, and `dynamo_endpoint` to indicate which component is providing the metric.
+**Frontend Metrics**: When using Dynamo HTTP Frontend (`--framework VLLM` or `--framework TENSORRTLLM`), these metrics are automatically exposed with the `dynamo_frontend_*` prefix and include `model` labels containing the model name. These cover request handling, token processing, and latency measurements. See the [Available Metrics section](../../deploy/metrics/README.md#available-metrics) for the complete list of frontend metrics.
+**Component Metrics**: The core Dynamo backend system automatically exposes metrics with the `dynamo_component_*` prefix for all components that use the `DistributedRuntime` framework. These include request counts, processing times, byte transfers, and system uptime metrics. See the [Available Metrics section](../../deploy/metrics/README.md#available-metrics) for the complete list of component metrics.
+**Specialized Component Metrics**: Components can also expose additional metrics specific to their functionality. For example, a `preprocessor` component exposes metrics with the `dynamo_preprocessor_*` prefix. See the [Available Metrics section](../../deploy/metrics/README.md#available-metrics) for details on specialized component metrics.
+## Coming Soon
+**Kubernetes Integration**: Comprehensive Kubernetes deployment and monitoring information will be available soon, including Helm charts, Kubernetes-native metrics collection, and cluster-wide observability solutions.
+## Metrics Hierarchy
+The `MetricsRegistry` trait is implemented by `DistributedRuntime`, `Namespace`, `Component`, and `Endpoint`, providing a hierarchical approach to metric collection that matches Dynamo's distributed architecture:
+- `DistributedRuntime`: Global metrics across the entire runtime
+- `Namespace`: Metrics scoped to a specific dynamo_namespace
+- `Component`: Metrics for a specific dynamo_component within a namespace
+- `Endpoint`: Metrics for individual dynamo_endpoint within a component
+This hierarchical structure allows you to create metrics at the appropriate level of granularity for your monitoring needs.
+## Getting Started
+For a complete setup guide including Docker Compose configuration, Prometheus setup, and Grafana dashboards, see the [Getting Started section](../../deploy/metrics/README.md#getting-started) in the deploy metrics documentation.
+The quick start includes:
+- Docker Compose setup for Prometheus and Grafana
+- Pre-configured dashboards and datasources
+- Access URLs for all monitoring endpoints
+- GPU targeting configuration
+## Implementation Examples
+See [Implementation Examples](../../deploy/metrics/README.md#implementation-examples) for detailed examples of creating metrics at different hierarchy levels and using dynamic labels.
+### Grafana Dashboards
+Use dashboards in `deploy/metrics/grafana_dashboards/`:
+- `grafana-dynamo-dashboard.json`: General Dynamo dashboard
+- `grafana-dcgm-metrics.json`: DCGM GPU metrics dashboard
+## Metrics Visualization Architecture
+### Service Topology
+The metrics system follows this architecture for collecting and visualizing metrics:
+```mermaid
+graph TD
+    BROWSER[Browser] -->|:3001| GRAFANA[Grafana :3001]
+    subgraph DockerComposeNetwork [Network inside Docker Compose]
+        NATS_PROM_EXP[nats-prom-exp :7777 /metrics] -->|:8222/varz| NATS_SERVER[nats-server :4222, :6222, :8222]
+        PROMETHEUS[Prometheus server :9090] -->|:2379/metrics| ETCD_SERVER[etcd-server :2379, :2380]
+        PROMETHEUS -->|:9401/metrics| DCGM_EXPORTER[dcgm-exporter :9401]
+        PROMETHEUS -->|:7777/metrics| NATS_PROM_EXP
+        PROMETHEUS -->|:8080/metrics| DYNAMOFE[Dynamo HTTP FE :8080]
+        PROMETHEUS -->|:8081/metrics| DYNAMOBACKEND[Dynamo backend :8081]
+        DYNAMOFE --> DYNAMOBACKEND
+        GRAFANA -->|:9090/query API| PROMETHEUS
+    end
+```
+### Grafana Dashboard
+The metrics system includes a pre-configured Grafana dashboard for visualizing service metrics:
+![Grafana Dynamo Dashboard](../../deploy/metrics/grafana-dynamo-composite.png)
+## Related Documentation
+- [Distributed Runtime Architecture](../architecture/distributed_runtime.md)
+- [Dynamo Architecture Overview](../architecture/architecture.md)
+- [Backend Guide](backend.md)
+- [Metrics Implementation Examples](../../deploy/metrics/README.md#implementation-examples)
+- [Complete Metrics Setup Guide](../../deploy/metrics/README.md)
\ No newline at end of file