README.md 10.4 KB
Newer Older
1
# Generic Metrics for Component Endpoints
2

3
This example demonstrates the automatic metrics provided to component endpoints by default.
4
5
6

## Overview

7
Component endpoints are measured automatically when using the DistributedRuntime code. The DistributedRuntime uses the `MetricsRegistry` trait which provides automatic measurement capabilities that are applied to all component endpoints automatically. It automatically tracks:
8

9
10
11
12
- **Request Count**: Total number of requests processed
- **Request Duration**: Time spent processing each request
- **Request/Response Bytes**: Total bytes received and sent
- **Error Count**: Total number of errors encountered
13

14
Additionally, the example demonstrates how to add custom metrics with data bytes tracking.
15
16
17

## How It Works

18
**Automatic Metrics**: All component endpoints automatically get measurement metrics without any code changes.
19
20
21
22
23
24

**Custom Metrics**: If you want to add custom metrics IN ADDITION to the automatic ones, you can use the `add_metrics` method:

```rust
use dynamo_runtime::pipeline::network::Ingress;

25
// Automatic measurements - no code changes needed!
26
27
28
29
let ingress = Ingress::for_engine(my_handler)?;

// Optional: Add custom metrics IN ADDITION to automatic ones
ingress.add_metrics(&endpoint)?;
30
31
```

32
The endpoint automatically provides proper labeling (dynamo_namespace, dynamo_component, dynamo_endpoint) for all metrics. These labels are prefixed with "dynamo_" to avoid collisions with Kubernetes and other monitoring system labels.
33
34
35
36
37

## Available Methods

The `Ingress` struct provides methods for metrics:

38
- **Automatic**: All component endpoints get measurement metrics automatically
39
40
41
42
43
- `Ingress::add_metrics(&endpoint)` - Add custom metrics IN ADDITION to automatic ones (optional)

## Metrics Generated

### Automatic Metrics (No Code Changes Required)
44
The following Prometheus metrics are automatically created for all component endpoints:
45
46

### Counters
47
48
49
50
- `dynamo_component_requests_total` - Total requests processed
- `dynamo_component_request_bytes_total` - Total bytes received in requests
- `dynamo_component_response_bytes_total` - Total bytes sent in responses
- `dynamo_component_errors_total` - Total errors encountered (with error_type labels)
51
52

### Error Types
53
The `dynamo_component_errors_total` metric includes the following error types:
54
55
56
57
58
59
60
61
- `deserialization` - Errors parsing request messages
- `invalid_message` - Unexpected message format
- `response_stream` - Errors creating response streams
- `generate` - Errors in request processing
- `publish_response` - Errors publishing response data
- `publish_final` - Errors publishing final response

### Histograms
62
- `dynamo_component_request_duration_seconds` - Request processing time
63
64

### Gauges
65
- `dynamo_component_concurrent_requests` - Number of requests currently being processed
66
67

### Custom Metrics (Optional)
68
- `dynamo_component_bytes_processed_total` - Total data bytes processed by system handler (example)
69
70
71

### Labels
All metrics automatically include these labels from the endpoint:
72
73
74
75
76
- `dynamo_namespace` - The namespace name
- `dynamo_component` - The component name
- `dynamo_endpoint` - The endpoint name

These labels are prefixed with "dynamo_" to avoid collisions with Kubernetes and other monitoring system labels.
77
78
79
80
81
82

## Example Metrics Output

When the system is running, you'll see metrics from the /metrics HTTP path like this:

```prometheus
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
# HELP dynamo_component_concurrent_requests Number of requests currently being processed by component endpoint
# TYPE dynamo_component_concurrent_requests gauge
dynamo_component_concurrent_requests{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace"} 0

# HELP dynamo_component_bytes_processed_total Example of a custom metric. Total number of data bytes processed by system handler
# TYPE dynamo_component_bytes_processed_total counter
dynamo_component_bytes_processed_total{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace"} 42

# HELP dynamo_component_request_bytes_total Total number of bytes received in requests by component endpoint
# TYPE dynamo_component_request_bytes_total counter
dynamo_component_request_bytes_total{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace"} 1098

# HELP dynamo_component_request_duration_seconds Time spent processing requests by component endpoint
# TYPE dynamo_component_request_duration_seconds histogram
dynamo_component_request_duration_seconds_bucket{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace",le="0.005"} 3
dynamo_component_request_duration_seconds_bucket{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace",le="0.01"} 3
dynamo_component_request_duration_seconds_bucket{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace",le="0.025"} 3
dynamo_component_request_duration_seconds_bucket{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace",le="0.05"} 3
dynamo_component_request_duration_seconds_bucket{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace",le="0.1"} 3
dynamo_component_request_duration_seconds_bucket{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace",le="0.25"} 3
dynamo_component_request_duration_seconds_bucket{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace",le="0.5"} 3
dynamo_component_request_duration_seconds_bucket{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace",le="1"} 3
dynamo_component_request_duration_seconds_bucket{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace",le="2.5"} 3
dynamo_component_request_duration_seconds_bucket{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace",le="5"} 3
dynamo_component_request_duration_seconds_bucket{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace",le="10"} 3
dynamo_component_request_duration_seconds_bucket{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace",le="+Inf"} 3
dynamo_component_request_duration_seconds_sum{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace"} 0.00048793700000000003
dynamo_component_request_duration_seconds_count{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace"} 3

# HELP dynamo_component_requests_total Total number of requests processed by component endpoint
# TYPE dynamo_component_requests_total counter
dynamo_component_requests_total{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace"} 3

# HELP dynamo_component_response_bytes_total Total number of bytes sent in responses by component endpoint
# TYPE dynamo_component_response_bytes_total counter
dynamo_component_response_bytes_total{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace"} 1917
119
120
121

# HELP uptime_seconds Total uptime of the DistributedRuntime in seconds
# TYPE uptime_seconds gauge
122
uptime_seconds{dynamo_namespace="metrics_server"} 1.8226759879999999
123
124
```

125
## Example
126

127
### Component Endpoint with Automatic Measurements and Optional Custom Metrics
128

129
130
```rust
struct RequestHandler {
131
    metrics: Option<Arc<CustomMetrics>>,
132
133
134
135
136
137
138
}

#[async_trait]
impl AsyncEngine<SingleIn<String>, ManyOut<Annotated<String>>, Error> for RequestHandler {
    async fn generate(&self, input: SingleIn<String>) -> Result<ManyOut<Annotated<String>>> {
        let (data, ctx) = input.into_parts();

139
        // Optional: Track custom metrics
140
141
142
143
144
        if let Some(metrics) = &self.metrics {
            metrics.data_bytes_processed.inc_by(data.len() as u64);
        }

        // Your business logic here...
145
        // No need to add any automatic measurement code!
146
147
148
149
150

        Ok(ResponseStream::new(Box::pin(stream), ctx.context()))
    }
}

151
152
153
154
155
156
157
158
159
// Create handler (with or without custom metrics)
let handler = if enable_custom_metrics {
    let custom_metrics = CustomMetrics::from_endpoint(&endpoint)?;
    RequestHandler::with_metrics(custom_metrics)
} else {
    RequestHandler::new()
};

// Automatic measurements - no additional code needed!
160
161
let ingress = Ingress::for_engine(handler)?;

162
163
164
165
166
167
// Optional: Add custom metrics IN ADDITION to automatic ones
if enable_custom_metrics {
    ingress.add_metrics(&endpoint)?;
}

// Endpoint code to add ingress to the handler below...
168
169
```

170
171
## Benefits

172
173
174
175
1. **Little/No Code Changes**: Existing handlers automatically get measurement metrics, and easy to add custom metrics for your particular application.
2. **Simple API**: Simply swap out Prometheus constructors with one of the endpoint's factory methods.
3. **Automatic Measurements**: Request count, duration, and error tracking out of the box for component endpoints.
4. **Automatic Labeling**: Endpoint provides proper namespace/component/endpoint labels
176
177
178

## Running the Example

179
**Important**: You must set the `DYN_SYSTEM_PORT` environment variable to specify which port the HTTP system metrics server will run on.
180
181
182
183

```bash
# Run the system metrics example
DYN_SYSTEM_ENABLED=true DYN_SYSTEM_PORT=8081 cargo run --bin system_server
184
```
185
The server will start an HTTP system metrics server on the specified port (8081 in this example) that exposes the Prometheus metrics endpoint at `/metrics`.
186
187
188


To Run an actual LLM frontend + server (aggregated example), launch both of them. By default, the frontend listens to port 8080.
189
```
190
python -m dynamo.frontend &
191

192
193
194
DYN_SYSTEM_ENABLED=true DYN_SYSTEM_PORT=8081 python -m dynamo.vllm --model Qwen/Qwen3-0.6B --enforce-eager --no-enable-prefix-caching &
```
Then make curl requests to the frontend (see the [main README](../../../../README.md))
195

196
## Querying Metrics
197

198
Once running, you can query the metrics:
199

200
```bash
201
202
# Get all component endpoint metrics for components
curl http://localhost:8081/metrics | grep -E "dynamo_component"
203

204
205
# Get all frontend metrics
curl http://localhost:8080/metrics | grep -E "dynamo_frontend"
206
```