README.md 10.4 KB
Newer Older
1
# Generic Metrics for Component Endpoints
2

3
This example demonstrates the automatic metrics provided to component endpoints by default.
4
5
6

## Overview

7
Component endpoints are measured automatically when using the DistributedRuntime code. The DistributedRuntime uses the `MetricsRegistry` trait which provides automatic measurement capabilities that are applied to all component endpoints automatically. It automatically tracks:
8

9
10
11
12
- **Request Count**: Total number of requests processed
- **Request Duration**: Time spent processing each request
- **Request/Response Bytes**: Total bytes received and sent
- **Error Count**: Total number of errors encountered
13

14
Additionally, the example demonstrates how to add custom metrics with data bytes tracking.
15
16
17

## How It Works

18
**Automatic Metrics**: All component endpoints automatically get measurement metrics without any code changes.
19
20
21
22
23
24

**Custom Metrics**: If you want to add custom metrics IN ADDITION to the automatic ones, you can use the `add_metrics` method:

```rust
use dynamo_runtime::pipeline::network::Ingress;

25
// Automatic measurements - no code changes needed!
26
27
28
29
let ingress = Ingress::for_engine(my_handler)?;

// Optional: Add custom metrics IN ADDITION to automatic ones
ingress.add_metrics(&endpoint)?;
30
31
```

32
The endpoint automatically provides proper labeling (dynamo_namespace, dynamo_component, dynamo_endpoint) for all metrics. These labels are prefixed with "dynamo_" to avoid collisions with Kubernetes and other monitoring system labels.
33
34
35
36
37

## Available Methods

The `Ingress` struct provides methods for metrics:

38
- **Automatic**: All component endpoints get measurement metrics automatically
39
40
41
42
43
- `Ingress::add_metrics(&endpoint)` - Add custom metrics IN ADDITION to automatic ones (optional)

## Metrics Generated

### Automatic Metrics (No Code Changes Required)
44
The following Prometheus metrics are automatically created for all component endpoints:
45
46

### Counters
47
48
49
50
- `dynamo_component_requests_total` - Total requests processed
- `dynamo_component_request_bytes_total` - Total bytes received in requests
- `dynamo_component_response_bytes_total` - Total bytes sent in responses
- `dynamo_component_errors_total` - Total errors encountered (with error_type labels)
51
52

### Error Types
53
The `dynamo_component_errors_total` metric includes the following error types:
54
55
56
57
58
59
60
61
- `deserialization` - Errors parsing request messages
- `invalid_message` - Unexpected message format
- `response_stream` - Errors creating response streams
- `generate` - Errors in request processing
- `publish_response` - Errors publishing response data
- `publish_final` - Errors publishing final response

### Histograms
62
- `dynamo_component_request_duration_seconds` - Request processing time
63
64

### Gauges
65
- `dynamo_component_inflight_requests` - Number of requests currently being processed
66
67

### Custom Metrics (Optional)
68
- `dynamo_component_bytes_processed_total` - Total data bytes processed by system handler (example)
69
70
71

### Labels
All metrics automatically include these labels from the endpoint:
72
73
74
75
76
- `dynamo_namespace` - The namespace name
- `dynamo_component` - The component name
- `dynamo_endpoint` - The endpoint name

These labels are prefixed with "dynamo_" to avoid collisions with Kubernetes and other monitoring system labels.
77
78
79

## Example Metrics Output

80
When the system is running, you'll see metrics from http://<ip>:<port>/metrics like this:
81
82

```prometheus
83
84
85
# HELP dynamo_component_inflight_requests Number of requests currently being processed by component endpoint
# TYPE dynamo_component_inflight_requests gauge
dynamo_component_inflight_requests{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace"} 0
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118

# HELP dynamo_component_bytes_processed_total Example of a custom metric. Total number of data bytes processed by system handler
# TYPE dynamo_component_bytes_processed_total counter
dynamo_component_bytes_processed_total{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace"} 42

# HELP dynamo_component_request_bytes_total Total number of bytes received in requests by component endpoint
# TYPE dynamo_component_request_bytes_total counter
dynamo_component_request_bytes_total{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace"} 1098

# HELP dynamo_component_request_duration_seconds Time spent processing requests by component endpoint
# TYPE dynamo_component_request_duration_seconds histogram
dynamo_component_request_duration_seconds_bucket{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace",le="0.005"} 3
dynamo_component_request_duration_seconds_bucket{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace",le="0.01"} 3
dynamo_component_request_duration_seconds_bucket{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace",le="0.025"} 3
dynamo_component_request_duration_seconds_bucket{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace",le="0.05"} 3
dynamo_component_request_duration_seconds_bucket{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace",le="0.1"} 3
dynamo_component_request_duration_seconds_bucket{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace",le="0.25"} 3
dynamo_component_request_duration_seconds_bucket{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace",le="0.5"} 3
dynamo_component_request_duration_seconds_bucket{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace",le="1"} 3
dynamo_component_request_duration_seconds_bucket{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace",le="2.5"} 3
dynamo_component_request_duration_seconds_bucket{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace",le="5"} 3
dynamo_component_request_duration_seconds_bucket{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace",le="10"} 3
dynamo_component_request_duration_seconds_bucket{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace",le="+Inf"} 3
dynamo_component_request_duration_seconds_sum{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace"} 0.00048793700000000003
dynamo_component_request_duration_seconds_count{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace"} 3

# HELP dynamo_component_requests_total Total number of requests processed by component endpoint
# TYPE dynamo_component_requests_total counter
dynamo_component_requests_total{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace"} 3

# HELP dynamo_component_response_bytes_total Total number of bytes sent in responses by component endpoint
# TYPE dynamo_component_response_bytes_total counter
dynamo_component_response_bytes_total{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace"} 1917
119
120
121

# HELP uptime_seconds Total uptime of the DistributedRuntime in seconds
# TYPE uptime_seconds gauge
122
uptime_seconds{dynamo_namespace="system_status_server"} 1.8226759879999999
123
124
```

125
## Example
126

127
### Component Endpoint with Automatic Measurements and Optional Custom Metrics
128

129
130
```rust
struct RequestHandler {
131
    metrics: Option<Arc<CustomMetrics>>,
132
133
134
135
136
137
138
}

#[async_trait]
impl AsyncEngine<SingleIn<String>, ManyOut<Annotated<String>>, Error> for RequestHandler {
    async fn generate(&self, input: SingleIn<String>) -> Result<ManyOut<Annotated<String>>> {
        let (data, ctx) = input.into_parts();

139
        // Optional: Track custom metrics
140
141
142
143
144
        if let Some(metrics) = &self.metrics {
            metrics.data_bytes_processed.inc_by(data.len() as u64);
        }

        // Your business logic here...
145
        // No need to add any automatic measurement code!
146
147
148
149
150

        Ok(ResponseStream::new(Box::pin(stream), ctx.context()))
    }
}

151
152
153
154
155
156
157
158
159
// Create handler (with or without custom metrics)
let handler = if enable_custom_metrics {
    let custom_metrics = CustomMetrics::from_endpoint(&endpoint)?;
    RequestHandler::with_metrics(custom_metrics)
} else {
    RequestHandler::new()
};

// Automatic measurements - no additional code needed!
160
161
let ingress = Ingress::for_engine(handler)?;

162
163
164
165
166
167
// Optional: Add custom metrics IN ADDITION to automatic ones
if enable_custom_metrics {
    ingress.add_metrics(&endpoint)?;
}

// Endpoint code to add ingress to the handler below...
168
169
```

170
171
## Benefits

172
173
174
175
1. **Little/No Code Changes**: Existing handlers automatically get measurement metrics, and easy to add custom metrics for your particular application.
2. **Simple API**: Simply swap out Prometheus constructors with one of the endpoint's factory methods.
3. **Automatic Measurements**: Request count, duration, and error tracking out of the box for component endpoints.
4. **Automatic Labeling**: Endpoint provides proper namespace/component/endpoint labels
176
177
178

## Running the Example

179
**Important**: You must set the `DYN_SYSTEM_PORT` environment variable to specify which port the system status server will listen on.
180
181
182
183

```bash
# Run the system metrics example
DYN_SYSTEM_ENABLED=true DYN_SYSTEM_PORT=8081 cargo run --bin system_server
184
```
185
The server will start an system status server on the specified port (8081 in this example) that exposes the Prometheus metrics endpoint at `/metrics`.
186
187


188
To Run an actual LLM frontend + server (aggregated example), launch both of them. By default, the frontend listens to port 8000.
189
```
190
python -m dynamo.frontend &
191

192
193
194
DYN_SYSTEM_ENABLED=true DYN_SYSTEM_PORT=8081 python -m dynamo.vllm --model Qwen/Qwen3-0.6B --enforce-eager --no-enable-prefix-caching &
```
Then make curl requests to the frontend (see the [main README](../../../../README.md))
195

196
## Querying Metrics
197

198
Once running, you can query the metrics:
199

200
```bash
201
202
# Get all component endpoint metrics for components
curl http://localhost:8081/metrics | grep -E "dynamo_component"
203

204
# Get all frontend metrics
205
curl http://localhost:8000/metrics | grep -E "dynamo_frontend"
206
```