README.md 3.16 KB
Newer Older
1
# Metrics
2
3
4

## Quickstart

5
6
To start the `metrics` component, simply point it at the `namespace/component/endpoint` trio that
you're interested in observing metrics from.
7

8
9
10
11
12
This will:
1. Scrape statistics from the services associated with that `endpoint`, do some postprocessing, and aggregate them.
2. Listen for `KvHitRateEvent`s on `namespace/kv-hit-rate`, and aggregate them.

For example:
13
```bash
14
# For more details, try DYN_LOG=debug
Neelay Shah's avatar
Neelay Shah committed
15
DYN_LOG=info cargo run --bin metrics -- --namespace dynamo --component backend --endpoint generate
16

Neelay Shah's avatar
Neelay Shah committed
17
18
# 2025-02-26T18:45:05.467026Z  INFO metrics: Creating unique instance of Metrics at dynamo/components/metrics/instance
# 2025-02-26T18:45:05.472146Z  INFO metrics: Scraping service dynamo_backend_720278f8 and filtering on subject dynamo_backend_720278f8.generate
19
20
21
# ...
```

22
With no matching endpoints running to collect stats from, you should see warnings in the logs:
23
```bash
Neelay Shah's avatar
Neelay Shah committed
24
2025-02-26T18:45:06.474161Z  WARN metrics: No endpoints found matching subject dynamo_backend_720278f8.generate
25
26
```

27
28
After a matching endpoint gets started, you should see the warnings stop
when the endpoint gets automatically discovered.
29

30
When stats are found from target endpoints, the metrics component will
31
aggregate them and publish them to a prometheus server running on `localhost:9091/metrics` by default:
32
```
Neelay Shah's avatar
Neelay Shah committed
33
2025-02-28T04:05:58.077901Z  INFO metrics: Aggregated metrics: ProcessedEndpoints { endpoints: [Endpoint { name: "worker-7587884888253033398", subject: "dynamo_backend_720278f8.generate-694d951a80e06bb6", data: ForwardPassMetrics { request_active_slots: 58, request_total_slots: 100, kv_active_blocks: 77, kv_total_blocks: 100 } }, Endpoint { name: "worker-7587884888253033401", subject: "dynamo_backend_720278f8.generate-694d951a80e06bb9", data: ForwardPassMetrics { request_active_slots: 71, request_total_slots: 100, kv_active_blocks: 29, kv_total_blocks: 100 } }], worker_ids: [7587884888253033398, 7587884888253033401], load_avg: 53.0, load_std: 24.0 }
34
35
```

36
37
38
39
40
41
42
43
44
45
46
47
48
To see the metrics being published in prometheus format, you can run:
```bash
curl localhost:9091/metrics

# # HELP llm_kv_blocks_active Active KV cache blocks
# # TYPE llm_kv_blocks_active gauge
# llm_kv_blocks_active{component="backend",endpoint="generate",worker_id="7587884888253033398"} 40
# llm_kv_blocks_active{component="backend",endpoint="generate",worker_id="7587884888253033401"} 2
# # HELP llm_kv_blocks_total Total KV cache blocks
# # TYPE llm_kv_blocks_total gauge
# llm_kv_blocks_total{component="backend",endpoint="generate",worker_id="7587884888253033398"} 100
# llm_kv_blocks_total{component="backend",endpoint="generate",worker_id="7587884888253033401"} 100
```
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65

## Mock Worker

For convenience and debugging, there is a mock worker that registers a mock `StatsHandler`
with the `endpoint` and publishes mock `KvHitRateEvent`s on `namespace/kv-hit-rate`.

```bash
# Can run multiple workers in separate shells to see aggregation as well.
DYN_LOG=info cargo run --bin mock_worker
```

**NOTE**: When using the mock worker, the data from the stats handler and the
events will be random and shouldn't be expected to correlate with each other.

## Real Worker

See the KV Routing example in `examples/python_rs/llm/vllm`.