README.md 2.77 KB
Newer Older
1
# Metrics
2
3
4

## Quickstart

5
To start `metrics`, simply point it at the namespace/component/endpoint trio that
6
7
8
9
10
you're interested in observing metrics from. This will scrape statistics from
the services associated with that endpoint, do some postprocessing on them,
and then publish an event with the postprocessed data.

```bash
11
# For more details, try DYN_LOG=debug
12
DYN_LOG=info cargo run --bin metrics -- --namespace dynemo --component backend --endpoint generate
13

14
15
# 2025-02-26T18:45:05.467026Z  INFO metrics: Creating unique instance of Metrics at dynemo/components/metrics/instance
# 2025-02-26T18:45:05.472146Z  INFO metrics: Scraping service dynemo_backend_720278f8 and filtering on subject dynemo_backend_720278f8.generate
16
17
18
19
20
# ...
```

With no matching endpoints running, you should see warnings in the logs:
```bash
21
2025-02-26T18:45:06.474161Z  WARN metrics: No endpoints found matching subject dynemo_backend_720278f8.generate
22
23
```

24
25
26
27
28
29
30
31
32
33
To see metrics published to a matching endpoint, you can use the
[mock_worker example](src/bin/mock_worker.rs) in this directory to launch
1 or more workers that publish LLM Metrics:
```bash
# Can run multiple workers in separate shells
cargo run --bin mock_worker
```

After a matching endpoint gets started, you should see the warnings go away
since the endpoint will automatically get discovered.
34

35
36
When stats are found from target endpoints, the metrics component will
aggregate and publish metrics as both events and as updates to a prometheus server:
37
```
38
2025-02-28T04:05:58.077901Z  INFO metrics: Aggregated metrics: ProcessedEndpoints { endpoints: [Endpoint { name: "worker-7587884888253033398", subject: "dynemo_backend_720278f8.generate-694d951a80e06bb6", data: ForwardPassMetrics { request_active_slots: 58, request_total_slots: 100, kv_active_blocks: 77, kv_total_blocks: 100 } }, Endpoint { name: "worker-7587884888253033401", subject: "dynemo_backend_720278f8.generate-694d951a80e06bb9", data: ForwardPassMetrics { request_active_slots: 71, request_total_slots: 100, kv_active_blocks: 29, kv_total_blocks: 100 } }], worker_ids: [7587884888253033398, 7587884888253033401], load_avg: 53.0, load_std: 24.0 }
39
40
```

41
42
43
44
45
46
47
48
49
50
51
52
53
To see the metrics being published in prometheus format, you can run:
```bash
curl localhost:9091/metrics

# # HELP llm_kv_blocks_active Active KV cache blocks
# # TYPE llm_kv_blocks_active gauge
# llm_kv_blocks_active{component="backend",endpoint="generate",worker_id="7587884888253033398"} 40
# llm_kv_blocks_active{component="backend",endpoint="generate",worker_id="7587884888253033401"} 2
# # HELP llm_kv_blocks_total Total KV cache blocks
# # TYPE llm_kv_blocks_total gauge
# llm_kv_blocks_total{component="backend",endpoint="generate",worker_id="7587884888253033398"} 100
# llm_kv_blocks_total{component="backend",endpoint="generate",worker_id="7587884888253033401"} 100
```