Unverified Commit 8c75ed79 authored by Keiven C's avatar Keiven C Committed by GitHub
Browse files

fix: frontend metrics to be renamed from nv_llm_http_service_* => dynamo_frontend_* (#2176)


Co-authored-by: default avatarKeiven Chang <keivenchang@users.noreply.github.com>
parent 66231cf0
...@@ -38,4 +38,4 @@ tracing = { workspace = true } ...@@ -38,4 +38,4 @@ tracing = { workspace = true }
# TODO: Update axum to 0.8 # TODO: Update axum to 0.8
axum = { version = "0.6" } axum = { version = "0.6" }
clap = { version = "4.5", features = ["derive", "env"] } clap = { version = "4.5", features = ["derive", "env"] }
reqwest = { version = "0.12", default-features = false, features = ["json", "rustls-tls"] } reqwest = { version = "0.12.22", default-features = false, features = ["json", "rustls-tls"] }
# Metrics # Metrics
The `metrics` component is a utility that can collect, aggregate, and publish ⚠️ **DEPRECATION NOTICE** ⚠️
metrics from a Dynamo deployment. After collecting and aggregating metrics from
workers, it exposes them via an HTTP `/metrics` endpoint in Prometheus format **This `metrics` component is unmaintained and being deprecated.**
that other applications or visualization tools like Prometheus server and Grafana can
pull from. The deprecated `metrics` component is being replaced by the **`MetricsRegistry`** built-in functionality that is now available directly in the `DistributedRuntime` framework. The `MetricsRegistry` provides:
**Note**: This is a demo implementation. The metrics component is currently under active development and this documentation will change as the implementation evolves. **For new projects and existing deployments, please migrate to using `MetricsRegistry` instead of this component.**
- In this demo the metrics names use the prefix "llm", but in production they will be prefixed with "nv_llm" (e.g., the HTTP `/metrics` endpoint will serve metrics with "nv_llm" prefixes)
This component may be migrated to the MetricsRegistry in the future.
**📖 See the [Dynamo MetricsRegistry Guide](../../docs/guides/metrics.md) for detailed information on using the new metrics system.**
---
The deprecated `metrics` component is a utility for collecting, aggregating, and publishing metrics from a Dynamo deployment, but it is unmaintained and being deprecated in favor of `MetricsRegistry`.
**Note**: This is a demo implementation. The deprecated `metrics` component is no longer under active development.
- In this demo the metrics names use the prefix "llm", but in production they will be prefixed with "dynamo" (e.g., the HTTP `/metrics` endpoint will serve metrics with "dynamo" prefixes)
- This demo will only work when using examples/llm/configs/agg.yml-- other configurations will not work - This demo will only work when using examples/llm/configs/agg.yml-- other configurations will not work
<div align="center"> <div align="center">
...@@ -16,7 +26,7 @@ pull from. ...@@ -16,7 +26,7 @@ pull from.
## Quickstart ## Quickstart
To start the `metrics` component, simply point it at the `namespace/component/endpoint` To start the deprecated `metrics` component, simply point it at the `namespace/component/endpoint`
trio for the Dynamo workers that you're interested in monitoring metrics on. trio for the Dynamo workers that you're interested in monitoring metrics on.
This will: This will:
...@@ -45,14 +55,14 @@ will get automatically discovered and the warnings will stop. ...@@ -45,14 +55,14 @@ will get automatically discovered and the warnings will stop.
## Workers ## Workers
The `metrics` component needs running workers to gather metrics from, The deprecated `metrics` component needs running workers to gather metrics from,
so below are some examples of workers and how they can be monitored. so below are some examples of workers and how they can be monitored.
### Mock Worker ### Mock Worker
To try out how `metrics` works, there is a demo Rust-based To try out how the deprecated `metrics` component works, there is a demo Rust-based
[mock worker](src/bin/mock_worker.rs) that provides sample data through two mechanisms: [mock worker](src/bin/mock_worker.rs) that provides sample data through two mechanisms:
1. Exposes a stats handler at `dynamo/MyComponent/my_endpoint` that responds to polling requests (from `metrics`) with randomly generated `ForwardPassMetrics` data 1. Exposes a stats handler at `dynamo/MyComponent/my_endpoint` that responds to polling requests (from the deprecated `metrics` component) with randomly generated `ForwardPassMetrics` data
2. Publishes mock `KVHitRateEvent` data every second to demonstrate event-based metrics 2. Publishes mock `KVHitRateEvent` data every second to demonstrate event-based metrics
Step 1: Launch a mock workers via the following command (if already built): Step 1: Launch a mock workers via the following command (if already built):
...@@ -99,11 +109,11 @@ docker compose -f deploy/docker-compose.yml --profile metrics up -d ...@@ -99,11 +109,11 @@ docker compose -f deploy/docker-compose.yml --profile metrics up -d
## Metrics Collection Modes ## Metrics Collection Modes
The metrics component supports two modes for exposing metrics in a Prometheus format: The deprecated `metrics` component supports two modes for exposing metrics in a Prometheus format:
### Pull Mode (Default) ### Pull Mode (Default)
When running in pull mode (the default), the metrics component will expose a When running in pull mode (the default), the deprecated `metrics` component will expose a
Prometheus metrics endpoint on the specified host and port that a Prometheus metrics endpoint on the specified host and port that a
Prometheus server or curl client can pull from: Prometheus server or curl client can pull from:
...@@ -136,7 +146,7 @@ curl localhost:9091/metrics ...@@ -136,7 +146,7 @@ curl localhost:9091/metrics
### Push Mode ### Push Mode
For ephemeral or batch jobs, or when metrics need to be pushed through a firewall, For ephemeral or batch jobs, or when metrics need to be pushed through a firewall,
you can use Push mode. In this mode, the metrics component will periodically push you can use Push mode. In this mode, the deprecated `metrics` component will periodically push
metrics to an externally hosted metrics to an externally hosted
[Prometheus PushGateway](https://prometheus.io/docs/instrumenting/pushing/): [Prometheus PushGateway](https://prometheus.io/docs/instrumenting/pushing/):
...@@ -145,7 +155,7 @@ Start a prometheus push gateway service via docker: ...@@ -145,7 +155,7 @@ Start a prometheus push gateway service via docker:
docker run --rm -d -p 9091:9091 --name pushgateway prom/pushgateway docker run --rm -d -p 9091:9091 --name pushgateway prom/pushgateway
``` ```
Start the metrics component in `--push` mode, specifying the host and port of your PushGateway: Start the deprecated `metrics` component in `--push` mode, specifying the host and port of your PushGateway:
```bash ```bash
# Push metrics to a Prometheus PushGateway every --push-interval seconds # Push metrics to a Prometheus PushGateway every --push-interval seconds
metrics \ metrics \
...@@ -173,7 +183,7 @@ curl 127.0.0.1:9091/metrics ...@@ -173,7 +183,7 @@ curl 127.0.0.1:9091/metrics
``` ```
## Building/Running from Source ## Building/Running from Source
For easy iteration while making edits to the metrics component, you can use `cargo run` For easy iteration while making edits to the deprecated `metrics` component, you can use `cargo run`
to build and run with your local changes: to build and run with your local changes:
```bash ```bash
......
...@@ -35,7 +35,7 @@ class PrometheusAPIClient: ...@@ -35,7 +35,7 @@ class PrometheusAPIClient:
increase(metric_sum[interval])/increase(metric_count[interval]) increase(metric_sum[interval])/increase(metric_count[interval])
Args: Args:
metric_name: Base metric name (e.g., 'nv_llm_http_service_inter_token_latency_seconds') metric_name: Base metric name (e.g., 'inter_token_latency_seconds')
interval: Time interval for the query (e.g., '60s') interval: Time interval for the query (e.g., '60s')
operation_name: Human-readable name for error logging operation_name: Human-readable name for error logging
...@@ -43,7 +43,8 @@ class PrometheusAPIClient: ...@@ -43,7 +43,8 @@ class PrometheusAPIClient:
Average metric value or 0 if no data/error Average metric value or 0 if no data/error
""" """
try: try:
query = f"increase({metric_name}_sum[{interval}])/increase({metric_name}_count[{interval}])" full_metric_name = f"dynamo_frontend_{metric_name}"
query = f"increase({full_metric_name}_sum[{interval}])/increase({full_metric_name}_count[{interval}])"
result = self.prom.custom_query(query=query) result = self.prom.custom_query(query=query)
if not result: if not result:
# No data available yet (no requests made) - return 0 silently # No data available yet (no requests made) - return 0 silently
...@@ -55,21 +56,21 @@ class PrometheusAPIClient: ...@@ -55,21 +56,21 @@ class PrometheusAPIClient:
def get_avg_inter_token_latency(self, interval: str): def get_avg_inter_token_latency(self, interval: str):
return self._get_average_metric( return self._get_average_metric(
"nv_llm_http_service_inter_token_latency_seconds", "inter_token_latency_seconds",
interval, interval,
"avg inter token latency", "avg inter token latency",
) )
def get_avg_time_to_first_token(self, interval: str): def get_avg_time_to_first_token(self, interval: str):
return self._get_average_metric( return self._get_average_metric(
"nv_llm_http_service_time_to_first_token_seconds", "time_to_first_token_seconds",
interval, interval,
"avg time to first token", "avg time to first token",
) )
def get_avg_request_duration(self, interval: str): def get_avg_request_duration(self, interval: str):
return self._get_average_metric( return self._get_average_metric(
"nv_llm_http_service_request_duration_seconds", "request_duration_seconds",
interval, interval,
"avg request duration", "avg request duration",
) )
...@@ -78,7 +79,7 @@ class PrometheusAPIClient: ...@@ -78,7 +79,7 @@ class PrometheusAPIClient:
# This function follows a different query pattern than the other metrics # This function follows a different query pattern than the other metrics
try: try:
raw_res = self.prom.custom_query( raw_res = self.prom.custom_query(
query=f"increase(nv_llm_http_service_requests_total[{interval}])" query=f"increase(dynamo_frontend_requests_total[{interval}])"
) )
total_count = 0.0 total_count = 0.0
for res in raw_res: for res in raw_res:
...@@ -91,14 +92,14 @@ class PrometheusAPIClient: ...@@ -91,14 +92,14 @@ class PrometheusAPIClient:
def get_avg_input_sequence_tokens(self, interval: str): def get_avg_input_sequence_tokens(self, interval: str):
return self._get_average_metric( return self._get_average_metric(
"nv_llm_http_service_input_sequence_tokens", "input_sequence_tokens",
interval, interval,
"avg input sequence tokens", "avg input sequence tokens",
) )
def get_avg_output_sequence_tokens(self, interval: str): def get_avg_output_sequence_tokens(self, interval: str):
return self._get_average_metric( return self._get_average_metric(
"nv_llm_http_service_output_sequence_tokens", "output_sequence_tokens",
interval, interval,
"avg output sequence tokens", "avg output sequence tokens",
) )
...@@ -60,7 +60,7 @@ As of Q2 2025, Dynamo HTTP Frontend metrics are exposed when you build container ...@@ -60,7 +60,7 @@ As of Q2 2025, Dynamo HTTP Frontend metrics are exposed when you build container
- Start the [components/metrics](../../components/metrics/README.md) application to begin monitoring for metric events from dynamo workers and aggregating them on a Prometheus metrics endpoint: `http://localhost:9091/metrics`. - Start the [components/metrics](../../components/metrics/README.md) application to begin monitoring for metric events from dynamo workers and aggregating them on a Prometheus metrics endpoint: `http://localhost:9091/metrics`.
- Uncomment the appropriate lines in prometheus.yml to poll port 9091. - Uncomment the appropriate lines in prometheus.yml to poll port 9091.
- Start worker(s) that publishes KV Cache metrics: [examples/rust/service_metrics/bin/server](../../lib/runtime/examples/service_metrics/README.md)` can populate dummy KV Cache metrics. - Start worker(s) that publishes KV Cache metrics: [lib/runtime/examples/service_metrics/README.md](../../lib/runtime/examples/service_metrics/README.md) can populate dummy KV Cache metrics.
## Configuration ## Configuration
...@@ -95,16 +95,19 @@ The following configuration files should be present in this directory: ...@@ -95,16 +95,19 @@ The following configuration files should be present in this directory:
- [grafana_dashboards/grafana-dcgm-metrics.json](./grafana_dashboards/grafana-dcgm-metrics.json): Contains Grafana dashboard configuration for DCGM GPU metrics - [grafana_dashboards/grafana-dcgm-metrics.json](./grafana_dashboards/grafana-dcgm-metrics.json): Contains Grafana dashboard configuration for DCGM GPU metrics
- [grafana_dashboards/grafana-llm-metrics.json](./grafana_dashboards/grafana-llm-metrics.json): This file, which is being phased out, contains the Grafana dashboard configuration for LLM-specific metrics. It requires an additional `metrics` component to operate concurrently. A new version is under development. - [grafana_dashboards/grafana-llm-metrics.json](./grafana_dashboards/grafana-llm-metrics.json): This file, which is being phased out, contains the Grafana dashboard configuration for LLM-specific metrics. It requires an additional `metrics` component to operate concurrently. A new version is under development.
## Running the example `metrics` component ## Running the deprecated `metrics` component
IMPORTANT: This section is being phased out, and some metrics may not function as expected. A new solution is under development. ⚠️ **DEPRECATION NOTICE** ⚠️
When you run the example [components/metrics](../../components/metrics/README.md) component, it exposes a Prometheus /metrics endpoint with the followings (defined in [../../components/metrics/src/lib.rs](../../components/metrics/src/lib.rs)): When you run the example [components/metrics](../../components/metrics/README.md) component, it exposes a Prometheus /metrics endpoint with the following metrics (defined in [components/metrics/src/lib.rs](../../components/metrics/src/lib.rs)):
- `llm_requests_active_slots`: Number of currently active request slots per worker
**⚠️ The following `llm_kv_*` metrics are deprecated:**
- `llm_requests_active_slots`: Active request slots per worker
- `llm_requests_total_slots`: Total available request slots per worker - `llm_requests_total_slots`: Total available request slots per worker
- `llm_kv_blocks_active`: Number of active KV blocks per worker - `llm_kv_blocks_active`: Active KV blocks per worker
- `llm_kv_blocks_total`: Total KV blocks available per worker - `llm_kv_blocks_total`: Total KV blocks available per worker
- `llm_kv_hit_rate_percent`: Cumulative KV Cache hit percent per worker - `llm_kv_hit_rate_percent`: KV Cache hit percent per worker
- `llm_load_avg`: Average load across workers - `llm_load_avg`: Average load across workers
- `llm_load_std`: Load standard deviation across workers - `llm_load_std`: Load standard deviation across workers
......
...@@ -27,7 +27,7 @@ ...@@ -27,7 +27,7 @@
"type": "prometheus", "type": "prometheus",
"uid": "P1809F7CD0C75ACF3" "uid": "P1809F7CD0C75ACF3"
}, },
"description": "nv_llm_http_service_requests_total (1m)", "description": "dynamo_frontend_requests_total (1m)",
"fieldConfig": { "fieldConfig": {
"defaults": { "defaults": {
"color": { "color": {
...@@ -106,7 +106,7 @@ ...@@ -106,7 +106,7 @@
"targets": [ "targets": [
{ {
"editorMode": "code", "editorMode": "code",
"expr": "rate(nv_llm_http_service_requests_total[30s])", "expr": "rate(dynamo_frontend_requests_total[30s])",
"legendFormat": "{{request_type}}, {{status}},", "legendFormat": "{{request_type}}, {{status}},",
"range": true, "range": true,
"refId": "A" "refId": "A"
...@@ -120,7 +120,7 @@ ...@@ -120,7 +120,7 @@
"type": "prometheus", "type": "prometheus",
"uid": "P1809F7CD0C75ACF3" "uid": "P1809F7CD0C75ACF3"
}, },
"description": "nv_llm_http_service_time_to_first_token_seconds (sum/count)", "description": "dynamo_frontend_time_to_first_token_seconds (sum/count)",
"fieldConfig": { "fieldConfig": {
"defaults": { "defaults": {
"color": { "color": {
...@@ -199,7 +199,7 @@ ...@@ -199,7 +199,7 @@
"targets": [ "targets": [
{ {
"editorMode": "code", "editorMode": "code",
"expr": "1000*(nv_llm_http_service_time_to_first_token_seconds_sum/nv_llm_http_service_time_to_first_token_seconds_count)", "expr": "1000*(dynamo_frontend_time_to_first_token_seconds_sum/dynamo_frontend_time_to_first_token_seconds_count)",
"legendFormat": "{{model}}", "legendFormat": "{{model}}",
"range": true, "range": true,
"refId": "A" "refId": "A"
...@@ -213,7 +213,7 @@ ...@@ -213,7 +213,7 @@
"type": "prometheus", "type": "prometheus",
"uid": "P1809F7CD0C75ACF3" "uid": "P1809F7CD0C75ACF3"
}, },
"description": "nv_llm_http_service_inter_token_latency_seconds (sum/count)", "description": "dynamo_frontend_inter_token_latency_seconds (sum/count)",
"fieldConfig": { "fieldConfig": {
"defaults": { "defaults": {
"color": { "color": {
...@@ -292,7 +292,7 @@ ...@@ -292,7 +292,7 @@
"targets": [ "targets": [
{ {
"editorMode": "code", "editorMode": "code",
"expr": "1000*(nv_llm_http_service_inter_token_latency_seconds_sum/nv_llm_http_service_inter_token_latency_seconds_count)", "expr": "1000*(dynamo_frontend_inter_token_latency_seconds_sum/dynamo_frontend_inter_token_latency_seconds_count)",
"legendFormat": "{{model}}", "legendFormat": "{{model}}",
"range": true, "range": true,
"refId": "A" "refId": "A"
...@@ -306,7 +306,7 @@ ...@@ -306,7 +306,7 @@
"type": "prometheus", "type": "prometheus",
"uid": "P1809F7CD0C75ACF3" "uid": "P1809F7CD0C75ACF3"
}, },
"description": "nv_llm_http_service_request_duration (sum/count)", "description": "dynamo_frontend_request_duration (sum/count)",
"fieldConfig": { "fieldConfig": {
"defaults": { "defaults": {
"color": { "color": {
...@@ -385,7 +385,7 @@ ...@@ -385,7 +385,7 @@
"targets": [ "targets": [
{ {
"editorMode": "code", "editorMode": "code",
"expr": "1000*(nv_llm_http_service_request_duration_seconds_sum / nv_llm_http_service_request_duration_seconds_count)", "expr": "1000*(dynamo_frontend_request_duration_seconds_sum / dynamo_frontend_request_duration_seconds_count)",
"legendFormat": "{{model}}", "legendFormat": "{{model}}",
"range": true, "range": true,
"refId": "A" "refId": "A"
...@@ -399,7 +399,7 @@ ...@@ -399,7 +399,7 @@
"type": "prometheus", "type": "prometheus",
"uid": "P1809F7CD0C75ACF3" "uid": "P1809F7CD0C75ACF3"
}, },
"description": "The length is the number of tokens. nv_llm_http_service_input_sequence_tokens", "description": "The length is the number of tokens. dynamo_frontend_input_sequence_tokens",
"fieldConfig": { "fieldConfig": {
"defaults": { "defaults": {
"color": { "color": {
...@@ -478,7 +478,7 @@ ...@@ -478,7 +478,7 @@
"targets": [ "targets": [
{ {
"editorMode": "code", "editorMode": "code",
"expr": "nv_llm_http_service_input_sequence_tokens_sum / nv_llm_http_service_input_sequence_tokens_count", "expr": "dynamo_frontend_input_sequence_tokens_sum / dynamo_frontend_input_sequence_tokens_count",
"legendFormat": "ISL", "legendFormat": "ISL",
"range": true, "range": true,
"refId": "A" "refId": "A"
...@@ -489,7 +489,7 @@ ...@@ -489,7 +489,7 @@
"uid": "P1809F7CD0C75ACF3" "uid": "P1809F7CD0C75ACF3"
}, },
"editorMode": "code", "editorMode": "code",
"expr": "nv_llm_http_service_output_sequence_tokens_sum / nv_llm_http_service_output_sequence_tokens_count", "expr": "dynamo_frontend_output_sequence_tokens_sum / dynamo_frontend_output_sequence_tokens_count",
"hide": false, "hide": false,
"instant": false, "instant": false,
"legendFormat": "OSL", "legendFormat": "OSL",
......
...@@ -26,7 +26,13 @@ ...@@ -26,7 +26,13 @@
"distributed under the License is distributed on an \"AS IS\" BASIS,", "distributed under the License is distributed on an \"AS IS\" BASIS,",
"WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.", "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.",
"See the License for the specific language governing permissions and", "See the License for the specific language governing permissions and",
"limitations under the License." "limitations under the License.",
"",
"DEPRECATION NOTICE:",
"This dashboard uses deprecated llm_kv_* metrics (llm_kv_blocks_active, llm_kv_blocks_total, llm_kv_hit_rate_percent)",
"that are part of the deprecated metrics aggregation service. These metrics will be removed in a future release.",
"Please migrate to the new MetricsRegistry system which provides dynamo_* metrics instead.",
"See docs/guides/metrics.md for migration guidance."
], ],
"editable": true, "editable": true,
"fiscalYearStartMonth": 0, "fiscalYearStartMonth": 0,
......
...@@ -47,6 +47,8 @@ scrape_configs: ...@@ -47,6 +47,8 @@ scrape_configs:
static_configs: static_configs:
- targets: ['host.docker.internal:8081'] - targets: ['host.docker.internal:8081']
# DEPRECATED: This metrics aggregation service is being deprecated in favor of MetricsRegistry
# The new system uses the 'dynamo-backend' job above instead of this separate service
# This is another demo aggregator that needs to be launched manually. See components/metrics/README.md # This is another demo aggregator that needs to be launched manually. See components/metrics/README.md
# Note that you may need to disable the firewall on your host. On Ubuntu: sudo ufw allow 9091/tcp # Note that you may need to disable the firewall on your host. On Ubuntu: sudo ufw allow 9091/tcp
- job_name: 'metrics-aggregation-service' - job_name: 'metrics-aggregation-service'
......
...@@ -12,6 +12,9 @@ pub use prometheus::Registry; ...@@ -12,6 +12,9 @@ pub use prometheus::Registry;
use super::RouteDoc; use super::RouteDoc;
/// Metric prefix for all HTTP service metrics
pub const FRONTEND_METRIC_PREFIX: &str = "dynamo_frontend";
/// Value for the `status` label in the request counter for successful requests /// Value for the `status` label in the request counter for successful requests
pub const REQUEST_STATUS_SUCCESS: &str = "success"; pub const REQUEST_STATUS_SUCCESS: &str = "success";
...@@ -24,6 +27,11 @@ pub const REQUEST_TYPE_STREAM: &str = "stream"; ...@@ -24,6 +27,11 @@ pub const REQUEST_TYPE_STREAM: &str = "stream";
/// Partial value for the `type` label in the request counter for unary requests /// Partial value for the `type` label in the request counter for unary requests
pub const REQUEST_TYPE_UNARY: &str = "unary"; pub const REQUEST_TYPE_UNARY: &str = "unary";
/// Helper function to construct metric names with the standard prefix
fn frontend_metric_name(suffix: &str) -> String {
format!("{}_{}", FRONTEND_METRIC_PREFIX, suffix)
}
pub struct Metrics { pub struct Metrics {
request_counter: IntCounterVec, request_counter: IntCounterVec,
inflight_gauge: IntGaugeVec, inflight_gauge: IntGaugeVec,
...@@ -94,24 +102,24 @@ pub struct ResponseMetricCollector { ...@@ -94,24 +102,24 @@ pub struct ResponseMetricCollector {
impl Default for Metrics { impl Default for Metrics {
fn default() -> Self { fn default() -> Self {
Self::new("nv_llm") Self::new()
} }
} }
impl Metrics { impl Metrics {
/// Create Metrics with the given prefix /// Create Metrics with the standard prefix defined by [`FRONTEND_METRIC_PREFIX`]
/// The following metrics will be created: /// The following metrics will be created:
/// - `{prefix}_http_service_requests_total` - IntCounterVec for the total number of requests processed /// - `dynamo_frontend_requests_total` - IntCounterVec for the total number of requests processed
/// - `{prefix}_http_service_inflight_requests` - IntGaugeVec for the number of inflight requests /// - `dynamo_frontend_inflight_requests` - IntGaugeVec for the number of inflight requests
/// - `{prefix}_http_service_request_duration_seconds` - HistogramVec for the duration of requests /// - `dynamo_frontend_request_duration_seconds` - HistogramVec for the duration of requests
/// - `{prefix}_http_service_input_sequence_tokens` - HistogramVec for input sequence length in tokens /// - `dynamo_frontend_input_sequence_tokens` - HistogramVec for input sequence length in tokens
/// - `{prefix}_http_service_output_sequence_tokens` - HistogramVec for output sequence length in tokens /// - `dynamo_frontend_output_sequence_tokens` - HistogramVec for output sequence length in tokens
/// - `{prefix}_http_service_time_to_first_token_seconds` - HistogramVec for time to first token in seconds /// - `dynamo_frontend_time_to_first_token_seconds` - HistogramVec for time to first token in seconds
/// - `{prefix}_http_service_inter_token_latency_seconds` - HistogramVec for inter-token latency in seconds /// - `dynamo_frontend_inter_token_latency_seconds` - HistogramVec for inter-token latency in seconds
pub fn new(prefix: &str) -> Self { pub fn new() -> Self {
let request_counter = IntCounterVec::new( let request_counter = IntCounterVec::new(
Opts::new( Opts::new(
format!("{}_http_service_requests_total", prefix), frontend_metric_name("requests_total"),
"Total number of LLM requests processed", "Total number of LLM requests processed",
), ),
&["model", "endpoint", "request_type", "status"], &["model", "endpoint", "request_type", "status"],
...@@ -120,7 +128,7 @@ impl Metrics { ...@@ -120,7 +128,7 @@ impl Metrics {
let inflight_gauge = IntGaugeVec::new( let inflight_gauge = IntGaugeVec::new(
Opts::new( Opts::new(
format!("{}_http_service_inflight_requests", prefix), frontend_metric_name("inflight_requests"),
"Number of inflight requests", "Number of inflight requests",
), ),
&["model"], &["model"],
...@@ -131,7 +139,7 @@ impl Metrics { ...@@ -131,7 +139,7 @@ impl Metrics {
let request_duration = HistogramVec::new( let request_duration = HistogramVec::new(
HistogramOpts::new( HistogramOpts::new(
format!("{}_http_service_request_duration_seconds", prefix), frontend_metric_name("request_duration_seconds"),
"Duration of LLM requests", "Duration of LLM requests",
) )
.buckets(buckets), .buckets(buckets),
...@@ -141,7 +149,7 @@ impl Metrics { ...@@ -141,7 +149,7 @@ impl Metrics {
let input_sequence_length = HistogramVec::new( let input_sequence_length = HistogramVec::new(
HistogramOpts::new( HistogramOpts::new(
format!("{}_http_service_input_sequence_tokens", prefix), frontend_metric_name("input_sequence_tokens"),
"Input sequence length in tokens", "Input sequence length in tokens",
) )
.buckets(vec![ .buckets(vec![
...@@ -154,7 +162,7 @@ impl Metrics { ...@@ -154,7 +162,7 @@ impl Metrics {
let output_sequence_length = HistogramVec::new( let output_sequence_length = HistogramVec::new(
HistogramOpts::new( HistogramOpts::new(
format!("{}_http_service_output_sequence_tokens", prefix), frontend_metric_name("output_sequence_tokens"),
"Output sequence length in tokens", "Output sequence length in tokens",
) )
.buckets(vec![ .buckets(vec![
...@@ -166,7 +174,7 @@ impl Metrics { ...@@ -166,7 +174,7 @@ impl Metrics {
let time_to_first_token = HistogramVec::new( let time_to_first_token = HistogramVec::new(
HistogramOpts::new( HistogramOpts::new(
format!("{}_http_service_time_to_first_token_seconds", prefix), frontend_metric_name("time_to_first_token_seconds"),
"Time to first token in seconds", "Time to first token in seconds",
) )
.buckets(vec![ .buckets(vec![
...@@ -179,7 +187,7 @@ impl Metrics { ...@@ -179,7 +187,7 @@ impl Metrics {
let inter_token_latency = HistogramVec::new( let inter_token_latency = HistogramVec::new(
HistogramOpts::new( HistogramOpts::new(
format!("{}_http_service_inter_token_latency_seconds", prefix), frontend_metric_name("inter_token_latency_seconds"),
"Inter-token latency in seconds", "Inter-token latency in seconds",
) )
.buckets(vec![ .buckets(vec![
......
...@@ -22,7 +22,7 @@ use dynamo_llm::http::{ ...@@ -22,7 +22,7 @@ use dynamo_llm::http::{
}, },
service::{ service::{
error::HttpError, error::HttpError,
metrics::{Endpoint, RequestType, Status}, metrics::{Endpoint, RequestType, Status, FRONTEND_METRIC_PREFIX},
service_v2::HttpService, service_v2::HttpService,
Metrics, Metrics,
}, },
...@@ -357,7 +357,7 @@ async fn test_http_service() { ...@@ -357,7 +357,7 @@ async fn test_http_service() {
let families = registry.gather(); let families = registry.gather();
let histogram_metric_family = families let histogram_metric_family = families
.into_iter() .into_iter()
.find(|m| m.get_name() == "nv_llm_http_service_request_duration_seconds") .find(|m| m.get_name() == format!("{}_request_duration_seconds", FRONTEND_METRIC_PREFIX))
.expect("Histogram metric not found"); .expect("Histogram metric not found");
assert_eq!( assert_eq!(
......
...@@ -65,26 +65,6 @@ impl Clone for HttpServerInfo { ...@@ -65,26 +65,6 @@ impl Clone for HttpServerInfo {
} }
} }
pub struct HttpMetricsRegistry {
pub drt: Arc<crate::DistributedRuntime>,
}
impl crate::traits::DistributedRuntimeProvider for HttpMetricsRegistry {
fn drt(&self) -> &crate::DistributedRuntime {
&self.drt
}
}
impl MetricsRegistry for HttpMetricsRegistry {
fn basename(&self) -> String {
"dynamo".to_string()
}
fn parent_hierarchy(&self) -> Vec<String> {
[self.drt().parent_hierarchy(), vec![self.drt().basename()]].concat()
}
}
/// HTTP server state containing metrics and uptime tracking /// HTTP server state containing metrics and uptime tracking
pub struct HttpServerState { pub struct HttpServerState {
// global drt registry is for printing out the entire Prometheus format output // global drt registry is for printing out the entire Prometheus format output
...@@ -96,11 +76,10 @@ pub struct HttpServerState { ...@@ -96,11 +76,10 @@ pub struct HttpServerState {
impl HttpServerState { impl HttpServerState {
/// Create new HTTP server state with the provided metrics registry /// Create new HTTP server state with the provided metrics registry
pub fn new(drt: Arc<crate::DistributedRuntime>) -> anyhow::Result<Self> { pub fn new(drt: Arc<crate::DistributedRuntime>) -> anyhow::Result<Self> {
let http_metrics_registry = Arc::new(HttpMetricsRegistry { drt: drt.clone() });
// Note: This metric is created at the DRT level (no namespace), so we manually add "dynamo_" prefix // Note: This metric is created at the DRT level (no namespace), so we manually add "dynamo_" prefix
// to maintain consistency with the project's metric naming convention // to maintain consistency with the project's metric naming convention
let uptime_gauge = http_metrics_registry.as_ref().create_gauge( let uptime_gauge = drt.as_ref().create_gauge(
"system_uptime_seconds", "dynamo_uptime_seconds",
"Total uptime of the DistributedRuntime in seconds", "Total uptime of the DistributedRuntime in seconds",
&[], &[],
)?; )?;
...@@ -368,9 +347,9 @@ mod tests { ...@@ -368,9 +347,9 @@ mod tests {
println!("Full metrics response:\n{}", response); println!("Full metrics response:\n{}", response);
let expected = "\ let expected = "\
# HELP dynamo_system_uptime_seconds Total uptime of the DistributedRuntime in seconds # HELP dynamo_uptime_seconds Total uptime of the DistributedRuntime in seconds
# TYPE dynamo_system_uptime_seconds gauge # TYPE dynamo_uptime_seconds gauge
dynamo_system_uptime_seconds{namespace=\"dynamo\"} 42 dynamo_uptime_seconds 42
"; ";
assert_eq!(response, expected); assert_eq!(response, expected);
} }
...@@ -445,8 +424,8 @@ dynamo_system_uptime_seconds{namespace=\"dynamo\"} 42 ...@@ -445,8 +424,8 @@ dynamo_system_uptime_seconds{namespace=\"dynamo\"} 42
let tracestate_value = "vendor1=opaqueValue1,vendor2=opaqueValue2"; let tracestate_value = "vendor1=opaqueValue1,vendor2=opaqueValue2";
let mut headers = reqwest::header::HeaderMap::new(); let mut headers = reqwest::header::HeaderMap::new();
headers.insert( headers.insert(
reqwest::header::HeaderName.from_static("traceparent"), reqwest::header::HeaderName::from_static("traceparent"),
reqwest::header::HeaderValue.from_str(traceparent_value)?, reqwest::header::HeaderValue::from_str(traceparent_value).unwrap(),
); );
let url = format!("http://{}{}", addr, path); let url = format!("http://{}{}", addr, path);
let response = client.get(&url).send().await.unwrap(); let response = client.get(&url).send().await.unwrap();
......
...@@ -645,6 +645,7 @@ mod tests { ...@@ -645,6 +645,7 @@ mod tests {
mod test_prefixes { mod test_prefixes {
use super::create_test_drt; use super::create_test_drt;
use super::*; use super::*;
use prometheus::core::Collector;
#[test] #[test]
fn test_hierarchical_prefixes_and_parent_hierarchies() { fn test_hierarchical_prefixes_and_parent_hierarchies() {
...@@ -810,17 +811,27 @@ mod test_prefixes { ...@@ -810,17 +811,27 @@ mod test_prefixes {
); );
println!("Invalid namespace prefix: '{}'", invalid_namespace.prefix()); println!("Invalid namespace prefix: '{}'", invalid_namespace.prefix());
// Try to create a metric - this should fail because "@@123" gets stripped to "" which is invalid // Try to create a metric - this should succeed because the namespace name will be sanitized
let result = invalid_namespace.create_counter("test_counter", "A test counter", &[]); let result = invalid_namespace.create_counter("test_counter", "A test counter", &[]);
println!("Result with invalid namespace '@@123':"); println!("Result with invalid namespace '@@123':");
println!("{:?}", result); println!("{:?}", result);
// The result should be an error because empty metric names are invalid // The result should fail because even after sanitization, the name "123" doesn't follow Prometheus naming pattern
assert!( assert!(
result.is_err(), result.is_err(),
"Creating metric with namespace '@@123' should fail because it gets stripped to empty string" "Creating metric with invalid namespace should fail even after sanitization"
); );
// Verify the error message indicates the sanitized name is still invalid
if let Err(e) = &result {
let error_msg = e.to_string();
assert!(
error_msg.contains("123"),
"Error message should mention the sanitized name '123', got: {}",
error_msg
);
}
// For comparison, show a valid namespace works // For comparison, show a valid namespace works
let valid_namespace = drt.namespace("test_namespace").unwrap(); let valid_namespace = drt.namespace("test_namespace").unwrap();
let valid_result = valid_namespace.create_counter("test_counter", "A test counter", &[]); let valid_result = valid_namespace.create_counter("test_counter", "A test counter", &[]);
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment