Unverified Commit cbe0b177 authored by Keiven C's avatar Keiven C Committed by GitHub
Browse files

refactor: redesign the metrics API from Trait to composition to make the code...


refactor: redesign the metrics API from Trait to composition to make the code cleaner and easier to understand (#3687)
Signed-off-by: default avatarKeiven Chang <keivenchang@users.noreply.github.com>
parent eb8d07cb
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
<!-- <!--
SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0 SPDX-License-Identifier: Apache-2.0
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
--> -->
# Dynamo MetricsRegistry # Dynamo MetricsRegistry
...@@ -25,11 +13,11 @@ Dynamo provides built-in metrics capabilities through the `MetricsRegistry` trai ...@@ -25,11 +13,11 @@ Dynamo provides built-in metrics capabilities through the `MetricsRegistry` trai
Dynamo automatically exposes metrics with the `dynamo_` name prefixes. It also adds the following labels `dynamo_namespace`, `dynamo_component`, and `dynamo_endpoint` to indicate which component is providing the metric. Dynamo automatically exposes metrics with the `dynamo_` name prefixes. It also adds the following labels `dynamo_namespace`, `dynamo_component`, and `dynamo_endpoint` to indicate which component is providing the metric.
**Frontend Metrics**: When using Dynamo HTTP Frontend (`--framework VLLM` or `--framework TRTLLM`), these metrics are automatically exposed with the `dynamo_frontend_*` prefix and include `model` labels containing the model name. These cover request handling, token processing, and latency measurements. See the [Available Metrics section](../../deploy/metrics/README.md#available-metrics) for the complete list of frontend metrics. **Frontend Metrics**: When using Dynamo HTTP Frontend (`--framework VLLM` or `--framework TRTLLM`), these metrics are automatically exposed with the `dynamo_frontend_*` prefix and include `model` labels containing the model name. These cover request handling, token processing, and latency measurements. See [prometheus-grafana.md](prometheus-grafana.md#available-metrics) for the complete list of frontend metrics.
**Component Metrics**: The core Dynamo backend system automatically exposes metrics with the `dynamo_component_*` prefix for all components that use the `DistributedRuntime` framework. These include request counts, processing times, byte transfers, and system uptime metrics. See the [Available Metrics section](../../deploy/metrics/README.md#available-metrics) for the complete list of component metrics. **Component Metrics**: The core Dynamo backend system automatically exposes metrics with the `dynamo_component_*` prefix for all components that use the `DistributedRuntime` framework. These include request counts, processing times, byte transfers, and system uptime metrics. See [prometheus-grafana.md](prometheus-grafana.md#available-metrics) for the complete list of component metrics.
**Specialized Component Metrics**: Components can also expose additional metrics specific to their functionality. For example, a `preprocessor` component exposes metrics with the `dynamo_preprocessor_*` prefix. See the [Available Metrics section](../../deploy/metrics/README.md#available-metrics) for details on specialized component metrics. **Specialized Component Metrics**: Components can also expose additional metrics specific to their functionality. For example, a `preprocessor` component exposes metrics with the `dynamo_preprocessor_*` prefix. See [prometheus-grafana.md](prometheus-grafana.md#available-metrics) for details on specialized component metrics.
**Kubernetes Integration**: For comprehensive Kubernetes deployment and monitoring setup, see the [Kubernetes Metrics Guide](../kubernetes/observability/metrics.md). This includes Prometheus Operator setup, metrics collection configuration, and visualization in Grafana. **Kubernetes Integration**: For comprehensive Kubernetes deployment and monitoring setup, see the [Kubernetes Metrics Guide](../kubernetes/observability/metrics.md). This includes Prometheus Operator setup, metrics collection configuration, and visualization in Grafana.
...@@ -47,7 +35,7 @@ This hierarchical structure allows you to create metrics at the appropriate leve ...@@ -47,7 +35,7 @@ This hierarchical structure allows you to create metrics at the appropriate leve
## Getting Started ## Getting Started
For a complete setup guide including Docker Compose configuration, Prometheus setup, and Grafana dashboards, see the [Getting Started section](../../deploy/metrics/README.md#getting-started) in the deploy metrics documentation. For a complete setup guide including Docker Compose configuration, Prometheus setup, and Grafana dashboards, see the [Getting Started section](prometheus-grafana.md#getting-started) in the Prometheus and Grafana guide.
The quick start includes: The quick start includes:
- Docker Compose setup for Prometheus and Grafana - Docker Compose setup for Prometheus and Grafana
...@@ -57,7 +45,7 @@ The quick start includes: ...@@ -57,7 +45,7 @@ The quick start includes:
## Implementation Examples ## Implementation Examples
See [Implementation Examples](../../deploy/metrics/README.md#implementation-examples) for detailed examples of creating metrics at different hierarchy levels and using dynamic labels. Examples of creating metrics at different hierarchy levels and using dynamic labels are included in this document below.
### Grafana Dashboards ### Grafana Dashboards
...@@ -90,12 +78,22 @@ graph TD ...@@ -90,12 +78,22 @@ graph TD
The metrics system includes a pre-configured Grafana dashboard for visualizing service metrics: The metrics system includes a pre-configured Grafana dashboard for visualizing service metrics:
![Grafana Dynamo Dashboard](../../deploy/metrics/grafana-dynamo-composite.png) ![Grafana Dynamo Dashboard](./grafana-dynamo-composite.png)
## Detailed Setup Guide
For complete setup instructions including Docker Compose, Prometheus configuration, and Grafana dashboards, see:
```{toctree}
:hidden:
prometheus-grafana
```
- [Prometheus and Grafana Setup Guide](prometheus-grafana.md)
## Related Documentation ## Related Documentation
- [Distributed Runtime Architecture](../design_docs/distributed_runtime.md) - [Distributed Runtime Architecture](../design_docs/distributed_runtime.md)
- [Dynamo Architecture Overview](../design_docs/architecture.md) - [Dynamo Architecture Overview](../design_docs/architecture.md)
- [Backend Guide](../development/backend-guide.md) - [Backend Guide](../development/backend-guide.md)
- [Metrics Implementation Examples](../../deploy/metrics/README.md#implementation-examples)
- [Complete Metrics Setup Guide](../../deploy/metrics/README.md)
\ No newline at end of file
...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
This directory contains configuration for visualizing metrics from the metrics aggregation service using Prometheus and Grafana. This directory contains configuration for visualizing metrics from the metrics aggregation service using Prometheus and Grafana.
> [!NOTE] > [!NOTE]
> For detailed information about Dynamo's metrics system, including hierarchical metrics, automatic labeling, and usage examples, see the [Metrics Guide](../../docs/observability/metrics.md). > For detailed information about Dynamo's metrics system, including hierarchical metrics, automatic labeling, and usage examples, see the [Metrics Guide](./metrics.md).
## Overview ## Overview
...@@ -165,14 +165,14 @@ $ python -m dynamo.vllm --model Qwen/Qwen3-0.6B \ ...@@ -165,14 +165,14 @@ $ python -m dynamo.vllm --model Qwen/Qwen3-0.6B \
### Required Files ### Required Files
The following configuration files should be present in this directory: The following configuration files are located in the `deploy/metrics/` directory:
- [docker-compose.yml](../docker-compose.yml): Defines the Prometheus and Grafana services - [docker-compose.yml](../../deploy/docker-compose.yml): Defines the Prometheus and Grafana services
- [prometheus.yml](./prometheus.yml): Contains Prometheus scraping configuration - [prometheus.yml](../../deploy/metrics/prometheus.yml): Contains Prometheus scraping configuration
- [grafana-datasources.yml](./grafana-datasources.yml): Contains Grafana datasource configuration - [grafana-datasources.yml](../../deploy/metrics/grafana-datasources.yml): Contains Grafana datasource configuration
- [grafana_dashboards/grafana-dashboard-providers.yml](./grafana_dashboards/grafana-dashboard-providers.yml): Contains Grafana dashboard provider configuration - [grafana_dashboards/grafana-dashboard-providers.yml](../../deploy/metrics/grafana_dashboards/grafana-dashboard-providers.yml): Contains Grafana dashboard provider configuration
- [grafana_dashboards/grafana-dynamo-dashboard.json](./grafana_dashboards/grafana-dynamo-dashboard.json): A general Dynamo Dashboard for both SW and HW metrics. - [grafana_dashboards/grafana-dynamo-dashboard.json](../../deploy/metrics/grafana_dashboards/grafana-dynamo-dashboard.json): A general Dynamo Dashboard for both SW and HW metrics.
- [grafana_dashboards/grafana-dcgm-metrics.json](./grafana_dashboards/grafana-dcgm-metrics.json): Contains Grafana dashboard configuration for DCGM GPU metrics - [grafana_dashboards/grafana-dcgm-metrics.json](../../deploy/metrics/grafana_dashboards/grafana-dcgm-metrics.json): Contains Grafana dashboard configuration for DCGM GPU metrics
- [grafana_dashboards/grafana-kvbm-dashboard.json](./grafana_dashboards/grafana-kvbm-dashboard.json): Contains Grafana dashboard configuration for KVBM metrics - [grafana_dashboards/grafana-kvbm-dashboard.json](../../deploy/metrics/grafana_dashboards/grafana-kvbm-dashboard.json): Contains Grafana dashboard configuration for KVBM metrics
### Metric Name Constants ### Metric Name Constants
...@@ -241,7 +241,7 @@ This centralized approach ensures all Dynamo components use consistent, valid Pr ...@@ -241,7 +241,7 @@ This centralized approach ensures all Dynamo components use consistent, valid Pr
#### Prometheus #### Prometheus
The Prometheus configuration is specified in [prometheus.yml](./prometheus.yml). This file is set up to collect metrics from the metrics aggregation service endpoint. The Prometheus configuration is specified in [prometheus.yml](../../deploy/metrics/prometheus.yml). This file is set up to collect metrics from the metrics aggregation service endpoint.
Please be aware that you might need to modify the target settings to align with your specific host configuration and network environment. Please be aware that you might need to modify the target settings to align with your specific host configuration and network environment.
...@@ -288,13 +288,13 @@ let component = namespace.component("my_component")?; ...@@ -288,13 +288,13 @@ let component = namespace.component("my_component")?;
let endpoint = component.endpoint("my_endpoint")?; let endpoint = component.endpoint("my_endpoint")?;
// Create endpoint-level counters (this is a Prometheus Counter type) // Create endpoint-level counters (this is a Prometheus Counter type)
let requests_total = endpoint.create_counter( let requests_total = endpoint.metrics().create_counter(
"requests_total", "requests_total",
"Total requests across all namespaces", "Total requests across all namespaces",
&[] &[]
)?; )?;
let active_connections = endpoint.create_gauge( let active_connections = endpoint.metrics().create_gauge(
"active_connections", "active_connections",
"Number of active client connections", "Number of active client connections",
&[] &[]
...@@ -307,17 +307,17 @@ let active_connections = endpoint.create_gauge( ...@@ -307,17 +307,17 @@ let active_connections = endpoint.create_gauge(
let namespace = runtime.namespace("my_model")?; let namespace = runtime.namespace("my_model")?;
// Namespace-scoped metrics // Namespace-scoped metrics
let model_requests = namespace.create_counter( let model_requests = namespace.metrics().create_counter(
"model_requests", "model_requests",
"Requests for this specific model", "Requests for this specific model",
&[] &[]
)?; )?;
let model_latency = namespace.create_histogram( let model_latency = namespace.metrics().create_histogram(
"model_latency_seconds", "model_latency_seconds",
"Model inference latency", "Model inference latency",
&[], &[],
&[0.001, 0.01, 0.1, 1.0, 10.0] Some(vec![0.001, 0.01, 0.1, 1.0, 10.0])
)?; )?;
``` ```
...@@ -327,13 +327,13 @@ let model_latency = namespace.create_histogram( ...@@ -327,13 +327,13 @@ let model_latency = namespace.create_histogram(
let component = namespace.component("backend")?; let component = namespace.component("backend")?;
// Component-specific metrics // Component-specific metrics
let backend_requests = component.create_counter( let backend_requests = component.metrics().create_counter(
"backend_requests", "backend_requests",
"Requests handled by this backend component", "Requests handled by this backend component",
&[] &[]
)?; )?;
let gpu_memory_usage = component.create_gauge( let gpu_memory_usage = component.metrics().create_gauge(
"gpu_memory_bytes", "gpu_memory_bytes",
"GPU memory usage in bytes", "GPU memory usage in bytes",
&[] &[]
...@@ -346,17 +346,17 @@ let gpu_memory_usage = component.create_gauge( ...@@ -346,17 +346,17 @@ let gpu_memory_usage = component.create_gauge(
let endpoint = component.endpoint("generate")?; let endpoint = component.endpoint("generate")?;
// Endpoint-specific metrics // Endpoint-specific metrics
let generate_requests = endpoint.create_counter( let generate_requests = endpoint.metrics().create_counter(
"generate_requests", "generate_requests",
"Generate endpoint requests", "Generate endpoint requests",
&[] &[]
)?; )?;
let generate_latency = endpoint.create_histogram( let generate_latency = endpoint.metrics().create_histogram(
"generate_latency_seconds", "generate_latency_seconds",
"Generate endpoint latency", "Generate endpoint latency",
&[], &[],
&[0.001, 0.01, 0.1, 1.0, 10.0] Some(vec![0.001, 0.01, 0.1, 1.0, 10.0])
)?; )?;
``` ```
...@@ -366,10 +366,11 @@ Use vector metrics when you need to track metrics with different label values: ...@@ -366,10 +366,11 @@ Use vector metrics when you need to track metrics with different label values:
```rust ```rust
// Counter with labels // Counter with labels
let requests_by_model = endpoint.create_counter_vec( let requests_by_model = endpoint.metrics().create_countervec(
"requests_by_model", "requests_by_model",
"Requests by model type", "Requests by model type",
&["model_type", "model_size"] &["model_type", "model_size"],
&[] // no constant labels
)?; )?;
// Increment with specific labels // Increment with specific labels
...@@ -377,10 +378,11 @@ requests_by_model.with_label_values(&["llama", "7b"]).inc(); ...@@ -377,10 +378,11 @@ requests_by_model.with_label_values(&["llama", "7b"]).inc();
requests_by_model.with_label_values(&["gpt", "13b"]).inc(); requests_by_model.with_label_values(&["gpt", "13b"]).inc();
// Gauge with labels // Gauge with labels
let memory_by_gpu = component.create_gauge_vec( let memory_by_gpu = component.metrics().create_gaugevec(
"gpu_memory_bytes", "gpu_memory_bytes",
"GPU memory usage by device", "GPU memory usage by device",
&["gpu_id", "memory_type"] &["gpu_id", "memory_type"],
&[] // no constant labels
)?; )?;
memory_by_gpu.with_label_values(&["0", "allocated"]).set(8192.0); memory_by_gpu.with_label_values(&["0", "allocated"]).set(8192.0);
...@@ -392,11 +394,11 @@ memory_by_gpu.with_label_values(&["0", "cached"]).set(4096.0); ...@@ -392,11 +394,11 @@ memory_by_gpu.with_label_values(&["0", "cached"]).set(4096.0);
Histograms are useful for measuring distributions of values like latency: Histograms are useful for measuring distributions of values like latency:
```rust ```rust
let latency_histogram = endpoint.create_histogram( let latency_histogram = endpoint.metrics().create_histogram(
"request_latency_seconds", "request_latency_seconds",
"Request latency distribution", "Request latency distribution",
&[], &[],
&[0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0, 5.0] Some(vec![0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0, 5.0])
)?; )?;
// Record latency values // Record latency values
...@@ -429,7 +431,7 @@ counter.inc(); ...@@ -429,7 +431,7 @@ counter.inc();
#### After (Dynamo MetricsRegistry) #### After (Dynamo MetricsRegistry)
```rust ```rust
let counter = endpoint.create_counter( let counter = endpoint.metrics().create_counter(
"my_counter", "my_counter",
"My custom counter", "My custom counter",
&[] &[]
...@@ -438,10 +440,10 @@ let counter = endpoint.create_counter( ...@@ -438,10 +440,10 @@ let counter = endpoint.create_counter(
counter.inc(); counter.inc();
``` ```
**Note:** The metric is automatically registered when created via the endpoint's `create_counter` factory method. **Note:** The metric is automatically registered when created via the endpoint's `metrics().create_counter()` factory method.
**Benefits of Dynamo's approach:** **Benefits of Dynamo's approach:**
- **Automatic registration**: Metrics created via endpoint's `create_*` factory methods are automatically registered with the system - **Automatic registration**: Metrics created via endpoint's `metrics().create_*()` factory methods are automatically registered with the system
- Automatic labeling with namespace, component, and endpoint information - Automatic labeling with namespace, component, and endpoint information
- Consistent metric naming with `dynamo_` prefix - Consistent metric naming with `dynamo_` prefix
- Built-in HTTP metrics endpoint when enabled with `DYN_SYSTEM_ENABLED=true` - Built-in HTTP metrics endpoint when enabled with `DYN_SYSTEM_ENABLED=true`
...@@ -454,11 +456,11 @@ counter.inc(); ...@@ -454,11 +456,11 @@ counter.inc();
```rust ```rust
// Define custom buckets for your use case // Define custom buckets for your use case
let custom_buckets = vec![0.001, 0.01, 0.1, 1.0, 10.0]; let custom_buckets = vec![0.001, 0.01, 0.1, 1.0, 10.0];
let latency = endpoint.create_histogram( let latency = endpoint.metrics().create_histogram(
"api_latency_seconds", "api_latency_seconds",
"API latency in seconds", "API latency in seconds",
&[], &[],
&custom_buckets Some(custom_buckets)
)?; )?;
``` ```
...@@ -466,7 +468,7 @@ let latency = endpoint.create_histogram( ...@@ -466,7 +468,7 @@ let latency = endpoint.create_histogram(
```rust ```rust
// Aggregate metrics across multiple endpoints // Aggregate metrics across multiple endpoints
let requests_total = namespace.create_counter( let requests_total = namespace.metrics().create_counter(
"requests_total", "requests_total",
"Total requests across all endpoints", "Total requests across all endpoints",
&[] &[]
......
...@@ -92,7 +92,7 @@ When you need to add or modify metrics in Method 1 (ForwardPassMetrics Pub/Sub v ...@@ -92,7 +92,7 @@ When you need to add or modify metrics in Method 1 (ForwardPassMetrics Pub/Sub v
// ... existing gauges ... // ... existing gauges ...
// Manually create and register new Prometheus gauge // Manually create and register new Prometheus gauge
let new_metric_gauge = component.create_gauge( let new_metric_gauge = component.metrics().create_gauge(
"new_metric_name", "new_metric_name",
"Description of new metric", "Description of new metric",
&[], // labels &[], // labels
...@@ -345,7 +345,7 @@ graph TD ...@@ -345,7 +345,7 @@ graph TD
end end
PY -->|endpoint.metrics.create_intgauge| PM PY -->|endpoint.metrics.create_intgauge| PM
PM -->|endpoint.create_intgauge| EP PM -->|endpoint.metrics.create_intgauge| EP
EP -->|create & register| PROM EP -->|create & register| PROM
PM -->|wrap & return| MT PM -->|wrap & return| MT
MT -->|return to Python| PY MT -->|return to Python| PY
......
...@@ -652,28 +652,28 @@ impl Histogram { ...@@ -652,28 +652,28 @@ impl Histogram {
#[pyclass] #[pyclass]
#[derive(Clone)] #[derive(Clone)]
pub struct RuntimeMetrics { pub struct RuntimeMetrics {
metricsregistry: Arc<dyn rs::metrics::MetricsRegistry>, hierarchy: Arc<dyn rs::metrics::MetricsHierarchy>,
} }
impl RuntimeMetrics { impl RuntimeMetrics {
/// Create from Endpoint /// Create from Endpoint
pub fn from_endpoint(endpoint: dynamo_runtime::component::Endpoint) -> Self { pub fn from_endpoint(endpoint: dynamo_runtime::component::Endpoint) -> Self {
Self { Self {
metricsregistry: Arc::new(endpoint), hierarchy: Arc::new(endpoint),
} }
} }
/// Create from Component /// Create from Component
pub fn from_component(component: dynamo_runtime::component::Component) -> Self { pub fn from_component(component: dynamo_runtime::component::Component) -> Self {
Self { Self {
metricsregistry: Arc::new(component), hierarchy: Arc::new(component),
} }
} }
/// Create from Namespace /// Create from Namespace
pub fn from_namespace(namespace: dynamo_runtime::component::Namespace) -> Self { pub fn from_namespace(namespace: dynamo_runtime::component::Namespace) -> Self {
Self { Self {
metricsregistry: Arc::new(namespace), hierarchy: Arc::new(namespace),
} }
} }
...@@ -690,29 +690,23 @@ impl RuntimeMetrics { ...@@ -690,29 +690,23 @@ impl RuntimeMetrics {
names.iter().map(|s| s.as_str()).collect() names.iter().map(|s| s.as_str()).collect()
} }
/// Generic helper to register metrics callbacks for any type implementing MetricsRegistry /// Generic helper to register metrics callbacks for any type implementing MetricsHierarchy
/// This allows Endpoint, Component, and Namespace to share the same callback registration logic /// This allows Endpoint, Component, and Namespace to share the same callback registration logic
pub fn register_callback_for<T>(registry_item: &T, callback: PyObject) -> PyResult<()> pub fn register_callback_for<T>(registry_item: &T, callback: PyObject) -> PyResult<()>
where where
T: rs::metrics::MetricsRegistry + rs::traits::DistributedRuntimeProvider + ?Sized, T: rs::metrics::MetricsHierarchy + ?Sized,
{ {
let hierarchy = registry_item.hierarchy(); // Get the metrics registry from the hierarchy and register the callback directly
let metrics_registry = registry_item.get_metrics_registry();
// Store the callback in the DRT's metrics callback registry using the registry_item's hierarchy metrics_registry.add_update_callback(Arc::new(move || {
// TODO: rename this to register_callback, once we move the the MetricsRegistry trait // Execute the Python callback in the Python event loop
// out of the runtime, and make it into a composed module. Python::with_gil(|py| {
registry_item.drt().register_prometheus_update_callback( if let Err(e) = callback.call0(py) {
vec![hierarchy.clone()], tracing::error!("Metrics callback failed: {}", e);
Arc::new(move || { }
// Execute the Python callback in the Python event loop });
Python::with_gil(|py| { Ok(())
if let Err(e) = callback.call0(py) { }));
tracing::error!("Metrics callback failed: {}", e);
}
});
Ok(())
}),
);
Ok(()) Ok(())
} }
...@@ -723,40 +717,52 @@ impl RuntimeMetrics { ...@@ -723,40 +717,52 @@ impl RuntimeMetrics {
/// Register a Python callback to be invoked before metrics are scraped /// Register a Python callback to be invoked before metrics are scraped
/// This callback will be called for this endpoint's metrics hierarchy /// This callback will be called for this endpoint's metrics hierarchy
fn register_callback(&self, callback: PyObject, _py: Python) -> PyResult<()> { fn register_callback(&self, callback: PyObject, _py: Python) -> PyResult<()> {
Self::register_callback_for(self.metricsregistry.as_ref(), callback) Self::register_callback_for(self.hierarchy.as_ref(), callback)
} }
/// Register a Python callback that returns Prometheus exposition text /// Register a Python callback that returns Prometheus exposition text
/// The returned text will be appended to the /metrics endpoint output /// The returned text will be appended to the /metrics endpoint output
/// The callback should return a string in Prometheus text exposition format /// The callback should return a string in Prometheus text exposition format
fn register_prometheus_expfmt_callback(&self, callback: PyObject, _py: Python) -> PyResult<()> { fn register_prometheus_expfmt_callback(&self, callback: PyObject, _py: Python) -> PyResult<()> {
let hierarchy = self.metricsregistry.hierarchy(); // Create the callback once (Arc allows sharing across registries)
let callback_arc = Arc::new(move || {
// Store the callback in the DRT's metrics exposition text callback registry // Execute the Python callback in the Python event loop
self.metricsregistry.drt().register_prometheus_expfmt_callback( Python::with_gil(|py| {
vec![hierarchy.clone()], match callback.call0(py) {
Arc::new(move || { Ok(result) => {
// Execute the Python callback in the Python event loop // Try to extract a string from the result
Python::with_gil(|py| { match result.extract::<String>(py) {
match callback.call0(py) { Ok(text) => Ok(text),
Ok(result) => { Err(e) => {
// Try to extract a string from the result tracing::error!(
match result.extract::<String>(py) { "Metrics exposition text callback must return a string: {}",
Ok(text) => Ok(text), e
Err(e) => { );
tracing::error!("Metrics exposition text callback must return a string: {}", e); Ok(String::new())
Ok(String::new())
}
} }
} }
Err(e) => {
tracing::error!("Metrics exposition text callback failed: {}", e);
Ok(String::new())
}
} }
}) Err(e) => {
}), tracing::error!("Metrics exposition text callback failed: {}", e);
); Ok(String::new())
}
}
})
});
// Register the callback at this hierarchy level
self.hierarchy
.get_metrics_registry()
.add_expfmt_callback(callback_arc.clone());
// Also register at all parent hierarchy levels so the callback is accessible
// when prometheus_expfmt() is called on any parent (e.g., DRT)
let parents = self.hierarchy.parent_hierarchies();
for parent in parents.iter() {
parent
.get_metrics_registry()
.add_expfmt_callback(callback_arc.clone());
}
Ok(()) Ok(())
} }
...@@ -774,10 +780,15 @@ impl RuntimeMetrics { ...@@ -774,10 +780,15 @@ impl RuntimeMetrics {
py: Python, py: Python,
) -> PyResult<Py<Counter>> { ) -> PyResult<Py<Counter>> {
let labels_vec = Self::convert_py_to_rust_labels(&labels); let labels_vec = Self::convert_py_to_rust_labels(&labels);
let counter = self let counter: prometheus::Counter = rs::metrics::create_metric(
.metricsregistry self.hierarchy.as_ref(),
.create_counter(&name, &description, &labels_vec) &name,
.map_err(|e| PyErr::new::<pyo3::exceptions::PyRuntimeError, _>(e.to_string()))?; &description,
&labels_vec,
None,
None,
)
.map_err(|e| PyErr::new::<pyo3::exceptions::PyRuntimeError, _>(e.to_string()))?;
let metric = Counter::from_prometheus(counter); let metric = Counter::from_prometheus(counter);
Py::new(py, metric) Py::new(py, metric)
...@@ -795,10 +806,15 @@ impl RuntimeMetrics { ...@@ -795,10 +806,15 @@ impl RuntimeMetrics {
) -> PyResult<Py<CounterVec>> { ) -> PyResult<Py<CounterVec>> {
let label_names_str = Self::convert_py_to_rust_label_names(&label_names); let label_names_str = Self::convert_py_to_rust_label_names(&label_names);
let const_labels_vec = Self::convert_py_to_rust_labels(&const_labels); let const_labels_vec = Self::convert_py_to_rust_labels(&const_labels);
let counter_vec = self let counter_vec: prometheus::CounterVec = rs::metrics::create_metric(
.metricsregistry self.hierarchy.as_ref(),
.create_countervec(&name, &description, &label_names_str, &const_labels_vec) &name,
.map_err(|e| PyErr::new::<pyo3::exceptions::PyRuntimeError, _>(e.to_string()))?; &description,
&const_labels_vec,
None,
Some(&label_names_str),
)
.map_err(|e| PyErr::new::<pyo3::exceptions::PyRuntimeError, _>(e.to_string()))?;
let metric = CounterVec::from_prometheus(counter_vec); let metric = CounterVec::from_prometheus(counter_vec);
Py::new(py, metric) Py::new(py, metric)
...@@ -815,10 +831,15 @@ impl RuntimeMetrics { ...@@ -815,10 +831,15 @@ impl RuntimeMetrics {
) -> PyResult<Py<Gauge>> { ) -> PyResult<Py<Gauge>> {
let labels_vec = Self::convert_py_to_rust_labels(&labels); let labels_vec = Self::convert_py_to_rust_labels(&labels);
let gauge = self let gauge: prometheus::Gauge = rs::metrics::create_metric(
.metricsregistry self.hierarchy.as_ref(),
.create_gauge(&name, &description, &labels_vec) &name,
.map_err(|e| PyErr::new::<pyo3::exceptions::PyRuntimeError, _>(e.to_string()))?; &description,
&labels_vec,
None,
None,
)
.map_err(|e| PyErr::new::<pyo3::exceptions::PyRuntimeError, _>(e.to_string()))?;
let metric = Gauge::from_prometheus(gauge); let metric = Gauge::from_prometheus(gauge);
Py::new(py, metric) Py::new(py, metric)
...@@ -836,10 +857,15 @@ impl RuntimeMetrics { ...@@ -836,10 +857,15 @@ impl RuntimeMetrics {
) -> PyResult<Py<GaugeVec>> { ) -> PyResult<Py<GaugeVec>> {
let label_names_str = Self::convert_py_to_rust_label_names(&label_names); let label_names_str = Self::convert_py_to_rust_label_names(&label_names);
let const_labels_vec = Self::convert_py_to_rust_labels(&const_labels); let const_labels_vec = Self::convert_py_to_rust_labels(&const_labels);
let gauge_vec = self let gauge_vec: prometheus::GaugeVec = rs::metrics::create_metric(
.metricsregistry self.hierarchy.as_ref(),
.create_gaugevec(&name, &description, &label_names_str, &const_labels_vec) &name,
.map_err(|e| PyErr::new::<pyo3::exceptions::PyRuntimeError, _>(e.to_string()))?; &description,
&const_labels_vec,
None,
Some(&label_names_str),
)
.map_err(|e| PyErr::new::<pyo3::exceptions::PyRuntimeError, _>(e.to_string()))?;
let metric = GaugeVec::from_prometheus(gauge_vec); let metric = GaugeVec::from_prometheus(gauge_vec);
Py::new(py, metric) Py::new(py, metric)
...@@ -856,10 +882,15 @@ impl RuntimeMetrics { ...@@ -856,10 +882,15 @@ impl RuntimeMetrics {
) -> PyResult<Py<Histogram>> { ) -> PyResult<Py<Histogram>> {
let labels_vec = Self::convert_py_to_rust_labels(&labels); let labels_vec = Self::convert_py_to_rust_labels(&labels);
let histogram = self let histogram: prometheus::Histogram = rs::metrics::create_metric(
.metricsregistry self.hierarchy.as_ref(),
.create_histogram(&name, &description, &labels_vec, None) &name,
.map_err(|e| PyErr::new::<pyo3::exceptions::PyRuntimeError, _>(e.to_string()))?; &description,
&labels_vec,
None,
None,
)
.map_err(|e| PyErr::new::<pyo3::exceptions::PyRuntimeError, _>(e.to_string()))?;
let metric = Histogram::from_prometheus(histogram); let metric = Histogram::from_prometheus(histogram);
Py::new(py, metric) Py::new(py, metric)
...@@ -876,10 +907,15 @@ impl RuntimeMetrics { ...@@ -876,10 +907,15 @@ impl RuntimeMetrics {
) -> PyResult<Py<IntCounter>> { ) -> PyResult<Py<IntCounter>> {
let labels_vec = Self::convert_py_to_rust_labels(&labels); let labels_vec = Self::convert_py_to_rust_labels(&labels);
let counter = self let counter: prometheus::IntCounter = rs::metrics::create_metric(
.metricsregistry self.hierarchy.as_ref(),
.create_intcounter(&name, &description, &labels_vec) &name,
.map_err(|e| PyErr::new::<pyo3::exceptions::PyRuntimeError, _>(e.to_string()))?; &description,
&labels_vec,
None,
None,
)
.map_err(|e| PyErr::new::<pyo3::exceptions::PyRuntimeError, _>(e.to_string()))?;
let metric = IntCounter::from_prometheus(counter); let metric = IntCounter::from_prometheus(counter);
Py::new(py, metric) Py::new(py, metric)
...@@ -897,10 +933,15 @@ impl RuntimeMetrics { ...@@ -897,10 +933,15 @@ impl RuntimeMetrics {
) -> PyResult<Py<IntCounterVec>> { ) -> PyResult<Py<IntCounterVec>> {
let label_names_str = Self::convert_py_to_rust_label_names(&label_names); let label_names_str = Self::convert_py_to_rust_label_names(&label_names);
let const_labels_vec = Self::convert_py_to_rust_labels(&const_labels); let const_labels_vec = Self::convert_py_to_rust_labels(&const_labels);
let counter_vec = self let counter_vec: prometheus::IntCounterVec = rs::metrics::create_metric(
.metricsregistry self.hierarchy.as_ref(),
.create_intcountervec(&name, &description, &label_names_str, &const_labels_vec) &name,
.map_err(|e| PyErr::new::<pyo3::exceptions::PyRuntimeError, _>(e.to_string()))?; &description,
&const_labels_vec,
None,
Some(&label_names_str),
)
.map_err(|e| PyErr::new::<pyo3::exceptions::PyRuntimeError, _>(e.to_string()))?;
let metric = IntCounterVec::from_prometheus(counter_vec); let metric = IntCounterVec::from_prometheus(counter_vec);
Py::new(py, metric) Py::new(py, metric)
...@@ -917,10 +958,15 @@ impl RuntimeMetrics { ...@@ -917,10 +958,15 @@ impl RuntimeMetrics {
) -> PyResult<Py<IntGauge>> { ) -> PyResult<Py<IntGauge>> {
let labels_vec = Self::convert_py_to_rust_labels(&labels); let labels_vec = Self::convert_py_to_rust_labels(&labels);
let gauge = self let gauge: prometheus::IntGauge = rs::metrics::create_metric(
.metricsregistry self.hierarchy.as_ref(),
.create_intgauge(&name, &description, &labels_vec) &name,
.map_err(|e| PyErr::new::<pyo3::exceptions::PyRuntimeError, _>(e.to_string()))?; &description,
&labels_vec,
None,
None,
)
.map_err(|e| PyErr::new::<pyo3::exceptions::PyRuntimeError, _>(e.to_string()))?;
let metric = IntGauge::from_prometheus(gauge); let metric = IntGauge::from_prometheus(gauge);
Py::new(py, metric) Py::new(py, metric)
...@@ -938,10 +984,15 @@ impl RuntimeMetrics { ...@@ -938,10 +984,15 @@ impl RuntimeMetrics {
) -> PyResult<Py<IntGaugeVec>> { ) -> PyResult<Py<IntGaugeVec>> {
let label_names_str = Self::convert_py_to_rust_label_names(&label_names); let label_names_str = Self::convert_py_to_rust_label_names(&label_names);
let const_labels_vec = Self::convert_py_to_rust_labels(&const_labels); let const_labels_vec = Self::convert_py_to_rust_labels(&const_labels);
let gauge_vec = self let gauge_vec: prometheus::IntGaugeVec = rs::metrics::create_metric(
.metricsregistry self.hierarchy.as_ref(),
.create_intgaugevec(&name, &description, &label_names_str, &const_labels_vec) &name,
.map_err(|e| PyErr::new::<pyo3::exceptions::PyRuntimeError, _>(e.to_string()))?; &description,
&const_labels_vec,
None,
Some(&label_names_str),
)
.map_err(|e| PyErr::new::<pyo3::exceptions::PyRuntimeError, _>(e.to_string()))?;
let metric = IntGaugeVec::from_prometheus(gauge_vec); let metric = IntGaugeVec::from_prometheus(gauge_vec);
Py::new(py, metric) Py::new(py, metric)
......
...@@ -35,7 +35,7 @@ use async_trait::async_trait; ...@@ -35,7 +35,7 @@ use async_trait::async_trait;
use bytes::Bytes; use bytes::Bytes;
use dynamo_runtime::{ use dynamo_runtime::{
component::Component, component::Component,
metrics::{MetricsRegistry, prometheus_names::kvrouter}, metrics::{MetricsHierarchy, prometheus_names::kvrouter},
}; };
use prometheus::{IntCounterVec, Opts}; use prometheus::{IntCounterVec, Opts};
use serde::{Deserialize, Serialize}; use serde::{Deserialize, Serialize};
...@@ -589,7 +589,7 @@ impl KvIndexerMetrics { ...@@ -589,7 +589,7 @@ impl KvIndexerMetrics {
/// KV_INDEXER_METRICS to avoid duplicate registration issues. /// KV_INDEXER_METRICS to avoid duplicate registration issues.
pub fn from_component(component: &Component) -> Arc<Self> { pub fn from_component(component: &Component) -> Arc<Self> {
KV_INDEXER_METRICS.get_or_init(|| { KV_INDEXER_METRICS.get_or_init(|| {
match component.create_intcountervec( match component.metrics().create_intcountervec(
kvrouter::KV_CACHE_EVENTS_APPLIED, kvrouter::KV_CACHE_EVENTS_APPLIED,
"Total number of KV cache events applied to index", "Total number of KV cache events applied to index",
&["event_type", "status"], &["event_type", "status"],
......
...@@ -7,7 +7,7 @@ use crate::kv_router::{ ...@@ -7,7 +7,7 @@ use crate::kv_router::{
protocols::*, protocols::*,
scoring::LoadEvent, scoring::LoadEvent,
}; };
use dynamo_runtime::metrics::{MetricsRegistry, prometheus_names::kvstats}; use dynamo_runtime::metrics::{MetricsHierarchy, prometheus_names::kvstats};
use dynamo_runtime::traits::{DistributedRuntimeProvider, events::EventPublisher}; use dynamo_runtime::traits::{DistributedRuntimeProvider, events::EventPublisher};
use dynamo_runtime::{ use dynamo_runtime::{
Result, Result,
...@@ -700,25 +700,25 @@ struct KvStatsPrometheusGauges { ...@@ -700,25 +700,25 @@ struct KvStatsPrometheusGauges {
impl KvStatsPrometheusGauges { impl KvStatsPrometheusGauges {
/// Create a new KvStatsPrometheusGauges instance with all metrics registered /// Create a new KvStatsPrometheusGauges instance with all metrics registered
fn new(component: &Component) -> Result<Self> { fn new(component: &Component) -> Result<Self> {
let kv_active_blocks_gauge = component.create_gauge( let kv_active_blocks_gauge = component.metrics().create_gauge(
kvstats::ACTIVE_BLOCKS, kvstats::ACTIVE_BLOCKS,
"Number of active KV cache blocks currently in use", "Number of active KV cache blocks currently in use",
&[], &[],
)?; )?;
let kv_total_blocks_gauge = component.create_gauge( let kv_total_blocks_gauge = component.metrics().create_gauge(
kvstats::TOTAL_BLOCKS, kvstats::TOTAL_BLOCKS,
"Total number of KV cache blocks available", "Total number of KV cache blocks available",
&[], &[],
)?; )?;
let gpu_cache_usage_gauge = component.create_gauge( let gpu_cache_usage_gauge = component.metrics().create_gauge(
kvstats::GPU_CACHE_USAGE_PERCENT, kvstats::GPU_CACHE_USAGE_PERCENT,
"GPU cache usage as a percentage (0.0-1.0)", "GPU cache usage as a percentage (0.0-1.0)",
&[], &[],
)?; )?;
let gpu_prefix_cache_hit_rate_gauge = component.create_gauge( let gpu_prefix_cache_hit_rate_gauge = component.metrics().create_gauge(
kvstats::GPU_PREFIX_CACHE_HIT_RATE, kvstats::GPU_PREFIX_CACHE_HIT_RATE,
"GPU prefix cache hit rate as a percentage (0.0-1.0)", "GPU prefix cache hit rate as a percentage (0.0-1.0)",
&[], &[],
...@@ -1333,7 +1333,6 @@ mod test_integration_publisher { ...@@ -1333,7 +1333,6 @@ mod test_integration_publisher {
#[ignore] // Mark as ignored as requested, because CI's integrations still don't have NATS #[ignore] // Mark as ignored as requested, because CI's integrations still don't have NATS
async fn test_kvstats_prometheus_gauge_updates() { async fn test_kvstats_prometheus_gauge_updates() {
use crate::kv_router::publisher::kvstats; use crate::kv_router::publisher::kvstats;
use dynamo_runtime::metrics::MetricsRegistry;
// Test that publish() updates Prometheus gauges correctly using real Component // Test that publish() updates Prometheus gauges correctly using real Component
let publisher = WorkerMetricsPublisher::new().unwrap(); let publisher = WorkerMetricsPublisher::new().unwrap();
...@@ -1388,7 +1387,7 @@ mod test_integration_publisher { ...@@ -1388,7 +1387,7 @@ mod test_integration_publisher {
// Test 4: Verify metrics are properly registered in the component's registry // Test 4: Verify metrics are properly registered in the component's registry
// Component implements MetricsRegistry trait which provides prometheus_expfmt() // Component implements MetricsRegistry trait which provides prometheus_expfmt()
let prometheus_output = component.prometheus_expfmt().unwrap(); let prometheus_output = component.metrics().prometheus_expfmt().unwrap();
// Verify metric names are present // Verify metric names are present
assert!(prometheus_output.contains(kvstats::ACTIVE_BLOCKS)); assert!(prometheus_output.contains(kvstats::ACTIVE_BLOCKS));
......
...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
use dynamo_runtime::{ use dynamo_runtime::{
DistributedRuntime, Result, DistributedRuntime, Result,
metrics::MetricsRegistry, metrics::MetricsHierarchy,
pipeline::{ pipeline::{
AsyncEngine, AsyncEngineContextProvider, Error, ManyOut, ResponseStream, SingleIn, AsyncEngine, AsyncEngineContextProvider, Error, ManyOut, ResponseStream, SingleIn,
async_trait, network::Ingress, async_trait, network::Ingress,
...@@ -33,7 +33,7 @@ pub struct MySystemStatsMetrics { ...@@ -33,7 +33,7 @@ pub struct MySystemStatsMetrics {
impl MySystemStatsMetrics { impl MySystemStatsMetrics {
pub fn from_endpoint(endpoint: &dynamo_runtime::component::Endpoint) -> anyhow::Result<Self> { pub fn from_endpoint(endpoint: &dynamo_runtime::component::Endpoint) -> anyhow::Result<Self> {
let data_bytes_processed = endpoint.create_intcounter( let data_bytes_processed = endpoint.metrics().create_intcounter(
"my_custom_bytes_processed_total", "my_custom_bytes_processed_total",
"Example of a custom metric. Total number of data bytes processed by system handler", "Example of a custom metric. Total number of data bytes processed by system handler",
&[], &[],
......
...@@ -34,7 +34,7 @@ use std::fmt; ...@@ -34,7 +34,7 @@ use std::fmt;
use crate::{ use crate::{
config::HealthStatus, config::HealthStatus,
discovery::Lease, discovery::Lease,
metrics::{MetricsRegistry, prometheus_names}, metrics::{MetricsHierarchy, MetricsRegistry, prometheus_names},
service::ServiceSet, service::ServiceSet,
transports::etcd::{ETCD_ROOT_PATH, EtcdPath}, transports::etcd::{ETCD_ROOT_PATH, EtcdPath},
}; };
...@@ -170,6 +170,10 @@ pub struct Component { ...@@ -170,6 +170,10 @@ pub struct Component {
// A static component's endpoints cannot be discovered via etcd, they are // A static component's endpoints cannot be discovered via etcd, they are
// fixed at startup time. // fixed at startup time.
is_static: bool, is_static: bool,
/// This hierarchy's own metrics registry
#[builder(default = "crate::MetricsRegistry::new()")]
metrics_registry: crate::MetricsRegistry,
} }
impl Hash for Component { impl Hash for Component {
...@@ -208,17 +212,25 @@ impl RuntimeProvider for Component { ...@@ -208,17 +212,25 @@ impl RuntimeProvider for Component {
} }
} }
impl MetricsRegistry for Component { impl MetricsHierarchy for Component {
fn basename(&self) -> String { fn basename(&self) -> String {
self.name.clone() self.name.clone()
} }
fn parent_hierarchy(&self) -> Vec<String> { fn parent_hierarchies(&self) -> Vec<&dyn MetricsHierarchy> {
[ let mut parents = vec![];
self.namespace.parent_hierarchy(),
vec![self.namespace.basename()], // Get all ancestors of namespace (DRT, parent namespaces, etc.)
] parents.extend(self.namespace.parent_hierarchies());
.concat()
// Add namespace itself
parents.push(&self.namespace as &dyn MetricsHierarchy);
parents
}
fn get_metrics_registry(&self) -> &MetricsRegistry {
&self.metrics_registry
} }
} }
...@@ -262,6 +274,7 @@ impl Component { ...@@ -262,6 +274,7 @@ impl Component {
name: endpoint.into(), name: endpoint.into(),
is_static: self.is_static, is_static: self.is_static,
labels: Vec::new(), labels: Vec::new(),
metrics_registry: crate::MetricsRegistry::new(),
} }
} }
...@@ -312,15 +325,6 @@ impl Component { ...@@ -312,15 +325,6 @@ impl Component {
let component_metrics = ComponentNatsServerPrometheusMetrics::new(self)?; let component_metrics = ComponentNatsServerPrometheusMetrics::new(self)?;
let component_clone = self.clone(); let component_clone = self.clone();
let mut hierarchies = self.parent_hierarchy();
hierarchies.push(self.hierarchy());
debug_assert!(
hierarchies
.last()
.map(|x| x.as_str())
.unwrap_or_default()
.eq_ignore_ascii_case(&self.service_name())
); // it happens that in component, hierarchy and service name are the same
// Start a background task that scrapes stats every 5 seconds // Start a background task that scrapes stats every 5 seconds
let m = component_metrics.clone(); let m = component_metrics.clone();
...@@ -434,6 +438,9 @@ pub struct Endpoint { ...@@ -434,6 +438,9 @@ pub struct Endpoint {
/// Additional labels for metrics /// Additional labels for metrics
labels: Vec<(String, String)>, labels: Vec<(String, String)>,
/// This hierarchy's own metrics registry
metrics_registry: crate::MetricsRegistry,
} }
impl Hash for Endpoint { impl Hash for Endpoint {
...@@ -466,17 +473,25 @@ impl RuntimeProvider for Endpoint { ...@@ -466,17 +473,25 @@ impl RuntimeProvider for Endpoint {
} }
} }
impl MetricsRegistry for Endpoint { impl MetricsHierarchy for Endpoint {
fn basename(&self) -> String { fn basename(&self) -> String {
self.name.clone() self.name.clone()
} }
fn parent_hierarchy(&self) -> Vec<String> { fn parent_hierarchies(&self) -> Vec<&dyn MetricsHierarchy> {
[ let mut parents = vec![];
self.component.parent_hierarchy(),
vec![self.component.basename()], // Get all ancestors of component (DRT, Namespace, etc.)
] parents.extend(self.component.parent_hierarchies());
.concat()
// Add component itself
parents.push(&self.component as &dyn MetricsHierarchy);
parents
}
fn get_metrics_registry(&self) -> &MetricsRegistry {
&self.metrics_registry
} }
} }
...@@ -603,6 +618,10 @@ pub struct Namespace { ...@@ -603,6 +618,10 @@ pub struct Namespace {
/// Additional labels for metrics /// Additional labels for metrics
#[builder(default = "Vec::new()")] #[builder(default = "Vec::new()")]
labels: Vec<(String, String)>, labels: Vec<(String, String)>,
/// This hierarchy's own metrics registry
#[builder(default = "crate::MetricsRegistry::new()")]
metrics_registry: crate::MetricsRegistry,
} }
impl DistributedRuntimeProvider for Namespace { impl DistributedRuntimeProvider for Namespace {
......
...@@ -7,7 +7,7 @@ use futures::stream::StreamExt; ...@@ -7,7 +7,7 @@ use futures::stream::StreamExt;
use futures::{Stream, TryStreamExt}; use futures::{Stream, TryStreamExt};
use super::*; use super::*;
use crate::metrics::MetricsRegistry; use crate::metrics::{MetricsHierarchy, MetricsRegistry};
use crate::traits::events::{EventPublisher, EventSubscriber}; use crate::traits::events::{EventPublisher, EventSubscriber};
#[async_trait] #[async_trait]
...@@ -68,25 +68,31 @@ impl EventSubscriber for Namespace { ...@@ -68,25 +68,31 @@ impl EventSubscriber for Namespace {
} }
} }
impl MetricsRegistry for Namespace { impl MetricsHierarchy for Namespace {
fn basename(&self) -> String { fn basename(&self) -> String {
self.name.clone() self.name.clone()
} }
fn parent_hierarchy(&self) -> Vec<String> { fn parent_hierarchies(&self) -> Vec<&dyn MetricsHierarchy> {
// Build as: [ "" (DRT), non-empty parent basenames from root -> leaf ] let mut parents = vec![];
let mut names = vec![String::new()]; // Start with empty string for DRT
// Collect parent basenames from root to leaf // Walk up the namespace parent chain (grandparents to immediate parent)
let parent_names: Vec<String> = let parent_chain: Vec<&Namespace> =
std::iter::successors(self.parent.as_deref(), |ns| ns.parent.as_deref()) std::iter::successors(self.parent.as_deref(), |ns| ns.parent.as_deref()).collect();
.map(|ns| ns.basename())
.filter(|name| !name.is_empty())
.collect();
// Append parent names in reverse order (root to leaf) // Add DRT first (root)
names.extend(parent_names.into_iter().rev()); parents.push(&*self.runtime as &dyn MetricsHierarchy);
names
// Then add parent namespaces in reverse order (root -> leaf)
for parent_ns in parent_chain.iter().rev() {
parents.push(*parent_ns as &dyn MetricsHierarchy);
}
parents
}
fn get_metrics_registry(&self) -> &MetricsRegistry {
&self.metrics_registry
} }
} }
......
...@@ -7,10 +7,11 @@ use crate::storage::key_value_store::{ ...@@ -7,10 +7,11 @@ use crate::storage::key_value_store::{
}; };
use crate::transports::nats::DRTNatsClientPrometheusMetrics; use crate::transports::nats::DRTNatsClientPrometheusMetrics;
use crate::{ use crate::{
ErrorContext, PrometheusUpdateCallback, ErrorContext,
component::{self, ComponentBuilder, Endpoint, InstanceSource, Namespace}, component::{self, ComponentBuilder, Endpoint, InstanceSource, Namespace},
discovery::DiscoveryClient, discovery::DiscoveryClient,
metrics::MetricsRegistry, metrics::PrometheusUpdateCallback,
metrics::{MetricsHierarchy, MetricsRegistry},
service::ServiceClient, service::ServiceClient,
transports::{etcd, nats, tcp}, transports::{etcd, nats, tcp},
}; };
...@@ -25,13 +26,17 @@ use std::collections::HashMap; ...@@ -25,13 +26,17 @@ use std::collections::HashMap;
use tokio::sync::Mutex; use tokio::sync::Mutex;
use tokio_util::sync::CancellationToken; use tokio_util::sync::CancellationToken;
impl MetricsRegistry for DistributedRuntime { impl MetricsHierarchy for DistributedRuntime {
fn basename(&self) -> String { fn basename(&self) -> String {
"".to_string() // drt has no basename. Basename only begins with the Namespace. "".to_string() // drt has no basename. Basename only begins with the Namespace.
} }
fn parent_hierarchy(&self) -> Vec<String> { fn parent_hierarchies(&self) -> Vec<&dyn MetricsHierarchy> {
vec![] // drt is the root, so no parent hierarchy vec![] // drt is the root, so no parent hierarchies
}
fn get_metrics_registry(&self) -> &MetricsRegistry {
&self.metrics_registry
} }
} }
...@@ -89,10 +94,7 @@ impl DistributedRuntime { ...@@ -89,10 +94,7 @@ impl DistributedRuntime {
component_registry: component::Registry::new(), component_registry: component::Registry::new(),
is_static, is_static,
instance_sources: Arc::new(Mutex::new(HashMap::new())), instance_sources: Arc::new(Mutex::new(HashMap::new())),
hierarchy_to_metricsregistry: Arc::new(std::sync::RwLock::new(HashMap::< metrics_registry: crate::MetricsRegistry::new(),
String,
crate::MetricsRegistryEntry,
>::new())),
system_health, system_health,
}; };
...@@ -101,9 +103,7 @@ impl DistributedRuntime { ...@@ -101,9 +103,7 @@ impl DistributedRuntime {
&distributed_runtime, &distributed_runtime,
nats_client_for_metrics.client().clone(), nats_client_for_metrics.client().clone(),
)?; )?;
let mut drt_hierarchies = distributed_runtime.parent_hierarchy(); // Register a callback to update NATS client metrics on the DRT's metrics registry
drt_hierarchies.push(distributed_runtime.hierarchy());
// Register a callback to update NATS client metrics
let nats_client_callback = Arc::new({ let nats_client_callback = Arc::new({
let nats_client_clone = nats_client_metrics.clone(); let nats_client_clone = nats_client_metrics.clone();
move || { move || {
...@@ -112,7 +112,8 @@ impl DistributedRuntime { ...@@ -112,7 +112,8 @@ impl DistributedRuntime {
} }
}); });
distributed_runtime distributed_runtime
.register_prometheus_update_callback(drt_hierarchies, nats_client_callback); .metrics_registry
.add_update_callback(nats_client_callback);
} }
// Initialize the uptime gauge in SystemHealth // Initialize the uptime gauge in SystemHealth
...@@ -301,78 +302,6 @@ impl DistributedRuntime { ...@@ -301,78 +302,6 @@ impl DistributedRuntime {
pub fn instance_sources(&self) -> Arc<Mutex<HashMap<Endpoint, Weak<InstanceSource>>>> { pub fn instance_sources(&self) -> Arc<Mutex<HashMap<Endpoint, Weak<InstanceSource>>>> {
self.instance_sources.clone() self.instance_sources.clone()
} }
/// Add a Prometheus metric to a specific hierarchy's registry. Note that it is possible
/// to register the same metric name multiple times, as long as the labels are different.
pub fn add_prometheus_metric(
&self,
hierarchy: &str,
prometheus_metric: Box<dyn prometheus::core::Collector>,
) -> anyhow::Result<()> {
let mut registries = self.hierarchy_to_metricsregistry.write().unwrap();
let entry = registries.entry(hierarchy.to_string()).or_default();
// Try to register the metric
entry
.prometheus_registry
.register(prometheus_metric)
.map_err(|e| e.into())
}
/// Add a Prometheus update callback to the given hierarchies
/// TODO: rename this to register_callback, once we move the the MetricsRegistry trait
/// out of the runtime, and make it into a composed module.
pub fn register_prometheus_update_callback(
&self,
hierarchies: Vec<String>,
callback: PrometheusUpdateCallback,
) {
let mut registries = self.hierarchy_to_metricsregistry.write().unwrap();
for hierarchy in &hierarchies {
registries
.entry(hierarchy.clone())
.or_default()
.add_prometheus_update_callback(callback.clone());
}
}
/// Execute all Prometheus update callbacks for a given hierarchy and return their results
pub fn execute_prometheus_update_callbacks(&self, hierarchy: &str) -> Vec<anyhow::Result<()>> {
// Clone callbacks while holding read lock (fast operation)
let callbacks = {
let registries = self.hierarchy_to_metricsregistry.read().unwrap();
registries
.get(hierarchy)
.map(|entry| entry.prometheus_update_callbacks.clone())
}; // Read lock released here
// Execute callbacks without holding the lock
match callbacks {
Some(callbacks) => callbacks.iter().map(|callback| callback()).collect(),
None => Vec::new(),
}
}
/// Add a Prometheus exposition text callback that returns Prometheus text for the given hierarchies
pub fn register_prometheus_expfmt_callback(
&self,
hierarchies: Vec<String>,
callback: crate::PrometheusExpositionFormatCallback,
) {
let mut registries = self.hierarchy_to_metricsregistry.write().unwrap();
for hierarchy in &hierarchies {
registries
.entry(hierarchy.clone())
.or_default()
.add_prometheus_expfmt_callback(callback.clone());
}
}
/// Get all registered hierarchy keys. Private because it is only used for testing.
fn get_registered_hierarchies(&self) -> Vec<String> {
let registries = self.hierarchy_to_metricsregistry.read().unwrap();
registries.keys().cloned().collect()
}
} }
#[derive(Dissolve)] #[derive(Dissolve)]
......
...@@ -47,6 +47,7 @@ pub mod worker; ...@@ -47,6 +47,7 @@ pub mod worker;
pub mod distributed; pub mod distributed;
pub use distributed::distributed_test_utils; pub use distributed::distributed_test_utils;
pub use futures::stream; pub use futures::stream;
pub use metrics::MetricsRegistry;
pub use system_health::{HealthCheckTarget, SystemHealth}; pub use system_health::{HealthCheckTarget, SystemHealth};
pub use tokio_util::sync::CancellationToken; pub use tokio_util::sync::CancellationToken;
pub use worker::Worker; pub use worker::Worker;
...@@ -81,104 +82,6 @@ pub struct Runtime { ...@@ -81,104 +82,6 @@ pub struct Runtime {
block_in_place_permits: Option<Arc<tokio::sync::Semaphore>>, block_in_place_permits: Option<Arc<tokio::sync::Semaphore>>,
} }
/// Type alias for runtime callback functions to reduce complexity
///
/// This type represents an Arc-wrapped callback function that can be:
/// - Shared efficiently across multiple threads and contexts
/// - Cloned without duplicating the underlying closure
/// - Used in generic contexts requiring 'static lifetime
///
/// The Arc wrapper is included in the type to make sharing explicit.
type PrometheusUpdateCallback = Arc<dyn Fn() -> anyhow::Result<()> + Send + Sync + 'static>;
/// Type alias for exposition text callback functions that return Prometheus text
type PrometheusExpositionFormatCallback =
Arc<dyn Fn() -> anyhow::Result<String> + Send + Sync + 'static>;
/// Structure to hold Prometheus registries and associated callbacks for a given hierarchy
pub struct MetricsRegistryEntry {
/// The Prometheus registry for this prefix
pub prometheus_registry: prometheus::Registry,
/// List of update callbacks invoked before metrics are scraped
pub prometheus_update_callbacks: Vec<PrometheusUpdateCallback>,
/// List of callbacks that return Prometheus exposition text to be appended to metrics output
pub prometheus_expfmt_callbacks: Vec<PrometheusExpositionFormatCallback>,
}
impl MetricsRegistryEntry {
/// Create a new metrics registry entry with an empty registry and no callbacks
pub fn new() -> Self {
Self {
prometheus_registry: prometheus::Registry::new(),
prometheus_update_callbacks: Vec::new(),
prometheus_expfmt_callbacks: Vec::new(),
}
}
/// Add a callback function that receives a reference to any MetricsRegistry
pub fn add_prometheus_update_callback(&mut self, callback: PrometheusUpdateCallback) {
self.prometheus_update_callbacks.push(callback);
}
/// Add an exposition text callback that returns Prometheus text
pub fn add_prometheus_expfmt_callback(&mut self, callback: PrometheusExpositionFormatCallback) {
self.prometheus_expfmt_callbacks.push(callback);
}
/// Execute all update callbacks and return their results
pub fn execute_prometheus_update_callbacks(&self) -> Vec<anyhow::Result<()>> {
self.prometheus_update_callbacks
.iter()
.map(|callback| callback())
.collect()
}
/// Execute all exposition text callbacks and return their concatenated text
pub fn execute_prometheus_expfmt_callbacks(&self) -> String {
let mut result = String::new();
for callback in &self.prometheus_expfmt_callbacks {
match callback() {
Ok(text) => {
if !text.is_empty() {
if !result.is_empty() && !result.ends_with('\n') {
result.push('\n');
}
result.push_str(&text);
}
}
Err(e) => {
tracing::error!("Error executing exposition text callback: {}", e);
}
}
}
result
}
/// Returns true if a metric with the given name already exists in the Prometheus registry
pub fn has_metric_named(&self, metric_name: &str) -> bool {
self.prometheus_registry
.gather()
.iter()
.any(|mf| mf.name() == metric_name)
}
}
impl Default for MetricsRegistryEntry {
fn default() -> Self {
Self::new()
}
}
impl Clone for MetricsRegistryEntry {
fn clone(&self) -> Self {
Self {
prometheus_registry: self.prometheus_registry.clone(),
prometheus_update_callbacks: Vec::new(), // Callbacks cannot be cloned, so we start with an empty list
prometheus_expfmt_callbacks: Vec::new(), // Callbacks cannot be cloned, so we start with an empty list
}
}
}
/// Distributed [Runtime] which provides access to shared resources across the cluster, this includes /// Distributed [Runtime] which provides access to shared resources across the cluster, this includes
/// communication protocols and transports. /// communication protocols and transports.
#[derive(Clone)] #[derive(Clone)]
...@@ -209,7 +112,6 @@ pub struct DistributedRuntime { ...@@ -209,7 +112,6 @@ pub struct DistributedRuntime {
// Health Status // Health Status
system_health: Arc<parking_lot::Mutex<SystemHealth>>, system_health: Arc<parking_lot::Mutex<SystemHealth>>,
// This map associates metric prefixes with their corresponding Prometheus registries and callbacks. // This hierarchy's own metrics registry
// Uses RwLock for better concurrency - multiple threads can read (execute callbacks) simultaneously. metrics_registry: MetricsRegistry,
hierarchy_to_metricsregistry: Arc<std::sync::RwLock<HashMap<String, MetricsRegistryEntry>>>,
} }
This diff is collapsed.
...@@ -31,7 +31,7 @@ use ingress::push_handler::WorkHandlerMetrics; ...@@ -31,7 +31,7 @@ use ingress::push_handler::WorkHandlerMetrics;
pub const STREAM_ERR_MSG: &str = "Stream ended before generation completed"; pub const STREAM_ERR_MSG: &str = "Stream ended before generation completed";
// Add Prometheus metrics types // Add Prometheus metrics types
use crate::metrics::MetricsRegistry; use crate::metrics::MetricsHierarchy;
use prometheus::{CounterVec, Histogram, IntCounter, IntCounterVec, IntGauge}; use prometheus::{CounterVec, Histogram, IntCounter, IntCounterVec, IntGauge};
pub trait Codable: PipelineIO + Serialize + for<'de> Deserialize<'de> {} pub trait Codable: PipelineIO + Serialize + for<'de> Deserialize<'de> {}
......
...@@ -47,38 +47,39 @@ impl WorkHandlerMetrics { ...@@ -47,38 +47,39 @@ impl WorkHandlerMetrics {
metrics_labels: Option<&[(&str, &str)]>, metrics_labels: Option<&[(&str, &str)]>,
) -> Result<Self, Box<dyn std::error::Error + Send + Sync>> { ) -> Result<Self, Box<dyn std::error::Error + Send + Sync>> {
let metrics_labels = metrics_labels.unwrap_or(&[]); let metrics_labels = metrics_labels.unwrap_or(&[]);
let request_counter = endpoint.create_intcounter( let metrics = endpoint.metrics();
let request_counter = metrics.create_intcounter(
work_handler::REQUESTS_TOTAL, work_handler::REQUESTS_TOTAL,
"Total number of requests processed by work handler", "Total number of requests processed by work handler",
metrics_labels, metrics_labels,
)?; )?;
let request_duration = endpoint.create_histogram( let request_duration = metrics.create_histogram(
work_handler::REQUEST_DURATION_SECONDS, work_handler::REQUEST_DURATION_SECONDS,
"Time spent processing requests by work handler", "Time spent processing requests by work handler",
metrics_labels, metrics_labels,
None, None,
)?; )?;
let inflight_requests = endpoint.create_intgauge( let inflight_requests = metrics.create_intgauge(
work_handler::INFLIGHT_REQUESTS, work_handler::INFLIGHT_REQUESTS,
"Number of requests currently being processed by work handler", "Number of requests currently being processed by work handler",
metrics_labels, metrics_labels,
)?; )?;
let request_bytes = endpoint.create_intcounter( let request_bytes = metrics.create_intcounter(
work_handler::REQUEST_BYTES_TOTAL, work_handler::REQUEST_BYTES_TOTAL,
"Total number of bytes received in requests by work handler", "Total number of bytes received in requests by work handler",
metrics_labels, metrics_labels,
)?; )?;
let response_bytes = endpoint.create_intcounter( let response_bytes = metrics.create_intcounter(
work_handler::RESPONSE_BYTES_TOTAL, work_handler::RESPONSE_BYTES_TOTAL,
"Total number of bytes sent in responses by work handler", "Total number of bytes sent in responses by work handler",
metrics_labels, metrics_labels,
)?; )?;
let error_counter = endpoint.create_intcountervec( let error_counter = metrics.create_intcountervec(
work_handler::ERRORS_TOTAL, work_handler::ERRORS_TOTAL,
"Total number of errors in work handler processing", "Total number of errors in work handler processing",
&[work_handler::ERROR_TYPE_LABEL], &[work_handler::ERROR_TYPE_LABEL],
......
...@@ -11,7 +11,7 @@ use crate::{ ...@@ -11,7 +11,7 @@ use crate::{
DistributedRuntime, Result, DistributedRuntime, Result,
component::Component, component::Component,
error, error,
metrics::{MetricsRegistry, prometheus_names, prometheus_names::nats_service}, metrics::{MetricsHierarchy, prometheus_names, prometheus_names::nats_service},
traits::*, traits::*,
transports::nats, transports::nats,
utils::stream, utils::stream,
...@@ -339,37 +339,37 @@ impl ComponentNatsServerPrometheusMetrics { ...@@ -339,37 +339,37 @@ impl ComponentNatsServerPrometheusMetrics {
let labels: &[(&str, &str)] = &labels_vec; let labels: &[(&str, &str)] = &labels_vec;
let service_processing_ms_avg = component.create_gauge( let service_processing_ms_avg = component.metrics().create_gauge(
nats_service::PROCESSING_MS_AVG, nats_service::PROCESSING_MS_AVG,
"Average processing time across all component endpoints in milliseconds", "Average processing time across all component endpoints in milliseconds",
labels, labels,
)?; )?;
let service_errors_total = component.create_intgauge( let service_errors_total = component.metrics().create_intgauge(
nats_service::ERRORS_TOTAL, nats_service::ERRORS_TOTAL,
"Total number of errors across all component endpoints", "Total number of errors across all component endpoints",
labels, labels,
)?; )?;
let service_requests_total = component.create_intgauge( let service_requests_total = component.metrics().create_intgauge(
nats_service::REQUESTS_TOTAL, nats_service::REQUESTS_TOTAL,
"Total number of requests across all component endpoints", "Total number of requests across all component endpoints",
labels, labels,
)?; )?;
let service_processing_ms_total = component.create_intgauge( let service_processing_ms_total = component.metrics().create_intgauge(
nats_service::PROCESSING_MS_TOTAL, nats_service::PROCESSING_MS_TOTAL,
"Total processing time across all component endpoints in milliseconds", "Total processing time across all component endpoints in milliseconds",
labels, labels,
)?; )?;
let service_active_services = component.create_intgauge( let service_active_services = component.metrics().create_intgauge(
nats_service::ACTIVE_SERVICES, nats_service::ACTIVE_SERVICES,
"Number of active services in this component", "Number of active services in this component",
labels, labels,
)?; )?;
let service_active_endpoints = component.create_intgauge( let service_active_endpoints = component.metrics().create_intgauge(
nats_service::ACTIVE_ENDPOINTS, nats_service::ACTIVE_ENDPOINTS,
"Number of active endpoints across all services", "Number of active endpoints across all services",
labels, labels,
......
...@@ -24,7 +24,7 @@ use tokio::sync::mpsc; ...@@ -24,7 +24,7 @@ use tokio::sync::mpsc;
use crate::component; use crate::component;
use crate::config::HealthStatus; use crate::config::HealthStatus;
use crate::metrics::prometheus_names::distributed_runtime; use crate::metrics::{MetricsHierarchy, prometheus_names::distributed_runtime};
/// Health check target containing instance info and payload /// Health check target containing instance info and payload
#[derive(Clone, Debug)] #[derive(Clone, Debug)]
...@@ -242,11 +242,8 @@ impl SystemHealth { ...@@ -242,11 +242,8 @@ impl SystemHealth {
} }
/// Initialize the uptime gauge using the provided metrics registry /// Initialize the uptime gauge using the provided metrics registry
pub fn initialize_uptime_gauge<T: crate::metrics::MetricsRegistry>( pub fn initialize_uptime_gauge<T: MetricsHierarchy>(&self, registry: &T) -> anyhow::Result<()> {
&self, let gauge = registry.metrics().create_gauge(
registry: &T,
) -> anyhow::Result<()> {
let gauge = registry.create_gauge(
distributed_runtime::UPTIME_SECONDS, distributed_runtime::UPTIME_SECONDS,
"Total uptime of the DistributedRuntime in seconds", "Total uptime of the DistributedRuntime in seconds",
&[], &[],
......
...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
use crate::config::HealthStatus; use crate::config::HealthStatus;
use crate::logging::make_request_span; use crate::logging::make_request_span;
use crate::metrics::MetricsRegistry; use crate::metrics::MetricsHierarchy;
use crate::metrics::prometheus_names::{nats_client, nats_service}; use crate::metrics::prometheus_names::{nats_client, nats_service};
use crate::traits::DistributedRuntimeProvider; use crate::traits::DistributedRuntimeProvider;
use axum::{Router, http::StatusCode, response::IntoResponse, routing::get}; use axum::{Router, http::StatusCode, response::IntoResponse, routing::get};
...@@ -186,27 +186,12 @@ async fn metrics_handler(state: Arc<SystemStatusState>) -> impl IntoResponse { ...@@ -186,27 +186,12 @@ async fn metrics_handler(state: Arc<SystemStatusState>) -> impl IntoResponse {
// Update the uptime gauge with current value // Update the uptime gauge with current value
state.drt().system_health.lock().update_uptime_gauge(); state.drt().system_health.lock().update_uptime_gauge();
// Execute all the callbacks for all registered hierarchies // Get all metrics from DistributedRuntime
let all_hierarchies: Vec<String> = { // Note: In the new hierarchy-based architecture, metrics are automatically registered
let registries = state.drt().hierarchy_to_metricsregistry.read().unwrap(); // at all parent levels, so DRT's metrics include all metrics from children
registries.keys().cloned().collect() // (Namespace, Component, Endpoint). The prometheus_expfmt() method also executes
}; // all update callbacks and expfmt callbacks before returning the metrics.
let response = match state.drt().metrics().prometheus_expfmt() {
for hierarchy in &all_hierarchies {
let callback_results = state.drt().execute_prometheus_update_callbacks(hierarchy);
for result in callback_results {
if let Err(e) = result {
tracing::error!(
"Error executing metrics callback for hierarchy '{}': {}",
hierarchy,
e
);
}
}
}
// Get all metrics from DistributedRuntime (top-level)
let mut response = match state.drt().prometheus_expfmt() {
Ok(r) => r, Ok(r) => r,
Err(e) => { Err(e) => {
tracing::error!("Failed to get metrics from registry: {}", e); tracing::error!("Failed to get metrics from registry: {}", e);
...@@ -217,25 +202,6 @@ async fn metrics_handler(state: Arc<SystemStatusState>) -> impl IntoResponse { ...@@ -217,25 +202,6 @@ async fn metrics_handler(state: Arc<SystemStatusState>) -> impl IntoResponse {
} }
}; };
// Collect and append Prometheus exposition text from all hierarchies
for hierarchy in &all_hierarchies {
let expfmt = {
let registries = state.drt().hierarchy_to_metricsregistry.read().unwrap();
if let Some(entry) = registries.get(hierarchy) {
entry.execute_prometheus_expfmt_callbacks()
} else {
String::new()
}
};
if !expfmt.is_empty() {
if !response.ends_with('\n') {
response.push('\n');
}
response.push_str(&expfmt);
}
}
(StatusCode::OK, response) (StatusCode::OK, response)
} }
...@@ -281,7 +247,7 @@ mod tests { ...@@ -281,7 +247,7 @@ mod tests {
mod integration_tests { mod integration_tests {
use super::*; use super::*;
use crate::distributed::distributed_test_utils::create_test_drt_async; use crate::distributed::distributed_test_utils::create_test_drt_async;
use crate::metrics::MetricsRegistry; use crate::metrics::MetricsHierarchy;
use anyhow::Result; use anyhow::Result;
use rstest::rstest; use rstest::rstest;
use std::sync::Arc; use std::sync::Arc;
...@@ -315,7 +281,7 @@ mod integration_tests { ...@@ -315,7 +281,7 @@ mod integration_tests {
// so we don't need to create it again here // so we don't need to create it again here
// The uptime_seconds metric should already be registered and available // The uptime_seconds metric should already be registered and available
let response = drt.prometheus_expfmt().unwrap(); let response = drt.metrics().prometheus_expfmt().unwrap();
println!("Full metrics response:\n{}", response); println!("Full metrics response:\n{}", response);
// Filter out NATS client metrics for comparison // Filter out NATS client metrics for comparison
......
...@@ -20,11 +20,9 @@ impl RuntimeProvider for DistributedRuntime { ...@@ -20,11 +20,9 @@ impl RuntimeProvider for DistributedRuntime {
} }
} }
// This implementation is required because: // This implementation allows DistributedRuntime to provide access to itself
// 1. MetricsRegistry has a supertrait bound: `MetricsRegistry: Send + Sync + DistributedRuntimeProvider` // when used in contexts that require DistributedRuntimeProvider.
// 2. DistributedRuntime implements MetricsRegistry (in distributed.rs) // Components, Namespaces, and Endpoints use this trait to access their DRT.
// 3. Therefore, DistributedRuntime must implement DistributedRuntimeProvider to satisfy the trait bound
// 4. This enables DistributedRuntime to serve as both a provider (of itself) and a metrics registry
impl DistributedRuntimeProvider for DistributedRuntime { impl DistributedRuntimeProvider for DistributedRuntime {
fn drt(&self) -> &DistributedRuntime { fn drt(&self) -> &DistributedRuntime {
self self
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment