"lib/vscode:/vscode.git/clone" did not exist on "f51ec24d6fcda650ca32a5475d903ef3bbed7a7b"
Unverified Commit f4a3a6b6 authored by Keiven C's avatar Keiven C Committed by GitHub
Browse files

refactor: standardize Prometheus metric naming conventions (part 1) (#3035)


Signed-off-by: default avatarKeiven Chang <keivenchang@users.noreply.github.com>
parent cab23f21
...@@ -70,8 +70,8 @@ Some components expose additional metrics specific to their functionality: ...@@ -70,8 +70,8 @@ Some components expose additional metrics specific to their functionality:
When using Dynamo HTTP Frontend (`--framework VLLM` or `--framework TRTLLM`), these metrics are automatically exposed with the `dynamo_frontend_*` prefix and include `model` labels containing the model name: When using Dynamo HTTP Frontend (`--framework VLLM` or `--framework TRTLLM`), these metrics are automatically exposed with the `dynamo_frontend_*` prefix and include `model` labels containing the model name:
- `dynamo_frontend_inflight_requests_total`: Inflight requests (gauge) - `dynamo_frontend_inflight_requests`: Inflight requests (gauge)
- `dynamo_frontend_queued_requests_total`: Number of requests in HTTP processing queue (gauge) - `dynamo_frontend_queued_requests`: Number of requests in HTTP processing queue (gauge)
- `dynamo_frontend_input_sequence_tokens`: Input sequence length (histogram) - `dynamo_frontend_input_sequence_tokens`: Input sequence length (histogram)
- `dynamo_frontend_inter_token_latency_seconds`: Inter-token latency (histogram) - `dynamo_frontend_inter_token_latency_seconds`: Inter-token latency (histogram)
- `dynamo_frontend_output_sequence_tokens`: Output sequence length (histogram) - `dynamo_frontend_output_sequence_tokens`: Output sequence length (histogram)
...@@ -79,6 +79,8 @@ When using Dynamo HTTP Frontend (`--framework VLLM` or `--framework TRTLLM`), th ...@@ -79,6 +79,8 @@ When using Dynamo HTTP Frontend (`--framework VLLM` or `--framework TRTLLM`), th
- `dynamo_frontend_requests_total`: Total LLM requests (counter) - `dynamo_frontend_requests_total`: Total LLM requests (counter)
- `dynamo_frontend_time_to_first_token_seconds`: Time to first token (histogram) - `dynamo_frontend_time_to_first_token_seconds`: Time to first token (histogram)
**Note**: The `dynamo_frontend_inflight_requests` metric tracks requests from HTTP handler start until the complete response is finished, while `dynamo_frontend_queued_requests` tracks requests from HTTP handler start until first token generation begins (including prefill time). HTTP queue time is a subset of inflight time.
##### Model Configuration Metrics ##### Model Configuration Metrics
The frontend also exposes model configuration metrics with the `dynamo_frontend_model_*` prefix. These metrics are populated from the worker backend registration service when workers register with the system: The frontend also exposes model configuration metrics with the `dynamo_frontend_model_*` prefix. These metrics are populated from the worker backend registration service when workers register with the system:
...@@ -91,7 +93,7 @@ These metrics come from the runtime configuration provided by worker backends du ...@@ -91,7 +93,7 @@ These metrics come from the runtime configuration provided by worker backends du
- `dynamo_frontend_model_max_num_batched_tokens`: Maximum number of batched tokens for a worker serving the model (gauge) - `dynamo_frontend_model_max_num_batched_tokens`: Maximum number of batched tokens for a worker serving the model (gauge)
**MDC Metrics (from ModelDeploymentCard):** **MDC Metrics (from ModelDeploymentCard):**
These metrics come from the Model Deployment Card information provided by worker backends during registration. These metrics come from the Model Deployment Card information provided by worker backends during registration. Note that when multiple worker instances register with the same model name, only the first instance's configuration metrics (runtime config and MDC metrics) will be populated. Subsequent instances with duplicate model names will be skipped for configuration metric updates, though the worker count metric will reflect all instances.
- `dynamo_frontend_model_context_length`: Maximum context length for a worker serving the model (gauge) - `dynamo_frontend_model_context_length`: Maximum context length for a worker serving the model (gauge)
- `dynamo_frontend_model_kv_cache_block_size`: KV cache block size for a worker serving the model (gauge) - `dynamo_frontend_model_kv_cache_block_size`: KV cache block size for a worker serving the model (gauge)
...@@ -100,10 +102,6 @@ These metrics come from the Model Deployment Card information provided by worker ...@@ -100,10 +102,6 @@ These metrics come from the Model Deployment Card information provided by worker
**Worker Management Metrics:** **Worker Management Metrics:**
- `dynamo_frontend_model_workers`: Number of worker instances currently serving the model (gauge) - `dynamo_frontend_model_workers`: Number of worker instances currently serving the model (gauge)
**Important Notes:**
- The `dynamo_frontend_inflight_requests_total` metric tracks requests from HTTP handler start until the complete response is finished, while `dynamo_frontend_queued_requests_total` tracks requests from HTTP handler start until first token generation begins (including prefill time). HTTP queue time is a subset of inflight time.
- **Model Name Deduplication**: When multiple worker instances register with the same model name, only the first instance's configuration metrics (runtime config and MDC metrics) will be populated. Subsequent instances with duplicate model names will be skipped for configuration metric updates, though the worker count metric will reflect all instances.
#### Request Processing Flow #### Request Processing Flow
This section explains the distinction between two key metrics used to track request processing: This section explains the distinction between two key metrics used to track request processing:
...@@ -148,10 +146,10 @@ Try launching a frontend and a Mocker backend that allows 3 concurrent requests: ...@@ -148,10 +146,10 @@ Try launching a frontend and a Mocker backend that allows 3 concurrent requests:
$ python -m dynamo.frontend --http-port 8000 $ python -m dynamo.frontend --http-port 8000
$ python -m dynamo.mocker --model-path Qwen/Qwen3-0.6B --max-num-seqs 3 $ python -m dynamo.mocker --model-path Qwen/Qwen3-0.6B --max-num-seqs 3
# Launch your 10 concurrent clients here # Launch your 10 concurrent clients here
# Then check the queued_requests_total and inflight_requests_total metrics from the frontend: # Then check the queued_requests and inflight_requests metrics from the frontend:
$ curl -s localhost:8000/metrics|grep -v '^#'|grep -E 'queue|inflight' $ curl -s localhost:8000/metrics|grep -v '^#'|grep -E 'queue|inflight'
dynamo_frontend_queued_requests_total{model="qwen/qwen3-0.6b"} 7 dynamo_frontend_queued_requests{model="qwen/qwen3-0.6b"} 7
dynamo_frontend_inflight_requests_total{model="qwen/qwen3-0.6b"} 10 dynamo_frontend_inflight_requests{model="qwen/qwen3-0.6b"} 10
``` ```
**Real setup using vLLM (instead of Mocker):** **Real setup using vLLM (instead of Mocker):**
...@@ -294,8 +292,8 @@ let component = namespace.component("my_component")?; ...@@ -294,8 +292,8 @@ let component = namespace.component("my_component")?;
let endpoint = component.endpoint("my_endpoint")?; let endpoint = component.endpoint("my_endpoint")?;
// Create endpoint-level counters (this is a Prometheus Counter type) // Create endpoint-level counters (this is a Prometheus Counter type)
let total_requests = endpoint.create_counter( let requests_total = endpoint.create_counter(
"total_requests", "requests_total",
"Total requests across all namespaces", "Total requests across all namespaces",
&[] &[]
)?; )?;
...@@ -472,8 +470,8 @@ let latency = endpoint.create_histogram( ...@@ -472,8 +470,8 @@ let latency = endpoint.create_histogram(
```rust ```rust
// Aggregate metrics across multiple endpoints // Aggregate metrics across multiple endpoints
let total_requests = namespace.create_counter( let requests_total = namespace.create_counter(
"total_requests", "requests_total",
"Total requests across all endpoints", "Total requests across all endpoints",
&[] &[]
)?; )?;
......
...@@ -28,13 +28,32 @@ ...@@ -28,13 +28,32 @@
//! # Access metrics directly (no constructor call needed!) //! # Access metrics directly (no constructor call needed!)
//! frontend = prometheus_names.frontend //! frontend = prometheus_names.frontend
//! print(frontend.requests_total) # "dynamo_frontend_requests_total" //! print(frontend.requests_total) # "dynamo_frontend_requests_total"
//! print(frontend.queued_requests) # "dynamo_frontend_queued_requests"
//! print(frontend.inflight_requests) # "dynamo_frontend_inflight_requests"
//! print(frontend.disconnected_clients) # "dynamo_frontend_disconnected_clients"
//! print(frontend.request_duration_seconds) # "dynamo_frontend_request_duration_seconds" //! print(frontend.request_duration_seconds) # "dynamo_frontend_request_duration_seconds"
//! print(frontend.input_sequence_tokens) # "dynamo_frontend_input_sequence_tokens"
//! print(frontend.output_sequence_tokens) # "dynamo_frontend_output_sequence_tokens"
//! print(frontend.time_to_first_token_seconds) # "dynamo_frontend_time_to_first_token_seconds"
//! print(frontend.inter_token_latency_seconds) # "dynamo_frontend_inter_token_latency_seconds" //! print(frontend.inter_token_latency_seconds) # "dynamo_frontend_inter_token_latency_seconds"
//! print(frontend.model_context_length) # "dynamo_frontend_model_context_length"
//! print(frontend.model_kv_cache_block_size) # "dynamo_frontend_model_kv_cache_block_size"
//! print(frontend.model_migration_limit) # "dynamo_frontend_model_migration_limit"
//! //!
//! work_handler = prometheus_names.work_handler //! work_handler = prometheus_names.work_handler
//! print(work_handler.requests_total) # "dynamo_component_requests_total" //! print(work_handler.requests_total) # "dynamo_component_requests_total"
//! print(work_handler.request_bytes_total) # "dynamo_component_request_bytes_total"
//! print(work_handler.response_bytes_total) # "dynamo_component_response_bytes_total"
//! print(work_handler.inflight_requests) # "dynamo_component_inflight_requests"
//! print(work_handler.request_duration_seconds) # "dynamo_component_request_duration_seconds"
//! print(work_handler.errors_total) # "dynamo_component_errors_total" //! print(work_handler.errors_total) # "dynamo_component_errors_total"
//! //!
//! kvstats = prometheus_names.kvstats
//! print(kvstats.active_blocks) # "kvstats_active_blocks"
//! print(kvstats.total_blocks) # "kvstats_total_blocks"
//! print(kvstats.gpu_cache_usage_percent) # "kvstats_gpu_cache_usage_percent"
//! print(kvstats.gpu_prefix_cache_hit_rate) # "kvstats_gpu_prefix_cache_hit_rate"
//!
//! # Use in Prometheus queries //! # Use in Prometheus queries
//! query = f"rate({frontend.requests_total}[5m])" //! query = f"rate({frontend.requests_total}[5m])"
//! pattern = rf'{work_handler.requests_total}\{{[^}}]*model="[^"]*"[^}}]*\}}' //! pattern = rf'{work_handler.requests_total}\{{[^}}]*model="[^"]*"[^}}]*\}}'
...@@ -60,6 +79,12 @@ impl PrometheusNames { ...@@ -60,6 +79,12 @@ impl PrometheusNames {
fn work_handler(&self) -> WorkHandler { fn work_handler(&self) -> WorkHandler {
WorkHandler WorkHandler
} }
/// KV stats metrics
#[getter]
fn kvstats(&self) -> KvStatsMetrics {
KvStatsMetrics
}
} }
/// Frontend service metrics (LLM HTTP service) /// Frontend service metrics (LLM HTTP service)
...@@ -86,21 +111,21 @@ impl FrontendService { ...@@ -86,21 +111,21 @@ impl FrontendService {
/// Number of requests waiting in HTTP queue before receiving the first response /// Number of requests waiting in HTTP queue before receiving the first response
#[getter] #[getter]
fn queued_requests_total(&self) -> String { fn queued_requests(&self) -> String {
format!( format!(
"{}_{}", "{}_{}",
name_prefix::FRONTEND, name_prefix::FRONTEND,
frontend_service::QUEUED_REQUESTS_TOTAL frontend_service::QUEUED_REQUESTS
) )
} }
/// Number of inflight requests going to the engine (vLLM, SGLang, ...) /// Number of inflight requests going to the engine (vLLM, SGLang, ...)
#[getter] #[getter]
fn inflight_requests_total(&self) -> String { fn inflight_requests(&self) -> String {
format!( format!(
"{}_{}", "{}_{}",
name_prefix::FRONTEND, name_prefix::FRONTEND,
frontend_service::INFLIGHT_REQUESTS_TOTAL frontend_service::INFLIGHT_REQUESTS
) )
} }
...@@ -153,6 +178,76 @@ impl FrontendService { ...@@ -153,6 +178,76 @@ impl FrontendService {
frontend_service::INTER_TOKEN_LATENCY_SECONDS frontend_service::INTER_TOKEN_LATENCY_SECONDS
) )
} }
/// Number of disconnected clients
#[getter]
fn disconnected_clients(&self) -> String {
format!(
"{}_{}",
name_prefix::FRONTEND,
frontend_service::DISCONNECTED_CLIENTS
)
}
/// Model total KV blocks
#[getter]
fn model_total_kv_blocks(&self) -> String {
format!(
"{}_{}",
name_prefix::FRONTEND,
frontend_service::MODEL_TOTAL_KV_BLOCKS
)
}
/// Model max number of sequences
#[getter]
fn model_max_num_seqs(&self) -> String {
format!(
"{}_{}",
name_prefix::FRONTEND,
frontend_service::MODEL_MAX_NUM_SEQS
)
}
/// Model max number of batched tokens
#[getter]
fn model_max_num_batched_tokens(&self) -> String {
format!(
"{}_{}",
name_prefix::FRONTEND,
frontend_service::MODEL_MAX_NUM_BATCHED_TOKENS
)
}
/// Model context length
#[getter]
fn model_context_length(&self) -> String {
format!(
"{}_{}",
name_prefix::FRONTEND,
frontend_service::MODEL_CONTEXT_LENGTH
)
}
/// Model KV cache block size
#[getter]
fn model_kv_cache_block_size(&self) -> String {
format!(
"{}_{}",
name_prefix::FRONTEND,
frontend_service::MODEL_KV_CACHE_BLOCK_SIZE
)
}
/// Model migration limit
#[getter]
fn model_migration_limit(&self) -> String {
format!(
"{}_{}",
name_prefix::FRONTEND,
frontend_service::MODEL_MIGRATION_LIMIT
)
}
} }
/// Work handler metrics (component request processing) /// Work handler metrics (component request processing)
...@@ -219,11 +314,44 @@ impl WorkHandler { ...@@ -219,11 +314,44 @@ impl WorkHandler {
} }
} }
/// KV stats metrics (KV cache statistics)
/// These methods return the metric names with the "kvstats_" prefix
#[pyclass]
pub struct KvStatsMetrics;
#[pymethods]
impl KvStatsMetrics {
/// Number of active KV cache blocks currently in use
#[getter]
fn active_blocks(&self) -> String {
kvstats::ACTIVE_BLOCKS.to_string()
}
/// Total number of KV cache blocks available
#[getter]
fn total_blocks(&self) -> String {
kvstats::TOTAL_BLOCKS.to_string()
}
/// GPU cache usage as a percentage (0.0-1.0)
#[getter]
fn gpu_cache_usage_percent(&self) -> String {
kvstats::GPU_CACHE_USAGE_PERCENT.to_string()
}
/// GPU prefix cache hit rate as a percentage (0.0-1.0)
#[getter]
fn gpu_prefix_cache_hit_rate(&self) -> String {
kvstats::GPU_PREFIX_CACHE_HIT_RATE.to_string()
}
}
/// Add prometheus_names module to the Python bindings /// Add prometheus_names module to the Python bindings
pub fn add_to_module(m: &Bound<'_, PyModule>) -> PyResult<()> { pub fn add_to_module(m: &Bound<'_, PyModule>) -> PyResult<()> {
m.add_class::<PrometheusNames>()?; m.add_class::<PrometheusNames>()?;
m.add_class::<FrontendService>()?; m.add_class::<FrontendService>()?;
m.add_class::<WorkHandler>()?; m.add_class::<WorkHandler>()?;
m.add_class::<KvStatsMetrics>()?;
// Add a module-level singleton instance for convenience // Add a module-level singleton instance for convenience
let prometheus_names_instance = PrometheusNames; let prometheus_names_instance = PrometheusNames;
......
...@@ -12,6 +12,9 @@ from typing import ( ...@@ -12,6 +12,9 @@ from typing import (
Tuple, Tuple,
) )
# Prometheus metric names are defined in a separate module
from ._prometheus_names import prometheus_names
def log_message(level: str, message: str, module: str, file: str, line: int) -> None: def log_message(level: str, message: str, module: str, file: str, line: int) -> None:
""" """
Log a message from Python with file and line info Log a message from Python with file and line info
...@@ -1376,134 +1379,7 @@ class VirtualConnectorClient: ...@@ -1376,134 +1379,7 @@ class VirtualConnectorClient:
"""Blocks until there is a new decision to fetch using 'get'""" """Blocks until there is a new decision to fetch using 'get'"""
... ...
class PrometheusNames: __all__ = [
""" # ... existing exports ...
Main container for all Prometheus metric name constants "prometheus_names"
""" ]
@property
def frontend(self) -> FrontendService:
"""
Frontend service metrics
"""
...
@property
def work_handler(self) -> WorkHandler:
"""
Work handler metrics
"""
...
class FrontendService:
"""
Frontend service metrics (LLM HTTP service)
These methods return the full metric names with the "dynamo_frontend_" prefix
"""
@property
def requests_total(self) -> str:
"""
Total number of LLM requests processed
"""
...
@property
def queued_requests_total(self) -> str:
"""
Number of requests waiting in HTTP queue before receiving the first response
"""
...
@property
def inflight_requests_total(self) -> str:
"""
Number of inflight requests going to the engine (vLLM, SGLang, ...)
"""
...
@property
def request_duration_seconds(self) -> str:
"""
Duration of LLM requests
"""
...
@property
def input_sequence_tokens(self) -> str:
"""
Input sequence length in tokens
"""
...
@property
def output_sequence_tokens(self) -> str:
"""
Output sequence length in tokens
"""
...
@property
def time_to_first_token_seconds(self) -> str:
"""
Time to first token in seconds
"""
...
@property
def inter_token_latency_seconds(self) -> str:
"""
Inter-token latency in seconds
"""
...
class WorkHandler:
"""
Work handler metrics (component request processing)
These methods return the full metric names with the "dynamo_component_" prefix
"""
@property
def requests_total(self) -> str:
"""
Total number of requests processed by work handler
"""
...
@property
def request_bytes_total(self) -> str:
"""
Total number of bytes received in requests by work handler
"""
...
@property
def response_bytes_total(self) -> str:
"""
Total number of bytes sent in responses by work handler
"""
...
@property
def inflight_requests(self) -> str:
"""
Number of requests currently being processed by work handler
"""
...
@property
def request_duration_seconds(self) -> str:
"""
Time spent processing requests by work handler (histogram)
"""
...
@property
def errors_total(self) -> str:
"""
Total number of errors in work handler processing
"""
...
# Module-level singleton instance for convenient access
prometheus_names: PrometheusNames
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Python type stubs for Prometheus metric name constants
⚠️ **CRITICAL: SYNC WITH RUST SOURCE** ⚠️
This file must stay in sync with:
- Source: `lib/runtime/src/metrics/prometheus_names.rs`
- Bindings: `lib/bindings/python/rust/prometheus_names.rs`
When the Rust source is modified, update all three files immediately.
"""
class PrometheusNames:
"""
Main container for all Prometheus metric name constants
"""
@property
def frontend(self) -> FrontendService:
"""
Frontend service metrics
"""
...
@property
def work_handler(self) -> WorkHandler:
"""
Work handler metrics
"""
...
@property
def kvstats(self) -> KvStatsMetrics:
"""
KV stats metrics
"""
...
class FrontendService:
"""
Frontend service metrics (LLM HTTP service)
These methods return the full metric names with the "dynamo_frontend_" prefix
"""
@property
def requests_total(self) -> str:
"""
Total number of LLM requests processed
"""
...
@property
def queued_requests(self) -> str:
"""
Number of requests waiting in HTTP queue before receiving the first response
"""
...
@property
def inflight_requests(self) -> str:
"""
Number of inflight requests going to the engine (vLLM, SGLang, ...)
"""
...
@property
def request_duration_seconds(self) -> str:
"""
Duration of LLM requests
"""
...
@property
def input_sequence_tokens(self) -> str:
"""
Input sequence length in tokens
"""
...
@property
def output_sequence_tokens(self) -> str:
"""
Output sequence length in tokens
"""
...
@property
def time_to_first_token_seconds(self) -> str:
"""
Time to first token in seconds
"""
...
@property
def inter_token_latency_seconds(self) -> str:
"""
Inter-token latency in seconds
"""
...
@property
def disconnected_clients(self) -> str:
"""
Number of disconnected clients
"""
...
@property
def model_total_kv_blocks(self) -> str:
"""
Model total KV blocks
"""
...
@property
def model_max_num_seqs(self) -> str:
"""
Model max number of sequences
"""
...
@property
def model_max_num_batched_tokens(self) -> str:
"""
Model max number of batched tokens
"""
...
@property
def model_context_length(self) -> str:
"""
Model context length
"""
...
@property
def model_kv_cache_block_size(self) -> str:
"""
Model KV cache block size
"""
...
@property
def model_migration_limit(self) -> str:
"""
Model migration limit
"""
...
class WorkHandler:
"""
Work handler metrics (component request processing)
These methods return the full metric names with the "dynamo_component_" prefix
"""
@property
def requests_total(self) -> str:
"""
Total number of requests processed by work handler
"""
...
@property
def request_bytes_total(self) -> str:
"""
Total number of bytes received in requests by work handler
"""
...
@property
def response_bytes_total(self) -> str:
"""
Total number of bytes sent in responses by work handler
"""
...
@property
def inflight_requests(self) -> str:
"""
Number of requests currently being processed by work handler
"""
...
@property
def request_duration_seconds(self) -> str:
"""
Time spent processing requests by work handler (histogram)
"""
...
@property
def errors_total(self) -> str:
"""
Total number of errors in work handler processing
"""
...
class KvStatsMetrics:
"""
KV stats metrics (KV cache statistics)
These methods return the metric names with the "kvstats_" prefix
"""
@property
def active_blocks(self) -> str:
"""
Number of active KV cache blocks currently in use
"""
...
@property
def total_blocks(self) -> str:
"""
Total number of KV cache blocks available
"""
...
@property
def gpu_cache_usage_percent(self) -> str:
"""
GPU cache usage as a percentage (0.0-1.0)
"""
...
@property
def gpu_prefix_cache_hit_rate(self) -> str:
"""
GPU prefix cache hit rate as a percentage (0.0-1.0)
"""
...
# Module-level singleton instance for convenient access
prometheus_names: PrometheusNames
...@@ -134,7 +134,8 @@ impl Metrics { ...@@ -134,7 +134,8 @@ impl Metrics {
/// ///
/// The following metrics will be created with the configured prefix: /// The following metrics will be created with the configured prefix:
/// - `{prefix}_requests_total` - IntCounterVec for the total number of requests processed /// - `{prefix}_requests_total` - IntCounterVec for the total number of requests processed
/// - `{prefix}_inflight_requests` - IntGaugeVec for the number of inflight requests /// - `{prefix}_inflight_requests` - IntGaugeVec for the number of inflight/concurrent requests
/// - `{prefix}_disconnected_clients` - IntGauge for the number of disconnected clients
/// - `{prefix}_request_duration_seconds` - HistogramVec for the duration of requests /// - `{prefix}_request_duration_seconds` - HistogramVec for the duration of requests
/// - `{prefix}_input_sequence_tokens` - HistogramVec for input sequence length in tokens /// - `{prefix}_input_sequence_tokens` - HistogramVec for input sequence length in tokens
/// - `{prefix}_output_sequence_tokens` - HistogramVec for output sequence length in tokens /// - `{prefix}_output_sequence_tokens` - HistogramVec for output sequence length in tokens
...@@ -185,7 +186,7 @@ impl Metrics { ...@@ -185,7 +186,7 @@ impl Metrics {
let inflight_gauge = IntGaugeVec::new( let inflight_gauge = IntGaugeVec::new(
Opts::new( Opts::new(
frontend_metric_name(frontend_service::INFLIGHT_REQUESTS_TOTAL), frontend_metric_name(frontend_service::INFLIGHT_REQUESTS),
"Number of inflight requests", "Number of inflight requests",
), ),
&["model"], &["model"],
...@@ -193,14 +194,14 @@ impl Metrics { ...@@ -193,14 +194,14 @@ impl Metrics {
.unwrap(); .unwrap();
let client_disconnect_gauge = prometheus::IntGauge::new( let client_disconnect_gauge = prometheus::IntGauge::new(
frontend_metric_name("client_disconnects"), frontend_metric_name(frontend_service::DISCONNECTED_CLIENTS),
"Number of connections dropped by clients", "Number of disconnected clients",
) )
.unwrap(); .unwrap();
let http_queue_gauge = IntGaugeVec::new( let http_queue_gauge = IntGaugeVec::new(
Opts::new( Opts::new(
frontend_metric_name(frontend_service::QUEUED_REQUESTS_TOTAL), frontend_metric_name(frontend_service::QUEUED_REQUESTS),
"Number of requests in HTTP processing queue", "Number of requests in HTTP processing queue",
), ),
&["model"], &["model"],
......
...@@ -90,9 +90,9 @@ async fn test_metrics_prefix_default() { ...@@ -90,9 +90,9 @@ async fn test_metrics_prefix_default() {
// Assert metrics that are actually present in the default configuration // Assert metrics that are actually present in the default configuration
assert!(body.contains("dynamo_frontend_requests_total")); assert!(body.contains("dynamo_frontend_requests_total"));
assert!(body.contains("dynamo_frontend_inflight_requests_total")); assert!(body.contains("dynamo_frontend_inflight_requests"));
assert!(body.contains("dynamo_frontend_request_duration_seconds")); assert!(body.contains("dynamo_frontend_request_duration_seconds"));
assert!(body.contains("dynamo_frontend_client_disconnects")); assert!(body.contains("dynamo_frontend_disconnected_clients"));
token.cancel(); token.cancel();
let _ = handle.await; let _ = handle.await;
...@@ -271,10 +271,10 @@ async fn test_metrics_with_mock_model() { ...@@ -271,10 +271,10 @@ async fn test_metrics_with_mock_model() {
// Assert that key metrics are present with the mockmodel // Assert that key metrics are present with the mockmodel
assert!(metrics_body.contains("dynamo_frontend_requests_total")); assert!(metrics_body.contains("dynamo_frontend_requests_total"));
assert!(metrics_body.contains("model=\"mockmodel\"")); assert!(metrics_body.contains("model=\"mockmodel\""));
assert!(metrics_body.contains("dynamo_frontend_inflight_requests_total")); assert!(metrics_body.contains("dynamo_frontend_inflight_requests"));
assert!(metrics_body.contains("dynamo_frontend_request_duration_seconds")); assert!(metrics_body.contains("dynamo_frontend_request_duration_seconds"));
assert!(metrics_body.contains("dynamo_frontend_output_sequence_tokens")); assert!(metrics_body.contains("dynamo_frontend_output_sequence_tokens"));
assert!(metrics_body.contains("dynamo_frontend_queued_requests_total")); assert!(metrics_body.contains("dynamo_frontend_queued_requests"));
// Verify specific request counter incremented // Verify specific request counter incremented
assert!(metrics_body.contains("endpoint=\"chat_completions\"")); assert!(metrics_body.contains("endpoint=\"chat_completions\""));
...@@ -386,6 +386,23 @@ mod integration_tests { ...@@ -386,6 +386,23 @@ mod integration_tests {
.await .await
.unwrap(); .unwrap();
// Manually save the model card and update metrics
// This simulates what the ModelWatcher polling task would do in production
let card = local_model.card().clone();
manager.save_model_card("test-mdc-key", card.clone());
if let Err(e) = service
.state()
.metrics_clone()
.update_metrics_from_mdc(&card)
{
tracing::debug!(
model = %card.display_name,
error = %e,
"Failed to update MDC metrics in test"
);
}
// Start the HTTP service // Start the HTTP service
let token = CancellationToken::new(); let token = CancellationToken::new();
let cancel_token = token.clone(); let cancel_token = token.clone();
...@@ -456,10 +473,10 @@ mod integration_tests { ...@@ -456,10 +473,10 @@ mod integration_tests {
let model_name = model.service_name(); let model_name = model.service_name();
assert!(metrics_body.contains("dynamo_frontend_requests_total")); assert!(metrics_body.contains("dynamo_frontend_requests_total"));
assert!(metrics_body.contains(&format!("model=\"{}\"", model_name))); assert!(metrics_body.contains(&format!("model=\"{}\"", model_name)));
assert!(metrics_body.contains("dynamo_frontend_inflight_requests_total")); assert!(metrics_body.contains("dynamo_frontend_inflight_requests"));
assert!(metrics_body.contains("dynamo_frontend_request_duration_seconds")); assert!(metrics_body.contains("dynamo_frontend_request_duration_seconds"));
assert!(metrics_body.contains("dynamo_frontend_output_sequence_tokens")); assert!(metrics_body.contains("dynamo_frontend_output_sequence_tokens"));
assert!(metrics_body.contains("dynamo_frontend_queued_requests_total")); assert!(metrics_body.contains("dynamo_frontend_queued_requests"));
// Assert MDC-based model configuration metrics are present // Assert MDC-based model configuration metrics are present
// These MUST be present for the test to pass // These MUST be present for the test to pass
......
...@@ -1176,8 +1176,8 @@ dynamo_component_nats_client_connection_state 1 ...@@ -1176,8 +1176,8 @@ dynamo_component_nats_client_connection_state 1
# TYPE dynamo_component_latency histogram # TYPE dynamo_component_latency histogram
dynamo_component_latency_bucket{le="0.1"} 10 dynamo_component_latency_bucket{le="0.1"} 10
dynamo_component_latency_bucket{le="0.5"} 25 dynamo_component_latency_bucket{le="0.5"} 25
dynamo_component_nats_service_total_requests 100 dynamo_component_nats_service_requests_total 100
dynamo_component_nats_service_total_errors 5"#; dynamo_component_nats_service_errors_total 5"#;
// Test remove_nats_lines (excludes NATS lines but keeps help/type) // Test remove_nats_lines (excludes NATS lines but keeps help/type)
let filtered_out = super::test_helpers::remove_nats_lines(test_input); let filtered_out = super::test_helpers::remove_nats_lines(test_input);
...@@ -1421,7 +1421,11 @@ mod test_metricsregistry_nats { ...@@ -1421,7 +1421,11 @@ mod test_metricsregistry_nats {
1.0, 1.0,
1.0, 1.0,
), // Should be connected ), // Should be connected
(build_component_metric_name(nats_client::CONNECTS), 1.0, 1.0), // Should have 1 connection (
build_component_metric_name(nats_client::CURRENT_CONNECTIONS),
1.0,
1.0,
), // Should have 1 connection
( (
build_component_metric_name(nats_client::IN_TOTAL_BYTES), build_component_metric_name(nats_client::IN_TOTAL_BYTES),
800.0, 800.0,
...@@ -1444,22 +1448,22 @@ mod test_metricsregistry_nats { ...@@ -1444,22 +1448,22 @@ mod test_metricsregistry_nats {
), // Wide range around 2 ), // Wide range around 2
// Component NATS metrics (ordered to match COMPONENT_NATS_METRICS) // Component NATS metrics (ordered to match COMPONENT_NATS_METRICS)
( (
build_component_metric_name(nats_service::AVG_PROCESSING_MS), build_component_metric_name(nats_service::PROCESSING_MS_AVG),
0.0, 0.0,
0.0, 0.0,
), // No processing yet ), // No processing yet
( (
build_component_metric_name(nats_service::TOTAL_ERRORS), build_component_metric_name(nats_service::ERRORS_TOTAL),
0.0, 0.0,
0.0, 0.0,
), // No errors yet ), // No errors yet
( (
build_component_metric_name(nats_service::TOTAL_REQUESTS), build_component_metric_name(nats_service::REQUESTS_TOTAL),
0.0, 0.0,
0.0, 0.0,
), // No requests yet ), // No requests yet
( (
build_component_metric_name(nats_service::TOTAL_PROCESSING_MS), build_component_metric_name(nats_service::PROCESSING_MS_TOTAL),
0.0, 0.0,
0.0, 0.0,
), // No processing yet ), // No processing yet
...@@ -1550,7 +1554,11 @@ mod test_metricsregistry_nats { ...@@ -1550,7 +1554,11 @@ mod test_metricsregistry_nats {
1.0, 1.0,
1.0, 1.0,
), // Connected ), // Connected
(build_component_metric_name(nats_client::CONNECTS), 1.0, 1.0), // 1 connection (
build_component_metric_name(nats_client::CURRENT_CONNECTIONS),
1.0,
1.0,
), // 1 connection
( (
build_component_metric_name(nats_client::IN_TOTAL_BYTES), build_component_metric_name(nats_client::IN_TOTAL_BYTES),
20000.0, 20000.0,
...@@ -1573,22 +1581,22 @@ mod test_metricsregistry_nats { ...@@ -1573,22 +1581,22 @@ mod test_metricsregistry_nats {
), // Wide range around 16 ), // Wide range around 16
// Component NATS metrics // Component NATS metrics
( (
build_component_metric_name(nats_service::AVG_PROCESSING_MS), build_component_metric_name(nats_service::PROCESSING_MS_AVG),
0.0, 0.0,
1.0, 1.0,
), // Low processing time ), // Low processing time
( (
build_component_metric_name(nats_service::TOTAL_ERRORS), build_component_metric_name(nats_service::ERRORS_TOTAL),
0.0, 0.0,
0.0, 0.0,
), // No errors ), // No errors
( (
build_component_metric_name(nats_service::TOTAL_REQUESTS), build_component_metric_name(nats_service::REQUESTS_TOTAL),
0.0, 0.0,
0.0, 0.0,
), // No work handler requests ), // No work handler requests
( (
build_component_metric_name(nats_service::TOTAL_PROCESSING_MS), build_component_metric_name(nats_service::PROCESSING_MS_TOTAL),
0.0, 0.0,
5.0, 5.0,
), // Low total processing time ), // Low total processing time
......
...@@ -20,26 +20,38 @@ ...@@ -20,26 +20,38 @@
//! **Prefix**: Component identifier (`dynamo_component_`, `dynamo_frontend_`, etc.) //! **Prefix**: Component identifier (`dynamo_component_`, `dynamo_frontend_`, etc.)
//! **Name**: Descriptive snake_case name indicating what is measured //! **Name**: Descriptive snake_case name indicating what is measured
//! **Suffix**: //! **Suffix**:
//! - Units: `_seconds`, `_bytes`, `_ms`, `_percent` //! - Units: `_seconds`, `_bytes`, `_ms`, `_percent`, `_messages`, `_connections`
//! - Counters: `_total` (not `total_` prefix) //! - Counters: `_total` (not `total_` prefix) - for cumulative metrics that only increase
//! - Gauges: No `_total` suffix - for current state metrics that can go up and down
//! - Note: Do not use `_counter`, `_gauge`, `_time`, or `_size` in Prometheus names (too vague) //! - Note: Do not use `_counter`, `_gauge`, `_time`, or `_size` in Prometheus names (too vague)
//! //!
//! **Common Transformations**: //! **Common Transformations**:
//! - ❌ `_counter` → ✅ `_total` //! - ❌ `_counter` → ✅ `_total`
//! - ❌ `_sum` → ✅ `_total`
//! - ❌ `_gauge` → ✅ (no suffix needed for current values)
//! - ❌ `_time` → ✅ `_seconds`, `_ms`, `_hours`, `_duration_seconds` //! - ❌ `_time` → ✅ `_seconds`, `_ms`, `_hours`, `_duration_seconds`
//! - ❌ `_time_total` → ✅ `_seconds_total`, `_ms_total`, `_hours_total`
//! - ❌ `_total_time` → ✅ `_seconds_total`, `_ms_total`, `_hours_total`
//! - ❌ `_total_time_seconds` → ✅ `_seconds_total`
//! - ❌ `_average_time` → ✅ `_seconds_avg`, `_ms_avg`
//! - ❌ `_size` → ✅ `_bytes`, `_total`, `_length` //! - ❌ `_size` → ✅ `_bytes`, `_total`, `_length`
//! - ❌ `_gauge` → ✅ (no suffix needed for current values) //! - ❌ `_some_request_size` → ✅ `_some_request_bytes_avg`
//! - ❌ `_rate` → ✅ `_per_second`, `_per_minute` //! - ❌ `_rate` → ✅ `_per_second`, `_per_minute`
//! - ❌ `disconnected_clients_total` → ✅ `disconnected_clients` (gauge, not counter)
//! - ❌ `inflight_requests_total` → ✅ `inflight_requests` (gauge, not counter)
//! - ❌ `connections_total` → ✅ `current_connections` (gauge, not counter)
//! //!
//! **Examples**: //! **Examples**:
//! - ✅ `dynamo_frontend_requests_total` - Total request counter (not `incoming_requests`) //! - ✅ `dynamo_frontend_requests_total` - Total request counter (not `incoming_requests`)
//! - ✅ `dynamo_frontend_request_duration_seconds` - Request duration histogram (not `response_time`) //! - ✅ `dynamo_frontend_request_duration_seconds` - Request duration histogram (not `response_time`)
//! - ✅ `dynamo_component_errors_total` - Total error counter (not `total_errors`) //! - ✅ `dynamo_component_errors_total` - Total error counter (not `total_errors`)
//! - ✅ `dynamo_component_memory_usage_bytes` - Memory usage gauge //! - ✅ `dynamo_component_memory_usage_bytes` - Memory usage gauge
//! - ✅ `dynamo_frontend_inflight_requests_total` - Current inflight requests gauge //! - ✅ `dynamo_frontend_inflight_requests` - Current inflight requests gauge
//! - ✅ `nats_client_connection_duration_ms` - Connection time in milliseconds //! - ✅ `nats_client_connection_duration_ms` - Connection time in milliseconds
//! - ✅ `dynamo_component_cpu_usage_percent` - CPU usage percentage //! - ✅ `dynamo_component_cpu_usage_percent` - CPU usage percentage
//! - ✅ `dynamo_frontend_tokens_per_second` - Token generation rate //! - ✅ `dynamo_frontend_tokens_per_second` - Token generation rate
//! - ✅ `nats_client_current_connections` - Current active connections gauge
//! - ✅ `nats_client_in_messages` - Total messages received counter
//! //!
//! ## Key Differences: Prometheus Metric Names vs Prometheus Label Names //! ## Key Differences: Prometheus Metric Names vs Prometheus Label Names
//! //!
...@@ -83,11 +95,15 @@ pub mod frontend_service { ...@@ -83,11 +95,15 @@ pub mod frontend_service {
/// Total number of LLM requests processed /// Total number of LLM requests processed
pub const REQUESTS_TOTAL: &str = "requests_total"; pub const REQUESTS_TOTAL: &str = "requests_total";
/// Number of requests waiting in HTTP queue before receiving the first response. /// Number of requests waiting in HTTP queue before receiving the first response (gauge)
pub const QUEUED_REQUESTS_TOTAL: &str = "queued_requests_total"; pub const QUEUED_REQUESTS: &str = "queued_requests";
/// Number of inflight/concurrent requests going to the engine (vLLM, SGLang, ...)
/// Note: This is a gauge metric (current state) that can go up and down, so no _total suffix
pub const INFLIGHT_REQUESTS: &str = "inflight_requests";
/// Number of inflight requests going to the engine (vLLM, SGLang, ...) /// Number of disconnected clients (gauge that can go up and down)
pub const INFLIGHT_REQUESTS_TOTAL: &str = "inflight_requests_total"; pub const DISCONNECTED_CLIENTS: &str = "disconnected_clients";
/// Duration of LLM requests /// Duration of LLM requests
pub const REQUEST_DURATION_SECONDS: &str = "request_duration_seconds"; pub const REQUEST_DURATION_SECONDS: &str = "request_duration_seconds";
...@@ -157,6 +173,7 @@ pub mod work_handler { ...@@ -157,6 +173,7 @@ pub mod work_handler {
pub const RESPONSE_BYTES_TOTAL: &str = "response_bytes_total"; pub const RESPONSE_BYTES_TOTAL: &str = "response_bytes_total";
/// Number of requests currently being processed by work handler /// Number of requests currently being processed by work handler
/// Note: This is a gauge metric (current state) that can go up and down, so no _total suffix
pub const INFLIGHT_REQUESTS: &str = "inflight_requests"; pub const INFLIGHT_REQUESTS: &str = "inflight_requests";
/// Time spent processing requests by work handler (histogram) /// Time spent processing requests by work handler (histogram)
...@@ -214,8 +231,9 @@ pub mod nats_client { ...@@ -214,8 +231,9 @@ pub mod nats_client {
/// Total number of messages sent by NATS client /// Total number of messages sent by NATS client
pub const OUT_MESSAGES: &str = nats_client_name!("out_messages"); pub const OUT_MESSAGES: &str = nats_client_name!("out_messages");
/// Total number of connections established by NATS client /// Current number of active connections for NATS client
pub const CONNECTS: &str = nats_client_name!("connects"); /// Note: Gauge metric measuring current connections, not cumulative total
pub const CURRENT_CONNECTIONS: &str = nats_client_name!("current_connections");
/// Current connection state of NATS client (0=disconnected, 1=connected, 2=reconnecting) /// Current connection state of NATS client (0=disconnected, 1=connected, 2=reconnecting)
pub const CONNECTION_STATE: &str = nats_client_name!("connection_state"); pub const CONNECTION_STATE: &str = nats_client_name!("connection_state");
...@@ -234,16 +252,16 @@ pub mod nats_service { ...@@ -234,16 +252,16 @@ pub mod nats_service {
pub const PREFIX: &str = nats_service_name!(""); pub const PREFIX: &str = nats_service_name!("");
/// Average processing time in milliseconds (maps to: average_processing_time in ms) /// Average processing time in milliseconds (maps to: average_processing_time in ms)
pub const AVG_PROCESSING_MS: &str = nats_service_name!("avg_processing_time_ms"); pub const PROCESSING_MS_AVG: &str = nats_service_name!("processing_ms_avg");
/// Total errors across all endpoints (maps to: num_errors) /// Total errors across all endpoints (maps to: num_errors)
pub const TOTAL_ERRORS: &str = nats_service_name!("total_errors"); pub const ERRORS_TOTAL: &str = nats_service_name!("errors_total");
/// Total requests across all endpoints (maps to: num_requests) /// Total requests across all endpoints (maps to: num_requests)
pub const TOTAL_REQUESTS: &str = nats_service_name!("total_requests"); pub const REQUESTS_TOTAL: &str = nats_service_name!("requests_total");
/// Total processing time in milliseconds (maps to: processing_time in ms) /// Total processing time in milliseconds (maps to: processing_time in ms)
pub const TOTAL_PROCESSING_MS: &str = nats_service_name!("total_processing_time_ms"); pub const PROCESSING_MS_TOTAL: &str = nats_service_name!("processing_ms_total");
/// Number of active services (derived from ServiceSet.services) /// Number of active services (derived from ServiceSet.services)
pub const ACTIVE_SERVICES: &str = nats_service_name!("active_services"); pub const ACTIVE_SERVICES: &str = nats_service_name!("active_services");
...@@ -255,7 +273,7 @@ pub mod nats_service { ...@@ -255,7 +273,7 @@ pub mod nats_service {
/// All NATS client Prometheus metric names as an array for iteration/validation /// All NATS client Prometheus metric names as an array for iteration/validation
pub const DRT_NATS_METRICS: &[&str] = &[ pub const DRT_NATS_METRICS: &[&str] = &[
nats_client::CONNECTION_STATE, nats_client::CONNECTION_STATE,
nats_client::CONNECTS, nats_client::CURRENT_CONNECTIONS,
nats_client::IN_TOTAL_BYTES, nats_client::IN_TOTAL_BYTES,
nats_client::IN_MESSAGES, nats_client::IN_MESSAGES,
nats_client::OUT_OVERHEAD_BYTES, nats_client::OUT_OVERHEAD_BYTES,
...@@ -265,10 +283,10 @@ pub const DRT_NATS_METRICS: &[&str] = &[ ...@@ -265,10 +283,10 @@ pub const DRT_NATS_METRICS: &[&str] = &[
/// All component service Prometheus metric names as an array for iteration/validation /// All component service Prometheus metric names as an array for iteration/validation
/// (ordered to match NatsStatsMetrics fields) /// (ordered to match NatsStatsMetrics fields)
pub const COMPONENT_NATS_METRICS: &[&str] = &[ pub const COMPONENT_NATS_METRICS: &[&str] = &[
nats_service::AVG_PROCESSING_MS, // maps to: average_processing_time (nanoseconds) nats_service::PROCESSING_MS_AVG, // maps to: average_processing_time (nanoseconds)
nats_service::TOTAL_ERRORS, // maps to: num_errors nats_service::ERRORS_TOTAL, // maps to: num_errors
nats_service::TOTAL_REQUESTS, // maps to: num_requests nats_service::REQUESTS_TOTAL, // maps to: num_requests
nats_service::TOTAL_PROCESSING_MS, // maps to: processing_time (nanoseconds) nats_service::PROCESSING_MS_TOTAL, // maps to: processing_time (nanoseconds)
nats_service::ACTIVE_SERVICES, // derived from ServiceSet.services nats_service::ACTIVE_SERVICES, // derived from ServiceSet.services
nats_service::ACTIVE_ENDPOINTS, // derived from ServiceInfo.endpoints nats_service::ACTIVE_ENDPOINTS, // derived from ServiceInfo.endpoints
]; ];
......
...@@ -306,15 +306,18 @@ mod tests { ...@@ -306,15 +306,18 @@ mod tests {
/// Flow: NATS Service → NatsStatsMetrics (Counters) → Metrics Callback → Prometheus Gauge /// Flow: NATS Service → NatsStatsMetrics (Counters) → Metrics Callback → Prometheus Gauge
/// Note: These are snapshots updated when execute_metrics_callbacks() is called. /// Note: These are snapshots updated when execute_metrics_callbacks() is called.
#[derive(Debug, Clone)] #[derive(Debug, Clone)]
/// Prometheus metrics for NATS server components.
/// Note: Metrics with `_total` names use IntGauge because we copy counter values
/// from underlying services rather than incrementing directly.
pub struct ComponentNatsServerPrometheusMetrics { pub struct ComponentNatsServerPrometheusMetrics {
/// Average processing time in milliseconds (maps to: average_processing_time) /// Average processing time in milliseconds (maps to: average_processing_time)
pub service_avg_processing_ms: prometheus::Gauge, pub service_processing_ms_avg: prometheus::Gauge,
/// Total errors across all endpoints (maps to: num_errors) /// Total errors across all endpoints (maps to: num_errors)
pub service_total_errors: prometheus::IntGauge, pub service_errors_total: prometheus::IntGauge,
/// Total requests across all endpoints (maps to: num_requests) /// Total requests across all endpoints (maps to: num_requests)
pub service_total_requests: prometheus::IntGauge, pub service_requests_total: prometheus::IntGauge,
/// Total processing time in milliseconds (maps to: processing_time) /// Total processing time in milliseconds (maps to: processing_time)
pub service_total_processing_ms: prometheus::IntGauge, pub service_processing_ms_total: prometheus::IntGauge,
/// Number of active services (derived from ServiceSet.services) /// Number of active services (derived from ServiceSet.services)
pub service_active_services: prometheus::IntGauge, pub service_active_services: prometheus::IntGauge,
/// Number of active endpoints (derived from ServiceInfo.endpoints) /// Number of active endpoints (derived from ServiceInfo.endpoints)
...@@ -336,26 +339,26 @@ impl ComponentNatsServerPrometheusMetrics { ...@@ -336,26 +339,26 @@ impl ComponentNatsServerPrometheusMetrics {
let labels: &[(&str, &str)] = &labels_vec; let labels: &[(&str, &str)] = &labels_vec;
let service_avg_processing_ms = component.create_gauge( let service_processing_ms_avg = component.create_gauge(
nats_service::AVG_PROCESSING_MS, nats_service::PROCESSING_MS_AVG,
"Average processing time across all component endpoints in milliseconds", "Average processing time across all component endpoints in milliseconds",
labels, labels,
)?; )?;
let service_total_errors = component.create_intgauge( let service_errors_total = component.create_intgauge(
nats_service::TOTAL_ERRORS, nats_service::ERRORS_TOTAL,
"Total number of errors across all component endpoints", "Total number of errors across all component endpoints",
labels, labels,
)?; )?;
let service_total_requests = component.create_intgauge( let service_requests_total = component.create_intgauge(
nats_service::TOTAL_REQUESTS, nats_service::REQUESTS_TOTAL,
"Total number of requests across all component endpoints", "Total number of requests across all component endpoints",
labels, labels,
)?; )?;
let service_total_processing_ms = component.create_intgauge( let service_processing_ms_total = component.create_intgauge(
nats_service::TOTAL_PROCESSING_MS, nats_service::PROCESSING_MS_TOTAL,
"Total processing time across all component endpoints in milliseconds", "Total processing time across all component endpoints in milliseconds",
labels, labels,
)?; )?;
...@@ -373,10 +376,10 @@ impl ComponentNatsServerPrometheusMetrics { ...@@ -373,10 +376,10 @@ impl ComponentNatsServerPrometheusMetrics {
)?; )?;
Ok(Self { Ok(Self {
service_avg_processing_ms, service_processing_ms_avg,
service_total_errors, service_errors_total,
service_total_requests, service_requests_total,
service_total_processing_ms, service_processing_ms_total,
service_active_services, service_active_services,
service_active_endpoints, service_active_endpoints,
}) })
...@@ -414,14 +417,14 @@ impl ComponentNatsServerPrometheusMetrics { ...@@ -414,14 +417,14 @@ impl ComponentNatsServerPrometheusMetrics {
if processing_time_samples > 0 && total_requests > 0 { if processing_time_samples > 0 && total_requests > 0 {
let avg_time_nanos = total_processing_time_nanos as f64 / total_requests as f64; let avg_time_nanos = total_processing_time_nanos as f64 / total_requests as f64;
let avg_time_ms = avg_time_nanos / 1_000_000.0; // Convert nanoseconds to milliseconds let avg_time_ms = avg_time_nanos / 1_000_000.0; // Convert nanoseconds to milliseconds
self.service_avg_processing_ms.set(avg_time_ms); self.service_processing_ms_avg.set(avg_time_ms);
} else { } else {
self.service_avg_processing_ms.set(0.0); self.service_processing_ms_avg.set(0.0);
} }
self.service_total_errors.set(total_errors as i64); // maps to: num_errors self.service_errors_total.set(total_errors as i64); // maps to: num_errors
self.service_total_requests.set(total_requests as i64); // maps to: num_requests self.service_requests_total.set(total_requests as i64); // maps to: num_requests
self.service_total_processing_ms self.service_processing_ms_total
.set((total_processing_time_nanos / 1_000_000) as i64); // maps to: processing_time (converted to milliseconds) .set((total_processing_time_nanos / 1_000_000) as i64); // maps to: processing_time (converted to milliseconds)
self.service_active_services.set(service_count); // derived from ServiceSet.services self.service_active_services.set(service_count); // derived from ServiceSet.services
self.service_active_endpoints.set(endpoint_count as i64); // derived from ServiceInfo.endpoints self.service_active_endpoints.set(endpoint_count as i64); // derived from ServiceInfo.endpoints
...@@ -429,10 +432,10 @@ impl ComponentNatsServerPrometheusMetrics { ...@@ -429,10 +432,10 @@ impl ComponentNatsServerPrometheusMetrics {
/// Reset all metrics to zero. Useful when no data is available or to clear stale values. /// Reset all metrics to zero. Useful when no data is available or to clear stale values.
pub fn reset_to_zeros(&self) { pub fn reset_to_zeros(&self) {
self.service_avg_processing_ms.set(0.0); self.service_processing_ms_avg.set(0.0);
self.service_total_errors.set(0); self.service_errors_total.set(0);
self.service_total_requests.set(0); self.service_requests_total.set(0);
self.service_total_processing_ms.set(0); self.service_processing_ms_total.set(0);
self.service_active_services.set(0); self.service_active_services.set(0);
self.service_active_endpoints.set(0); self.service_active_endpoints.set(0);
} }
......
...@@ -919,8 +919,8 @@ impl DRTNatsClientPrometheusMetrics { ...@@ -919,8 +919,8 @@ impl DRTNatsClientPrometheusMetrics {
&[], &[],
)?; )?;
let connects = drt.create_intgauge( let connects = drt.create_intgauge(
nats_metrics::CONNECTS, nats_metrics::CURRENT_CONNECTIONS,
"Total number of connections established by NATS client", "Current number of active connections for NATS client",
&[], &[],
)?; )?;
let connection_state = drt.create_intgauge( let connection_state = drt.create_intgauge(
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment