docs: DYN-1967 update metrics docs after kvstats removal (#5704)

Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com> Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>

docs: DYN-1967 update metrics docs after kvstats removal (#5704)
Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com> Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
8e72fb69 · Keiven C · GitHub · e5557803 · 8e72fb69 · 8e72fb69
Unverified Commit 8e72fb69 authored Jan 27, 2026 by Keiven C Committed by GitHub Jan 27, 2026
6 changed files
--- a/docs/kubernetes/autoscaling.md
+++ b/docs/kubernetes/autoscaling.md
@@ -227,7 +227,6 @@ Dynamo exports several metrics useful for autoscaling. These are available at th
 | `dynamo_frontend_time_to_first_token_seconds` | Histogram | TTFT latency | ✅ Workers |
 | `dynamo_frontend_inter_token_latency_seconds` | Histogram | ITL latency | ✅ Decode |
 | `dynamo_frontend_request_duration_seconds` | Histogram | Total request duration | ⚠️ General |
-| `kvstats_gpu_cache_usage_percent` | Gauge | GPU KV cache usage (0-1) | ✅ Decode |
 #### Metric Labels
@@ -641,7 +640,7 @@ Avoid configuring multiple autoscalers for the same service:
 |--------------|---------------------|---------------|
 | Frontend | CPU utilization, request rate | `dynamo_frontend_requests_total` |
 | Prefill | Queue depth, TTFT | `dynamo_frontend_queued_requests`, `dynamo_frontend_time_to_first_token_seconds` |
-| Decode | KV cache utilization, ITL | `kvstats_gpu_cache_usage_percent`, `dynamo_frontend_inter_token_latency_seconds` |
+| Decode | ITL | `dynamo_frontend_inter_token_latency_seconds` |
 ### 3. Configure Stabilization Windows

--- a/docs/observability/metrics.md
+++ b/docs/observability/metrics.md
@@ -123,19 +123,6 @@ DYN_SYSTEM_PORT=8081 python -m dynamo.vllm --model <model>
 curl http://localhost:8081/metrics
 ```
-### KV Router Statistics (kvstats)
-KV router statistics are automatically exposed by LLM workers and KV router components on the backend system status port (port 8081) with the `dynamo_component_kvstats_*` prefix. These metrics provide insights into GPU memory usage and cache efficiency:
- `dynamo_component_kvstats_active_blocks`: Number of active KV cache blocks currently in use (gauge)
- `dynamo_component_kvstats_total_blocks`: Total number of KV cache blocks available (gauge)
- `dynamo_component_kvstats_gpu_cache_usage_percent`: GPU cache usage as a percentage (0.0-1.0) (gauge)
- `dynamo_component_kvstats_gpu_prefix_cache_hit_rate`: GPU prefix cache hit rate as a percentage (0.0-1.0) (gauge)
-These metrics are published by:
- **LLM Workers**: vLLM and TRT-LLM backends publish these metrics through their respective publishers
- **KV Router**: The KV router component aggregates and exposes these metrics for load balancing decisions
 ### Specialized Component Metrics
 Some components expose additional metrics specific to their functionality:

--- a/fern/pages/kubernetes/autoscaling.md
+++ b/fern/pages/kubernetes/autoscaling.md
@@ -233,7 +233,6 @@ Dynamo exports several metrics useful for autoscaling. These are available at th
 | `dynamo_frontend_time_to_first_token_seconds` | Histogram | TTFT latency | ✅ Workers |
 | `dynamo_frontend_inter_token_latency_seconds` | Histogram | ITL latency | ✅ Decode |
 | `dynamo_frontend_request_duration_seconds` | Histogram | Total request duration | ⚠️ General |
-| `kvstats_gpu_cache_usage_percent` | Gauge | GPU KV cache usage (0-1) | ✅ Decode |
 #### Metric Labels
@@ -647,7 +646,7 @@ Avoid configuring multiple autoscalers for the same service:
 |--------------|---------------------|---------------|
 | Frontend | CPU utilization, request rate | `dynamo_frontend_requests_total` |
 | Prefill | Queue depth, TTFT | `dynamo_frontend_queued_requests`, `dynamo_frontend_time_to_first_token_seconds` |
-| Decode | KV cache utilization, ITL | `kvstats_gpu_cache_usage_percent`, `dynamo_frontend_inter_token_latency_seconds` |
+| Decode | ITL | `dynamo_frontend_inter_token_latency_seconds` |
 ### 3. Configure Stabilization Windows

--- a/fern/pages/observability/metrics.md
+++ b/fern/pages/observability/metrics.md
@@ -122,19 +122,6 @@ DYN_SYSTEM_PORT=8081 python -m dynamo.vllm --model <model>
 curl http://localhost:8081/metrics
 ```
-### KV Router Statistics (kvstats)
-KV router statistics are automatically exposed by LLM workers and KV router components on the backend system status port (port 8081) with the `dynamo_component_kvstats_*` prefix. These metrics provide insights into GPU memory usage and cache efficiency:
- `dynamo_component_kvstats_active_blocks`: Number of active KV cache blocks currently in use (gauge)
- `dynamo_component_kvstats_total_blocks`: Total number of KV cache blocks available (gauge)
- `dynamo_component_kvstats_gpu_cache_usage_percent`: GPU cache usage as a percentage (0.0-1.0) (gauge)
- `dynamo_component_kvstats_gpu_prefix_cache_hit_rate`: GPU prefix cache hit rate as a percentage (0.0-1.0) (gauge)
-These metrics are published by:
- **LLM Workers**: vLLM and TRT-LLM backends publish these metrics through their respective publishers
- **KV Router**: The KV router component aggregates and exposes these metrics for load balancing decisions
 ### Specialized Component Metrics
 Some components expose additional metrics specific to their functionality:

--- a/lib/bindings/python/codegen/README.md
+++ b/lib/bindings/python/codegen/README.md
@@ -16,21 +16,20 @@ cargo run -p dynamo-codegen --bin gen-python-prometheus-names
 - Parses Rust AST from `lib/runtime/src/metrics/prometheus_names.rs`
 - Generates Python classes with constants at `lib/bindings/python/src/dynamo/prometheus_names.py`
- Handles macro-generated constants (e.g., `kvstats_name!("active_blocks")` → `"kvstats_active_blocks"`)
 ### Example
 **Rust input:**
 ```rust
-pub mod kvstats {
+pub mod kvrouter {
-    pub const ACTIVE_BLOCKS: &str = kvstats_name!("active_blocks");
+    pub const KV_CACHE_EVENTS_APPLIED: &str = "kv_cache_events_applied";
 }
 ```
 **Python output:**
 ```python
-class kvstats:
+class kvrouter:
-    ACTIVE_BLOCKS = "kvstats_active_blocks"
+    KV_CACHE_EVENTS_APPLIED = "kv_cache_events_applied"
 ```
 ### When to run

--- a/lib/bindings/python/codegen/src/gen_python_prometheus_names.rs
+++ b/lib/bindings/python/codegen/src/gen_python_prometheus_names.rs
@@ -196,7 +196,7 @@ Parses lib/runtime/src/metrics/prometheus_names.rs and generates a pure Python
 module with 1:1 constant mappings at lib/bindings/python/src/dynamo/prometheus_names.py
 This allows Python code to import Prometheus metric constants without Rust bindings:
-    from dynamo.prometheus_names import frontend_service, kvstats
+    from dynamo.prometheus_names import frontend_service
 OPTIONS:
    --source PATH    Path to Rust source file