@@ -85,6 +85,6 @@ You can deploy TensorRT-LLM with Dynamo on Kubernetes using a `DynamoGraphDeploy
...
@@ -85,6 +85,6 @@ You can deploy TensorRT-LLM with Dynamo on Kubernetes using a `DynamoGraphDeploy
-**[Reference Guide](trtllm-reference-guide.md)**: Features, configuration, and operational details
-**[Reference Guide](trtllm-reference-guide.md)**: Features, configuration, and operational details
-**[Examples](trtllm-examples.md)**: All deployment patterns with launch scripts
-**[Examples](trtllm-examples.md)**: All deployment patterns with launch scripts
-**[KV Cache Transfer](trtllm-kv-cache-transfer.md)**: KV cache transfer methods for disaggregated serving
-**[KV Cache Transfer](trtllm-kv-cache-transfer.md)**: KV cache transfer methods for disaggregated serving
-**[Prometheus Metrics](trtllm-prometheus.md)**: Metrics and monitoring
-**[Observability](trtllm-observability.md)**: Metrics and monitoring
-**[Multinode Examples](multinode/trtllm-multinode-examples.md)**: Multi-node deployment with SLURM
-**[Multinode Examples](multinode/trtllm-multinode-examples.md)**: Multi-node deployment with SLURM
-**[Deploying TensorRT-LLM with Dynamo on Kubernetes](https://github.com/ai-dynamo/dynamo/tree/main/examples/backends/trtllm/deploy/README.md)**: Kubernetes deployment guide
-**[Deploying TensorRT-LLM with Dynamo on Kubernetes](https://github.com/ai-dynamo/dynamo/tree/main/examples/backends/trtllm/deploy/README.md)**: Kubernetes deployment guide
@@ -54,7 +54,7 @@ See the instructions here: [Running KVBM in TensorRT-LLM](../../components/kvbm/
...
@@ -54,7 +54,7 @@ See the instructions here: [Running KVBM in TensorRT-LLM](../../components/kvbm/
## Observability
## Observability
TensorRT-LLM exposes Prometheus metrics for monitoring inference performance. For detailed metrics reference, collection setup, and Grafana integration, see the [Prometheus Metrics Guide](./trtllm-prometheus.md).
TensorRT-LLM exposes Prometheus metrics for monitoring inference performance. For detailed metrics reference, collection setup, and Grafana integration, see the [Observability Guide](./trtllm-observability.md).
> **Note:** Metric names and counts are subject to change with engine version updates. All metrics were verified from live scrapes on 2026-04-10 running Dynamo v1.0.0. Always inspect your actual `/metrics` endpoint for the definitive list.
All frameworks share the common `dynamo_component_*` metrics from the Dynamo runtime.
## Common Dynamo Worker Metrics
These backend metrics are available across all backends on the worker port (`:8081/metrics`). Verified from live scrapes, 2026-04-10.
For Dynamo frontend and router metrics (`dynamo_frontend_*`, `dynamo_component_router_*`), see the [Metrics Guide](metrics.md).
| Metric Name | Type | Description |
|-------------|------|-------------|
| `dynamo_component_cancellation_total` | counter | Total number of requests cancelled by work handler |
| `dynamo_component_gpu_cache_usage_percent` | gauge | GPU cache usage as a percentage (0.0-1.0) |
| `dynamo_component_inflight_requests` | gauge | Number of requests currently being processed |
| `dynamo_component_model_load_time_seconds` | gauge | Model load time in seconds |
| `dynamo_component_request_bytes_total` | counter | Total bytes received in requests |
| `dynamo_component_request_duration_seconds` | histogram | Time spent processing requests |
| `dynamo_component_requests_total` | counter | Total number of requests processed |
| `dynamo_component_response_bytes_total` | counter | Total bytes sent in responses |
| `dynamo_component_total_blocks` | gauge | Total number of KV cache blocks available on the worker |
| `dynamo_component_uptime_seconds` | gauge | Total uptime of the DistributedRuntime |
## Framework-Specific Metrics Comparison
These are **pass-through metrics from the engines themselves** — Dynamo exposes them on its `/metrics` endpoint but does not generate them. Metric names are shown **without prefix**. Actual metrics use `vllm:`, `sglang:`, or `trtllm_` prefix respectively.