Unverified Commit afccc9d4 authored by Keiven C's avatar Keiven C Committed by GitHub
Browse files

refactor: consolidate Observability files (e.g. OTEL docker-compose, md files) (#4173)


Signed-off-by: default avatarKeiven Chang <keivenchang@users.noreply.github.com>
Co-authored-by: default avatarKeiven Chang <keivenchang@users.noreply.github.com>
parent 3577b5c1
...@@ -9,7 +9,7 @@ datasources: ...@@ -9,7 +9,7 @@ datasources:
access: proxy access: proxy
url: http://tempo:3200 url: http://tempo:3200
uid: tempo uid: tempo
isDefault: true isDefault: false
editable: true editable: true
jsonData: jsonData:
httpMethod: GET httpMethod: GET
......
# SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
version: '3.8'
services:
# Tempo - Distributed tracing backend
tempo:
image: grafana/tempo:2.8.2
command: [ "-config.file=/etc/tempo.yaml" ]
volumes:
- ./tempo.yaml:/etc/tempo.yaml
- tempo-data:/tmp/tempo
ports:
- "3200:3200" # Tempo HTTP
- "4317:4317" # OTLP gRPC receiver (accessible from host)
- "4318:4318" # OTLP HTTP receiver (accessible from host)
# Grafana - Visualization and dashboards
grafana:
image: grafana/grafana:12.2.0
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
- GF_FEATURE_TOGGLES_ENABLE=traceqlEditor
volumes:
- grafana-data:/var/lib/grafana
- ./grafana/provisioning:/etc/grafana/provisioning
depends_on:
- tempo
volumes:
tempo-data:
grafana-data:
...@@ -4,6 +4,10 @@ Observability ...@@ -4,6 +4,10 @@ Observability
.. toctree:: .. toctree::
:hidden: :hidden:
Overview <../observability/README>
Prometheus + Grafana Setup <../observability/prometheus-grafana>
Metrics <../observability/metrics> Metrics <../observability/metrics>
Metrics Developer Guide <../observability/metrics-developer-guide>
Health Checks <../observability/health-checks>
Tracing <../observability/tracing>
Logging <../observability/logging> Logging <../observability/logging>
Health Checks <../observability/health-checks>
\ No newline at end of file
...@@ -26,6 +26,7 @@ ...@@ -26,6 +26,7 @@
kubernetes/api_reference.md kubernetes/api_reference.md
kubernetes/deployment/create_deployment.md kubernetes/deployment/create_deployment.md
kubernetes/deployment/dynamomodel-guide.md
kubernetes/fluxcd.md kubernetes/fluxcd.md
kubernetes/grove.md kubernetes/grove.md
......
...@@ -25,6 +25,8 @@ While this guide does not use Prometheus, it assumes Grafana is pre-installed wi ...@@ -25,6 +25,8 @@ While this guide does not use Prometheus, it assumes Grafana is pre-installed wi
### 3. Environment Variables ### 3. Environment Variables
#### Kubernetes Setup Variables
The following env variables are set: The following env variables are set:
- `MONITORING_NAMESPACE`: The namespace where Loki is installed - `MONITORING_NAMESPACE`: The namespace where Loki is installed
- `DYN_NAMESPACE`: The namespace where Dynamo Cloud Operator is installed - `DYN_NAMESPACE`: The namespace where Dynamo Cloud Operator is installed
...@@ -34,6 +36,14 @@ export MONITORING_NAMESPACE=monitoring ...@@ -34,6 +36,14 @@ export MONITORING_NAMESPACE=monitoring
export DYN_NAMESPACE=dynamo-system export DYN_NAMESPACE=dynamo-system
``` ```
#### Dynamo Logging Variables
| Variable | Description | Example |
|----------|-------------|---------|
| `DYN_LOGGING_JSONL` | Enable JSONL logging format (required for Loki) | `true` |
| `DYN_LOG` | Log levels per target `<default_level>,<module_path>=<level>,<module_path>=<level>` | `DYN_LOG=info,dynamo_runtime::system_status_server:trace` |
| `DYN_LOG_USE_LOCAL_TZ` | Use local timezone for timestamps | `true` |
## Installation Steps ## Installation Steps
### 1. Install Loki ### 1. Install Loki
...@@ -46,7 +56,7 @@ helm repo add grafana https://grafana.github.io/helm-charts ...@@ -46,7 +56,7 @@ helm repo add grafana https://grafana.github.io/helm-charts
helm repo update helm repo update
# Install Loki # Install Loki
helm install --values deploy/logging/values/loki-values.yaml loki grafana/loki -n $MONITORING_NAMESPACE helm install --values deploy/observability/k8s/logging/values/loki-values.yaml loki grafana/loki -n $MONITORING_NAMESPACE
``` ```
Our configuration (`loki-values.yaml`) sets up Loki in a simple configuration that is suitable for testing and development. It uses a local MinIO for storage. The installation pods can be viewed with: Our configuration (`loki-values.yaml`) sets up Loki in a simple configuration that is suitable for testing and development. It uses a local MinIO for storage. The installation pods can be viewed with:
...@@ -60,7 +70,7 @@ Next, install the Grafana Alloy collector to gather logs from your Kubernetes cl ...@@ -60,7 +70,7 @@ Next, install the Grafana Alloy collector to gather logs from your Kubernetes cl
```bash ```bash
# Generate a custom values file with the namespace information # Generate a custom values file with the namespace information
envsubst < deploy/logging/values/alloy-values.yaml > alloy-custom-values.yaml envsubst < deploy/observability/k8s/logging/values/alloy-values.yaml > alloy-custom-values.yaml
# Install the collector # Install the collector
helm install --values alloy-custom-values.yaml alloy grafana/k8s-monitoring -n $MONITORING_NAMESPACE helm install --values alloy-custom-values.yaml alloy grafana/k8s-monitoring -n $MONITORING_NAMESPACE
...@@ -110,10 +120,10 @@ Since we are using Grafana with the Prometheus Operator, we can simply apply the ...@@ -110,10 +120,10 @@ Since we are using Grafana with the Prometheus Operator, we can simply apply the
```bash ```bash
# Configure Grafana with the Loki datasource # Configure Grafana with the Loki datasource
envsubst < deploy/logging/grafana/loki-datasource.yaml | kubectl apply -n $MONITORING_NAMESPACE -f - envsubst < deploy/observability/k8s/logging/grafana/loki-datasource.yaml | kubectl apply -n $MONITORING_NAMESPACE -f -
# Configure Grafana with the Dynamo Logs dashboard # Configure Grafana with the Dynamo Logs dashboard
envsubst < deploy/logging/grafana/logging-dashboard.yaml | kubectl apply -n $MONITORING_NAMESPACE -f - envsubst < deploy/observability/k8s/logging/grafana/logging-dashboard.yaml | kubectl apply -n $MONITORING_NAMESPACE -f -
``` ```
> [!Note] > [!Note]
...@@ -141,4 +151,4 @@ kubectl port-forward svc/prometheus-grafana 3000:80 -n $MONITORING_NAMESPACE ...@@ -141,4 +151,4 @@ kubectl port-forward svc/prometheus-grafana 3000:80 -n $MONITORING_NAMESPACE
If everything is working, under Home > Dashboards > Dynamo Logs, you should see a dashboard that can be used to view the logs associated with our DynamoGraphDeployments If everything is working, under Home > Dashboards > Dynamo Logs, you should see a dashboard that can be used to view the logs associated with our DynamoGraphDeployments
The dashboard enables filtering by DynamoGraphDeployment, namespace, and component type (e.g frontend, worker, etc). The dashboard enables filtering by DynamoGraphDeployment, namespace, and component type (e.g., frontend, worker, etc.).
\ No newline at end of file
...@@ -128,9 +128,7 @@ spec: ...@@ -128,9 +128,7 @@ spec:
Apply the Dynamo dashboard configuration to populate Grafana with the Dynamo dashboard: Apply the Dynamo dashboard configuration to populate Grafana with the Dynamo dashboard:
```bash ```bash
pushd deploy/metrics/k8s kubectl apply -n monitoring -f deploy/observability/k8s/grafana-dynamo-dashboard-configmap.yaml
kubectl apply -n monitoring -f grafana-dynamo-dashboard-configmap.yaml
popd
``` ```
The dashboard is embedded in the ConfigMap. Since it is labeled with `grafana_dashboard: "1"`, the Grafana will discover and populate it to its list of available dashboards. The dashboard includes panels for: The dashboard is embedded in the ConfigMap. Since it is labeled with `grafana_dashboard: "1"`, the Grafana will discover and populate it to its list of available dashboards. The dashboard includes panels for:
......
<!--
SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
-->
# Dynamo Observability
## Getting Started Quickly
This is an example to get started quickly on a single machine.
### Prerequisites
Install these on your machine:
- [Docker](https://docs.docker.com/get-docker/)
- [Docker Compose](https://docs.docker.com/compose/install/)
### Starting the Observability Stack
Dynamo provides a Docker Compose-based observability stack that includes Prometheus, Grafana, Tempo, and various exporters for metrics, tracing, and visualization.
From the Dynamo root directory:
```bash
# Start infrastructure (NATS, etcd)
docker compose -f deploy/docker-compose.yml up -d
# Start observability stack (Prometheus, Grafana, Tempo, DCGM GPU exporter, NATS exporter)
docker compose -f deploy/docker-observability.yml up -d
```
For detailed setup instructions and configuration, see [Prometheus + Grafana Setup](prometheus-grafana.md).
## Observability Documentations
| Guide | Description | Environment Variables to Control |
|-------|-------------|----------------------------------|
| [Metrics](metrics.md) | Available metrics reference | `DYN_SYSTEM_PORT`† |
| [Health Checks](health-checks.md) | Component health monitoring and readiness probes | `DYN_SYSTEM_PORT`†, `DYN_SYSTEM_STARTING_HEALTH_STATUS`, `DYN_SYSTEM_HEALTH_PATH`, `DYN_SYSTEM_LIVE_PATH`, `DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS` |
| [Tracing](tracing.md) | Distributed tracing with OpenTelemetry and Tempo | `DYN_LOGGING_JSONL`†, `OTEL_EXPORT_ENABLED`†, `OTEL_EXPORTER_OTLP_TRACES_ENDPOINT`†, `OTEL_SERVICE_NAME`† |
| [Logging](logging.md) | Structured logging configuration | `DYN_LOGGING_JSONL`†, `DYN_LOG`, `DYN_LOG_USE_LOCAL_TZ`, `DYN_LOGGING_CONFIG_PATH`, `OTEL_SERVICE_NAME`†, `OTEL_EXPORT_ENABLED`†, `OTEL_EXPORTER_OTLP_TRACES_ENDPOINT`† |
**Variables marked with † are shared across multiple observability systems.**
## Developer Guides
| Guide | Description | Environment Variables to Control |
|-------|-------------|----------------------------------|
| [Metrics Developer Guide](metrics-developer-guide.md) | Creating custom metrics in Rust and Python | `DYN_SYSTEM_PORT`† |
## Kubernetes
For Kubernetes-specific setup and configuration, see [docs/kubernetes/observability/](../kubernetes/observability/).
---
## Topology
This provides:
- **Prometheus** on `http://localhost:9090` - metrics collection and querying
- **Grafana** on `http://localhost:3000` - visualization dashboards (username: `dynamo`, password: `dynamo`)
- **Tempo** on `http://localhost:3200` - distributed tracing backend
- **DCGM Exporter** on `http://localhost:9401/metrics` - GPU metrics
- **NATS Exporter** on `http://localhost:7777/metrics` - NATS messaging metrics
### Service Relationship Diagram
```mermaid
graph TD
BROWSER[Browser] -->|:3000| GRAFANA[Grafana :3000]
subgraph DockerComposeNetwork [Network inside Docker Compose]
NATS_PROM_EXP[nats-prom-exp :7777 /metrics] -->|:8222/varz| NATS_SERVER[nats-server :4222, :6222, :8222]
PROMETHEUS[Prometheus server :9090] -->|:2379/metrics| ETCD_SERVER[etcd-server :2379, :2380]
PROMETHEUS -->|:9401/metrics| DCGM_EXPORTER[dcgm-exporter :9401]
PROMETHEUS -->|:7777/metrics| NATS_PROM_EXP
PROMETHEUS -->|:8000/metrics| DYNAMOFE[Dynamo HTTP FE :8000]
PROMETHEUS -->|:8081/metrics| DYNAMOBACKEND[Dynamo backend :8081]
DYNAMOFE --> DYNAMOBACKEND
GRAFANA -->|:9090/query API| PROMETHEUS
end
```
The dcgm-exporter service in the Docker Compose network is configured to use port 9401 instead of the default port 9400. This adjustment is made to avoid port conflicts with other dcgm-exporter instances that may be running simultaneously. Such a configuration is typical in distributed systems like SLURM.
### Configuration Files
The following configuration files are located in the `deploy/observability/` directory:
- [docker-compose.yml](../../deploy/docker-compose.yml): Defines NATS and etcd services
- [docker-observability.yml](../../deploy/docker-observability.yml): Defines Prometheus, Grafana, Tempo, and exporters
- [prometheus.yml](../../deploy/observability/prometheus.yml): Contains Prometheus scraping configuration
- [grafana-datasources.yml](../../deploy/observability/grafana-datasources.yml): Contains Grafana datasource configuration
- [grafana_dashboards/dashboard-providers.yml](../../deploy/observability/grafana_dashboards/dashboard-providers.yml): Contains Grafana dashboard provider configuration
- [grafana_dashboards/dynamo.json](../../deploy/observability/grafana_dashboards/dynamo.json): A general Dynamo Dashboard for both SW and HW metrics
- [grafana_dashboards/dcgm-metrics.json](../../deploy/observability/grafana_dashboards/dcgm-metrics.json): Contains Grafana dashboard configuration for DCGM GPU metrics
- [grafana_dashboards/kvbm.json](../../deploy/observability/grafana_dashboards/kvbm.json): Contains Grafana dashboard configuration for KVBM metrics
...@@ -11,6 +11,38 @@ Dynamo provides health check and liveness HTTP endpoints for each component whic ...@@ -11,6 +11,38 @@ Dynamo provides health check and liveness HTTP endpoints for each component whic
can be used to configure startup, liveness and readiness probes in can be used to configure startup, liveness and readiness probes in
orchestration frameworks such as Kubernetes. orchestration frameworks such as Kubernetes.
## Environment Variables
| Variable | Description | Default | Example |
|----------|-------------|---------|---------|
| `DYN_SYSTEM_PORT` | System status server port | `8081` | `9090` |
| `DYN_SYSTEM_STARTING_HEALTH_STATUS` | Initial health status | `notready` | `ready`, `notready` |
| `DYN_SYSTEM_HEALTH_PATH` | Custom health endpoint path | `/health` | `/custom/health` |
| `DYN_SYSTEM_LIVE_PATH` | Custom liveness endpoint path | `/live` | `/custom/live` |
| `DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS` | Endpoints required for ready state | none | `["generate"]` |
## Getting Started Quickly
Enable health checks and query endpoints:
```bash
# Start your Dynamo components
python -m dynamo.frontend --http-port 8000 &
# Enable system status server on port 8081
DYN_SYSTEM_PORT=8081 python -m dynamo.vllm --model Qwen/Qwen3-0.6B --enforce-eager &
```
Check health status:
```bash
# Frontend health (port 8000)
curl -s localhost:8000/health | jq
# Worker health (port 8081)
curl -s localhost:8081/health | jq
```
## Frontend Liveness Check ## Frontend Liveness Check
The frontend liveness endpoint reports a status of `live` as long as The frontend liveness endpoint reports a status of `live` as long as
...@@ -124,16 +156,6 @@ when initializing and HTTP status code `HTTP/1.1 200 OK` once ready. ...@@ -124,16 +156,6 @@ when initializing and HTTP status code `HTTP/1.1 200 OK` once ready.
> **Note**: Both /live and /ready return the same information > **Note**: Both /live and /ready return the same information
### Environment Variables for Enabling Health Checks
| **Environment Variable** | **Description** | **Example Settings** |
| -------------------------| ------------------- | ------------------------------------------------ |
| `DYN_SYSTEM_PORT` | Specifies the port for the system status server (automatically enables it when set to a positive value). | `9090`, `8081` |
| `DYN_SYSTEM_STARTING_HEALTH_STATUS` | Sets the initial health status of the system (ready/not ready). | `ready`, `notready` |
| `DYN_SYSTEM_HEALTH_PATH` | Custom path for the health endpoint. | `/custom/health` |
| `DYN_SYSTEM_LIVE_PATH` | Custom path for the liveness endpoint. | `/custom/live` |
| `DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS` | Specifies endpoints to check for determining overall system health status. | `["generate"]` |
### Example Environment Setting ### Example Environment Setting
``` ```
......
<!-- <!--
SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0 SPDX-License-Identifier: Apache-2.0
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
--> -->
# Dynamo Logging # Dynamo Logging
...@@ -24,18 +12,38 @@ JSONL is enabled logs additionally contain `span` creation and exit ...@@ -24,18 +12,38 @@ JSONL is enabled logs additionally contain `span` creation and exit
events as well as support for `trace_id` and `span_id` fields for events as well as support for `trace_id` and `span_id` fields for
distributed tracing. distributed tracing.
## Environment Variables for configuring Logging ## Environment Variables
| Variable | Description | Default | Example |
|----------|-------------|---------|---------|
| `DYN_LOGGING_JSONL` | Enable JSONL logging format | `false` | `true` |
| `DYN_LOG` | Log levels per target `<default_level>,<module_path>=<level>,<module_path>=<level>` | `info` | `DYN_LOG=info,dynamo_runtime::system_status_server:trace` |
| `DYN_LOG_USE_LOCAL_TZ` | Use local timezone for timestamps (default is UTC) | `false` | `true` |
| `DYN_LOGGING_CONFIG_PATH` | Path to custom TOML logging configuration | none | `/path/to/config.toml` |
| `OTEL_SERVICE_NAME` | Service name for trace and span information | `dynamo` | `dynamo-frontend` |
| `OTEL_EXPORT_ENABLED` | Enable OTLP trace exporting | `false` | `true` |
| `OTEL_EXPORTER_OTLP_TRACES_ENDPOINT` | OTLP exporter endpoint | `http://localhost:4317` | `http://tempo:4317` |
## Getting Started Quickly
### Start Observability Stack
For collecting and visualizing logs with Grafana Loki (Kubernetes), or viewing trace context in logs alongside Grafana Tempo, start the observability stack. See [Observability Getting Started](README.md#getting-started-quickly) for instructions.
### Enable Structured Logging
Enable structured JSONL logging:
| Environment Variable | Description | Example Settings | ```bash
| ----------------------------------- | --------------------------------------------| ---------------------------------------------------- | export DYN_LOGGING_JSONL=true
| `DYN_LOGGING_JSONL` | Enable JSONL logging format (default: READABLE) | `DYN_LOGGING_JSONL=true` | export DYN_LOG=debug
| `DYN_LOG_USE_LOCAL_TZ` | Use local timezone for logging timestamps (default: UTC) | `DYN_LOG_USE_LOCAL_TZ=1` |
| `DYN_LOG` | Log levels per target `<default_level>,<module_path>=<level>,<module_path>=<level>` | `DYN_LOG=info,dynamo_runtime::system_status_server:trace` |
| `DYN_LOGGING_CONFIG_PATH` | Path to custom TOML logging configuration file | `DYN_LOGGING_CONFIG_PATH=/path/to/config.toml`|
| `OTEL_SERVICE_NAME` | Service name for OpenTelemetry traces (default: `dynamo`) | `OTEL_SERVICE_NAME=dynamo-frontend` |
| `OTEL_EXPORT_ENABLED` | Enable OTLP trace exporting (set to `1` to enable) | `OTEL_EXPORT_ENABLED=1` |
| `OTEL_EXPORTER_OTLP_TRACES_ENDPOINT` | OTLP exporter endpoint (default: http://localhost:4317) | `OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://tempo:4317` |
# Start your Dynamo components
python -m dynamo.frontend --http-port 8000 &
python -m dynamo.vllm --model Qwen/Qwen3-0.6B --enforce-eager &
```
Logs will be written to stderr in JSONL format with trace context.
## Available Logging Levels ## Available Logging Levels
...@@ -85,68 +93,57 @@ Resulting Log format: ...@@ -85,68 +93,57 @@ Resulting Log format:
{"time":"2025-09-02T15:53:31.943747Z","level":"INFO","target":"log","message":"Scheduler config values: {'max_num_seqs': 256, 'max_num_batched_tokens': 2048}","log.file":"/opt/dynamo/venv/lib/python3.12/site-packages/dynamo/vllm/main.py","log.line":268,"log.target":"main.get_engine_cache_info"} {"time":"2025-09-02T15:53:31.943747Z","level":"INFO","target":"log","message":"Scheduler config values: {'max_num_seqs': 256, 'max_num_batched_tokens': 2048}","log.file":"/opt/dynamo/venv/lib/python3.12/site-packages/dynamo/vllm/main.py","log.line":268,"log.target":"main.get_engine_cache_info"}
``` ```
## OpenTelemetry Distributed Tracing ## Logging of Trace and Span IDs
When `DYN_LOGGING_JSONL` is enabled, Dynamo uses OpenTelemetry for distributed tracing. All logs include `trace_id` and `span_id` fields, and spans are automatically created for requests. By default, traces are **not exported**. To export traces to an observability backend (like Tempo, Jaeger, or Zipkin), set `OTEL_EXPORT_ENABLED=1`.
### Behavior When `DYN_LOGGING_JSONL` is enabled, all logs include `trace_id` and `span_id` fields, and spans are automatically created for requests. This is useful for short debugging sessions where you want to examine trace context in logs without setting up a full tracing backend and for correlating log messages with traces.
- **With `DYN_LOGGING_JSONL=true` only**: OpenTelemetry layer is active, generating trace context and span IDs for all requests. Traces appear in logs but are not exported anywhere. The trace and span information uses the OpenTelemetry format and libraries, which means the IDs are compatible with OpenTelemetry-based tracing backends like Tempo or Jaeger if you later choose to enable trace export.
- **With `OTEL_EXPORT_ENABLED=1` and `DYN_LOGGING_JSONL=true`**: Same as above, plus traces are exported to an OTLP collector for visualization.
### Configuration **Note:** This section has overlap with [Distributed Tracing with Tempo](tracing.md). For trace visualization in Grafana Tempo and persistent trace analysis, see [Distributed Tracing with Tempo](tracing.md).
To enable OTLP trace exporting: ### Configuration for Logging
1. Set `OTEL_EXPORT_ENABLED=1` to enable trace export To see trace information in logs:
2. Optionally configure the endpoint using `OTEL_EXPORTER_OTLP_TRACES_ENDPOINT` (default: `http://localhost:4317`)
3. Optionally set `OTEL_SERVICE_NAME` to identify the service (useful in Kubernetes, default: `dynamo`)
**Export Settings:**
- **Protocol**: gRPC (Tonic)
- **Service Name**: Value of `OTEL_SERVICE_NAME` env var, or `dynamo` if not set
- **Endpoint**: Value of `OTEL_EXPORTER_OTLP_TRACES_ENDPOINT` env var, or `http://localhost:4317` if not set
### Example: JSONL Logging Only (No Export)
```bash ```bash
export DYN_LOGGING_JSONL=true export DYN_LOGGING_JSONL=true
# OpenTelemetry is active, traces appear in logs, but nothing is exported export DYN_LOG=debug # Set to debug to see detailed trace logs
# Start your Dynamo components (e.g., frontend and worker)
python -m dynamo.frontend --http-port 8000 &
python -m dynamo.vllm --model Qwen/Qwen3-0.6B --enforce-eager &
``` ```
### Example: JSONL Logging + Trace Export to Tempo This enables JSONL logging with `trace_id` and `span_id` fields. Traces appear in logs but are not exported to any backend.
### Example Request
Send a request to generate logs with trace context:
```bash ```bash
export DYN_LOGGING_JSONL=true curl -H 'Content-Type: application/json' \
export OTEL_EXPORT_ENABLED=1 -H 'x-request-id: test-trace-001' \
export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://tempo:4317 -d '{
export OTEL_SERVICE_NAME=dynamo-frontend "model": "Qwen/Qwen3-0.6B",
# OpenTelemetry is active, traces appear in logs AND are exported to Tempo "max_completion_tokens": 100,
"messages": [
{"role": "user", "content": "What is the capital of France?"}
]
}' \
http://localhost:8000/v1/chat/completions
``` ```
## Trace and Span Information Check the logs (stderr) for JSONL output containing `trace_id`, `span_id`, and `x_request_id` fields.
### Example Request ## Trace and Span Information in Logs
```sh This section shows how trace and span information appears in JSONL logs. These logs can be used to understand request flows even without a trace visualization backend.
curl -X POST http://localhost:8000/v1/chat/completions \
-H 'Content-Type: application/json' \ ### Example Disaggregated Trace in Grafana
-d '{
"model": "Qwen/Qwen3-0.6B",
"messages": [
{
"role": "user",
"content": "Explain why Roger Federer is considered one of the greatest tennis players of all time"
}
],
"stream": true,
"max_tokens": 1000,
}'
```
When viewing the corresponding trace in Grafana, you should be able to see something like the following: When viewing the corresponding trace in Grafana, you should be able to see something like the following:
![Trace Example](./grafana-disagg-trace.png) ![Disaggregated Trace Example](grafana-disagg-trace.png)
### Trace Overview ### Trace Overview
...@@ -208,18 +205,18 @@ When viewing the corresponding trace in Grafana, you should be able to see somet ...@@ -208,18 +205,18 @@ When viewing the corresponding trace in Grafana, you should be able to see somet
| **Busy Time** | 3,795,258 ns (3.79ms) | | **Busy Time** | 3,795,258 ns (3.79ms) |
| **Idle Time** | 3,996,532,471 ns (3.99s) | | **Idle Time** | 3,996,532,471 ns (3.99s) |
### Frontend Logs ### Frontend Logs with Trace Context
The following shows the JSONL logs from the frontend service for the same request. Note the `trace_id` field (`b672ccf48683b392891c5cb4163d4b51`) that correlates all logs for this request, and the `span_id` field that identifies individual operations: The following shows the JSONL logs from the frontend service for the same request. Note the `trace_id` field (`b672ccf48683b392891c5cb4163d4b51`) that correlates all logs for this request, and the `span_id` field that identifies individual operations:
``` ```
{"time":"2025-10-31T20:52:07.707164Z","level":"INFO","file":"/opt/dynamo/lib/runtime/src/logging.rs","line":806,"target":"dynamo_runtime::logging","message":"OpenTelemetry OTLP export enabled","endpoint":"http://tempo.tm.svc.cluster.local:4317","service":"frontend"} {"time":"2025-10-31T20:52:07.707164Z","level":"INFO","file":"/opt/dynamo/lib/runtime/src/logging.rs","line":806,"target":"dynamo_runtime::logging","message":"OTLP export enabled","endpoint":"http://tempo.tm.svc.cluster.local:4317","service":"frontend"}
{"time":"2025-10-31T20:52:10.707164Z","level":"DEBUG","file":"/opt/dynamo/lib/runtime/src/pipeline/network/tcp/server.rs","line":230,"target":"dynamo_runtime::pipeline::network::tcp::server","message":"Registering new TcpStream on 10.0.4.65:41959","method":"POST","span_id":"5c20cc08e6afb2b7","span_name":"http-request","trace_id":"b672ccf48683b392891c5cb4163d4b51","uri":"/v1/chat/completions","version":"HTTP/1.1"} {"time":"2025-10-31T20:52:10.707164Z","level":"DEBUG","file":"/opt/dynamo/lib/runtime/src/pipeline/network/tcp/server.rs","line":230,"target":"dynamo_runtime::pipeline::network::tcp::server","message":"Registering new TcpStream on 10.0.4.65:41959","method":"POST","span_id":"5c20cc08e6afb2b7","span_name":"http-request","trace_id":"b672ccf48683b392891c5cb4163d4b51","uri":"/v1/chat/completions","version":"HTTP/1.1"}
{"time":"2025-10-31T20:52:10.745264Z","level":"DEBUG","file":"/opt/dynamo/lib/llm/src/kv_router/prefill_router.rs","line":232,"target":"dynamo_llm::kv_router::prefill_router","message":"Prefill succeeded, using disaggregated params for decode","method":"POST","span_id":"5c20cc08e6afb2b7","span_name":"http-request","trace_id":"b672ccf48683b392891c5cb4163d4b51","uri":"/v1/chat/completions","version":"HTTP/1.1"} {"time":"2025-10-31T20:52:10.745264Z","level":"DEBUG","file":"/opt/dynamo/lib/llm/src/kv_router/prefill_router.rs","line":232,"target":"dynamo_llm::kv_router::prefill_router","message":"Prefill succeeded, using disaggregated params for decode","method":"POST","span_id":"5c20cc08e6afb2b7","span_name":"http-request","trace_id":"b672ccf48683b392891c5cb4163d4b51","uri":"/v1/chat/completions","version":"HTTP/1.1"}
{"time":"2025-10-31T20:52:10.745545Z","level":"DEBUG","file":"/opt/dynamo/lib/runtime/src/pipeline/network/tcp/server.rs","line":230,"target":"dynamo_runtime::pipeline::network::tcp::server","message":"Registering new TcpStream on 10.0.4.65:41959","method":"POST","span_id":"5c20cc08e6afb2b7","span_name":"http-request","trace_id":"b672ccf48683b392891c5cb4163d4b51","uri":"/v1/chat/completions","version":"HTTP/1.1"} {"time":"2025-10-31T20:52:10.745545Z","level":"DEBUG","file":"/opt/dynamo/lib/runtime/src/pipeline/network/tcp/server.rs","line":230,"target":"dynamo_runtime::pipeline::network::tcp::server","message":"Registering new TcpStream on 10.0.4.65:41959","method":"POST","span_id":"5c20cc08e6afb2b7","span_name":"http-request","trace_id":"b672ccf48683b392891c5cb4163d4b51","uri":"/v1/chat/completions","version":"HTTP/1.1"}
``` ```
## Custom Request IDs ## Custom Request IDs in Logs
You can provide a custom request ID using the `x-request-id` header. This ID will be attached to all spans and logs for that request, making it easier to correlate traces with application-level request tracking. You can provide a custom request ID using the `x-request-id` header. This ID will be attached to all spans and logs for that request, making it easier to correlate traces with application-level request tracking.
...@@ -237,7 +234,7 @@ curl -X POST http://localhost:8000/v1/chat/completions \ ...@@ -237,7 +234,7 @@ curl -X POST http://localhost:8000/v1/chat/completions \
"content": "Explain why Roger Federer is considered one of the greatest tennis players of all time" "content": "Explain why Roger Federer is considered one of the greatest tennis players of all time"
} }
], ],
"stream": true, "stream": false,
"max_tokens": 1000 "max_tokens": 1000
}' }'
``` ```
......
<!--
SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
-->
# Metrics Developer Guide
This guide explains how to create and use custom metrics in Dynamo components using the Dynamo metrics API.
## Metrics Exposure
All metrics created via the Dynamo metrics API are automatically exposed on the `/metrics` HTTP endpoint in Prometheus Exposition Format text when the following environment variable is set:
- `DYN_SYSTEM_PORT=<port>` - Port for the metrics endpoint (set to positive value to enable, default: `-1` disabled)
Example:
```bash
DYN_SYSTEM_PORT=8081 python -m dynamo.vllm --model <model>
```
Prometheus Exposition Format text metrics will be available at: `http://localhost:8081/metrics`
## Metric Name Constants
The [prometheus_names.rs](../../lib/runtime/src/metrics/prometheus_names.rs) module provides centralized metric name constants and sanitization functions to ensure consistency across all Dynamo components.
---
## Metrics API in Rust
The metrics API is accessible through the `.metrics()` method on runtime, namespace, component, and endpoint objects. See [Runtime Hierarchy](metrics.md#runtime-hierarchy) for details on the hierarchical structure.
### Available Methods
- `.metrics().create_counter()`: Create a counter metric
- `.metrics().create_gauge()`: Create a gauge metric
- `.metrics().create_histogram()`: Create a histogram metric
- `.metrics().create_countervec()`: Create a counter with labels
- `.metrics().create_gaugevec()`: Create a gauge with labels
- `.metrics().create_histogramvec()`: Create a histogram with labels
### Creating Metrics
```rust
use dynamo_runtime::DistributedRuntime;
let runtime = DistributedRuntime::new()?;
let endpoint = runtime.namespace("my_namespace").component("my_component").endpoint("my_endpoint");
// Simple metrics
let requests_total = endpoint.metrics().create_counter(
"requests_total",
"Total requests",
&[]
)?;
let active_connections = endpoint.metrics().create_gauge(
"active_connections",
"Active connections",
&[]
)?;
let latency = endpoint.metrics().create_histogram(
"latency_seconds",
"Request latency",
&[],
Some(vec![0.001, 0.01, 0.1, 1.0, 10.0])
)?;
```
### Using Metrics
```rust
// Counters
requests_total.inc();
// Gauges
active_connections.set(42.0);
active_connections.inc();
active_connections.dec();
// Histograms
latency.observe(0.023); // 23ms
```
### Vector Metrics with Labels
```rust
// Create vector metrics with label names
let requests_by_model = endpoint.metrics().create_countervec(
"requests_by_model",
"Requests by model",
&["model_type", "model_size"],
&[]
)?;
let memory_by_gpu = endpoint.metrics().create_gaugevec(
"gpu_memory_bytes",
"GPU memory by device",
&["gpu_id", "memory_type"],
&[]
)?;
// Use with specific label values
requests_by_model.with_label_values(&["llama", "7b"]).inc();
memory_by_gpu.with_label_values(&["0", "allocated"]).set(8192.0);
```
### Advanced Features
**Custom histogram buckets:**
```rust
let latency = endpoint.metrics().create_histogram(
"latency_seconds",
"Request latency",
&[],
Some(vec![0.001, 0.01, 0.1, 1.0, 10.0])
)?;
```
**Constant labels:**
```rust
let counter = endpoint.metrics().create_counter(
"requests_total",
"Total requests",
&[("region", "us-west"), ("env", "prod")]
)?;
```
---
## Metrics API in Python
Python components can create and manage Prometheus metrics using the same metrics API through Python bindings.
### Available Methods
- `endpoint.metrics.create_counter()` / `create_intcounter()`: Create a counter metric
- `endpoint.metrics.create_gauge()` / `create_intgauge()`: Create a gauge metric
- `endpoint.metrics.create_histogram()`: Create a histogram metric
- `endpoint.metrics.create_countervec()` / `create_intcountervec()`: Create a counter with labels
- `endpoint.metrics.create_gaugevec()` / `create_intgaugevec()`: Create a gauge with labels
- `endpoint.metrics.create_histogramvec()`: Create a histogram with labels
All metrics are imported from `dynamo.prometheus_metrics`.
### Creating Metrics
```python
from dynamo.runtime import DistributedRuntime
drt = DistributedRuntime()
endpoint = drt.namespace("my_namespace").component("my_component").endpoint("my_endpoint")
# Simple metrics
requests_total = endpoint.metrics.create_intcounter(
"requests_total",
"Total requests"
)
active_connections = endpoint.metrics.create_intgauge(
"active_connections",
"Active connections"
)
latency = endpoint.metrics.create_histogram(
"latency_seconds",
"Request latency",
buckets=[0.001, 0.01, 0.1, 1.0, 10.0]
)
```
### Using Metrics
```python
# Counters
requests_total.inc()
requests_total.inc_by(5)
# Gauges
active_connections.set(42)
active_connections.inc()
active_connections.dec()
# Histograms
latency.observe(0.023) # 23ms
```
### Vector Metrics with Labels
```python
# Create vector metrics with label names
requests_by_model = endpoint.metrics.create_intcountervec(
"requests_by_model",
"Requests by model",
["model_type", "model_size"]
)
memory_by_gpu = endpoint.metrics.create_intgaugevec(
"gpu_memory_bytes",
"GPU memory by device",
["gpu_id", "memory_type"]
)
# Use with specific label values
requests_by_model.inc({"model_type": "llama", "model_size": "7b"})
memory_by_gpu.set(8192, {"gpu_id": "0", "memory_type": "allocated"})
```
### Advanced Features
**Constant labels:**
```python
counter = endpoint.metrics.create_intcounter(
"requests_total",
"Total requests",
[("region", "us-west"), ("env", "prod")]
)
```
**Metric introspection:**
```python
print(counter.name()) # "my_namespace_my_component_my_endpoint_requests_total"
print(counter.const_labels()) # {"dynamo_namespace": "my_namespace", ...}
print(gauge_vec.variable_labels()) # ["model_type", "model_size"]
```
**Update patterns:**
Background thread updates:
```python
import threading
import time
def update_loop():
while True:
active_connections.set(compute_current_connections())
time.sleep(2)
threading.Thread(target=update_loop, daemon=True).start()
```
Callback-based updates (called before each `/metrics` scrape):
```python
def update_metrics():
active_connections.set(compute_current_connections())
endpoint.metrics.register_callback(update_metrics)
```
### Examples
Example scripts: [lib/bindings/python/examples/metrics/](../../lib/bindings/python/examples/metrics/)
```bash
cd ~/dynamo/lib/bindings/python/examples/metrics
DYN_SYSTEM_PORT=8081 ./server_with_loop.py
DYN_SYSTEM_PORT=8081 ./server_with_callback.py
```
---
## Related Documentation
- [Metrics Overview](metrics.md)
- [Prometheus and Grafana Setup](prometheus-grafana.md)
- [Distributed Runtime Architecture](../design_docs/distributed_runtime.md)
- [Python Metrics Examples](../../lib/bindings/python/examples/metrics/)
...@@ -3,27 +3,91 @@ SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All ...@@ -3,27 +3,91 @@ SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All
SPDX-License-Identifier: Apache-2.0 SPDX-License-Identifier: Apache-2.0
--> -->
# Dynamo MetricsRegistry # Dynamo Metrics
## Overview ## Overview
Dynamo provides built-in metrics capabilities through the `MetricsRegistry` trait, which is automatically available whenever you use the `DistributedRuntime` framework. This guide explains how to use metrics for observability and monitoring across all Dynamo components. Dynamo provides built-in metrics capabilities through the Dynamo metrics API, which is automatically available whenever you use the `DistributedRuntime` framework. This document serves as a reference for all available metrics in Dynamo.
## Automatic Metrics **For visualization setup instructions**, see the [Prometheus and Grafana Setup Guide](prometheus-grafana.md).
Dynamo automatically exposes metrics with the `dynamo_` name prefixes. It also adds the following labels `dynamo_namespace`, `dynamo_component`, and `dynamo_endpoint` to indicate which component is providing the metric. **For creating custom metrics**, see the [Metrics Developer Guide](metrics-developer-guide.md).
**Frontend Metrics**: When using Dynamo HTTP Frontend (`--framework VLLM` or `--framework TRTLLM`), these metrics are automatically exposed with the `dynamo_frontend_*` prefix and include `model` labels containing the model name. These cover request handling, token processing, and latency measurements. See [prometheus-grafana.md](prometheus-grafana.md#available-metrics) for the complete list of frontend metrics. ## Environment Variables
**Component Metrics**: The core Dynamo backend system automatically exposes metrics with the `dynamo_component_*` prefix for all components that use the `DistributedRuntime` framework. These include request counts, processing times, byte transfers, and system uptime metrics. See [prometheus-grafana.md](prometheus-grafana.md#available-metrics) for the complete list of component metrics. | Variable | Description | Default | Example |
|----------|-------------|---------|---------|
| `DYN_SYSTEM_PORT` | System metrics/health port | `-1` (disabled) | `8081` |
**Specialized Component Metrics**: Components can also expose additional metrics specific to their functionality. For example, a `preprocessor` component exposes metrics with the `dynamo_preprocessor_*` prefix. See [prometheus-grafana.md](prometheus-grafana.md#available-metrics) for details on specialized component metrics. ## Getting Started Quickly
**Kubernetes Integration**: For comprehensive Kubernetes deployment and monitoring setup, see the [Kubernetes Metrics Guide](../kubernetes/observability/metrics.md). This includes Prometheus Operator setup, metrics collection configuration, and visualization in Grafana. This is a single machine example.
## Metrics Hierarchy ### Start Observability Stack
The `MetricsRegistry` trait is implemented by `DistributedRuntime`, `Namespace`, `Component`, and `Endpoint`, providing a hierarchical approach to metric collection that matches Dynamo's distributed architecture: For visualizing metrics with Prometheus and Grafana, start the observability stack. See [Observability Getting Started](README.md#getting-started-quickly) for instructions.
### Launch Dynamo Components
Launch a frontend and vLLM backend to test metrics:
```bash
$ python -m dynamo.frontend --http-port 8000
# Enable system metrics server on port 8081
$ DYN_SYSTEM_PORT=8081 python -m dynamo.vllm --model Qwen/Qwen3-0.6B \
--enforce-eager --no-enable-prefix-caching --max-num-seqs 3
```
Wait for the vLLM worker to start, then send requests and check metrics:
```bash
# Send a request
curl -H 'Content-Type: application/json' \
-d '{
"model": "Qwen/Qwen3-0.6B",
"max_completion_tokens": 100,
"messages": [{"role": "user", "content": "Hello"}]
}' \
http://localhost:8000/v1/chat/completions
# Check metrics from the worker
curl -s localhost:8081/metrics | grep dynamo_component
```
## Exposed Metrics
Dynamo exposes metrics in Prometheus Exposition Format text at the `/metrics` HTTP endpoint. All Dynamo-generated metrics use the `dynamo_*` prefix and include labels (`dynamo_namespace`, `dynamo_component`, `dynamo_endpoint`) to identify the source component.
**Example Prometheus Exposition Format text:**
```
# HELP dynamo_component_requests_total Total requests processed
# TYPE dynamo_component_requests_total counter
dynamo_component_requests_total{dynamo_namespace="default",dynamo_component="worker",dynamo_endpoint="generate"} 42
# HELP dynamo_component_request_duration_seconds Request processing time
# TYPE dynamo_component_request_duration_seconds histogram
dynamo_component_request_duration_seconds_bucket{dynamo_namespace="default",dynamo_component="worker",dynamo_endpoint="generate",le="0.005"} 10
dynamo_component_request_duration_seconds_bucket{dynamo_namespace="default",dynamo_component="worker",dynamo_endpoint="generate",le="0.01"} 15
dynamo_component_request_duration_seconds_bucket{dynamo_namespace="default",dynamo_component="worker",dynamo_endpoint="generate",le="+Inf"} 42
dynamo_component_request_duration_seconds_sum{dynamo_namespace="default",dynamo_component="worker",dynamo_endpoint="generate"} 2.5
dynamo_component_request_duration_seconds_count{dynamo_namespace="default",dynamo_component="worker",dynamo_endpoint="generate"} 42
```
### Metric Categories
Dynamo exposes several categories of metrics:
- **Frontend Metrics** (`dynamo_frontend_*`) - Request handling, token processing, and latency measurements
- **Component Metrics** (`dynamo_component_*`) - Request counts, processing times, byte transfers, and system uptime
- **Specialized Component Metrics** (e.g., `dynamo_preprocessor_*`) - Component-specific metrics
- **Engine Metrics** (Pass-through) - Backend engines expose their own metrics: [vLLM](../backends/vllm/prometheus.md) (`vllm:*`), [SGLang](../backends/sglang/prometheus.md) (`sglang:*`), [TensorRT-LLM](../backends/trtllm/prometheus.md) (`trtllm:*`)
## Runtime Hierarchy
The Dynamo metrics API is available on `DistributedRuntime`, `Namespace`, `Component`, and `Endpoint`, providing a hierarchical approach to metric collection that matches Dynamo's distributed architecture:
- `DistributedRuntime`: Global metrics across the entire runtime - `DistributedRuntime`: Global metrics across the entire runtime
- `Namespace`: Metrics scoped to a specific dynamo_namespace - `Namespace`: Metrics scoped to a specific dynamo_namespace
...@@ -32,65 +96,116 @@ The `MetricsRegistry` trait is implemented by `DistributedRuntime`, `Namespace`, ...@@ -32,65 +96,116 @@ The `MetricsRegistry` trait is implemented by `DistributedRuntime`, `Namespace`,
This hierarchical structure allows you to create metrics at the appropriate level of granularity for your monitoring needs. This hierarchical structure allows you to create metrics at the appropriate level of granularity for your monitoring needs.
## Available Metrics
## Getting Started ### Backend Component Metrics
For a complete setup guide including Docker Compose configuration, Prometheus setup, and Grafana dashboards, see the [Getting Started section](prometheus-grafana.md#getting-started) in the Prometheus and Grafana guide. The core Dynamo backend system automatically exposes metrics with the `dynamo_component_*` prefix for all components that use the `DistributedRuntime` framework:
The quick start includes: - `dynamo_component_inflight_requests`: Requests currently being processed (gauge)
- Docker Compose setup for Prometheus and Grafana - `dynamo_component_request_bytes_total`: Total bytes received in requests (counter)
- Pre-configured dashboards and datasources - `dynamo_component_request_duration_seconds`: Request processing time (histogram)
- Access URLs for all monitoring endpoints - `dynamo_component_requests_total`: Total requests processed (counter)
- GPU targeting configuration - `dynamo_component_response_bytes_total`: Total bytes sent in responses (counter)
- `dynamo_component_system_uptime_seconds`: DistributedRuntime uptime (gauge)
## Implementation Examples ### KV Router Statistics (kvstats)
Examples of creating metrics at different hierarchy levels and using dynamic labels are included in this document below. KV router statistics are automatically exposed by LLM workers and KV router components with the `dynamo_component_kvstats_*` prefix. These metrics provide insights into GPU memory usage and cache efficiency:
### Grafana Dashboards - `dynamo_component_kvstats_active_blocks`: Number of active KV cache blocks currently in use (gauge)
- `dynamo_component_kvstats_total_blocks`: Total number of KV cache blocks available (gauge)
- `dynamo_component_kvstats_gpu_cache_usage_percent`: GPU cache usage as a percentage (0.0-1.0) (gauge)
- `dynamo_component_kvstats_gpu_prefix_cache_hit_rate`: GPU prefix cache hit rate as a percentage (0.0-1.0) (gauge)
Use dashboards in `deploy/metrics/grafana_dashboards/`: These metrics are published by:
- `grafana-dynamo-dashboard.json`: General Dynamo dashboard - **LLM Workers**: vLLM and TRT-LLM backends publish these metrics through their respective publishers
- `grafana-dcgm-metrics.json`: DCGM GPU metrics dashboard - **KV Router**: The KV router component aggregates and exposes these metrics for load balancing decisions
## Metrics Visualization Architecture ### Specialized Component Metrics
### Service Topology Some components expose additional metrics specific to their functionality:
The metrics system follows this architecture for collecting and visualizing metrics: - `dynamo_preprocessor_*`: Metrics specific to preprocessor components
```mermaid ### Frontend Metrics
graph TD
BROWSER[Browser] -->|:3001| GRAFANA[Grafana :3001] When using Dynamo HTTP Frontend (`--framework VLLM` or `--framework TRTLLM`), these metrics are automatically exposed with the `dynamo_frontend_*` prefix and include `model` labels containing the model name:
subgraph DockerComposeNetwork [Network inside Docker Compose]
NATS_PROM_EXP[nats-prom-exp :7777 /metrics] -->|:8222/varz| NATS_SERVER[nats-server :4222, :6222, :8222] - `dynamo_frontend_inflight_requests`: Inflight requests (gauge)
PROMETHEUS[Prometheus server :9090] -->|:2379/metrics| ETCD_SERVER[etcd-server :2379, :2380] - `dynamo_frontend_queued_requests`: Number of requests in HTTP processing queue (gauge)
PROMETHEUS -->|:9401/metrics| DCGM_EXPORTER[dcgm-exporter :9401] - `dynamo_frontend_input_sequence_tokens`: Input sequence length (histogram)
PROMETHEUS -->|:7777/metrics| NATS_PROM_EXP - `dynamo_frontend_inter_token_latency_seconds`: Inter-token latency (histogram)
PROMETHEUS -->|:8000/metrics| DYNAMOFE[Dynamo HTTP FE :8000] - `dynamo_frontend_output_sequence_tokens`: Output sequence length (histogram)
PROMETHEUS -->|:8081/metrics| DYNAMOBACKEND[Dynamo backend :8081] - `dynamo_frontend_request_duration_seconds`: LLM request duration (histogram)
DYNAMOFE --> DYNAMOBACKEND - `dynamo_frontend_requests_total`: Total LLM requests (counter)
GRAFANA -->|:9090/query API| PROMETHEUS - `dynamo_frontend_time_to_first_token_seconds`: Time to first token (histogram)
end
``` **Note**: The `dynamo_frontend_inflight_requests` metric tracks requests from HTTP handler start until the complete response is finished, while `dynamo_frontend_queued_requests` tracks requests from HTTP handler start until first token generation begins (including prefill time). HTTP queue time is a subset of inflight time.
#### Model Configuration Metrics
### Grafana Dashboard The frontend also exposes model configuration metrics with the `dynamo_frontend_model_*` prefix. These metrics are populated from the worker backend registration service when workers register with the system:
The metrics system includes a pre-configured Grafana dashboard for visualizing service metrics: **Runtime Config Metrics (from ModelRuntimeConfig):**
These metrics come from the runtime configuration provided by worker backends during registration.
![Grafana Dynamo Dashboard](./grafana-dynamo-composite.png) - `dynamo_frontend_model_total_kv_blocks`: Total KV blocks available for a worker serving the model (gauge)
- `dynamo_frontend_model_max_num_seqs`: Maximum number of sequences for a worker serving the model (gauge)
- `dynamo_frontend_model_max_num_batched_tokens`: Maximum number of batched tokens for a worker serving the model (gauge)
## Detailed Setup Guide **MDC Metrics (from ModelDeploymentCard):**
These metrics come from the Model Deployment Card information provided by worker backends during registration. Note that when multiple worker instances register with the same model name, only the first instance's configuration metrics (runtime config and MDC metrics) will be populated. Subsequent instances with duplicate model names will be skipped for configuration metric updates, though the worker count metric will reflect all instances.
For complete setup instructions including Docker Compose, Prometheus configuration, and Grafana dashboards, see: - `dynamo_frontend_model_context_length`: Maximum context length for a worker serving the model (gauge)
- `dynamo_frontend_model_kv_cache_block_size`: KV cache block size for a worker serving the model (gauge)
- `dynamo_frontend_model_migration_limit`: Request migration limit for a worker serving the model (gauge)
```{toctree} **Worker Management Metrics:**
:hidden: - `dynamo_frontend_model_workers`: Number of worker instances currently serving the model (gauge)
prometheus-grafana ### Request Processing Flow
This section explains the distinction between two key metrics used to track request processing:
1. **Inflight**: Tracks requests from HTTP handler start until the complete response is finished
2. **HTTP Queue**: Tracks requests from HTTP handler start until first token generation begins (including prefill time)
**Example Request Flow:**
```
curl -s localhost:8000/v1/completions -H "Content-Type: application/json" -d '{
"model": "Qwen/Qwen3-0.6B",
"prompt": "Hello let's talk about LLMs",
"stream": false,
"max_tokens": 1000
}'
``` ```
- [Prometheus and Grafana Setup Guide](prometheus-grafana.md) **Timeline:**
```
Timeline: 0, 1, ...
Client ────> Frontend:8000 ────────────────────> Dynamo component/backend (vLLM, SGLang, TRT)
│request start │received │
| | |
│ ├──> start prefill ──> first token ──> |last token
│ │ (not impl) | |
├─────actual HTTP queue¹ ──────────┘ │ |
│ │ │
├─────implemented HTTP queue ─────────────────────────────┘ |
│ │
└─────────────────────────────────── Inflight ────────────────────────────┘
```
**Concurrency Example:**
Suppose the backend allows 3 concurrent requests and there are 10 clients continuously hitting the frontend:
- All 10 requests will be counted as inflight (from start until complete response)
- 7 requests will be in HTTP queue most of the time
- 3 requests will be actively processed (between first token and last token)
**Key Differences:**
- **Inflight**: Measures total request lifetime including processing time
- **HTTP Queue**: Measures queuing time before processing begins (including prefill time)
- **HTTP Queue ≤ Inflight** (HTTP queue is a subset of inflight time)
## Related Documentation ## Related Documentation
......
This diff is collapsed.
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
...@@ -5,87 +5,61 @@ SPDX-License-Identifier: Apache-2.0 ...@@ -5,87 +5,61 @@ SPDX-License-Identifier: Apache-2.0
# Distributed Tracing with Tempo # Distributed Tracing with Tempo
This guide explains how to set up and view distributed traces in Grafana Tempo for Dynamo workloads.
## Overview ## Overview
Dynamo supports OpenTelemetry-based distributed tracing, allowing you to visualize request flows across Frontend and Worker components. Traces are exported to Tempo via OTLP (OpenTelemetry Protocol) and visualized in Grafana. Dynamo supports OpenTelemetry-based distributed tracing for visualizing request flows across Frontend and Worker components. Traces are exported to Tempo via OTLP (OpenTelemetry Protocol) and visualized in Grafana.
**Requirements:** Set `DYN_LOGGING_JSONL=true` and `OTEL_EXPORT_ENABLED=true` to export traces to Tempo.
## Prerequisites This guide covers single GPU demo setup using Docker Compose. For Kubernetes deployments, see [Kubernetes Deployment](#kubernetes-deployment).
- Docker and Docker Compose (for local deployment) **Note:** This section has overlap with [Logging of OpenTelemetry Tracing](logging.md) since OpenTelemetry has aspects of both logging and tracing. The tracing approach documented here is for persistent trace visualization and analysis. For short debugging sessions examining trace context directly in logs, see the [Logging](logging.md) guide.
- Kubernetes cluster with kubectl access (for Kubernetes deployment)
- Dynamo runtime with tracing support
## Environment Variables ## Environment Variables
Dynamo's tracing is configured via environment variables. For complete logging documentation, see [docs/observability/logging.md](../../docs/observability/logging.md). | Variable | Description | Default | Example |
|----------|-------------|---------|---------|
| `DYN_LOGGING_JSONL` | Enable JSONL logging format (required for tracing) | `false` | `true` |
| `OTEL_EXPORT_ENABLED` | Enable OTLP trace export | `false` | `true` |
| `OTEL_EXPORTER_OTLP_TRACES_ENDPOINT` | OTLP gRPC endpoint for Tempo | `http://localhost:4317` | `http://tempo:4317` |
| `OTEL_SERVICE_NAME` | Service name for identifying components | `dynamo` | `dynamo-frontend` |
### Required Environment Variables ## Getting Started Quickly
| Variable | Description | Example Value | ### 1. Start Observability Stack
|----------|-------------|---------------|
| `DYN_LOGGING_JSONL` | Enable JSONL logging format (required for tracing) | `true` |
| `OTEL_EXPORT_ENABLED` | Enable OTLP trace export | `1` |
| `OTEL_EXPORTER_OTLP_TRACES_ENDPOINT` | OTLP gRPC endpoint for Tempo | `http://localhost:4317` (local) or `http://tempo:4317` (docker) |
| `OTEL_SERVICE_NAME` | Service name for identifying components | `dynamo-frontend`, `dynamo-worker-prefill`, `dynamo-worker-decode` |
**Note:** When `OTEL_EXPORT_ENABLED=1`, logging initialization is deferred until the runtime is available (required by the OTEL exporter). This means some early logs will be dropped. This will be fixed in a future release. Start the observability stack (Prometheus, Grafana, Tempo, exporters). See [Observability Getting Started](README.md#getting-started-quickly) for instructions.
### Example Configuration ### 2. Set Environment Variables
Configure Dynamo components to export traces:
```bash ```bash
# Enable JSONL logging and tracing # Enable JSONL logging and tracing
export DYN_LOGGING_JSONL=true export DYN_LOGGING_JSONL=true
export OTEL_EXPORT_ENABLED=true
# Enable trace export to Tempo export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://localhost:4317
export OTEL_EXPORT_ENABLED=1
# Set the Tempo endpoint (docker-compose network)
export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://tempo:4317
# Set service name to identify this component
export OTEL_SERVICE_NAME=dynamo-frontend
``` ```
--- ### 3. Start Dynamo Components (Single GPU)
## Local Deployment with Docker Compose
### 1. Start Tempo and Grafana
From the `deploy/tracing` directory, start the observability stack: For a simple single-GPU deployment, start the frontend and a single vLLM worker:
```bash ```bash
cd deploy/tracing # Start the frontend with tracing enabled
docker-compose up -d export OTEL_SERVICE_NAME=dynamo-frontend
``` python -m dynamo.frontend --router-mode kv --http-port=8000 &
This will start:
- **Tempo** on `http://localhost:3200` (HTTP API) and `localhost:4317` (OTLP gRPC)
- **Grafana** on `http://localhost:3000` (username: `admin`, password: `admin`)
Verify services are running: # Start a single vLLM worker (aggregated prefill and decode)
export OTEL_SERVICE_NAME=dynamo-worker-vllm
python -m dynamo.vllm --model Qwen/Qwen3-0.6B --enforce-eager &
```bash wait
docker-compose ps
``` ```
### 2. Set Environment Variables This runs both prefill and decode on the same GPU, providing a simpler setup for testing tracing.
Configure Dynamo components to export traces: ### Alternative: Disaggregated Deployment (2 GPUs)
```bash
# Enable JSONL logging and tracing
export DYN_LOGGING_JSONL=true
export OTEL_EXPORT_ENABLED=1
export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://localhost:4317
# Set service names for each component
export OTEL_SERVICE_NAME=dynamo-frontend
```
### 3. Run vLLM Disaggregated Deployment
Run the vLLM disaggregated script with tracing enabled: Run the vLLM disaggregated script with tracing enabled:
...@@ -106,70 +80,66 @@ trap 'echo Cleaning up...; kill 0' EXIT ...@@ -106,70 +80,66 @@ trap 'echo Cleaning up...; kill 0' EXIT
# Enable tracing # Enable tracing
export DYN_LOGGING_JSONL=true export DYN_LOGGING_JSONL=true
export OTEL_EXPORT_ENABLED=1 export OTEL_EXPORT_ENABLED=true
export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://localhost:4317 export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://localhost:4317
# Run frontend # Run frontend
export OTEL_SERVICE_NAME=dynamo-frontend export OTEL_SERVICE_NAME=dynamo-frontend
python -m dynamo.frontend --router-mode kv --http-port=8000 & python -m dynamo.frontend --router-mode kv --http-port=8000 &
# Run decode worker # Run decode worker, make sure to wait for start up
export OTEL_SERVICE_NAME=dynamo-worker-decode export OTEL_SERVICE_NAME=dynamo-worker-decode
CUDA_VISIBLE_DEVICES=0 python3 -m dynamo.vllm --model Qwen/Qwen3-0.6B --enforce-eager & CUDA_VISIBLE_DEVICES=0 python3 -m dynamo.vllm --model Qwen/Qwen3-0.6B --enforce-eager &
# Run prefill worker # Run prefill worker, make sure to wait for start up
export OTEL_SERVICE_NAME=dynamo-worker-prefill export OTEL_SERVICE_NAME=dynamo-worker-prefill
CUDA_VISIBLE_DEVICES=1 python3 -m dynamo.vllm \ CUDA_VISIBLE_DEVICES=1 python3 -m dynamo.vllm \
--model Qwen/Qwen3-0.6B \ --model Qwen/Qwen3-0.6B \
--enforce-eager \ --enforce-eager \
--is-prefill-worker & --is-prefill-worker &
wait
``` ```
For disaggregated deployments, this separates prefill and decode onto different GPUs for better resource utilization.
### 4. Generate Traces ### 4. Generate Traces
Send requests to the frontend to generate traces: Send requests to the frontend to generate traces (works for both aggregated and disaggregated deployments). **Note the `x-request-id` header**, which allows you to easily search for and correlate this specific trace in Grafana:
```bash ```bash
curl -d '{ curl -H 'Content-Type: application/json' \
-H 'x-request-id: test-trace-001' \
-d '{
"model": "Qwen/Qwen3-0.6B", "model": "Qwen/Qwen3-0.6B",
"max_completion_tokens": 100, "max_completion_tokens": 100,
"messages": [ "messages": [
{"role": "user", "content": "What is the capital of France?"} {"role": "user", "content": "What is the capital of France?"}
] ]
}' \ }' \
-H 'Content-Type: application/json' \
-H 'x-request-id: test-trace-001' \
http://localhost:8000/v1/chat/completions http://localhost:8000/v1/chat/completions
``` ```
### 5. View Traces in Grafana Tempo ### 5. View Traces in Grafana Tempo
1. Open Grafana at `http://localhost:3000` 1. Open Grafana at `http://localhost:3000`
2. Login with username `admin` and password `admin` 2. Login with username `dynamo` and password `dynamo`
3. Navigate to **Explore** (compass icon in the left sidebar) 3. Navigate to **Explore** (compass icon in the left sidebar)
4. Select **Tempo** as the data source (should be selected by default) 4. Select **Tempo** as the data source (should be selected by default)
5. Use the **Search** tab to find traces: 5. In the query type, select **"Search"** (not TraceQL, not Service Graph)
6. Use the **Search** tab to find traces:
- Search by **Service Name** (e.g., `dynamo-frontend`) - Search by **Service Name** (e.g., `dynamo-frontend`)
- Search by **Span Name** (e.g., `http-request`, `handle_payload`) - Search by **Span Name** (e.g., `http-request`, `handle_payload`)
- Search by **Tags** (e.g., `x_request_id=test-trace-001`) - Search by **Tags** (e.g., `x_request_id=test-trace-001`)
6. Click on a trace to view the detailed flame graph 7. Click on a trace to view the detailed flame graph
#### Example Trace View #### Example Trace View
Below is an example of what a trace looks like in Grafana Tempo: Below is an example of what a trace looks like in Grafana Tempo:
![Trace Example](./trace.png) ![Trace Example](trace.png)
### 6. Stop Services ### 6. Stop Services
When done, stop the Tempo and Grafana stack: When done, stop the observability stack. See [Observability Getting Started](README.md#getting-started-quickly) for Docker Compose commands.
```bash
cd deploy/tracing
docker-compose down
```
--- ---
...@@ -192,7 +162,7 @@ spec: ...@@ -192,7 +162,7 @@ spec:
- name: DYN_LOGGING_JSONL - name: DYN_LOGGING_JSONL
value: "true" value: "true"
- name: OTEL_EXPORT_ENABLED - name: OTEL_EXPORT_ENABLED
value: "1" value: "true"
- name: OTEL_EXPORTER_OTLP_TRACES_ENDPOINT - name: OTEL_EXPORTER_OTLP_TRACES_ENDPOINT
value: "http://tempo.observability.svc.cluster.local:4317" value: "http://tempo.observability.svc.cluster.local:4317"
......
...@@ -17,7 +17,7 @@ python3 -m dynamo.frontend --http-port=8000 & ...@@ -17,7 +17,7 @@ python3 -m dynamo.frontend --http-port=8000 &
DYNAMO_PID=$! DYNAMO_PID=$!
# run worker with metrics enabled # run worker with metrics enabled
DYN_SYSTEM_ENABLED=true DYN_SYSTEM_PORT=8081 \ DYN_SYSTEM_PORT=8081 \
python3 -m dynamo.sglang \ python3 -m dynamo.sglang \
--model-path Qwen/Qwen3-0.6B \ --model-path Qwen/Qwen3-0.6B \
--served-model-name Qwen/Qwen3-0.6B \ --served-model-name Qwen/Qwen3-0.6B \
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment