tracing.md 4.91 KB
Newer Older
1
2
3
---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
4
title: Tracing
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
---

## Overview

Dynamo supports OpenTelemetry-based distributed tracing for visualizing request flows across Frontend and Worker components. Traces are exported to Tempo via OTLP (OpenTelemetry Protocol) and visualized in Grafana.

**Requirements:** Set `DYN_LOGGING_JSONL=true` and `OTEL_EXPORT_ENABLED=true` to export traces to Tempo.

This guide covers single GPU demo setup using Docker Compose. For Kubernetes deployments, see [Kubernetes Deployment](#kubernetes-deployment).

**Note:** This section has overlap with [Logging of OpenTelemetry Tracing](logging.md) since OpenTelemetry has aspects of both logging and tracing. The tracing approach documented here is for persistent trace visualization and analysis. For short debugging sessions examining trace context directly in logs, see the [Logging](logging.md) guide.

## Environment Variables

| Variable | Description | Default | Example |
|----------|-------------|---------|---------|
| `DYN_LOGGING_JSONL` | Enable JSONL logging format (required for tracing) | `false` | `true` |
| `OTEL_EXPORT_ENABLED` | Enable OTLP trace export | `false` | `true` |
| `OTEL_EXPORTER_OTLP_TRACES_ENDPOINT` | OTLP gRPC endpoint for Tempo | `http://localhost:4317` | `http://tempo:4317` |
| `OTEL_SERVICE_NAME` | Service name for identifying components | `dynamo` | `dynamo-frontend` |

## Getting Started Quickly

### 1. Start Observability Stack

Start the observability stack (Prometheus, Grafana, Tempo, exporters). See [Observability Getting Started](README.md#getting-started-quickly) for instructions.

32
### 2. Start Dynamo Components (Single GPU)
33

34
For a simple single-GPU deployment, run the aggregated tracing launch script. This script enables tracing, sets per-component service names, and starts a frontend with a single vLLM worker:
35
36

```bash
37
38
cd examples/backends/vllm/launch
./agg_tracing.sh
39
40
```

41
To override the Tempo endpoint (default `http://localhost:4317`):
42
43

```bash
44
45
export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://tempo:4317
./agg_tracing.sh
46
47
```

48
This runs a single aggregated worker on one GPU, providing a simpler setup for testing tracing.
49
50
51

### Alternative: Disaggregated Deployment (2 GPUs)

52
For a disaggregated deployment with tracing, run the disaggregated tracing launch script. This script sets up tracing and launches a frontend, a decode worker on GPU 0, and a prefill worker on GPU 1:
53
54
55

```bash
cd examples/backends/vllm/launch
56
./disagg_tracing.sh
57
58
```

59
This separates prefill and decode onto different GPUs for better resource utilization.
60

61
### 3. Generate Traces
62

63
Send requests to the frontend to generate traces (works for both aggregated and disaggregated deployments). The launch scripts print an example `curl` command on startup with the correct model name.
64

65
**Tip:** Add an `x-request-id` header to easily search for a specific trace in Grafana:
66
67
68
69
70

```bash
curl -H 'Content-Type: application/json' \
-H 'x-request-id: test-trace-001' \
-d '{
71
  "model": "<MODEL>",
72
73
74
75
76
77
78
79
  "max_completion_tokens": 100,
  "messages": [
    {"role": "user", "content": "What is the capital of France?"}
  ]
}' \
http://localhost:8000/v1/chat/completions
```

80
### 4. View Traces in Grafana Tempo
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96

1. Open Grafana at `http://localhost:3000`
2. Login with username `dynamo` and password `dynamo`
3. Navigate to **Explore** (compass icon in the left sidebar)
4. Select **Tempo** as the data source (should be selected by default)
5. In the query type, select **"Search"** (not TraceQL, not Service Graph)
6. Use the **Search** tab to find traces:
   - Search by **Service Name** (e.g., `dynamo-frontend`)
   - Search by **Span Name** (e.g., `http-request`, `handle_payload`)
   - Search by **Tags** (e.g., `x_request_id=test-trace-001`)
7. Click on a trace to view the detailed flame graph

#### Example Trace View

Below is an example of what a trace looks like in Grafana Tempo:

97
![Trace Example](../assets/img/trace.png)
98

99
### 5. Stop Services
100
101
102
103
104
105
106
107
108
109
110

When done, stop the observability stack. See [Observability Getting Started](README.md#getting-started-quickly) for Docker Compose commands.

---

## Kubernetes Deployment

For Kubernetes deployments, ensure you have a Tempo instance deployed and accessible (e.g., `http://tempo.observability.svc.cluster.local:4317`).

### Modify DynamoGraphDeployment for Tracing

111
112
113
114
115
116
Tracing-enabled variants of the example deployments are provided:

- **Aggregated:** `examples/backends/vllm/deploy/agg_tracing.yaml`
- **Disaggregated:** `examples/backends/vllm/deploy/disagg_tracing.yaml`

These add the [Environment Variables](#environment-variables) to the base `agg.yaml` / `disagg.yaml` deployments. To override the Tempo endpoint, edit `OTEL_EXPORTER_OTLP_TRACES_ENDPOINT` in the YAML.
117

118
Apply a tracing-enabled deployment:
119
120

```bash
121
kubectl apply -f examples/backends/vllm/deploy/disagg_tracing.yaml
122
123
124
125
```

Traces will now be exported to Tempo and can be viewed in Grafana.