"tests/kernels/attention/test_aiter_flash_attn.py" did not exist on "e6b8e65d2d68fc96871bc2f07999cb495e054ced"
tracing.md 7.44 KB
Newer Older
1
<!--
2
SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3
4
5
6
7
8
9
SPDX-License-Identifier: Apache-2.0
-->

# Distributed Tracing with Tempo

## Overview

10
11
12
Dynamo supports OpenTelemetry-based distributed tracing for visualizing request flows across Frontend and Worker components. Traces are exported to Tempo via OTLP (OpenTelemetry Protocol) and visualized in Grafana.

**Requirements:** Set `DYN_LOGGING_JSONL=true` and `OTEL_EXPORT_ENABLED=true` to export traces to Tempo.
13

14
This guide covers single GPU demo setup using Docker Compose. For Kubernetes deployments, see [Kubernetes Deployment](#kubernetes-deployment).
15

16
**Note:** This section has overlap with [Logging of OpenTelemetry Tracing](logging.md) since OpenTelemetry has aspects of both logging and tracing. The tracing approach documented here is for persistent trace visualization and analysis. For short debugging sessions examining trace context directly in logs, see the [Logging](logging.md) guide.
17
18
19

## Environment Variables

20
21
22
23
24
25
| Variable | Description | Default | Example |
|----------|-------------|---------|---------|
| `DYN_LOGGING_JSONL` | Enable JSONL logging format (required for tracing) | `false` | `true` |
| `OTEL_EXPORT_ENABLED` | Enable OTLP trace export | `false` | `true` |
| `OTEL_EXPORTER_OTLP_TRACES_ENDPOINT` | OTLP gRPC endpoint for Tempo | `http://localhost:4317` | `http://tempo:4317` |
| `OTEL_SERVICE_NAME` | Service name for identifying components | `dynamo` | `dynamo-frontend` |
26

27
## Getting Started Quickly
28

29
### 1. Start Observability Stack
30

31
Start the observability stack (Prometheus, Grafana, Tempo, exporters). See [Observability Getting Started](README.md#getting-started-quickly) for instructions.
32

33
34
35
### 2. Set Environment Variables

Configure Dynamo components to export traces:
36
37
38
39

```bash
# Enable JSONL logging and tracing
export DYN_LOGGING_JSONL=true
40
41
export OTEL_EXPORT_ENABLED=true
export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://localhost:4317
42
43
```

44
### 3. Start Dynamo Components (Single GPU)
45

46
For a simple single-GPU deployment, start the frontend and a single vLLM worker:
47
48

```bash
49
# Start the frontend with tracing enabled (default port 8000, override with --http-port or DYN_HTTP_PORT env var)
50
export OTEL_SERVICE_NAME=dynamo-frontend
51
python -m dynamo.frontend --router-mode kv &
52

53
54
# Start a single vLLM worker (aggregated prefill and decode)
export OTEL_SERVICE_NAME=dynamo-worker-vllm
55
56
python -m dynamo.vllm --model Qwen/Qwen3-0.6B --enforce-eager \
--otlp-traces-endpoint="$OTEL_EXPORTER_OTLP_TRACES_ENDPOINT" &
57

58
wait
59
60
```

61
This runs both prefill and decode on the same GPU, providing a simpler setup for testing tracing.
62

63
### Alternative: Disaggregated Deployment (2 GPUs)
64
65
66
67
68

Run the vLLM disaggregated script with tracing enabled:

```bash
# Navigate to vLLM launch directory
69
cd examples/backends/vllm/launch
70

71
# Export tracing env vars, then run the disaggregated deployment script.
72
73
74
./disagg.sh
```

75
76
**Note:** the example vLLM `disagg.sh` sets additional per-worker port environment variables (e.g., `DYN_VLLM_KV_EVENT_PORT`,
`VLLM_NIXL_SIDE_CHANNEL_PORT`) to avoid ZMQ "Address already in use" conflicts when multiple workers run on the same host. If you run the components manually, make sure you mirror those port settings.
77
78
79
80
81
82
83
84

```bash
#!/bin/bash
set -e
trap 'echo Cleaning up...; kill 0' EXIT

# Enable tracing
export DYN_LOGGING_JSONL=true
85
export OTEL_EXPORT_ENABLED=true
86
export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://localhost:4317
87

88
# Run frontend (default port 8000, override with --http-port or DYN_HTTP_PORT env var)
89
export OTEL_SERVICE_NAME=dynamo-frontend
90
python -m dynamo.frontend --router-mode kv &
91

92
# Run decode worker, make sure to wait for start up
93
export OTEL_SERVICE_NAME=dynamo-worker-decode
94
DYN_SYSTEM_PORT=8081 CUDA_VISIBLE_DEVICES=0 python3 -m dynamo.vllm \
95
96
97
    --model Qwen/Qwen3-0.6B \
    --enforce-eager \
    --otlp-traces-endpoint="$OTEL_EXPORTER_OTLP_TRACES_ENDPOINT" &
98

99
# Run prefill worker, make sure to wait for start up
100
export OTEL_SERVICE_NAME=dynamo-worker-prefill
101
102
103
DYN_SYSTEM_PORT=8082 \
DYN_VLLM_KV_EVENT_PORT=20081 \
VLLM_NIXL_SIDE_CHANNEL_PORT=20097 \
104
105
106
CUDA_VISIBLE_DEVICES=1 python3 -m dynamo.vllm \
    --model Qwen/Qwen3-0.6B \
    --enforce-eager \
107
    --otlp-traces-endpoint="$OTEL_EXPORTER_OTLP_TRACES_ENDPOINT" \
108
109
110
    --is-prefill-worker &
```

111
112
For disaggregated deployments, this separates prefill and decode onto different GPUs for better resource utilization.

113
114
### 4. Generate Traces

115
Send requests to the frontend to generate traces (works for both aggregated and disaggregated deployments). **Note the `x-request-id` header**, which allows you to easily search for and correlate this specific trace in Grafana:
116
117

```bash
118
119
120
curl -H 'Content-Type: application/json' \
-H 'x-request-id: test-trace-001' \
-d '{
121
122
123
124
125
126
127
128
129
130
131
132
  "model": "Qwen/Qwen3-0.6B",
  "max_completion_tokens": 100,
  "messages": [
    {"role": "user", "content": "What is the capital of France?"}
  ]
}' \
http://localhost:8000/v1/chat/completions
```

### 5. View Traces in Grafana Tempo

1. Open Grafana at `http://localhost:3000`
133
2. Login with username `dynamo` and password `dynamo`
134
135
3. Navigate to **Explore** (compass icon in the left sidebar)
4. Select **Tempo** as the data source (should be selected by default)
136
137
5. In the query type, select **"Search"** (not TraceQL, not Service Graph)
6. Use the **Search** tab to find traces:
138
139
140
   - Search by **Service Name** (e.g., `dynamo-frontend`)
   - Search by **Span Name** (e.g., `http-request`, `handle_payload`)
   - Search by **Tags** (e.g., `x_request_id=test-trace-001`)
141
7. Click on a trace to view the detailed flame graph
142
143
144
145
146

#### Example Trace View

Below is an example of what a trace looks like in Grafana Tempo:

147
![Trace Example](trace.png)
148
149
150

### 6. Stop Services

151
When done, stop the observability stack. See [Observability Getting Started](README.md#getting-started-quickly) for Docker Compose commands.
152
153
154
155
156
157
158
159
160

---

## Kubernetes Deployment

For Kubernetes deployments, ensure you have a Tempo instance deployed and accessible (e.g., `http://tempo.observability.svc.cluster.local:4317`).

### Modify DynamoGraphDeployment for Tracing

161
Add common tracing environment variables at the top level and service-specific names in each component in your `DynamoGraphDeployment` (e.g., `examples/backends/vllm/deploy/disagg.yaml`):
162
163
164
165
166
167
168
169
170
171
172
173

```yaml
apiVersion: nvidia.com/v1alpha1
kind: DynamoGraphDeployment
metadata:
  name: vllm-disagg
spec:
  # Common environment variables for all services
  env:
    - name: DYN_LOGGING_JSONL
      value: "true"
    - name: OTEL_EXPORT_ENABLED
174
      value: "true"
175
    - name: OTEL_EXPORTER_OTLP_TRACES_ENDPOINT
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
      value: "http://tempo.observability.svc.cluster.local:4317"

  services:
    Frontend:
      # ... existing configuration ...
      extraPodSpec:
        mainContainer:
          # ... existing configuration ...
          env:
            - name: OTEL_SERVICE_NAME
              value: "dynamo-frontend"

    VllmDecodeWorker:
      # ... existing configuration ...
      extraPodSpec:
        mainContainer:
          # ... existing configuration ...
          env:
            - name: OTEL_SERVICE_NAME
              value: "dynamo-worker-decode"

    VllmPrefillWorker:
      # ... existing configuration ...
      extraPodSpec:
        mainContainer:
          # ... existing configuration ...
          env:
            - name: OTEL_SERVICE_NAME
              value: "dynamo-worker-prefill"
```

Apply the updated DynamoGraphDeployment:

```bash
210
kubectl apply -f examples/backends/vllm/deploy/disagg.yaml
211
212
213
214
```

Traces will now be exported to Tempo and can be viewed in Grafana.