Dynamo creates distributed traces that span across multiple services in a disaggregated serving setup. The following sections describe the key spans you'll see in Grafana when viewing traces for chat completion requests.
The following shows the JSONL logs from the frontend service for the same request. Note the `trace_id` field (`b672ccf48683b392891c5cb4163d4b51`) that correlates all logs for this request, and the `span_id` field that identifies individual operations:
{"time":"2025-10-31T20:52:10.707164Z","level":"DEBUG","file":"/opt/dynamo/lib/runtime/src/pipeline/network/tcp/server.rs","line":230,"target":"dynamo_runtime::pipeline::network::tcp::server","message":"Registering new TcpStream on 10.0.4.65:41959","method":"POST","span_id":"5c20cc08e6afb2b7","span_name":"http-request","trace_id":"b672ccf48683b392891c5cb4163d4b51","uri":"/v1/chat/completions","version":"HTTP/1.1"}
When running Dynamo in disaggregated mode, a typical request creates the following spans:
{"time":"2025-10-31T20:52:10.745264Z","level":"DEBUG","file":"/opt/dynamo/lib/llm/src/kv_router/prefill_router.rs","line":232,"target":"dynamo_llm::kv_router::prefill_router","message":"Prefill succeeded, using disaggregated params for decode","method":"POST","span_id":"5c20cc08e6afb2b7","span_name":"http-request","trace_id":"b672ccf48683b392891c5cb4163d4b51","uri":"/v1/chat/completions","version":"HTTP/1.1"}
{"time":"2025-10-31T20:52:10.745545Z","level":"DEBUG","file":"/opt/dynamo/lib/runtime/src/pipeline/network/tcp/server.rs","line":230,"target":"dynamo_runtime::pipeline::network::tcp::server","message":"Registering new TcpStream on 10.0.4.65:41959","method":"POST","span_id":"5c20cc08e6afb2b7","span_name":"http-request","trace_id":"b672ccf48683b392891c5cb4163d4b51","uri":"/v1/chat/completions","version":"HTTP/1.1"}
##### 1. `http-request` (Frontend - Root Span)
```
The root span for the entire request lifecycle, created in the **dynamo-frontend** service.
**Key Attributes:**
-**Service**: `dynamo-frontend`
-**Operation**: Handles the HTTP request from client to completion
-**Duration**: Total end-to-end request time (includes prefill + decode)
A child span of `http-request`, created in the **dynamo-frontend** service during the routing phase.
**Key Attributes:**
-**Service**: `dynamo-frontend`
-**Operation**: Routes the prefill request to an appropriate prefill worker
-**Duration**: Time spent selecting and the span of prefill.
-**Parent**: `http-request` span
This span captures the routing logic and decision-making process and the request sent to the prefill worker.
##### 3. `handle_payload` (Prefill Worker Span)
A child span of `http-request`, created in the **dynamo-worker-vllm-prefill** service.
**Key Attributes:**
-**Service**: `dynamo-worker-vllm-prefill` (or `dynamo-worker-sglang-prefill` for SGLang)
-**Operation**: Processes the prefill phase of generation
-**Duration**: Time to compute prefill (typically milliseconds to seconds)
-**Component**: `prefill`
-**Endpoint**: `generate`
-**Parent**: `http-request` span
This span represents the actual prefill computation on a prefill-specialized worker, including prompt processing and initial KV cache generation.
##### 4. `handle_payload` (Decode Worker Span)
A child span of `http-request`, created in the **dynamo-worker-vllm-decode** service.
**Key Attributes:**
-**Service**: `dynamo-worker-vllm-decode` (or `dynamo-worker-sglang-decode` for SGLang)
-**Operation**: Processes the decode phase of generation
-**Duration**: Time to generate all output tokens (typically seconds)
-**Component**: `decode` or `backend`
-**Endpoint**: `generate`
-**Parent**: `http-request` span
This span represents the iterative token generation phase on a decode-specialized worker, which consumes the KV cache from prefill and produces output tokens.
#### Understanding Span Metrics
Each span provides several useful metrics:
| Metric | Description |
|--------|-------------|
| **Duration** | Total time from span start to end |
| **Busy Time** | Time actively processing (excluding waiting) |
| **Idle Time** | Time spent waiting (e.g., for network, other services) |
| **Start Time** | When the span began |
| **Child Count** | Number of direct child spans |
The relationship **Duration = Busy Time + Idle Time** helps identify where time is spent and potential bottlenecks.