The following shows the JSONL logs from the frontend service for the same request. Note the `trace_id` field (`b672ccf48683b392891c5cb4163d4b51`) that correlates all logs for this request, and the `span_id` field that identifies individual operations:
Dynamo creates distributed traces that span across multiple services in a disaggregated serving setup. The following sections describe the key spans you'll see in Grafana when viewing traces for chat completion requests.
{"time":"2025-10-31T20:52:10.707164Z","level":"DEBUG","file":"/opt/dynamo/lib/runtime/src/pipeline/network/tcp/server.rs","line":230,"target":"dynamo_runtime::pipeline::network::tcp::server","message":"Registering new TcpStream on 10.0.4.65:41959","method":"POST","span_id":"5c20cc08e6afb2b7","span_name":"http-request","trace_id":"b672ccf48683b392891c5cb4163d4b51","uri":"/v1/chat/completions","version":"HTTP/1.1"}
{"time":"2025-10-31T20:52:10.745264Z","level":"DEBUG","file":"/opt/dynamo/lib/llm/src/kv_router/prefill_router.rs","line":232,"target":"dynamo_llm::kv_router::prefill_router","message":"Prefill succeeded, using disaggregated params for decode","method":"POST","span_id":"5c20cc08e6afb2b7","span_name":"http-request","trace_id":"b672ccf48683b392891c5cb4163d4b51","uri":"/v1/chat/completions","version":"HTTP/1.1"}
{"time":"2025-10-31T20:52:10.745545Z","level":"DEBUG","file":"/opt/dynamo/lib/runtime/src/pipeline/network/tcp/server.rs","line":230,"target":"dynamo_runtime::pipeline::network::tcp::server","message":"Registering new TcpStream on 10.0.4.65:41959","method":"POST","span_id":"5c20cc08e6afb2b7","span_name":"http-request","trace_id":"b672ccf48683b392891c5cb4163d4b51","uri":"/v1/chat/completions","version":"HTTP/1.1"}
```
#### Available Spans in Disaggregated Mode
When running Dynamo in disaggregated mode, a typical request creates the following spans:
##### 1. `http-request` (Frontend - Root Span)
The root span for the entire request lifecycle, created in the **dynamo-frontend** service.
**Key Attributes:**
-**Service**: `dynamo-frontend`
-**Operation**: Handles the HTTP request from client to completion
-**Duration**: Total end-to-end request time (includes prefill + decode)
A child span of `http-request`, created in the **dynamo-frontend** service during the routing phase.
**Key Attributes:**
-**Service**: `dynamo-frontend`
-**Operation**: Routes the prefill request to an appropriate prefill worker
-**Duration**: Time spent selecting and the span of prefill.
-**Parent**: `http-request` span
This span captures the routing logic and decision-making process and the request sent to the prefill worker.
##### 3. `handle_payload` (Prefill Worker Span)
A child span of `http-request`, created in the **dynamo-worker-vllm-prefill** service.
**Key Attributes:**
-**Service**: `dynamo-worker-vllm-prefill` (or `dynamo-worker-sglang-prefill` for SGLang)
-**Operation**: Processes the prefill phase of generation
-**Duration**: Time to compute prefill (typically milliseconds to seconds)
-**Component**: `prefill`
-**Endpoint**: `generate`
-**Parent**: `http-request` span
This span represents the actual prefill computation on a prefill-specialized worker, including prompt processing and initial KV cache generation.
##### 4. `handle_payload` (Decode Worker Span)
A child span of `http-request`, created in the **dynamo-worker-vllm-decode** service.
**Key Attributes:**
-**Service**: `dynamo-worker-vllm-decode` (or `dynamo-worker-sglang-decode` for SGLang)
-**Operation**: Processes the decode phase of generation
-**Duration**: Time to generate all output tokens (typically seconds)
-**Component**: `decode` or `backend`
-**Endpoint**: `generate`
-**Parent**: `http-request` span
This span represents the iterative token generation phase on a decode-specialized worker, which consumes the KV cache from prefill and produces output tokens.
#### Understanding Span Metrics
Each span provides several useful metrics:
| Metric | Description |
|--------|-------------|
| **Duration** | Total time from span start to end |
| **Busy Time** | Time actively processing (excluding waiting) |
| **Idle Time** | Time spent waiting (e.g., for network, other services) |
| **Start Time** | When the span began |
| **Child Count** | Number of direct child spans |
The relationship **Duration = Busy Time + Idle Time** helps identify where time is spent and potential bottlenecks.