README.md

# NAT Trace Converter

Convert NeMo Agent Toolkit (NAT) profiler traces to mooncake format for aiperf benchmarking.

## Overview

This converter transforms `all_requests_profiler_traces.json` from NAT profiler into mooncake-style JSONL that can be used with aiperf for:
- Multi-turn conversation replay with `session_id` serialization
- Prefix cache benchmarking with `hash_ids`

## Obtaining Trace Data

Traces are generated by running evaluations with the [NeMo Agent Toolkit](https://github.com/NVIDIA/NeMo-Agent-Toolkit) profiler. See the [NAT documentation](https://docs.nvidia.com/nemo/agent-toolkit/latest/) for details on running agent workflows with profiling enabled.

An example trace file will be open-sourced soon to make benchmarking more accessible.

## Input Format

NAT profiler trace JSON (`all_requests_profiler_traces.json`):
```json
[
  {
    "request_number": 0,
    "intermediate_steps": [
      {
        "payload": {
          "event_type": "LLM_START",
          "metadata": {"chat_inputs": [{"content": "..."}]},
          "name": "llama-3.3-70b",
          "UUID": "..."
        }
      },
      {
        "payload": {
          "event_type": "LLM_END",
          "usage_info": {"token_usage": {"prompt_tokens": 9176, "completion_tokens": 142}},
          "UUID": "..."
        }
      }
    ]
  }
]
```

## Output Format

Mooncake JSONL with session-based serialization:
```json
{"session_id": "conv_0", "input_length": 9176, "output_length": 142, "hash_ids": [1, 2, 3]}
{"session_id": "conv_0", "input_length": 9500, "output_length": 98, "hash_ids": [1, 2, 3, 4]}
{"session_id": "conv_1", "input_length": 8234, "output_length": 156, "hash_ids": [5, 6]}
```

## Usage

Basic conversion:
```bash
python convert.py --input-file /path/to/all_requests_profiler_traces.json
```

With custom tokenizer:
```bash
python convert.py \
    --input-file /path/to/all_requests_profiler_traces.json \
    --tokenizer meta-llama/Llama-3.3-70B-Instruct \
    --block-size 128
```

Limit number of requests:
```bash
python convert.py \
    --input-file /path/to/all_requests_profiler_traces.json \
    --num-requests 100 \
    --skip-requests 10
```

## Arguments

| Argument | Description | Default |
|----------|-------------|---------|
| `--input-file` | Path to NAT profiler JSON | Required |
| `--output-file` | Output JSONL path | `<input>_mooncake.jsonl` |
| `--tokenizer` | HuggingFace tokenizer name | Auto-inferred from trace |
| `--block-size` | Block size for hash generation | 128 |
| `--num-requests` | Max requests to process | All |
| `--skip-requests` | Skip first N requests | 0 |

---

# Telemetry Trace Converter

Convert OpenAI-style telemetry JSONL (e.g., from agentic research pipelines) to mooncake format for aiperf benchmarking.

## Overview

This converter transforms `telemetry.jsonl` containing `llm_call` and `tool_call` events into mooncake-style JSONL. It identifies 6 agent types from the telemetry and tags each entry accordingly.

## Input Format

Telemetry JSONL with one event per line:
```json
{"event_type": "llm_call", "timestamp": "...", "session_id": "...", "latency_priority": "HIGH", "latency_ms": 738.22, "request_payload": {"messages": [...], "model": "gpt-5.2"}, "response_payload": {"usage": {"prompt_tokens": 256, "completion_tokens": 4}}}
{"event_type": "tool_call", "tool_name": "tavily_web_search", "session_id": "...", "start_time": "...", "end_time": "...", "duration_ms": 181.08}
```

Only `llm_call` events are processed; `tool_call` events are dropped.

## Output Format

Mooncake JSONL with agent type and priority:
```json
{"session_id": "082e33c7-...", "agent_type": "deep_coordinator", "input_length": 2426, "output_length": 33, "hash_ids": [1, 2, 3], "priority": "HIGH"}
{"session_id": "082e33c7-...", "agent_type": "research_worker", "input_length": 4800, "output_length": 154, "hash_ids": [1, 2, 3, 4], "priority": "LOW"}
```

## Agent Types

Agents are identified by system prompt prefix matching:

| Agent Type | System Prompt Prefix |
|---|---|
| `deep_coordinator` | `You are a Deep Research agent` |
| `research_worker` | `Gather and synthesize comprehe` |
| `research_planner` | `For the given task, generate a` |
| `shallow_agent` | `Current date and time:` |
| `classifier` | (no system msg) — "Classify" in user msg |
| `complexity_analyzer` | (no system msg) — "complexity analyzer" in user msg |

## Usage

Basic conversion:
```bash
python convert_telemetry.py --input-file /path/to/telemetry.jsonl
```

With custom tokenizer:
```bash
python convert_telemetry.py \
    --input-file /path/to/telemetry.jsonl \
    --tokenizer deepseek-ai/DeepSeek-R1-Distill-Llama-8B \
    --block-size 128
```

## Arguments

| Argument | Description | Default |
|---|---|---|
| `--input-file` | Path to telemetry JSONL | Required |
| `--output-file` | Output JSONL path | `<input>_mooncake.jsonl` |
| `--tokenizer` | HuggingFace tokenizer name | `deepseek-ai/DeepSeek-R1-Distill-Llama-8B` |
| `--block-size` | Block size for hash generation | 64 |

---

## Running with aiperf

After conversion, use with aiperf:
```bash
aiperf profile \
    --model <model-name> \
    --tokenizer <tokenizer> \
    --endpoint-type chat \
    --streaming \
    --url http://localhost:8000 \
    --input-file output_mooncake.jsonl \
    --custom-dataset-type mooncake_trace \
    --concurrency 10
```

The `session_id` field ensures:
- Turns within a conversation are serialized (executed in order)
- Different conversations run in parallel up to `--concurrency`