# NAT Trace Converter
Convert NeMo Agent Toolkit (NAT) profiler traces to mooncake format for aiperf benchmarking.
## Overview
This converter transforms `all_requests_profiler_traces.json` from NAT profiler into mooncake-style JSONL that can be used with aiperf for:
- Multi-turn conversation replay with `session_id` serialization
- Prefix cache benchmarking with `hash_ids`
## Obtaining Trace Data
Traces are generated by running evaluations with the [NeMo Agent Toolkit](https://github.com/NVIDIA/NeMo-Agent-Toolkit) profiler. See the [NAT documentation](https://docs.nvidia.com/nemo/agent-toolkit/latest/) for details on running agent workflows with profiling enabled.
An example trace file will be open-sourced soon to make benchmarking more accessible.
## Input Format
NAT profiler trace JSON (`all_requests_profiler_traces.json`):
```json
[
{
"request_number": 0,
"intermediate_steps": [
{
"payload": {
"event_type": "LLM_START",
"metadata": {"chat_inputs": [{"content": "..."}]},
"name": "llama-3.3-70b",
"UUID": "..."
}
},
{
"payload": {
"event_type": "LLM_END",
"usage_info": {"token_usage": {"prompt_tokens": 9176, "completion_tokens": 142}},
"UUID": "..."
}
}
]
}
]
```
## Output Format
Mooncake JSONL with session-based serialization:
```json
{"session_id": "conv_0", "input_length": 9176, "output_length": 142, "hash_ids": [1, 2, 3]}
{"session_id": "conv_0", "input_length": 9500, "output_length": 98, "hash_ids": [1, 2, 3, 4]}
{"session_id": "conv_1", "input_length": 8234, "output_length": 156, "hash_ids": [5, 6]}
```
## Usage
Basic conversion:
```bash
python convert.py --input-file /path/to/all_requests_profiler_traces.json
```
With custom tokenizer:
```bash
python convert.py \
--input-file /path/to/all_requests_profiler_traces.json \
--tokenizer meta-llama/Llama-3.3-70B-Instruct \
--block-size 128
```
Limit number of requests:
```bash
python convert.py \
--input-file /path/to/all_requests_profiler_traces.json \
--num-requests 100 \
--skip-requests 10
```
## Arguments
| Argument | Description | Default |
|----------|-------------|---------|
| `--input-file` | Path to NAT profiler JSON | Required |
| `--output-file` | Output JSONL path | `_mooncake.jsonl` |
| `--tokenizer` | HuggingFace tokenizer name | Auto-inferred from trace |
| `--block-size` | Block size for hash generation | 128 |
| `--num-requests` | Max requests to process | All |
| `--skip-requests` | Skip first N requests | 0 |
## Running with aiperf
After conversion, use with aiperf:
```bash
aiperf profile \
--model \
--tokenizer \
--endpoint-type chat \
--streaming \
--url http://localhost:8000 \
--input-file output_mooncake.jsonl \
--custom-dataset-type mooncake_trace \
--concurrency 10
```
The `session_id` field ensures:
- Turns within a conversation are serialized (executed in order)
- Different conversations run in parallel up to `--concurrency`