# NAT Trace Converter Convert NeMo Agent Toolkit (NAT) profiler traces to mooncake format for aiperf benchmarking. ## Overview This converter transforms `all_requests_profiler_traces.json` from NAT profiler into mooncake-style JSONL that can be used with aiperf for: - Multi-turn conversation replay with `session_id` serialization - Prefix cache benchmarking with `hash_ids` ## Obtaining Trace Data Traces are generated by running evaluations with the [NeMo Agent Toolkit](https://github.com/NVIDIA/NeMo-Agent-Toolkit) profiler. See the [NAT documentation](https://docs.nvidia.com/nemo/agent-toolkit/latest/) for details on running agent workflows with profiling enabled. An example trace file will be open-sourced soon to make benchmarking more accessible. ## Input Format NAT profiler trace JSON (`all_requests_profiler_traces.json`): ```json [ { "request_number": 0, "intermediate_steps": [ { "payload": { "event_type": "LLM_START", "metadata": {"chat_inputs": [{"content": "..."}]}, "name": "llama-3.3-70b", "UUID": "..." } }, { "payload": { "event_type": "LLM_END", "usage_info": {"token_usage": {"prompt_tokens": 9176, "completion_tokens": 142}}, "UUID": "..." } } ] } ] ``` ## Output Format Mooncake JSONL with session-based serialization: ```json {"session_id": "conv_0", "input_length": 9176, "output_length": 142, "hash_ids": [1, 2, 3]} {"session_id": "conv_0", "input_length": 9500, "output_length": 98, "hash_ids": [1, 2, 3, 4]} {"session_id": "conv_1", "input_length": 8234, "output_length": 156, "hash_ids": [5, 6]} ``` ## Usage Basic conversion: ```bash python convert.py --input-file /path/to/all_requests_profiler_traces.json ``` With custom tokenizer: ```bash python convert.py \ --input-file /path/to/all_requests_profiler_traces.json \ --tokenizer meta-llama/Llama-3.3-70B-Instruct \ --block-size 128 ``` Limit number of requests: ```bash python convert.py \ --input-file /path/to/all_requests_profiler_traces.json \ --num-requests 100 \ --skip-requests 10 ``` ## Arguments | Argument | Description | Default | |----------|-------------|---------| | `--input-file` | Path to NAT profiler JSON | Required | | `--output-file` | Output JSONL path | `_mooncake.jsonl` | | `--tokenizer` | HuggingFace tokenizer name | Auto-inferred from trace | | `--block-size` | Block size for hash generation | 128 | | `--num-requests` | Max requests to process | All | | `--skip-requests` | Skip first N requests | 0 | ## Running with aiperf After conversion, use with aiperf: ```bash aiperf profile \ --model \ --tokenizer \ --endpoint-type chat \ --streaming \ --url http://localhost:8000 \ --input-file output_mooncake.jsonl \ --custom-dataset-type mooncake_trace \ --concurrency 10 ``` The `session_id` field ensures: - Turns within a conversation are serialized (executed in order) - Different conversations run in parallel up to `--concurrency`