Unverified Commit aaa8a567 authored by Yongming Ding's avatar Yongming Ding Committed by GitHub
Browse files

docs: remove unsupported --trace-file from mocker (#7585)


Signed-off-by: default avatarYongming Ding <yongmingd@nvidia.com>
parent 52382c91
...@@ -5,13 +5,10 @@ title: Mocker Trace Replay ...@@ -5,13 +5,10 @@ title: Mocker Trace Replay
subtitle: Replay Mooncake-style traces through the mocker in offline or online mode subtitle: Replay Mooncake-style traces through the mocker in offline or online mode
--- ---
This guide covers the mocker's trace replay support for Mooncake-style JSONL traces. The replay This guide covers trace replay support for Mooncake-style JSONL traces via `python -m dynamo.replay`,
surface is available in two forms: which prints an AIPerf-style summary table, writes the full replay report JSON to disk, and exposes
`offline|online`, `round_robin|kv_router`, `arrival_speedup_ratio`, and synthetic replay inputs
- `python -m dynamo.mocker --trace-file ...`, which writes a report file and prints a replay summary directly.
- `python -m dynamo.replay ...`, which prints an AIPerf-style summary table, writes the full
replay report JSON to disk, and exposes `offline|online`, `round_robin|kv_router`,
`arrival_speedup_ratio`, and synthetic replay inputs directly
Unlike normal `dynamo.mocker` usage, offline replay does not launch workers, register endpoints, or Unlike normal `dynamo.mocker` usage, offline replay does not launch workers, register endpoints, or
require NATS, etcd, or a frontend. Online replay does exercise the live mock-worker runtime path. require NATS, etcd, or a frontend. Online replay does exercise the live mock-worker runtime path.
...@@ -31,7 +28,7 @@ python -m dynamo.replay /path/to/mooncake_trace.jsonl \ ...@@ -31,7 +28,7 @@ python -m dynamo.replay /path/to/mooncake_trace.jsonl \
--num-workers 4 \ --num-workers 4 \
--replay-mode offline \ --replay-mode offline \
--router-mode round_robin \ --router-mode round_robin \
--extra-engine-args '{"block_size":512,"speedup_ratio":1000.0}' \ --extra-engine-args '{"block_size":512}' \
--report-json /tmp/replay-report.json --report-json /tmp/replay-report.json
``` ```
...@@ -46,27 +43,12 @@ python -m dynamo.replay \ ...@@ -46,27 +43,12 @@ python -m dynamo.replay \
--num-workers 1 \ --num-workers 1 \
--replay-mode offline \ --replay-mode offline \
--replay-concurrency 100 \ --replay-concurrency 100 \
--extra-engine-args '{"block_size":512,"speedup_ratio":1000.0}' \ --extra-engine-args '{"block_size":512}' \
--report-json /tmp/replay-report.json --report-json /tmp/replay-report.json
``` ```
You can also run replay through the mocker CLI by passing `--trace-file`:
```bash
python -m dynamo.mocker \
--trace-file /path/to/mooncake_trace.jsonl \
--model-path Qwen/Qwen3-0.6B
```
This writes a JSON report next to the trace file by default:
```text
/path/to/mooncake_trace.replay.json
```
`python -m dynamo.replay` prints an AIPerf-style summary table to stdout and writes the full replay `python -m dynamo.replay` prints an AIPerf-style summary table to stdout and writes the full replay
report JSON to disk. The mocker CLI prints a `Replay Summary` table to stdout and writes the report report JSON to disk.
JSON to disk.
## Input Format ## Input Format
...@@ -115,7 +97,7 @@ python -m dynamo.replay /path/to/mooncake_trace.jsonl \ ...@@ -115,7 +97,7 @@ python -m dynamo.replay /path/to/mooncake_trace.jsonl \
--router-mode kv_router \ --router-mode kv_router \
--num-workers 4 \ --num-workers 4 \
--arrival-speedup-ratio 10 \ --arrival-speedup-ratio 10 \
--extra-engine-args '{"block_size":512,"speedup_ratio":1000.0}' \ --extra-engine-args '{"block_size":512}' \
--router-config '{"router_queue_policy":"fcfs","router_temperature":0.0}' \ --router-config '{"router_queue_policy":"fcfs","router_temperature":0.0}' \
--report-json /tmp/replay-report.json --report-json /tmp/replay-report.json
``` ```
...@@ -127,7 +109,6 @@ SGLang replay uses the same CLI surface. A minimal extra-engine-args file can us ...@@ -127,7 +109,6 @@ SGLang replay uses the same CLI surface. A minimal extra-engine-args file can us
{ {
"engine_type": "sglang", "engine_type": "sglang",
"num_gpu_blocks": 512, "num_gpu_blocks": 512,
"speedup_ratio": 1000.0,
"sglang": { "sglang": {
"page_size": 2 "page_size": 2
} }
...@@ -138,11 +119,6 @@ Both `--extra-engine-args` and `--router-config` accept partial JSON objects. Un ...@@ -138,11 +119,6 @@ Both `--extra-engine-args` and `--router-config` accept partial JSON objects. Un
fall back to the same defaults used by `MockEngineArgs::default()` and fall back to the same defaults used by `MockEngineArgs::default()` and
`KvRouterConfig::default()`. `KvRouterConfig::default()`.
### `python -m dynamo.mocker --trace-file`
The mocker CLI supports offline replay and remains useful when you want the historical
`Replay Summary` output and report-file workflow.
### Synthetic Replay ### Synthetic Replay
Synthetic replay bypasses trace loading and generates in-memory requests with fixed input/output Synthetic replay bypasses trace loading and generates in-memory requests with fixed input/output
...@@ -156,7 +132,7 @@ python -m dynamo.replay \ ...@@ -156,7 +132,7 @@ python -m dynamo.replay \
--arrival-interval-ms 0.5 \ --arrival-interval-ms 0.5 \
--replay-mode offline \ --replay-mode offline \
--replay-concurrency 50 \ --replay-concurrency 50 \
--extra-engine-args '{"block_size":512,"speedup_ratio":1000.0}' --extra-engine-args '{"block_size":512}'
``` ```
This is useful for parameter sweeps where Mooncake-style prefix structure is not required. This is useful for parameter sweeps where Mooncake-style prefix structure is not required.
...@@ -172,7 +148,7 @@ those timestamps: ...@@ -172,7 +148,7 @@ those timestamps:
python -m dynamo.replay /path/to/mooncake_trace.jsonl \ python -m dynamo.replay /path/to/mooncake_trace.jsonl \
--replay-mode offline \ --replay-mode offline \
--num-workers 4 \ --num-workers 4 \
--extra-engine-args '{"block_size":512,"speedup_ratio":1000.0}' --extra-engine-args '{"block_size":512}'
``` ```
This is the right mode when you want deterministic replay of the original arrival pattern. This is the right mode when you want deterministic replay of the original arrival pattern.
...@@ -203,7 +179,7 @@ python -m dynamo.replay /path/to/mooncake_trace.jsonl \ ...@@ -203,7 +179,7 @@ python -m dynamo.replay /path/to/mooncake_trace.jsonl \
--router-mode kv_router \ --router-mode kv_router \
--num-workers 4 \ --num-workers 4 \
--arrival-speedup-ratio 10 \ --arrival-speedup-ratio 10 \
--extra-engine-args '{"block_size":512,"speedup_ratio":1000.0}' --extra-engine-args '{"block_size":512}'
``` ```
### Arrival Speedup ### Arrival Speedup
...@@ -216,7 +192,7 @@ python -m dynamo.replay /path/to/mooncake_trace.jsonl \ ...@@ -216,7 +192,7 @@ python -m dynamo.replay /path/to/mooncake_trace.jsonl \
--replay-mode offline \ --replay-mode offline \
--num-workers 4 \ --num-workers 4 \
--arrival-speedup-ratio 5 \ --arrival-speedup-ratio 5 \
--extra-engine-args '{"block_size":512,"speedup_ratio":1000.0}' --extra-engine-args '{"block_size":512}'
``` ```
### Router Modes ### Router Modes
...@@ -243,14 +219,14 @@ python -m dynamo.replay /path/to/mooncake_trace.jsonl \ ...@@ -243,14 +219,14 @@ python -m dynamo.replay /path/to/mooncake_trace.jsonl \
--replay-mode offline \ --replay-mode offline \
--router-mode kv_router \ --router-mode kv_router \
--num-workers 4 \ --num-workers 4 \
--extra-engine-args '{"block_size":512,"speedup_ratio":1000.0}' \ --extra-engine-args '{"block_size":512}' \
--router-config '{"router_queue_policy":"fcfs"}' --router-config '{"router_queue_policy":"fcfs"}'
python -m dynamo.replay /path/to/mooncake_trace.jsonl \ python -m dynamo.replay /path/to/mooncake_trace.jsonl \
--replay-mode offline \ --replay-mode offline \
--router-mode kv_router \ --router-mode kv_router \
--num-workers 4 \ --num-workers 4 \
--extra-engine-args '{"block_size":512,"speedup_ratio":1000.0}' \ --extra-engine-args '{"block_size":512}' \
--router-config '{"router_queue_policy":"lcfs"}' --router-config '{"router_queue_policy":"lcfs"}'
``` ```
...@@ -259,17 +235,6 @@ an expected production default. ...@@ -259,17 +235,6 @@ an expected production default.
## Output ## Output
Use `--output-file` to override the default report location:
```bash
python -m dynamo.mocker \
--trace-file /path/to/mooncake_trace.jsonl \
--model-path Qwen/Qwen3-0.6B \
--output-file /tmp/replay-report.json
```
If `--output-file` is not set, the report path defaults to `TRACE_STEM.replay.json` in the same directory as the input trace.
The report contains: The report contains:
- request counts - request counts
......
...@@ -73,9 +73,6 @@ python -m dynamo.mocker \ ...@@ -73,9 +73,6 @@ python -m dynamo.mocker \
| `--model-path` | Required | HuggingFace model ID or local path for tokenizer | | `--model-path` | Required | HuggingFace model ID or local path for tokenizer |
| `--endpoint` | Auto-derived | Dynamo endpoint string. Defaults are namespace-dependent, and prefill workers use a different default endpoint than aggregated/decode workers | | `--endpoint` | Auto-derived | Dynamo endpoint string. Defaults are namespace-dependent, and prefill workers use a different default endpoint than aggregated/decode workers |
| `--model-name` | Derived from model-path | Model name for API responses | | `--model-name` | Derived from model-path | Model name for API responses |
| `--trace-file` | None | Run offline trace replay from a Mooncake-style JSONL trace file |
| `--output-file` | `TRACE_STEM.replay.json` | Write replay metrics JSON to this path |
| `--replay-concurrency` | None | Run offline replay in closed-loop concurrency mode with this many in-flight requests |
| `--num-gpu-blocks-override` | 16384 | Number of KV cache blocks | | `--num-gpu-blocks-override` | 16384 | Number of KV cache blocks |
| `--block-size` | 64 (`vllm`) / engine-specific | Tokens per KV cache block. For `sglang`, if omitted, the effective page/block size defaults to 1 or to `--sglang-page-size` when provided | | `--block-size` | 64 (`vllm`) / engine-specific | Tokens per KV cache block. For `sglang`, if omitted, the effective page/block size defaults to 1 or to `--sglang-page-size` when provided |
| `--max-num-seqs` | 256 | Maximum concurrent sequences | | `--max-num-seqs` | 256 | Maximum concurrent sequences |
...@@ -127,19 +124,9 @@ python -m dynamo.mocker \ ...@@ -127,19 +124,9 @@ python -m dynamo.mocker \
## Trace Replay ## Trace Replay
The mocker also supports replaying Mooncake-style traces through both the original mocker CLI and The mocker supports replaying Mooncake-style traces through the dedicated replay CLI, which exposes
the dedicated replay harness. `offline|online`, `round_robin|kv_router`, `arrival_speedup_ratio`, and the synthetic replay path
directly:
For the original mocker CLI flow:
```bash
python -m dynamo.mocker \
--trace-file /path/to/mooncake_trace.jsonl \
--model-path Qwen/Qwen3-0.6B
```
For the standalone replay CLI, which exposes `offline|online`, `round_robin|kv_router`,
`arrival_speedup_ratio`, and the synthetic replay path directly:
```bash ```bash
python -m dynamo.replay /path/to/mooncake_trace.jsonl \ python -m dynamo.replay /path/to/mooncake_trace.jsonl \
...@@ -147,7 +134,7 @@ python -m dynamo.replay /path/to/mooncake_trace.jsonl \ ...@@ -147,7 +134,7 @@ python -m dynamo.replay /path/to/mooncake_trace.jsonl \
--replay-mode offline \ --replay-mode offline \
--router-mode kv_router \ --router-mode kv_router \
--arrival-speedup-ratio 5 \ --arrival-speedup-ratio 5 \
--extra-engine-args '{"block_size":512,"speedup_ratio":1000.0}' \ --extra-engine-args '{"block_size":512}' \
--router-config '{"router_queue_policy":"fcfs"}' \ --router-config '{"router_queue_policy":"fcfs"}' \
--report-json /tmp/replay-report.json --report-json /tmp/replay-report.json
``` ```
...@@ -163,13 +150,12 @@ python -m dynamo.replay \ ...@@ -163,13 +150,12 @@ python -m dynamo.replay \
--num-workers 1 \ --num-workers 1 \
--replay-mode offline \ --replay-mode offline \
--replay-concurrency 100 \ --replay-concurrency 100 \
--extra-engine-args '{"block_size":512,"speedup_ratio":1000.0}' \ --extra-engine-args '{"block_size":512}' \
--report-json /tmp/replay-report.json --report-json /tmp/replay-report.json
``` ```
The standalone replay CLI prints an AIPerf-style summary table to stdout and writes the full replay The standalone replay CLI prints an AIPerf-style summary table to stdout and writes the full replay
report JSON to disk. The `dynamo.mocker` trace-file flow still writes a report file and prints a report JSON to disk.
`Replay Summary` table.
For full usage, constraints, and benchmarking guidance, see [Mocker Trace Replay](../benchmarks/mocker-trace-replay.md). For full usage, constraints, and benchmarking guidance, see [Mocker Trace Replay](../benchmarks/mocker-trace-replay.md).
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment