docs: remove unsupported --trace-file from mocker (#7585)

Signed-off-by: Yongming Ding <yongmingd@nvidia.com>

docs: remove unsupported --trace-file from mocker (#7585)
Signed-off-by: Yongming Ding <yongmingd@nvidia.com>
aaa8a567 · Yongming Ding · GitHub · 52382c91 · aaa8a567 · aaa8a567
Unverified Commit aaa8a567 authored Mar 23, 2026 by Yongming Ding Committed by GitHub Mar 23, 2026
Hide whitespace changes
Inline Side-by-side

Showing with 20 additions and 69 deletions

docs/benchmarks/mocker-trace-replay.md docs/benchmarks/mocker-trace-replay.md +14 -49

docs/mocker/mocker.md docs/mocker/mocker.md +6 -20

No files found.
--- a/docs/benchmarks/mocker-trace-replay.md
+++ b/docs/benchmarks/mocker-trace-replay.md
@@ -5,13 +5,10 @@ title: Mocker Trace Replay
 subtitle: Replay Mooncake-style traces through the mocker in offline or online mode
 ---

-This guide covers the mocker's trace replay support for Mooncake-style JSONL traces. The replay
-surface is available in two forms:
-
- `python -m dynamo.mocker --trace-file ...`, which writes a report file and prints a replay summary
- `python -m dynamo.replay ...`, which prints an AIPerf-style summary table, writes the full
-  replay report JSON to disk, and exposes `offline|online`, `round_robin|kv_router`,
-  `arrival_speedup_ratio`, and synthetic replay inputs directly
+This guide covers trace replay support for Mooncake-style JSONL traces via `python -m dynamo.replay`,
+which prints an AIPerf-style summary table, writes the full replay report JSON to disk, and exposes
+`offline|online`, `round_robin|kv_router`, `arrival_speedup_ratio`, and synthetic replay inputs
+directly.

 Unlike normal `dynamo.mocker` usage, offline replay does not launch workers, register endpoints, or
 require NATS, etcd, or a frontend. Online replay does exercise the live mock-worker runtime path.
@@ -31,7 +28,7 @@ python -m dynamo.replay /path/to/mooncake_trace.jsonl \
    --num-workers 4 \
    --replay-mode offline \
    --router-mode round_robin \
-    --extra-engine-args '{"block_size":512,"speedup_ratio":1000.0}' \
+    --extra-engine-args '{"block_size":512}' \
    --report-json /tmp/replay-report.json
 ```

@@ -46,27 +43,12 @@ python -m dynamo.replay \
    --num-workers 1 \
    --replay-mode offline \
    --replay-concurrency 100 \
-    --extra-engine-args '{"block_size":512,"speedup_ratio":1000.0}' \
+    --extra-engine-args '{"block_size":512}' \
    --report-json /tmp/replay-report.json
 ```

-You can also run replay through the mocker CLI by passing `--trace-file`:
-
-```bash
-python -m dynamo.mocker \
-    --trace-file /path/to/mooncake_trace.jsonl \
-    --model-path Qwen/Qwen3-0.6B
-```
-
-This writes a JSON report next to the trace file by default:
-
-```text
-/path/to/mooncake_trace.replay.json
-```
-
 `python -m dynamo.replay` prints an AIPerf-style summary table to stdout and writes the full replay
-report JSON to disk. The mocker CLI prints a `Replay Summary` table to stdout and writes the report
-JSON to disk.
+report JSON to disk.

 ## Input Format

@@ -115,7 +97,7 @@ python -m dynamo.replay /path/to/mooncake_trace.jsonl \
    --router-mode kv_router \
    --num-workers 4 \
    --arrival-speedup-ratio 10 \
-    --extra-engine-args '{"block_size":512,"speedup_ratio":1000.0}' \
+    --extra-engine-args '{"block_size":512}' \
    --router-config '{"router_queue_policy":"fcfs","router_temperature":0.0}' \
    --report-json /tmp/replay-report.json
 ```
@@ -127,7 +109,6 @@ SGLang replay uses the same CLI surface. A minimal extra-engine-args file can us
 {
  "engine_type": "sglang",
  "num_gpu_blocks": 512,
-  "speedup_ratio": 1000.0,
  "sglang": {
    "page_size": 2
  }
@@ -138,11 +119,6 @@ Both `--extra-engine-args` and `--router-config` accept partial JSON objects. Un
 fall back to the same defaults used by `MockEngineArgs::default()` and
 `KvRouterConfig::default()`.

-### `python -m dynamo.mocker --trace-file`
-
-The mocker CLI supports offline replay and remains useful when you want the historical
-`Replay Summary` output and report-file workflow.
-
 ### Synthetic Replay

 Synthetic replay bypasses trace loading and generates in-memory requests with fixed input/output
@@ -156,7 +132,7 @@ python -m dynamo.replay \
    --arrival-interval-ms 0.5 \
    --replay-mode offline \
    --replay-concurrency 50 \
-    --extra-engine-args '{"block_size":512,"speedup_ratio":1000.0}'
+    --extra-engine-args '{"block_size":512}'
 ```

 This is useful for parameter sweeps where Mooncake-style prefix structure is not required.
@@ -172,7 +148,7 @@ those timestamps:
 python -m dynamo.replay /path/to/mooncake_trace.jsonl \
    --replay-mode offline \
    --num-workers 4 \
-    --extra-engine-args '{"block_size":512,"speedup_ratio":1000.0}'
+    --extra-engine-args '{"block_size":512}'
 ```

 This is the right mode when you want deterministic replay of the original arrival pattern.
@@ -203,7 +179,7 @@ python -m dynamo.replay /path/to/mooncake_trace.jsonl \
    --router-mode kv_router \
    --num-workers 4 \
    --arrival-speedup-ratio 10 \
-    --extra-engine-args '{"block_size":512,"speedup_ratio":1000.0}'
+    --extra-engine-args '{"block_size":512}'
 ```

 ### Arrival Speedup
@@ -216,7 +192,7 @@ python -m dynamo.replay /path/to/mooncake_trace.jsonl \
    --replay-mode offline \
    --num-workers 4 \
    --arrival-speedup-ratio 5 \
-    --extra-engine-args '{"block_size":512,"speedup_ratio":1000.0}'
+    --extra-engine-args '{"block_size":512}'
 ```

 ### Router Modes
@@ -243,14 +219,14 @@ python -m dynamo.replay /path/to/mooncake_trace.jsonl \
    --replay-mode offline \
    --router-mode kv_router \
    --num-workers 4 \
-    --extra-engine-args '{"block_size":512,"speedup_ratio":1000.0}' \
+    --extra-engine-args '{"block_size":512}' \
    --router-config '{"router_queue_policy":"fcfs"}'

 python -m dynamo.replay /path/to/mooncake_trace.jsonl \
    --replay-mode offline \
    --router-mode kv_router \
    --num-workers 4 \
-    --extra-engine-args '{"block_size":512,"speedup_ratio":1000.0}' \
+    --extra-engine-args '{"block_size":512}' \
    --router-config '{"router_queue_policy":"lcfs"}'
 ```

@@ -259,17 +235,6 @@ an expected production default.

 ## Output

-Use `--output-file` to override the default report location:
-
-```bash
-python -m dynamo.mocker \
-    --trace-file /path/to/mooncake_trace.jsonl \
-    --model-path Qwen/Qwen3-0.6B \
-    --output-file /tmp/replay-report.json
-```
-
-If `--output-file` is not set, the report path defaults to `TRACE_STEM.replay.json` in the same directory as the input trace.
-
 The report contains:

 - request counts

--- a/docs/mocker/mocker.md
+++ b/docs/mocker/mocker.md
@@ -73,9 +73,6 @@ python -m dynamo.mocker \
 | `--model-path` | Required | HuggingFace model ID or local path for tokenizer |
 | `--endpoint` | Auto-derived | Dynamo endpoint string. Defaults are namespace-dependent, and prefill workers use a different default endpoint than aggregated/decode workers |
 | `--model-name` | Derived from model-path | Model name for API responses |
-| `--trace-file` | None | Run offline trace replay from a Mooncake-style JSONL trace file |
-| `--output-file` | `TRACE_STEM.replay.json` | Write replay metrics JSON to this path |
-| `--replay-concurrency` | None | Run offline replay in closed-loop concurrency mode with this many in-flight requests |
 | `--num-gpu-blocks-override` | 16384 | Number of KV cache blocks |
 | `--block-size` | 64 (`vllm`) / engine-specific | Tokens per KV cache block. For `sglang`, if omitted, the effective page/block size defaults to 1 or to `--sglang-page-size` when provided |
 | `--max-num-seqs` | 256 | Maximum concurrent sequences |
@@ -127,19 +124,9 @@ python -m dynamo.mocker \

 ## Trace Replay

-The mocker also supports replaying Mooncake-style traces through both the original mocker CLI and
-the dedicated replay harness.
-
-For the original mocker CLI flow:
-
-```bash
-python -m dynamo.mocker \
-    --trace-file /path/to/mooncake_trace.jsonl \
-    --model-path Qwen/Qwen3-0.6B
-```
-
-For the standalone replay CLI, which exposes `offline|online`, `round_robin|kv_router`,
-`arrival_speedup_ratio`, and the synthetic replay path directly:
+The mocker supports replaying Mooncake-style traces through the dedicated replay CLI, which exposes
+`offline|online`, `round_robin|kv_router`, `arrival_speedup_ratio`, and the synthetic replay path
+directly:

 ```bash
 python -m dynamo.replay /path/to/mooncake_trace.jsonl \
@@ -147,7 +134,7 @@ python -m dynamo.replay /path/to/mooncake_trace.jsonl \
    --replay-mode offline \
    --router-mode kv_router \
    --arrival-speedup-ratio 5 \
-    --extra-engine-args '{"block_size":512,"speedup_ratio":1000.0}' \
+    --extra-engine-args '{"block_size":512}' \
    --router-config '{"router_queue_policy":"fcfs"}' \
    --report-json /tmp/replay-report.json
 ```
@@ -163,13 +150,12 @@ python -m dynamo.replay \
    --num-workers 1 \
    --replay-mode offline \
    --replay-concurrency 100 \
-    --extra-engine-args '{"block_size":512,"speedup_ratio":1000.0}' \
+    --extra-engine-args '{"block_size":512}' \
    --report-json /tmp/replay-report.json
 ```

 The standalone replay CLI prints an AIPerf-style summary table to stdout and writes the full replay
-report JSON to disk. The `dynamo.mocker` trace-file flow still writes a report file and prints a
-`Replay Summary` table.
+report JSON to disk.

 For full usage, constraints, and benchmarking guidance, see [Mocker Trace Replay](../benchmarks/mocker-trace-replay.md).