"ssh:/git@developer.sourcefind.cn:2222/OpenDAS/dynamo.git" did not exist on "f70dd6638bad4f745996be1e63be1ec186395c11"
Unverified Commit aaa8a567 authored by Yongming Ding's avatar Yongming Ding Committed by GitHub
Browse files

docs: remove unsupported --trace-file from mocker (#7585)


Signed-off-by: default avatarYongming Ding <yongmingd@nvidia.com>
parent 52382c91
......@@ -5,13 +5,10 @@ title: Mocker Trace Replay
subtitle: Replay Mooncake-style traces through the mocker in offline or online mode
---
This guide covers the mocker's trace replay support for Mooncake-style JSONL traces. The replay
surface is available in two forms:
- `python -m dynamo.mocker --trace-file ...`, which writes a report file and prints a replay summary
- `python -m dynamo.replay ...`, which prints an AIPerf-style summary table, writes the full
replay report JSON to disk, and exposes `offline|online`, `round_robin|kv_router`,
`arrival_speedup_ratio`, and synthetic replay inputs directly
This guide covers trace replay support for Mooncake-style JSONL traces via `python -m dynamo.replay`,
which prints an AIPerf-style summary table, writes the full replay report JSON to disk, and exposes
`offline|online`, `round_robin|kv_router`, `arrival_speedup_ratio`, and synthetic replay inputs
directly.
Unlike normal `dynamo.mocker` usage, offline replay does not launch workers, register endpoints, or
require NATS, etcd, or a frontend. Online replay does exercise the live mock-worker runtime path.
......@@ -31,7 +28,7 @@ python -m dynamo.replay /path/to/mooncake_trace.jsonl \
--num-workers 4 \
--replay-mode offline \
--router-mode round_robin \
--extra-engine-args '{"block_size":512,"speedup_ratio":1000.0}' \
--extra-engine-args '{"block_size":512}' \
--report-json /tmp/replay-report.json
```
......@@ -46,27 +43,12 @@ python -m dynamo.replay \
--num-workers 1 \
--replay-mode offline \
--replay-concurrency 100 \
--extra-engine-args '{"block_size":512,"speedup_ratio":1000.0}' \
--extra-engine-args '{"block_size":512}' \
--report-json /tmp/replay-report.json
```
You can also run replay through the mocker CLI by passing `--trace-file`:
```bash
python -m dynamo.mocker \
--trace-file /path/to/mooncake_trace.jsonl \
--model-path Qwen/Qwen3-0.6B
```
This writes a JSON report next to the trace file by default:
```text
/path/to/mooncake_trace.replay.json
```
`python -m dynamo.replay` prints an AIPerf-style summary table to stdout and writes the full replay
report JSON to disk. The mocker CLI prints a `Replay Summary` table to stdout and writes the report
JSON to disk.
report JSON to disk.
## Input Format
......@@ -115,7 +97,7 @@ python -m dynamo.replay /path/to/mooncake_trace.jsonl \
--router-mode kv_router \
--num-workers 4 \
--arrival-speedup-ratio 10 \
--extra-engine-args '{"block_size":512,"speedup_ratio":1000.0}' \
--extra-engine-args '{"block_size":512}' \
--router-config '{"router_queue_policy":"fcfs","router_temperature":0.0}' \
--report-json /tmp/replay-report.json
```
......@@ -127,7 +109,6 @@ SGLang replay uses the same CLI surface. A minimal extra-engine-args file can us
{
"engine_type": "sglang",
"num_gpu_blocks": 512,
"speedup_ratio": 1000.0,
"sglang": {
"page_size": 2
}
......@@ -138,11 +119,6 @@ Both `--extra-engine-args` and `--router-config` accept partial JSON objects. Un
fall back to the same defaults used by `MockEngineArgs::default()` and
`KvRouterConfig::default()`.
### `python -m dynamo.mocker --trace-file`
The mocker CLI supports offline replay and remains useful when you want the historical
`Replay Summary` output and report-file workflow.
### Synthetic Replay
Synthetic replay bypasses trace loading and generates in-memory requests with fixed input/output
......@@ -156,7 +132,7 @@ python -m dynamo.replay \
--arrival-interval-ms 0.5 \
--replay-mode offline \
--replay-concurrency 50 \
--extra-engine-args '{"block_size":512,"speedup_ratio":1000.0}'
--extra-engine-args '{"block_size":512}'
```
This is useful for parameter sweeps where Mooncake-style prefix structure is not required.
......@@ -172,7 +148,7 @@ those timestamps:
python -m dynamo.replay /path/to/mooncake_trace.jsonl \
--replay-mode offline \
--num-workers 4 \
--extra-engine-args '{"block_size":512,"speedup_ratio":1000.0}'
--extra-engine-args '{"block_size":512}'
```
This is the right mode when you want deterministic replay of the original arrival pattern.
......@@ -203,7 +179,7 @@ python -m dynamo.replay /path/to/mooncake_trace.jsonl \
--router-mode kv_router \
--num-workers 4 \
--arrival-speedup-ratio 10 \
--extra-engine-args '{"block_size":512,"speedup_ratio":1000.0}'
--extra-engine-args '{"block_size":512}'
```
### Arrival Speedup
......@@ -216,7 +192,7 @@ python -m dynamo.replay /path/to/mooncake_trace.jsonl \
--replay-mode offline \
--num-workers 4 \
--arrival-speedup-ratio 5 \
--extra-engine-args '{"block_size":512,"speedup_ratio":1000.0}'
--extra-engine-args '{"block_size":512}'
```
### Router Modes
......@@ -243,14 +219,14 @@ python -m dynamo.replay /path/to/mooncake_trace.jsonl \
--replay-mode offline \
--router-mode kv_router \
--num-workers 4 \
--extra-engine-args '{"block_size":512,"speedup_ratio":1000.0}' \
--extra-engine-args '{"block_size":512}' \
--router-config '{"router_queue_policy":"fcfs"}'
python -m dynamo.replay /path/to/mooncake_trace.jsonl \
--replay-mode offline \
--router-mode kv_router \
--num-workers 4 \
--extra-engine-args '{"block_size":512,"speedup_ratio":1000.0}' \
--extra-engine-args '{"block_size":512}' \
--router-config '{"router_queue_policy":"lcfs"}'
```
......@@ -259,17 +235,6 @@ an expected production default.
## Output
Use `--output-file` to override the default report location:
```bash
python -m dynamo.mocker \
--trace-file /path/to/mooncake_trace.jsonl \
--model-path Qwen/Qwen3-0.6B \
--output-file /tmp/replay-report.json
```
If `--output-file` is not set, the report path defaults to `TRACE_STEM.replay.json` in the same directory as the input trace.
The report contains:
- request counts
......
......@@ -73,9 +73,6 @@ python -m dynamo.mocker \
| `--model-path` | Required | HuggingFace model ID or local path for tokenizer |
| `--endpoint` | Auto-derived | Dynamo endpoint string. Defaults are namespace-dependent, and prefill workers use a different default endpoint than aggregated/decode workers |
| `--model-name` | Derived from model-path | Model name for API responses |
| `--trace-file` | None | Run offline trace replay from a Mooncake-style JSONL trace file |
| `--output-file` | `TRACE_STEM.replay.json` | Write replay metrics JSON to this path |
| `--replay-concurrency` | None | Run offline replay in closed-loop concurrency mode with this many in-flight requests |
| `--num-gpu-blocks-override` | 16384 | Number of KV cache blocks |
| `--block-size` | 64 (`vllm`) / engine-specific | Tokens per KV cache block. For `sglang`, if omitted, the effective page/block size defaults to 1 or to `--sglang-page-size` when provided |
| `--max-num-seqs` | 256 | Maximum concurrent sequences |
......@@ -127,19 +124,9 @@ python -m dynamo.mocker \
## Trace Replay
The mocker also supports replaying Mooncake-style traces through both the original mocker CLI and
the dedicated replay harness.
For the original mocker CLI flow:
```bash
python -m dynamo.mocker \
--trace-file /path/to/mooncake_trace.jsonl \
--model-path Qwen/Qwen3-0.6B
```
For the standalone replay CLI, which exposes `offline|online`, `round_robin|kv_router`,
`arrival_speedup_ratio`, and the synthetic replay path directly:
The mocker supports replaying Mooncake-style traces through the dedicated replay CLI, which exposes
`offline|online`, `round_robin|kv_router`, `arrival_speedup_ratio`, and the synthetic replay path
directly:
```bash
python -m dynamo.replay /path/to/mooncake_trace.jsonl \
......@@ -147,7 +134,7 @@ python -m dynamo.replay /path/to/mooncake_trace.jsonl \
--replay-mode offline \
--router-mode kv_router \
--arrival-speedup-ratio 5 \
--extra-engine-args '{"block_size":512,"speedup_ratio":1000.0}' \
--extra-engine-args '{"block_size":512}' \
--router-config '{"router_queue_policy":"fcfs"}' \
--report-json /tmp/replay-report.json
```
......@@ -163,13 +150,12 @@ python -m dynamo.replay \
--num-workers 1 \
--replay-mode offline \
--replay-concurrency 100 \
--extra-engine-args '{"block_size":512,"speedup_ratio":1000.0}' \
--extra-engine-args '{"block_size":512}' \
--report-json /tmp/replay-report.json
```
The standalone replay CLI prints an AIPerf-style summary table to stdout and writes the full replay
report JSON to disk. The `dynamo.mocker` trace-file flow still writes a report file and prints a
`Replay Summary` table.
report JSON to disk.
For full usage, constraints, and benchmarking guidance, see [Mocker Trace Replay](../benchmarks/mocker-trace-replay.md).
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment