| `--model-path` | Required | HuggingFace model ID or local path for tokenizer |
| `--model-path` | Required | HuggingFace model ID or local path for tokenizer |
| `--endpoint` | Auto-derived | Dynamo endpoint string. Defaults are namespace-dependent, and prefill workers use a different default endpoint than aggregated/decode workers |
| `--endpoint` | Auto-derived | Dynamo endpoint string. Defaults are namespace-dependent, and prefill workers use a different default endpoint than aggregated/decode workers |
| `--model-name` | Derived from model-path | Model name for API responses |
| `--model-name` | Derived from model-path | Model name for API responses |
| `--trace-file` | None | Run offline trace replay from a Mooncake-style JSONL trace file |
| `--output-file` | `TRACE_STEM.replay.json` | Write replay metrics JSON to this path |
| `--replay-concurrency` | None | Run offline replay in closed-loop concurrency mode with this many in-flight requests |
| `--num-gpu-blocks-override` | 16384 | Number of KV cache blocks |
| `--num-gpu-blocks-override` | 16384 | Number of KV cache blocks |
| `--block-size` | 64 (`vllm`) / engine-specific | Tokens per KV cache block. For `sglang`, if omitted, the effective page/block size defaults to 1 or to `--sglang-page-size` when provided |
| `--block-size` | 64 (`vllm`) / engine-specific | Tokens per KV cache block. For `sglang`, if omitted, the effective page/block size defaults to 1 or to `--sglang-page-size` when provided |
| `--max-num-seqs` | 256 | Maximum concurrent sequences |
| `--max-num-seqs` | 256 | Maximum concurrent sequences |
...
@@ -127,19 +124,9 @@ python -m dynamo.mocker \
...
@@ -127,19 +124,9 @@ python -m dynamo.mocker \
## Trace Replay
## Trace Replay
The mocker also supports replaying Mooncake-style traces through both the original mocker CLI and
The mocker supports replaying Mooncake-style traces through the dedicated replay CLI, which exposes
the dedicated replay harness.
`offline|online`, `round_robin|kv_router`, `arrival_speedup_ratio`, and the synthetic replay path
directly:
For the original mocker CLI flow:
```bash
python -m dynamo.mocker \
--trace-file /path/to/mooncake_trace.jsonl \
--model-path Qwen/Qwen3-0.6B
```
For the standalone replay CLI, which exposes `offline|online`, `round_robin|kv_router`,
`arrival_speedup_ratio`, and the synthetic replay path directly: