By default, the mocker uses hardcoded polynomial formulas to estimate prefill and decode timing. For more realistic simulations, you can load performance data from actual profiling results.
### Using profiled performance data
Add the `--planner-profile-data` flag to load an NPZ file containing interpolation grids from the planner profiler:
By default, the mocker uses hardcoded polynomial formulas to estimate prefill and decode timing. For more realistic simulations, you can load performance data from actual profiling results using `--planner-profile-data`:
-`prefill_isl`: 1D array of input sequence lengths
-`prefill_ttft_ms`: 1D array of time-to-first-token values (ms)
-`decode_active_kv_tokens`: 1D array of active KV token counts
-`decode_context_length`: 1D array of context lengths
-`decode_itl`: 2D array of inter-token latencies (ms)
### Generating performance data from profiler results
#### Option 1: Use existing pre-swept results
The repository includes pre-swept profiling results for common models and hardware configurations. For example, to use Llama-3.1-8B-Instruct-FP8 on H200 SXM:
```bash
# Convert existing pre-swept results to mocker-compatible NPZ format
The profile results directory should contain `selected_prefill_interpolation/` and `selected_decode_interpolation/` subdirectories with `raw_data.npz` files. This works seamlessly in Kubernetes where profile data is mounted via ConfigMap or PersistentVolume.
To convert your own profiler results into the NPZ format suitable for the mocker, you'll need to run the profiler (see [SLA-driven profiling documentation](../../../../docs/benchmarks/sla_driven_profiling.md) for details). Note that this is generally run in a Kubernetes environment.
To generate profiling data for your own model/hardware configuration, run the profiler (see [SLA-driven profiling documentation](../../../../docs/benchmarks/sla_driven_profiling.md) for details):
```bash
# Run the profiler
python benchmarks/profiler/profile_sla.py \
--profile-config your_profile_config.yaml
# Convert profiler results to mocker-compatible NPZ format
# Case 3: Invalid path - neither mocker-format NPZ nor profiler-style directory
raiseFileNotFoundError(
f"Path '{planner_profile_data}' is neither a mocker-format NPZ file nor a valid profiler results directory.\n"
f"Expected either:\n"
f" - A .npz file with keys: prefill_isl, prefill_ttft_ms, decode_active_kv_tokens, decode_context_length, decode_itl\n"
f" - A directory containing selected_prefill_interpolation/raw_data.npz and selected_decode_interpolation/raw_data.npz\n"
f" - A directory containing prefill_raw_data.json and decode_raw_data.json"
)
defcreate_temp_engine_args_file(args)->Path:
"""
Create a temporary JSON file with MockEngineArgs from CLI arguments.
...
...
@@ -182,7 +252,8 @@ def parse_args():
"--planner-profile-data",
type=Path,
default=None,
help="Path to JSON configmap or NPZ file containing performance profiling data from planner_profiler_perf_data_converter.py (default: None, uses hardcoded polynomials)",
help="Path to profile results directory containing selected_prefill_interpolation/ and "