Unverified Commit 16bca7b6 authored by Yongming Ding's avatar Yongming Ding Committed by GitHub
Browse files

fix(replay): resolve planner_profile_data directory (#7586)


Signed-off-by: default avatarYongming Ding <yongmingd@nvidia.com>
parent a58d2241
...@@ -234,6 +234,15 @@ python -m dynamo.mocker \ ...@@ -234,6 +234,15 @@ python -m dynamo.mocker \
The AIC model automatically uses `--model-path` and `--engine-type` to select the appropriate performance data. Available systems include `h200_sxm`, `h100_sxm`, etc. (see AIC SDK documentation for the full list). The AIC model automatically uses `--model-path` and `--engine-type` to select the appropriate performance data. Available systems include `h200_sxm`, `h100_sxm`, etc. (see AIC SDK documentation for the full list).
When using `python -m dynamo.replay`, there are no dedicated AIC flags. Pass the equivalent fields directly via `--extra-engine-args`:
```bash
python -m dynamo.replay /path/to/trace.jsonl \
--extra-engine-args '{"aic_backend":"vllm","aic_system":"h200_sxm","aic_model_path":"nvidia/Llama-3.1-8B-Instruct-FP8","aic_tp_size":1}'
```
The `aic_backend` field enables the AIC perf model and should match `engine_type` (`"vllm"` or `"sglang"`). The `aic_model_path` field is the equivalent of `--model-path` in `dynamo.mocker`.
Example `--reasoning` configuration: Example `--reasoning` configuration:
```bash ```bash
......
...@@ -4,13 +4,16 @@ ...@@ -4,13 +4,16 @@
from __future__ import annotations from __future__ import annotations
import argparse import argparse
import json
import os import os
import sys import sys
from collections.abc import Sequence from collections.abc import Sequence
from pathlib import Path
os.environ.setdefault("DYNAMO_SKIP_PYTHON_LOG_INIT", "1") os.environ.setdefault("DYNAMO_SKIP_PYTHON_LOG_INIT", "1")
from dynamo.llm import KvRouterConfig, MockEngineArgs from dynamo.llm import KvRouterConfig, MockEngineArgs
from dynamo.mocker.args import resolve_planner_profile_data
from dynamo.replay import run_synthetic_trace_replay, run_trace_replay from dynamo.replay import run_synthetic_trace_replay, run_trace_replay
from dynamo.replay.reporting import format_report_table, write_report_json from dynamo.replay.reporting import format_report_table, write_report_json
...@@ -71,11 +74,24 @@ def main(argv: Sequence[str] | None = None) -> int: ...@@ -71,11 +74,24 @@ def main(argv: Sequence[str] | None = None) -> int:
"synthetic replay requires --input-tokens, --output-tokens, and --request-count" "synthetic replay requires --input-tokens, --output-tokens, and --request-count"
) )
extra_engine_args = ( # Resolve planner_profile_data directory -> NPZ before passing to Rust.
MockEngineArgs.from_json(args.extra_engine_args) # Rust only accepts NPZ files; resolve_planner_profile_data handles conversion.
if args.extra_engine_args is not None profile_data_result = None
else None if args.extra_engine_args is not None:
raw = json.loads(args.extra_engine_args)
if "planner_profile_data" in raw:
profile_data_result = resolve_planner_profile_data(
Path(raw["planner_profile_data"])
) )
if profile_data_result.npz_path is not None:
raw["planner_profile_data"] = str(profile_data_result.npz_path)
else:
del raw["planner_profile_data"]
extra_engine_args = MockEngineArgs.from_json(json.dumps(raw))
else:
extra_engine_args = MockEngineArgs.from_json(args.extra_engine_args)
else:
extra_engine_args = None
router_config = ( router_config = (
KvRouterConfig.from_json(args.router_config) KvRouterConfig.from_json(args.router_config)
if args.router_config is not None if args.router_config is not None
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment