Unverified Commit 7860861f authored by Hongkuan Zhou's avatar Hongkuan Zhou Committed by GitHub
Browse files

feat: Add TTFT and ITL Interpolation to Profiling Script (#1159)


Co-authored-by: default avatarroot <root@kkranen-dt.nvidia.com>
parent 3bde1e45
...@@ -31,6 +31,7 @@ protobuf==5.27.3 ...@@ -31,6 +31,7 @@ protobuf==5.27.3
pydantic==2.7.1 pydantic==2.7.1
pyright pyright
PyYAML PyYAML
scikit-learn
sentencepiece sentencepiece
tensorboard==2.19.0 tensorboard==2.19.0
tensorboardX==2.6.2.2 tensorboardX==2.6.2.2
......
...@@ -86,6 +86,8 @@ The following information will be printed out in the terminal: ...@@ -86,6 +86,8 @@ The following information will be printed out in the terminal:
2025-05-16 15:20:24 - __main__ - INFO - Suggested planner upper/lower bound for decode kv cache utilization: 0.20/0.10 2025-05-16 15:20:24 - __main__ - INFO - Suggested planner upper/lower bound for decode kv cache utilization: 0.20/0.10
``` ```
After finding the best TP size for prefill and decode, the script will then interpolate the TTFT with ISL and ITL with active KV cache and decode context length. This is to provide a more accurate estimation of the performance when ISL and OSL changes. The results will be saved to `<output_dir>/<decode/prefill>_tp<best_tp>_interploation`.
## Usage ## Usage
The planner is started automatically as part of Dynamo pipelines when running `dynamo serve`. You can configure the planner just as you would any other component in your pipeline either via YAML configuration or through CLI arguments. The planner is started automatically as part of Dynamo pipelines when running `dynamo serve`. You can configure the planner just as you would any other component in your pipeline either via YAML configuration or through CLI arguments.
......
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment