feat: Add TTFT and ITL Interpolation to Profiling Script (#1159)

Co-authored-by: root <root@kkranen-dt.nvidia.com>

feat: Add TTFT and ITL Interpolation to Profiling Script (#1159)
Co-authored-by: root <root@kkranen-dt.nvidia.com>
7860861f · Hongkuan Zhou · GitHub · 3bde1e45 · 7860861f · 7860861f
Unverified Commit 7860861f authored May 22, 2025 by Hongkuan Zhou Committed by GitHub May 22, 2025
Showing with 372 additions and 66 deletions

container/deps/requirements.txt container/deps/requirements.txt +1 -0

docs/guides/planner.md docs/guides/planner.md +2 -0

examples/llm/utils/profile_sla.py examples/llm/utils/profile_sla.py +369 -66

No files found.
--- a/container/deps/requirements.txt
+++ b/container/deps/requirements.txt
@@ -31,6 +31,7 @@ protobuf==5.27.3
 pydantic==2.7.1
 pyright
 PyYAML
+scikit-learn
 sentencepiece
 tensorboard==2.19.0
 tensorboardX==2.6.2.2

--- a/docs/guides/planner.md
+++ b/docs/guides/planner.md
@@ -86,6 +86,8 @@ The following information will be printed out in the terminal:
 2025-05-16 15:20:24 - __main__ - INFO - Suggested planner upper/lower bound for decode kv cache utilization: 0.20/0.10
 ```
+After finding the best TP size for prefill and decode, the script will then interpolate the TTFT with ISL and ITL with active KV cache and decode context length. This is to provide a more accurate estimation of the performance when ISL and OSL changes. The results will be saved to `<output_dir>/<decode/prefill>_tp<best_tp>_interploation`.
 ## Usage
 The planner is started automatically as part of Dynamo pipelines when running `dynamo serve`. You can configure the planner just as you would any other component in your pipeline either via YAML configuration or through CLI arguments.

--- a/examples/llm/utils/profile_sla.py
+++ b/examples/llm/utils/profile_sla.py