docs: update profiling-related docs (#2816)

Signed-off-by: Hannah Zhang <hannahz@nvidia.com> Signed-off-by: hhzhang16 <54051230+hhzhang16@users.noreply.github.com> Co-authored-by: Hongkuan Zhou <tedzhouhk@gmail.com>

docs: update profiling-related docs (#2816)
Signed-off-by: Hannah Zhang <hannahz@nvidia.com> Signed-off-by: hhzhang16 <54051230+hhzhang16@users.noreply.github.com> Co-authored-by: Hongkuan Zhou <tedzhouhk@gmail.com>
7eef9ac3 · hhzhang16 · GitHub · ad4821c5 · 7eef9ac3 · 7eef9ac3
Unverified Commit 7eef9ac3 authored Sep 02, 2025 by hhzhang16 Committed by GitHub Sep 02, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 4 additions and 1 deletion

benchmarks/profiler/README.md benchmarks/profiler/README.md +1 -1

docs/benchmarks/pre_deployment_profiling.md docs/benchmarks/pre_deployment_profiling.md +3 -0

No files found.
--- a/benchmarks/profiler/README.md
+++ b/benchmarks/profiler/README.md
-../../docs/architecture/pre_deployment_profiling.md
+../../docs/benchmarks/pre_deployment_profiling.md
\ No newline at end of file
--- a/docs/benchmarks/pre_deployment_profiling.md
+++ b/docs/benchmarks/pre_deployment_profiling.md
@@ -4,6 +4,9 @@
 To ensure Dynamo deployments comply with the SLA, we provide a pre-deployment script to profile the model performance with different parallelization mappings and recommend the parallelization mapping for prefill and decode workers and planner configurations. To use this script, the user needs to provide the target ISL, OSL, TTFT SLA, and ITL SLA.
+> [!NOTE]
+> **Time Investment**: This profiling process is comprehensive and typically takes **a few hours** to complete. The script systematically tests multiple tensor parallelism configurations and load conditions to find optimal performance settings. This upfront investment ensures your deployment meets SLA requirements and operates efficiently.
 Support matrix:
 | Backends | Model Types | Supported |
 | --- | --- | --- |