"vscode:/vscode.git/clone" did not exist on "ffae72b7d27a3cedcab140ee31b32d77c13f9ff3"
Unverified Commit 7eef9ac3 authored by hhzhang16's avatar hhzhang16 Committed by GitHub
Browse files

docs: update profiling-related docs (#2816)


Signed-off-by: default avatarHannah Zhang <hannahz@nvidia.com>
Signed-off-by: default avatarhhzhang16 <54051230+hhzhang16@users.noreply.github.com>
Co-authored-by: default avatarHongkuan Zhou <tedzhouhk@gmail.com>
parent ad4821c5
../../docs/architecture/pre_deployment_profiling.md ../../docs/benchmarks/pre_deployment_profiling.md
\ No newline at end of file \ No newline at end of file
...@@ -4,6 +4,9 @@ ...@@ -4,6 +4,9 @@
To ensure Dynamo deployments comply with the SLA, we provide a pre-deployment script to profile the model performance with different parallelization mappings and recommend the parallelization mapping for prefill and decode workers and planner configurations. To use this script, the user needs to provide the target ISL, OSL, TTFT SLA, and ITL SLA. To ensure Dynamo deployments comply with the SLA, we provide a pre-deployment script to profile the model performance with different parallelization mappings and recommend the parallelization mapping for prefill and decode workers and planner configurations. To use this script, the user needs to provide the target ISL, OSL, TTFT SLA, and ITL SLA.
> [!NOTE]
> **Time Investment**: This profiling process is comprehensive and typically takes **a few hours** to complete. The script systematically tests multiple tensor parallelism configurations and load conditions to find optimal performance settings. This upfront investment ensures your deployment meets SLA requirements and operates efficiently.
Support matrix: Support matrix:
| Backends | Model Types | Supported | | Backends | Model Types | Supported |
| --- | --- | --- | | --- | --- | --- |
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment