docs(profiler): require optimization_target=sla for throughput scaling [DYN-2751] (#8649)

Signed-off-by: hongkuanz <hongkuanz@nvidia.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

docs(profiler): require optimization_target=sla for throughput scaling [DYN-2751] (#8649)
Signed-off-by: hongkuanz <hongkuanz@nvidia.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dee63d96 · Hongkuan Zhou · GitHub · 280df2a1 · dee63d96
Unverified Commit dee63d96 authored Apr 23, 2026 by Hongkuan Zhou Committed by GitHub Apr 23, 2026
Hide whitespace changes
Inline Side-by-side

Showing with 5 additions and 1 deletion

docs/components/profiler/profiler-guide.md docs/components/profiler/profiler-guide.md +5 -1

No files found.
--- a/docs/components/profiler/profiler-guide.md
+++ b/docs/components/profiler/profiler-guide.md
@@ -128,17 +128,20 @@ When the planner is enabled, the profiler generates engine interpolation data ne
 ```yaml
 features:
  planner:
+    optimization_target: sla              # required for throughput-based scaling and specific SLA targets
    pre_deployment_sweeping_mode: rapid   # rapid | thorough | none
    enable_throughput_scaling: true
 ```

+`optimization_target` must be set to `sla` for `enable_throughput_scaling` and the planner's `ttft`/`itl` SLA targets to take effect. The `PlannerConfig` default is `throughput`, which uses static queue/utilization thresholds: it silently flips `enable_throughput_scaling` to `false` (so pre-deployment profiling is skipped and `planner-profile-data-XXXX` is not emitted) and ignores any `features.planner.ttft`/`itl` values. `enable_load_scaling` is unaffected (easy-mode keeps load scaling enabled). See the [Planner Guide](../planner/planner-guide.md#optimization-target) for the full explanation of each `optimization_target` value.
+
 - **rapid**: Uses AIC simulation to generate interpolation curves (~30s, no GPUs)
 - **thorough**: Deploys the selected engine config on real GPUs and sweeps across ISL/concurrency ranges (2-4h)
 - **none**: Skips interpolation. Only valid when using load-based scaling without throughput-based scaling.

 The profiler saves two ConfigMaps into the generated DGD:
 - **planner-config-XXXX**: Serialized `PlannerConfig` JSON (with `profile_results_dir` pointing to the profiling data mount)
- **planner-profile-data-XXXX**: Prefill and decode interpolation data (JSON)
+- **planner-profile-data-XXXX**: Prefill and decode interpolation data (JSON). Only emitted when `optimization_target: sla` is set alongside `enable_throughput_scaling: true` (or when mocker is enabled).

 See the [Planner Guide](../planner/planner-guide.md) for the full `PlannerConfig` reference.

@@ -155,6 +158,7 @@ The profiler enforces these rules at startup:
 | Condition | Behavior |
 |-----------|----------|
 | `searchStrategy: thorough` + `backend: auto` | Rejected. Specify a concrete backend. |
+| `enable_throughput_scaling: true` without `optimization_target: sla` | Silently coerced. `PlannerConfig` defaults `optimization_target` to `throughput`, which flips `enable_throughput_scaling` to `false` at validation time. Set `optimization_target: sla` explicitly to keep throughput-based scaling enabled. |
 | `enable_throughput_scaling: true` + `pre_deployment_sweeping_mode: none` (or unset) | Rejected. Throughput-based scaling requires pre-deployment sweeping. |
 | `enable_throughput_scaling: true` + `pre_deployment_sweeping_mode: rapid` + AIC unsupported | Rejected. AIC does not support this model/hardware/backend combination; switch `pre_deployment_sweeping_mode` to `thorough`. |
 | `e2eLatency` provided together with an explicitly-set `ttft` or `itl` | Rejected by SLA validator. Provide only `e2eLatency`; `ttft` and `itl` do not need to be explicitly nulled. |