feat: turn profiling k8s jobs into sample DGDR requests (#3864)

Signed-off-by: Hannah Zhang <hannahz@nvidia.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com> Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com> Co-authored-by: hongkuanz <hongkuanz@nvidia.com> Co-authored-by: Hongkuan Zhou <tedzhouhk@gmail.com>

feat: turn profiling k8s jobs into sample DGDR requests (#3864)
Signed-off-by: Hannah Zhang <hannahz@nvidia.com> Signed-off-by: hongkuanz <hongkuanz@nvidia.com> Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com> Co-authored-by: hongkuanz <hongkuanz@nvidia.com> Co-authored-by: Hongkuan Zhou <tedzhouhk@gmail.com>
6a84ffd3 · hhzhang16 · GitHub · 0d07e2c3 · 6a84ffd3 · 6a84ffd3
Unverified Commit 6a84ffd3 authored Oct 27, 2025 by hhzhang16 Committed by GitHub Oct 27, 2025
4 changed files
--- a/docs/planner/sla_planner_quickstart.md
+++ b/docs/planner/sla_planner_quickstart.md
-# SLA Planner Quick Start Guide
+# SLA-Driven Profiling and Planner Deployment Quick Start Guide

-Complete workflow to deploy SLA-based autoscaling for Dynamo deployments. This guide consolidates all necessary steps into a clear, sequential process.
+Complete workflow to deploy SLA-optimized Dynamo models using DynamoGraphDeploymentRequests (DGDR). This guide shows how to automatically profile models and deploy them with optimal configurations that meet your Service Level Agreements (SLAs).

 > [!IMPORTANT]
 > **Prerequisites**: This guide assumes you have a Kubernetes cluster with GPU nodes and have completed the [Dynamo Platform installation](/docs/kubernetes/installation_guide.md).

 ## Overview

-The SLA Planner automatically scales prefill and decode workers to meet your TTFT (Time To First Token) and ITL (Inter-Token Latency) targets.
+The DGDR workflow automates the entire process from SLA specification to deployment:

-The deployment process consists of two mandatory phases:
-
-1. **Pre-Deployment Profiling** (2-4 hours) - Generates performance data
-2. **SLA Planner Deployment** (5-10 minutes) - Enables autoscaling
-
-> [!TIP]
-> **Fast Profiling with AI Configurator**: For TensorRT-LLM users, we provide AI Configurator (AIC) that can complete profiling in 20-30 seconds using performance simulation instead of real deployments. Support for vLLM and SGLang coming soon. See [AI Configurator section](/docs/benchmarks/pre_deployment_profiling.md#running-the-profiling-script-with-ai-configurator) in the Profiling Guide.
+1. **Define SLAs**: Specify performance requirements (TTFT, ITL) and model information in a DGDR Custom Resource
+2. **Automatic Profiling**: The Dynamo Operator automatically profiles your model to find optimal configurations
+3. **Auto-Deploy**: The system automatically deploys the optimal configuration that meets your SLAs

 ```mermaid
 flowchart TD
-    A[Start Setup] --> B{Profiling Done?}
-    B -->|No| C[Run Profiling<br/>2-4 hours]
-    C --> D[Verify Results]
-    D --> E[Deploy Planner<br/>5-10 minutes]
-    B -->|Yes| E
-    E --> F[Test System]
-    F --> G[Ready!]
+    A[Create DGDR] --> B[DGDR Controller]
+    B --> C{Profiling Method}
+    C -->|Online| D[Run Profiling Job<br/>2-4 hours]
+    C -->|Offline/AIC| E[AI Configurator<br/>20-30 seconds]
+    D --> F[Generate DGD Config]
+    E --> F
+    F --> G[Auto-Deploy DGD]
+    G --> H[Monitor & Scale]

    style A fill:#e1f5fe
-    style C fill:#fff3e0
+    style D fill:#fff3e0
    style E fill:#e8f5e8
    style G fill:#f3e5f5
-    style B fill:#fff8e1
+    style H fill:#fff8e1
 ```

-## Prerequisites
+## What is a DynamoGraphDeploymentRequest (DGDR)?

-Before deploying the SLA planner, ensure:
- **Dynamo platform installed** (see [Installation Guide](/docs/kubernetes/installation_guide.md))
- **[kube-prometheus-stack](/docs/kubernetes/observability/metrics.md) installed and running.** By default, the prometheus server is not deployed in the `monitoring` namespace. If it is deployed to a different namespace, set `dynamo-operator.dynamo.metrics.prometheusEndpoint="http://prometheus-kube-prometheus-prometheus.<namespace>.svc.cluster.local:9090"`.
- **Benchmarking resources setup** (see [Kubernetes utilities for Dynamo Benchmarking and Profiling](../../deploy/utils/README.md)) The script will create a `dynamo-pvc` with `ReadWriteMany` access, if your cluster's default storageClassName does not allow `ReadWriteMany`, you need to specify a different storageClassName in `deploy/utils/manifests/pvc.yaml` which does support `ReadWriteMany`.
+A **DynamoGraphDeploymentRequest (DGDR)** is a Kubernetes Custom Resource that serves as the primary interface for users to request model deployments with specific performance and resource constraints. Think of it as a "deployment order" where you specify:

+- **What** model you want to deploy (`model`)
+- **How** it should perform (SLA targets: `ttft`, `itl`)
+- **Where** it should run (optional GPU preferences)
+- **Which** backend to use (`backend`: vllm, sglang, or trtllm)
+- **Which** images to use (`profilingConfig.profilerImage`, `deploymentOverrides.workersImage`)

-## Pre-Deployment Profiling
+The Dynamo Operator watches for DGDRs and automatically:
+1. Discovers available GPU resources in your cluster
+2. Runs profiling (online or offline) to find optimal configurations
+3. Generates an optimized DynamoGraphDeployment (DGD) configuration
+4. Deploys the DGD to your cluster

-Deploying planner starts with running pre-deployment profiling.
+**Key Benefits:**
+- **Declarative**: Specify what you want, not how to achieve it
+- **Automated**: No manual profiling job setup or result processing
+- **SLA-Driven**: Ensures deployments meet your performance requirements
+- **Integrated**: Works seamlessly with the Dynamo Operator

-> [!WARNING]
-> **MANDATORY**: Pre-deployment profiling must be completed before deploying SLA planner. This process analyzes your model's performance characteristics to determine optimal tensor parallelism configurations and scaling parameters.
+## Prerequisites

-### Step 1.1: Set Up Profiling Environment
+Before creating a DGDR, ensure:
+- **Dynamo platform installed** with the operator running (see [Installation Guide](/docs/kubernetes/installation_guide.md))
+- **[kube-prometheus-stack](/docs/kubernetes/observability/metrics.md) installed and running** (required for SLA planner)
+- **Profiling PVC created** (see [Benchmarking Resource Setup](/deploy/utils/README.md#benchmarking-resource-setup#BenchmarkingResourceSetup))
+- **Image pull secrets configured** if using private registries (typically `nvcr-imagepullsecret` for NVIDIA images)
+- **Sufficient GPU resources** available in your cluster for profiling
+- **Runtime images available** that contain both profiler and runtime components

-Set up your Kubernetes namespace for profiling (one-time per namespace). If your namespace is already set up, skip this step.
+### Container Images

-```bash
-export NAMESPACE=your-namespace
-```
+Each DGDR requires you to specify container images for the profiling and deployment process:

-**Prerequisites**: Ensure all dependencies are installed:
-```bash
-pip install -r deploy/utils/requirements.txt
-```
+**profilingConfig.profilerImage** (Required):
+Specifies the container image used for the profiling job itself. This image must contain the profiler code and dependencies needed for SLA-based profiling.

-### Step 1.2: Inject Your Configuration
+**deploymentOverrides.workersImage** (Optional):
+Specifies the container image used for DynamoGraphDeployment worker components (frontend, workers, planner). This image is used for:
+- Temporary DGDs created during online profiling (for performance measurements)
+- The final DGD deployed after profiling completes

-Use the injector utility to place your DGD manifest into the PVC:
+If `workersImage` is omitted, the image from the base config file (e.g., `disagg.yaml`) is used. You may use our public images (0.6.1 and later) or build and push your own.

-```bash
-# Use default disagg.yaml config
-python3 -m deploy.utils.inject_manifest --namespace $NAMESPACE --src components/backends/vllm/deploy/disagg.yaml --dest /data/configs/disagg.yaml
-
-# Or use a custom disagg config file
-python3 -m deploy.utils.inject_manifest --namespace $NAMESPACE --src my-custom-disagg.yaml --dest /data/configs/disagg.yaml
+```yaml
+spec:
+  profilingConfig:
+    profilerImage: "nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.1"
+  deploymentOverrides:
+    workersImage: "nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.1"  # Optional
 ```

-> **Note**: All paths must start with `/data/` for security reasons.
+## Quick Start: Deploy with DGDR
+
+### Step 1: Create Your DGDR
+
+Dynamo provides sample DGDR configurations in `benchmarks/profiler/deploy/`. You can use these as starting points:

-### Step 1.3: Configure SLA Targets
+**Available Sample DGDRs:**
+- **`profile_sla_dgdr.yaml`**: Standard online profiling for dense models
+- **`profile_sla_aic_dgdr.yaml`**: Fast offline profiling using AI Configurator (TensorRT-LLM)
+- **`profile_sla_moe_dgdr.yaml`**: Online profiling for MoE models (SGLang)

-For dense models, edit `$DYNAMO_HOME/benchmarks/profiler/deploy/profile_sla_job.yaml`:
+Or, you can create your own DGDR for your own needs:

 ```yaml
+apiVersion: nvidia.com/v1alpha1
+kind: DynamoGraphDeploymentRequest
+metadata:
+  name: my-model-deployment  # Change the name
+  namespace: default         # Change the namespace
 spec:
-  template:
-    spec:
-      containers:
-        - name: profile-sla
-          args:
-            - --isl
-            - "3000" # average ISL is 3000 tokens
-            - --osl
-            - "150" # average OSL is 150 tokens
-            - --ttft
-            - "200" # target TTFT is 200ms
-            - --itl
-            - "20" # target ITL is 20ms
-            - --backend
-            - <vllm/sglang>
-```
+  model: "Qwen/Qwen3-0.6B"     # Update to your model
+  backend: vllm                # Backend: vllm, sglang, or trtllm

-For MoE models, edit `$DYNAMO_HOME/benchmarks/profiler/deploy/profile_sla_moe_job.yaml` instead.
+  profilingConfig:
+    profilerImage: "nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.1"  # Required
+    config:
+      sla:
+        isl: 3000    # Adjust to your workload
+        osl: 150     # Adjust to your workload
+        ttft: 200    # Your target (ms)
+        itl: 20      # Your target (ms)

-### Step 1.4: Run Profiling
+      sweep:
+        use_ai_configurator: false  # Set to true for fast profiling (TensorRT-LLM only)

-Set the container image and config path:
+  deploymentOverrides:
+    workersImage: "nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.1"  # Optional

-```bash
-export DOCKER_IMAGE=nvcr.io/nvidia/ai-dynamo/vllm-runtime:my-tag
-export DGD_CONFIG_FILE=/data/configs/disagg.yaml
+  autoApply: true  # Auto-deploy after profiling
 ```

-Run profiling:
+> [!TIP]
+> For detailed explanations of all configuration options (SLA, hardware, sweep, AIC, planner), see the [DGDR Configuration Reference](/docs/benchmarks/sla_driven_profiling.md#dgdr-configuration-reference).

-```bash
-# for dense models
-envsubst < benchmarks/profiler/deploy/profile_sla_job.yaml | kubectl apply -f -
+### Step 2: Apply the DGDR

-# for MoE models
-envsubst < benchmarks/profiler/deploy/profile_sla_moe_job.yaml | kubectl apply -f -
+The rest of this quickstart will use the DGDR sample that uses AIC profiling. If you use a different DGDR file and/or name, be sure to adjust the commands accordingly.

-# using aiconfigurator instead of real sweeping (see below for more details)
-envsubst < benchmarks/profiler/deploy/profile_sla_aic_job.yaml | kubectl apply -f -
+```bash
+export NAMESPACE=your-namespace
+kubectl apply -f benchmarks/profiler/deploy/profile_sla_aic_dgdr.yaml -n $NAMESPACE
 ```

-### Step 1.5: Monitor Profiling Progress
+The Dynamo Operator will immediately begin processing your request.
+
+### Step 3: Monitor Progress
+
+Watch the DGDR status:

 ```bash
-kubectl get jobs -n $NAMESPACE
-kubectl logs job/profile-sla -n $NAMESPACE
+# View status
+kubectl get dgdr -n $NAMESPACE
+
+# Detailed status
+kubectl describe dgdr sla-aic -n $NAMESPACE
+
+# Watch profiling job logs
+kubectl logs -f job/profile-sla-aic -n $NAMESPACE
 ```

+**DGDR Status States:**
+- `Pending`: Initial state, preparing to profile
+- `Profiling`: Running profiling job (20-30 seconds for AIC, 2-4 hours for online)
+- `Deploying`: Generating and applying DGD configuration
+- `Ready`: DGD successfully deployed and running
+- `Failed`: Error occurred (check events for details)
+
 > [!NOTE]
-> **Time Investment**: This profiling process is comprehensive and typically takes **2-4 hours** to complete. The script systematically tests multiple tensor parallelism configurations and load conditions to find optimal performance settings.
+> With AI Configurator, profiling completes in **20-30 seconds**! This is much faster than online profiling which takes 2-4 hours.

-### Step 1.6: Download Profiling Results
+### Step 4: Access Your Deployment

-If you want to view the profiling results and performance plots:
+Once the DGDR reaches `Ready` state, your model is deployed and ready to serve:

 ```bash
-# Download to directory
-python3 -m deploy.utils.download_pvc_results --namespace $NAMESPACE --output-dir ./results --folder /data/profiling_results
-```
+# Find the frontend service
+kubectl get svc -n $NAMESPACE | grep trtllm-disagg

-For detailed information about the output structure, performance plots, and how to analyze the results, see the [Viewing Profiling Results](/docs/benchmarks/pre_deployment_profiling.md#viewing-profiling-results) section in the Profiling Guide.
+# Port-forward to access locally
+kubectl port-forward svc/trtllm-disagg-frontend 8000:8000 -n $NAMESPACE

-**Verify Success**: Look for terminal output like:
-```
-Suggested prefill TP:4 (TTFT 48.37 ms, throughput 15505.23 tokens/s/GPU)
-Suggested decode TP:4 (ITL 4.83 ms, throughput 51.22 tokens/s/GPU)
-...
-Final DGD config with planner: {...}
-Deploying the optimized DGD with planner...
+# Test the endpoint
+curl http://localhost:8000/v1/models
 ```

-### Step 1.7: Deploy the DGD with Planner
+## DGDR Configuration Details

-```bash
-kubectl apply -f ./results/config_with_planner.yaml
-```
+### Required Fields

-### Step 1.8: Wait for Deployment to be Ready
+| Field | Type | Description |
+|-------|------|-------------|
+| `spec.model` | string | Model identifier (e.g., "meta-llama/Llama-3-70b") |
+| `spec.backend` | enum | Inference backend: `vllm`, `sglang`, or `trtllm` |
+| `spec.profilingConfig.profilerImage` | string | Container image for profiling job |
+| `spec.profilingConfig.config.sla` | object | SLA targets (isl, osl, ttft, itl) |

-```bash
-kubectl get pods -n $NAMESPACE
+### Optional Fields
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `spec.deploymentOverrides.workersImage` | string | Container image for DGD worker components. If omitted, uses image from base config file. |
+| `spec.autoApply` | boolean | Automatically deploy DGD after profiling (default: false) |
+| `spec.deploymentOverrides` | object | Customize metadata (name, namespace, labels, annotations) and image for auto-created DGD |
+
+### SLA Configuration
+
+The `sla` section defines performance requirements and workload characteristics:
+
+```yaml
+sla:
+  isl: 3000      # Average input sequence length (tokens)
+  osl: 150       # Average output sequence length (tokens)
+  ttft: 200      # Target Time To First Token (milliseconds, float)
+  itl: 20        # Target Inter-Token Latency (milliseconds, float)
 ```

-**Expected pods** (all should be `1/1 Running`):
+**Choosing SLA Values:**
+- **ISL/OSL**: Based on your expected traffic patterns
+- **TTFT**: First token latency target (lower = more GPUs needed)
+- **ITL**: Token generation latency target (lower = more GPUs needed)
+- **Trade-offs**: Tighter SLAs require more GPU resources
+
+### Profiling Methods
+
+Choose between **online profiling** (real measurements, 2-4 hours) or **offline profiling** with AI Configurator (estimated, 20-30 seconds):
+
+```yaml
+# Online Profiling (Default)
+sweep:
+  use_ai_configurator: false
+
+# Offline Profiling (AI Configurator - TensorRT-LLM only)
+sweep:
+  use_ai_configurator: true
+aic:
+  system: h200_sxm
+  model_name: QWEN3_32B
+  backend_version: "0.20.0"
 ```
-vllm-disagg-planner-frontend-*            1/1 Running
-vllm-disagg-planner-planner-*             1/1 Running
-vllm-disagg-planner-backend-*             1/1 Running
-vllm-disagg-planner-prefill-*             1/1 Running
+
+> [!NOTE]
+> For detailed comparison, supported configurations, and limitations, see [SLA-Driven Profiling Documentation](/docs/benchmarks/sla_driven_profiling.md#profiling-methods).
+
+### GPU Discovery
+
+By default, the DGDR controller automatically discovers available GPU resources. Optionally specify preferences:
+
+```yaml
+spec:
+  gpu:
+    type: h200           # GPU type (e.g., h100, h200)
+    count: 8             # Number of GPUs to use
+    memoryGB: 141        # GPU memory in GB
 ```

-### Step 1.9: Test the System
+### Advanced Configuration
+
+#### Using Existing DGD Configs (Recommended for Custom Setups)
+
+If you have an existing DynamoGraphDeployment config (e.g., from `components/backends/*/deploy/disagg.yaml` or custom recipes), you can reference it via ConfigMap:
+
+**Step 1: Create ConfigMap from your DGD config file:**

 ```bash
-# Port forward to frontend
-kubectl port-forward -n $NAMESPACE deployment/vllm-disagg-planner-frontend 8000:8000
-
-# Send a request
-curl -N http://localhost:8000/v1/chat/completions \
-  -H "Content-Type: application/json" \
-  -d '{
-    "model": "Qwen/Qwen3-0.6B",
-    "messages": [
-    {
-        "role": "user",
-        "content": "Hello, how are you?"
-    }
-    ],
-    "stream":true,
-    "max_tokens": 30
-  }'
+kubectl create configmap deepseek-r1-config \
+  --from-file=disagg.yaml=/path/to/your/disagg.yaml \
+  --namespace $NAMESPACE \
+  --dry-run=client -o yaml | kubectl apply -f -
 ```

-### Step 1.10: Monitor Scaling
+**Step 2: Reference the ConfigMap in your DGDR:**

-```bash
-# Check planner logs for scaling decisions
-kubectl logs -n $NAMESPACE deployment/vllm-disagg-planner-planner --tail=10
+```yaml
+apiVersion: nvidia.com/v1alpha1
+kind: DynamoGraphDeploymentRequest
+metadata:
+  name: deepseek-r1
+spec:
+  model: deepseek-ai/DeepSeek-R1
+  backend: sglang
+
+  profilingConfig:
+    profilerImage: "nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.1"
+    configMapRef:
+      name: deepseek-r1-config
+      key: disagg.yaml  # Must match the key used in --from-file
+    config:
+      sla:
+        isl: 4000
+        osl: 500
+        ttft: 300
+        itl: 10
+      sweep:
+        use_ai_configurator: true
+      aic:
+        system: h200_sxm
+        model_name: DEEPSEEK_V3
+        backend_version: "0.20.0"
+
+  deploymentOverrides:
+    workersImage: "nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.1"
+
+  autoApply: true
 ```

-**Expected successful output** (after streaming requests):
+> **What's happening**: The profiler uses the DGD config from the ConfigMap as a **base template**, then optimizes it based on your SLA targets. The controller automatically injects `spec.model` into `deployment.model` and `spec.backend` into `engine.backend` in the final configuration.
+
+#### Inline Configuration (Simple Use Cases)
+
+For simple use cases without a custom DGD config, provide profiler configuration directly. The profiler will auto-generate a basic DGD configuration from your `model` and `backend`:
+
+```yaml
+profilingConfig:
+  config:
+    # SLA targets (required for profiling)
+    sla:
+      isl: 8000   # Input sequence length
+      osl: 200    # Output sequence length
+      ttft: 200.0 # Time To First Token (ms)
+      itl: 10.0   # Inter-Token Latency (ms)
+
+    # Hardware constraints (optional)
+    hardware:
+      min_num_gpus_per_engine: 2
+      max_num_gpus_per_engine: 8
+      gpu_type: h200_sxm
+
+    # Profiling sweep settings (optional)
+    sweep:
+      skip_existing_results: false
+      force_rerun: false
 ```
-New adjustment interval started!
-Observed num_req: X.XXX isl: X.XXX osl: X.XXX
-Observed ttft: X.XXms itl: X.XXms
-Number of prefill workers: 1, number of decode workers: 1
+
+> **Note**: `engine.config` is a **file path** to a DGD YAML file, not inline configuration. Use ConfigMapRef (recommended) or leave it unset to auto-generate.
+
+#### Planner Configuration Passthrough
+Add planner-specific settings. Planner arguments use a `planner_` prefix:
+
+```yaml
+profilingConfig:
+  config:
+    planner:
+      planner_min_endpoint: 2
 ```

-## Production Readiness
+## Understanding Profiling Results

-### Monitoring Metrics
+For details about the profiling process, performance plots, and interpolation data, see [SLA-Driven Profiling Documentation](/docs/benchmarks/sla_driven_profiling.md#profiling-process-details).

- **Basic metrics** (request count): Available with any request type
- **Latency metrics** (TTFT/ITL): Available for both streaming and non-streaming requests
- **Scaling decisions**: Require sufficient request volume
+## Advanced Topics

-### Troubleshooting
+### DGDR Immutability

-**Connection Issues:**
-```bash
-# Verify Prometheus is accessible
-kubectl port-forward svc/prometheus-kube-prometheus-prometheus -n monitoring 9090:9090
-curl "http://localhost:9090/api/v1/query?query=up"
+DGDRs are **immutable** - if you need to update SLAs or configuration:
+
+1. Delete the existing DGDR: `kubectl delete dgdr sla-aic`
+2. Create a new DGDR with updated specifications
+
+### Manual Deployment Control
+
+Disable auto-deployment to review configurations before deploying:
+
+```yaml
+spec:
+  autoApply: false
 ```

-**Missing Metrics:**
+Then manually apply the generated DGD:
+
 ```bash
-# Check frontend metrics
-kubectl port-forward -n $NAMESPACE deployment/vllm-disagg-planner-frontend 8000:8000
-curl http://localhost:8000/metrics | grep nv_llm_http_service
+# Extract generated config
+kubectl get dgdr sla-aic -n $NAMESPACE -o jsonpath='{.status.generatedConfig}' > my-dgd.yaml
+
+# Review and modify if needed
+vi my-dgd.yaml
+
+# Deploy manually
+kubectl apply -f my-dgd.yaml -n $NAMESPACE
 ```

-**Worker Issues:**
- Large models can take 10+ minutes to initialize
- Check worker logs: `kubectl logs -n $NAMESPACE deployment/vllm-disagg-planner-backend`
- Ensure GPU resources are available for workers
+### Relationship to DynamoGraphDeployment (DGD)

-**Unknown Field subComponentType:**
+- **DGDR**: High-level "intent" - what you want deployed
+- **DGD**: Low-level "implementation" - how it's deployed
+
+The DGDR controller generates a DGD that:
+- Uses optimal TP configurations from profiling
+- Includes SLA planner for autoscaling
+- Has deployment and engine settings tuned for your SLAs
+
+The generated DGD is tracked via labels:
+```yaml
+metadata:
+  labels:
+    dgdr.nvidia.com/name: sla-aic
+    dgdr.nvidia.com/namespace: your-namespace
+```
+
+## Troubleshooting
+
+### Quick Diagnostics

-If you encounter the following error when applying the deployment:
 ```bash
-Error from server (BadRequest): error when creating "components/backends/vllm/deploy/disagg.yaml": DynamoGraphDeployment in version "v1alpha1" cannot be handled as a DynamoGraphDeployment: strict decoding error: unknown field "spec.services.DecodeWorker.subComponentType", unknown field "spec.services.PrefillWorker.subComponentType"
+# Check DGDR status and events
+kubectl describe dgdr sla-aic -n $NAMESPACE
+
+# Check operator logs
+kubectl logs -n $NAMESPACE -l app.kubernetes.io/name=dynamo-operator --tail=100
+
+# Check profiling job logs
+kubectl logs -l job-name=profile-sla-aic -n $NAMESPACE
 ```
-This is because the `subComponentType` field has only been added in newer versions of the DynamoGraphDeployment CRD (> 0.5.0). You can upgrade the CRD version by following the instructions [here](/docs/kubernetes/installation_guide.md).

-## Next Steps
+### Common Issues

- **Architecture Details**: See [SLA-based Planner Architecture](/docs/planner/sla_planner.md) for technical details
- **Performance Tuning**: See [Pre-Deployment Profiling Guide](/docs/benchmarks/pre_deployment_profiling.md) for advanced profiling options
- **Load Testing**: See [SLA Planner Load Test](/tests/planner/README.md) for comprehensive testing tools
+| Issue | Quick Fix |
+|-------|-----------|
+| **DGDR stuck in Pending** | Check GPU availability: `kubectl get nodes -o jsonpath='{.items[*].status.allocatable.nvidia\.com/gpu}'` |
+| **Image pull errors** | Verify secret exists: `kubectl get secret nvcr-imagepullsecret -n $NAMESPACE` |
+| **Profiling fails** | Check job logs: `kubectl logs -l job-name=profile-sla-aic -n $NAMESPACE` |
+| **SLA cannot be met** | Relax TTFT/ITL targets or add more GPUs |
+| **DGD not deployed** | Verify `autoApply: true` in DGDR spec |

-## Quick Reference
+> [!NOTE]
+> For comprehensive troubleshooting including AI Configurator constraints, performance debugging, and backend-specific issues, see [SLA-Driven Profiling Troubleshooting](/docs/benchmarks/sla_driven_profiling.md#troubleshooting).

-| Phase | Duration | Purpose | Status Check |
-|-------|----------|---------|--------------|
-| Profiling | 2-4 hours | Generate performance data | `kubectl logs job/profile-sla` |
-| Deployment | 5-10 minutes | Enable autoscaling | `kubectl get pods` |
-| Testing | 5 minutes | Verify functionality | `kubectl logs deployment/planner` |
+## Configuration Reference

---
+For comprehensive documentation of all DGDR configuration options, see the [DGDR Configuration Reference](/docs/benchmarks/sla_driven_profiling.md#dgdr-configuration-reference).

-> [!TIP]
-> **Need Help?** If you encounter issues, check the [troubleshooting section](#troubleshooting) or refer to the detailed guides linked in [Next Steps](#next-steps).
+This includes detailed explanations of:
+- **SLA Configuration**: ISL, OSL, TTFT, ITL with use cases and trade-offs
+- **Hardware Configuration**: GPU constraints and search space control
+- **Sweep Configuration**: Profiling behavior and interpolation settings
+- **AI Configurator Configuration**: System types, model mappings, backend versions
+- **Planner Configuration**: Autoscaling and adjustment parameters
+- **Complete Examples**: Full DGDRs for online, offline (AIC), and MoE profiling
+
+## Related Documentation
+
+- [DGDR API Reference](/docs/kubernetes/api_reference.md)
+- [Pre-Deployment Profiling Details](/docs/benchmarks/sla_driven_profiling.md)
+- [SLA Planner Architecture](/docs/planner/sla_planner.md)
+- [Dynamo Operator Guide](/docs/kubernetes/dynamo_operator.md)
--- a/tests/planner/README.md
+++ b/tests/planner/README.md
@@ -23,7 +23,7 @@ Use the pre-configured test deployment with sample profiling data, we provide th

 ### Option B: Use Your Own Profiling Results

-1. Run pre-deployment profiling for your specific setup. See the [pre-deployment profiling documentation](../../docs/benchmarks/pre_deployment_profiling.md) for detailed instructions.
+1. Run pre-deployment profiling for your specific setup. See the [pre-deployment profiling documentation](../../docs/benchmarks/sla_driven_profiling.md) for detailed instructions.

 ## Interpolator Testing


--- a/tests/profiler/test_profile_sla_aiconfigurator.py
+++ b/tests/profiler/test_profile_sla_aiconfigurator.py
@@ -27,6 +27,8 @@ class TestProfileSlaAiconfigurator:
    def trtllm_args(self):
        class Args:
            def __init__(self):
+                self.model = ""
+                self.dgd_image = ""
                self.backend = "trtllm"
                self.config = "components/backends/trtllm/deploy/disagg.yaml"
                self.output_dir = "/tmp/test_profiling_results"

--- a/tests/profiler/test_profile_sla_dryrun.py
+++ b/tests/profiler/test_profile_sla_dryrun.py
@@ -49,6 +49,8 @@ class TestProfileSLADryRun:
                self.config = "components/backends/vllm/deploy/disagg.yaml"
                self.output_dir = "/tmp/test_profiling_results"
                self.namespace = "test-namespace"
+                self.model = ""
+                self.dgd_image = ""
                self.min_num_gpus_per_engine = 1
                self.max_num_gpus_per_engine = 8
                self.skip_existing_results = False
@@ -83,6 +85,8 @@ class TestProfileSLADryRun:
                self.config = "components/backends/sglang/deploy/disagg.yaml"
                self.output_dir = "/tmp/test_profiling_results"
                self.namespace = "test-namespace"
+                self.model = ""
+                self.dgd_image = ""
                self.min_num_gpus_per_engine = 1
                self.max_num_gpus_per_engine = 8
                self.skip_existing_results = False
@@ -131,6 +135,8 @@ class TestProfileSLADryRun:
                self.config = "components/backends/trtllm/deploy/disagg.yaml"
                self.output_dir = "/tmp/test_profiling_results"
                self.namespace = "test-namespace"
+                self.model = ""
+                self.dgd_image = ""
                self.min_num_gpus_per_engine = 1
                self.max_num_gpus_per_engine = 8
                self.skip_existing_results = False
@@ -172,6 +178,8 @@ class TestProfileSLADryRun:
                self.config = "recipes/deepseek-r1/sglang/disagg-16gpu/deploy.yaml"
                self.output_dir = "/tmp/test_profiling_results"
                self.namespace = "test-namespace"
+                self.model = ""
+                self.dgd_image = ""
                self.min_num_gpus_per_engine = 8
                self.max_num_gpus_per_engine = 32
                self.skip_existing_results = False
@@ -233,6 +241,7 @@ class TestProfileSLADryRun:
                self.output_dir = "/tmp/test_profiling_results"
                self.namespace = "test-namespace"
                self.model = "deepseek-ai/DeepSeek-R1-Distill-Llama-8B"  # Specify model for autogen
+                self.dgd_image = ""
                self.min_num_gpus_per_engine = 0  # Will be auto-generated
                self.max_num_gpus_per_engine = 0  # Will be auto-generated
                self.skip_existing_results = False
@@ -294,6 +303,7 @@ class TestProfileSLADryRun:
                self.output_dir = "/tmp/test_profiling_results"
                self.namespace = "test-namespace"
                self.model = "deepseek-ai/DeepSeek-R1-Distill-Llama-8B"  # Specify model for autogen
+                self.dgd_image = ""
                self.min_num_gpus_per_engine = 0  # Will be auto-generated
                self.max_num_gpus_per_engine = 0  # Will be auto-generated
                self.skip_existing_results = False
@@ -355,6 +365,7 @@ class TestProfileSLADryRun:
                self.output_dir = "/tmp/test_profiling_results"
                self.namespace = "test-namespace"
                self.model = "deepseek-ai/DeepSeek-R1-Distill-Llama-8B"  # Specify model for autogen
+                self.dgd_image = ""
                self.min_num_gpus_per_engine = 0  # Will be auto-generated
                self.max_num_gpus_per_engine = 0  # Will be auto-generated
                self.skip_existing_results = False