docs: add docs for DGDR usage -- golden path (#6946)

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

docs: add docs for DGDR usage -- golden path (#6946)
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
5c7e66ec · hhzhang16 · GitHub · 38bf037b · 38bf037b · 38bf037b
Unverified Commit 5c7e66ec authored Mar 12, 2026 by hhzhang16 Committed by GitHub Mar 12, 2026
8 changed files
--- a/components/src/dynamo/profiler/deploy/profile_sla_aic_dgdr.yaml
+++ b/components/src/dynamo/profiler/deploy/profile_sla_aic_dgdr.yaml
-# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
-# SPDX-License-Identifier: Apache-2.0
-#
-# DynamoGraphDeploymentRequest for AI Configurator-based profiling
-apiVersion: nvidia.com/v1beta1
-kind: DynamoGraphDeploymentRequest
-metadata:
-  name: sla-aic
-spec:
-  model: Qwen/Qwen3-32B
-  backend: trtllm
-  image: "nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:my-tag"
--- a/components/src/dynamo/profiler/deploy/profile_sla_dgdr.yaml
+++ b/components/src/dynamo/profiler/deploy/profile_sla_dgdr.yaml
-# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
-# SPDX-License-Identifier: Apache-2.0
-#
-# DynamoGraphDeploymentRequest for online profiling (actual deployment testing)
-apiVersion: nvidia.com/v1beta1
-kind: DynamoGraphDeploymentRequest
-metadata:
-  name: sla-online
-spec:
-  model: Qwen/Qwen3-0.6B
-  backend: vllm
-  image: "nvcr.io/nvidia/ai-dynamo/vllm-runtime:my-tag" # tag must be at least 1.0.0
-  searchStrategy: thorough
--- a/components/src/dynamo/profiler/deploy/profile_sla_moe_dgdr.yaml
+++ b/components/src/dynamo/profiler/deploy/profile_sla_moe_dgdr.yaml
-# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
-# SPDX-License-Identifier: Apache-2.0
-#
-# DynamoGraphDeploymentRequest for MoE model profiling
-apiVersion: nvidia.com/v1beta1
-kind: DynamoGraphDeploymentRequest
-metadata:
-  name: sla-moe
-spec:
-  model: deepseek-ai/DeepSeek-R1
-  backend: sglang
-  image: "nvcr.io/nvidia/ai-dynamo/dynamo-frontend:my-tag"
-  searchStrategy: rapid
-
-  modelCache:
-    pvcName: "model-cache"                      # Name of PVC containing model weights
-    pvcModelPath: "deepseek-r1"                  # Subpath within PVC where model is stored
-
-  hardware:
-    # for h200, sweep over 8-16 GPUs per engine
-    numGpusPerNode: 8  # Override auto-discovered value if different
--- a/docs/components/profiler/profiler-examples.md
+++ b/docs/components/profiler/profiler-examples.md
@@ -8,60 +8,45 @@ Complete examples for profiling with DGDRs.

 ## DGDR Examples

-### Dense Model: AIPerf on Real Engines
+### Dense Model: Rapid

-Standard online profiling with real GPU measurements:
+Fast profiling (~30 seconds):

 ```yaml
 apiVersion: nvidia.com/v1beta1
 kind: DynamoGraphDeploymentRequest
 metadata:
-  name: vllm-dense-online
+  name: qwen-0-6b
 spec:
  model: "Qwen/Qwen3-0.6B"
-  backend: vllm
-  image: "nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.9.0"
-
-  workload:
-    isl: 3000
-    osl: 150
-
-  sla:
-    ttft: 200.0
-    itl: 20.0
-
-  autoApply: true
+  image: "nvcr.io/nvidia/ai-dynamo/dynamo-frontend:1.0.0"
 ```

-### Dense Model: AI Configurator Simulation
+### Dense Model: Thorough

-Fast offline profiling (~30 seconds, TensorRT-LLM only):
+Profiling with real GPU measurements:

 ```yaml
 apiVersion: nvidia.com/v1beta1
 kind: DynamoGraphDeploymentRequest
 metadata:
-  name: trtllm-aic-offline
+  name: vllm-dense-online
 spec:
-  model: "Qwen/Qwen3-32B"
-  backend: trtllm
-  image: "nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.9.0"
-
-  workload:
-    isl: 4000
-    osl: 500
-
-  sla:
-    ttft: 300.0
-    itl: 10.0
-
-  autoApply: true
+  model: "Qwen/Qwen3-0.6B"
+  backend: vllm
+  image: "nvcr.io/nvidia/ai-dynamo/dynamo-frontend:1.0.0"
+  searchStrategy: thorough
 ```

 ### MoE Model

 Multi-node MoE profiling with SGLang:

+> [!IMPORTANT]
+> The PVC referenced by `modelCache.pvcName` must already exist in the same namespace and contain
+> the model weights at the specified `pvcModelPath`. The DGDR controller does not create or
+> populate the PVC — it only mounts it into the profiling job and deployed workers.
+
 ```yaml
 apiVersion: nvidia.com/v1beta1
 kind: DynamoGraphDeploymentRequest
@@ -70,53 +55,138 @@ metadata:
 spec:
  model: "deepseek-ai/DeepSeek-R1"
  backend: sglang
-  image: "nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.9.0"
-
-  workload:
-    isl: 2048
-    osl: 512
-
-  sla:
-    ttft: 300.0
-    itl: 25.0
+  image: "nvcr.io/nvidia/ai-dynamo/dynamo-frontend:1.0.0"

  hardware:
    numGpusPerNode: 8

-  autoApply: true
+  modelCache:
+    pvcName: "model-cache"
+    pvcModelPath: "deepseek-r1"      # path within the PVC
 ```

-### Using Existing DGD Config (ConfigMap)
+### Private Model

-Reference a custom DGD configuration via ConfigMap:
+For gated or private HuggingFace models, pass your token via an environment variable injected
+into the profiling job. Create the secret first:

 ```bash
-# Create ConfigMap from your DGD config file
-kubectl create configmap deepseek-r1-config \
-  --from-file=/path/to/your/disagg.yaml \
-  --namespace $NAMESPACE \
-  --dry-run=client -o yaml | kubectl apply -f -
+kubectl create secret generic hf-token-secret \
+  --from-literal=HF_TOKEN="${HF_TOKEN}" \
+  -n ${NAMESPACE}
 ```

+Then reference it in your DGDR:
+
 ```yaml
 apiVersion: nvidia.com/v1beta1
 kind: DynamoGraphDeploymentRequest
 metadata:
-  name: deepseek-r1
+  name: llama-private
 spec:
-  model: deepseek-ai/DeepSeek-R1
-  backend: sglang
-  image: "nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.9.0"
+  model: "meta-llama/Llama-3.1-8B-Instruct"
+  image: "nvcr.io/nvidia/ai-dynamo/dynamo-frontend:1.0.0"
+
+  overrides:
+    profilingJob:
+      template:
+        spec:
+          containers: []    # required placeholder; leave empty to inherit defaults
+          initContainers:
+            - name: profiler
+              env:
+                - name: HF_TOKEN
+                  valueFrom:
+                    secretKeyRef:
+                      name: hf-token-secret
+                      key: HF_TOKEN
+```
+
+### Custom SLA Targets
+
+Control how the profiler optimizes your deployment by specifying latency targets and workload
+characteristics.
+
+**Explicit TTFT + ITL targets** (default mode):
+
+```yaml
+apiVersion: nvidia.com/v1beta1
+kind: DynamoGraphDeploymentRequest
+metadata:
+  name: low-latency-dense
+spec:
+  model: "Qwen/Qwen3-0.6B"
+  image: "nvcr.io/nvidia/ai-dynamo/dynamo-frontend:1.0.0"
+
+  sla:
+    ttft: 500      # Time To First Token target in milliseconds
+    itl: 20        # Inter-Token Latency target in milliseconds

  workload:
-    isl: 4000
-    osl: 500
+    isl: 2000      # expected input sequence length (tokens)
+    osl: 500       # expected output sequence length (tokens)
+```

+**End-to-end latency target** (alternative to ttft+itl):
+
+```yaml
+spec:
+  ...
+  sla:
+    e2eLatency: 10000    # total request latency budget in milliseconds
+```
+
+**Optimization objective without explicit targets** (maximize throughput or minimize latency):
+
+```yaml
+spec:
+  ...
  sla:
-    ttft: 300
-    itl: 10
+    optimizationType: throughput    # or: latency
+```
+
+### Overrides
+
+Use `overrides` to customize the profiling job pod spec — for example to add tolerations for
+GPU node taints or inject environment variables.
+
+**GPU node toleration** (common on GKE and shared clusters):

-  autoApply: true
+```yaml
+apiVersion: nvidia.com/v1beta1
+kind: DynamoGraphDeploymentRequest
+metadata:
+  name: dense-with-tolerations
+spec:
+  model: "Qwen/Qwen3-0.6B"
+  image: "nvcr.io/nvidia/ai-dynamo/dynamo-frontend:1.0.0"
+
+  overrides:
+    profilingJob:
+      template:
+        spec:
+          containers: []    # required placeholder; leave empty to inherit defaults
+          tolerations:
+            - key: nvidia.com/gpu
+              operator: Exists
+              effect: NoSchedule
+```
+
+**Override the generated DynamoGraphDeployment** (e.g., to use a custom worker image):
+
+```yaml
+spec:
+  ...
+  overrides:
+    dgd:
+      apiVersion: nvidia.com/v1alpha1
+      kind: DynamoGraphDeployment
+      spec:
+        services:
+          VllmWorker:
+            extraEnvs:
+              - name: CUSTOM_ENV
+                value: "my-value"
 ```

 ## SGLang Runtime Profiling

--- a/docs/index.yml
+++ b/docs/index.yml
@@ -45,6 +45,8 @@ navigation:
        contents:
          - page: Detailed Installation Guide
            path: kubernetes/installation-guide.md
+          - page: Deploying Your First Model
+            path: kubernetes/dgdr.md
          - page: Dynamo Operator
            path: kubernetes/dynamo-operator.md
          - page: Service Discovery

--- a/docs/kubernetes/README.md
+++ b/docs/kubernetes/README.md
@@ -82,26 +82,12 @@ Each backend has deployment examples and configuration options:

 ## 3. Deploy Your First Model

-```bash
-export NAMESPACE=dynamo-system
-kubectl create namespace ${NAMESPACE}
-
-# to pull model from HF
-export HF_TOKEN=<Token-Here>
-kubectl create secret generic hf-token-secret \
-  --from-literal=HF_TOKEN="$HF_TOKEN" \
-  -n ${NAMESPACE};
+Follow the **[Deploying Your First Model](dgdr.md)** guide for a complete end-to-end
+walkthrough using `DynamoGraphDeploymentRequest` (DGDR) — Dynamo's recommended path that
+handles profiling and configuration automatically.

-# Deploy any example (this uses vLLM with Qwen model using aggregated serving)
-kubectl apply -f examples/backends/vllm/deploy/agg.yaml -n ${NAMESPACE}
-
-# Check status
-kubectl get dynamoGraphDeployment -n ${NAMESPACE}
-
-# Test it
-kubectl port-forward svc/vllm-agg-frontend 8000:8000 -n ${NAMESPACE}
-curl http://localhost:8000/v1/models
-```
+The tutorial deploys `Qwen/Qwen3-0.6B` with vLLM and walks you through every step: creating
+the DGDR, watching the profiling lifecycle, and sending your first inference request.

 For SLA-based autoscaling, see [SLA Planner Guide](../components/planner/planner-guide.md).


--- a/docs/kubernetes/dgdr.md
+++ b/docs/kubernetes/dgdr.md
+---
+# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+title: Deploying Your First Model
+---
+
+# Deploying Your First Model
+
+End-to-end tutorial for deploying `Qwen/Qwen3-0.6B` on Kubernetes using Dynamo's recommended
+`DynamoGraphDeploymentRequest` (DGDR) workflow — from zero to your first inference response.
+
+> [!NOTE]
+> This guide assumes you have already completed the
+> [platform installation](installation-guide.md) and that the Dynamo operator and CRDs are
+> running in your cluster.
+
+## What is a DynamoGraphDeploymentRequest?
+
+A `DynamoGraphDeploymentRequest` (DGDR) is Dynamo's **deploy-by-intent** API. You describe what
+you want to run and your performance targets; Dynamo's profiler determines the optimal
+configuration automatically, then creates the live deployment for you.
+
+| | DGDR (this guide) | DGD (manual) |
+|---|---|---|
+| **You provide** | Model + optional SLA targets | Full deployment spec |
+| **Profiling** | Automated | You bring your own config |
+| **Best for** | Getting started, SLA-driven deployments | Fine-grained control |
+
+For a deeper comparison, see [Understanding Dynamo's Custom Resources](README.md#understanding-dynamos-custom-resources).
+
+## Prerequisites
+
+Before starting, confirm:
+
+- Platform installed: `kubectl get pods -n ${NAMESPACE}` shows operator pods `Running`
+- CRDs present: `kubectl get crd | grep dynamo` shows `dynamographdeploymentrequests.nvidia.com`
+- `kubectl` and `helm` available in your shell
+
+Set these variables once — they are referenced throughout the guide:
+
+```bash
+export NAMESPACE=dynamo-system      # namespace where the platform is installed
+export RELEASE_VERSION=1.x.x       # match the installed platform version (e.g. 1.0.0)
+export HF_TOKEN=<your-hf-token>    # HuggingFace token
+```
+
+> [!TIP]
+> `Qwen/Qwen3-0.6B` is a public model. A HuggingFace token is not strictly required to download
+> it, but is recommended to avoid rate limiting.
+
+## Step 1: Configure Namespace and Secrets
+
+```bash
+# Create the namespace (idempotent — safe to run even if it already exists)
+kubectl create namespace ${NAMESPACE} --dry-run=client -o yaml | kubectl apply -f -
+
+# Create the HuggingFace token secret for model download
+kubectl create secret generic hf-token-secret \
+  --from-literal=HF_TOKEN="${HF_TOKEN}" \
+  -n ${NAMESPACE}
+```
+
+Verify the secret was created:
+
+```bash
+kubectl get secret hf-token-secret -n ${NAMESPACE}
+```
+
+## Step 2: Create the DynamoGraphDeploymentRequest
+
+Save the following as `qwen3-first-model.yaml`:
+
+```yaml
+apiVersion: nvidia.com/v1beta1
+kind: DynamoGraphDeploymentRequest
+metadata:
+  name: qwen3-first-model
+spec:
+  # Model to profile and deploy
+  model: Qwen/Qwen3-0.6B
+
+  # Container image for the profiling job — must match your installed platform version.
+  # This is the same dynamo-frontend image used by the deployed inference service.
+  image: "nvcr.io/nvidia/ai-dynamo/dynamo-frontend:${RELEASE_VERSION}"
+```
+
+Apply it (uses `envsubst` to substitute the `RELEASE_VERSION` shell variable into the YAML):
+
+```bash
+envsubst < qwen3-first-model.yaml | kubectl apply -f - -n ${NAMESPACE}
+```
+
+### Field reference
+
+| Field | Required | Default | Purpose |
+|---|---|---|---|
+| `model` | Yes | — | HuggingFace model ID (e.g. `Qwen/Qwen3-0.6B`) |
+| `image` | No | — | Container image for the profiling job (`dynamo-frontend`) |
+| `backend` | No | `auto` | Inference engine (`auto`, `vllm`, `sglang`, `trtllm`) |
+| `searchStrategy` | No | `rapid` | Profiling depth — `rapid` (~30s, AIC simulation) or `thorough` (2–4h, real GPUs) |
+| `autoApply` | No | `true` | Automatically create and start the deployment after profiling |
+| `sla` | No | — | Target latency (TTFT, ITL in ms) for profiler optimization |
+| `workload` | No | — | Expected traffic shape (ISL, OSL, request rate) |
+| `hardware` | No | auto-detected | GPU SKU and count override; required when GPU discovery is disabled. When not set, the auto-discovered GPU count is capped at 32 — set `hardware.totalGpus` explicitly to use more. |
+
+For the full spec reference, see the [DGDR API Reference](api-reference.md) and
+[Profiler Guide](../components/profiler/profiler-guide.md).
+
+> [!IMPORTANT]
+> If you are using a **namespace-scoped operator** with GPU discovery disabled, you must also
+> provide explicit hardware info or the DGDR will be rejected at admission:
+>
+> ```yaml
+> spec:
+>   ...
+>   hardware:
+>     numGpusPerNode: 1
+>     gpuSku: "H100-SXM5-80GB"
+>     vramMb: 81920
+> ```
+>
+> See the [installation guide](installation-guide.md#gpu-discovery-for-dynamographdeploymentrequests-with-namespace-scoped-operators)
+> for details.
+
+## Step 3: Monitor Profiling Progress
+
+Profiling is the automated step where Dynamo sweeps across candidate configurations (parallelism, batching, scheduling strategies) to find the one that best meets your SLA and hardware — so you don't have to tune it manually.
+
+Watch the DGDR status in real time:
+
+```bash
+kubectl get dynamographdeploymentrequest qwen3-first-model -n ${NAMESPACE} -w
+```
+
+The `PHASE` column progresses through:
+
+| Phase | What is happening |
+|---|---|
+| `Pending` (condition: `DiscoveringHardware`) | Spec validated; operator is discovering GPU hardware and preparing the profiling job |
+| `Profiling` | Profiling job is running (AIC simulation or real-GPU sweep) |
+| `Ready` | Profiling complete; optimal config stored in `.status`. Terminal state when `autoApply: false` |
+| `Deploying` | Creating the `DynamoGraphDeployment` (only when `autoApply: true`) |
+| `Deployed` | DGD is running and healthy |
+| `Failed` | Unrecoverable error — check events for details |
+
+> [!TIP]
+> `Deployed` is the success terminal state when `autoApply: true` (the default).
+> If you set `autoApply: false`, the phase stops at `Ready` — profiling is complete and the
+> generated DGD spec is stored in `.status`, but no deployment is created automatically.
+> To inspect and deploy it manually:
+>
+> ```bash
+> # View the generated DGD spec
+> kubectl get dynamographdeploymentrequest qwen3-first-model -n ${NAMESPACE} \
+>   -o jsonpath='{.status.profilingResults.selectedConfig}' | python3 -m json.tool
+>
+> # Save it and apply
+> kubectl get dynamographdeploymentrequest qwen3-first-model -n ${NAMESPACE} \
+>   -o jsonpath='{.status.profilingResults.selectedConfig}' > generated-dgd.yaml
+> kubectl apply -f generated-dgd.yaml -n ${NAMESPACE}
+> ```
+
+For a full status summary and events:
+
+```bash
+kubectl describe dynamographdeploymentrequest qwen3-first-model -n ${NAMESPACE}
+```
+
+To follow the profiling job logs:
+
+```bash
+# Find the profiling pod
+kubectl get pods -n ${NAMESPACE} -l nvidia.com/dgdr-name=qwen3-first-model
+
+# Stream its logs
+kubectl logs -f <profiling-pod-name> -n ${NAMESPACE}
+```
+
+> [!TIP]
+> With `searchStrategy: rapid`, profiling typically completes in under 15 minutes on a single GPU.
+
+## Step 4: Verify the Deployment
+
+Once the DGDR reaches `Deployed`, the `DynamoGraphDeployment` has been created automatically.
+Check that everything is running:
+
+```bash
+# See the auto-created DGD
+kubectl get dynamographdeployment -n ${NAMESPACE}
+
+# Confirm all pods are Running
+kubectl get pods -n ${NAMESPACE}
+```
+
+Wait until pods are ready:
+
+```bash
+kubectl wait --for=condition=ready pod \
+  -l nvidia.com/dynamo-deployment=qwen3-first-model \
+  -n ${NAMESPACE} \
+  --timeout=600s
+```
+
+Find the frontend service name:
+
+```bash
+kubectl get svc -n ${NAMESPACE} | grep frontend
+```
+
+## Step 5: Send Your First Request
+
+Port-forward to the frontend and send an inference request:
+
+```bash
+# Start port-forward (replace <frontend-service-name> with the name from Step 4)
+kubectl port-forward svc/<frontend-service-name> 8000:8000 -n ${NAMESPACE} &
+
+# Confirm the model is available
+curl http://localhost:8000/v1/models
+
+# Send a chat completion request
+curl http://localhost:8000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "Qwen/Qwen3-0.6B",
+    "messages": [{"role": "user", "content": "What is NVIDIA Dynamo?"}],
+    "max_tokens": 200
+  }'
+```
+
+A successful response looks like:
+
+```json
+{
+  "id": "chatcmpl-...",
+  "object": "chat.completion",
+  "model": "Qwen/Qwen3-0.6B",
+  "choices": [{
+    "message": {
+      "role": "assistant",
+      "content": "NVIDIA Dynamo is a high-performance inference framework..."
+    }
+  }]
+}
+```
+
+Your first model is now live.
+
+## Cleanup
+
+To remove the deployment and profiling artifacts:
+
+```bash
+kubectl delete dynamographdeploymentrequest qwen3-first-model -n ${NAMESPACE}
+```
+
+> [!NOTE]
+> Deleting a DGDR does **not** delete the `DynamoGraphDeployment` it created. The DGD persists
+> independently so it can continue serving traffic.
+
+## Troubleshooting
+
+**DGDR stuck in `Pending`**
+
+```bash
+kubectl describe dynamographdeploymentrequest qwen3-first-model -n ${NAMESPACE}
+# Check the Events section at the bottom
+```
+
+Common causes: no available GPU nodes, image pull failure (check image tag; NGC credentials are
+optional but may be needed if you hit rate limits pulling from public NGC), missing `hardware`
+config for a namespace-scoped operator.
+
+> [!TIP]
+> **GPU node taints** are a frequent cause of pods staying `Pending`. Many clusters (including
+> GKE by default and most shared/HPC environments) taint GPU nodes with
+> `nvidia.com/gpu:NoSchedule` so that only GPU-aware workloads land on them. If the profiling
+> job pod is stuck with a `0/N nodes are available: … node(s) had untolerated taint` event,
+> add a toleration to your DGDR via `overrides.profilingJob`. The operator and profiler
+> automatically forward it to every candidate and deployed pod:
+>
+> ```yaml
+> spec:
+>   ...
+>   overrides:
+>     profilingJob:
+>       template:
+>         spec:
+>           containers: []    # required placeholder; leave empty to inherit defaults
+>           tolerations:
+>             - key: nvidia.com/gpu
+>               operator: Exists
+>               effect: NoSchedule
+> ```
+
+**Profiling job fails**
+
+```bash
+kubectl get pods -n ${NAMESPACE} -l nvidia.com/dgdr-name=qwen3-first-model
+kubectl logs <profiling-pod-name> -n ${NAMESPACE}
+# If the pod has already exited:
+kubectl logs <profiling-pod-name> -n ${NAMESPACE} --previous
+```
+
+**Pods not starting after profiling**
+
+```bash
+kubectl describe pod <pod-name> -n ${NAMESPACE}
+# Look for ImagePullBackOff, OOMKilled, or Insufficient resources
+```
+
+**Model not responding after port-forward**
+
+```bash
+# Check frontend is ready
+kubectl get pods -n ${NAMESPACE} | grep frontend
+
+# Check frontend logs
+kubectl logs <frontend-pod-name> -n ${NAMESPACE}
+```
+
+## Next Steps
+
+- **Tune for production SLAs**: Add `sla` (TTFT, ITL) and `workload` (ISL, OSL) targets to
+  your DGDR so the profiler optimizes for your specific traffic. See the
+  [Profiler Guide](../components/profiler/profiler-guide.md) for the full configuration
+  reference and picking modes. For ready-to-use YAML — including SLA targets, private models,
+  MoE, and overrides — see [DGDR Examples](../components/profiler/profiler-examples.md).
+- **Scale the deployment**: [Autoscaling guide](autoscaling.md)
+- **SLA-aware autoscaling**: Enable the Planner via `features.planner` in the DGDR —
+  see the [Planner Guide](../components/planner/planner-guide.md).
+- **Inspect the generated config**: Set `autoApply: false` and extract the DGD spec with
+  `kubectl get dgdr <name> -o jsonpath='{.status.profilingResults.selectedConfig}'`
+  before deploying.
+- **Direct control**: [Creating Deployments](deployment/create-deployment.md) — write your own
+  `DynamoGraphDeployment` spec for full customization.
+- **Monitor performance**: [Observability](observability/metrics.md)
+- **Try specific backends**: [vLLM](../backends/vllm/README.md),
+  [SGLang](../backends/sglang/README.md), [TensorRT-LLM](../backends/trtllm/README.md)
--- a/fern/components/profiler/profiler_guide.md
+++ b/fern/components/profiler/profiler_guide.md
@@ -578,6 +578,6 @@ kubectl create secret docker-registry nvcr-imagepullsecret \

 ## See Also

- [DGDR Examples](../../../components/src/dynamo/profiler/deploy/) - Complete DGDR YAML examples
+- [DGDR Examples](../../../docs/components/profiler/profiler-examples.md) - Complete DGDR YAML examples
 - [DGDR API Reference](/docs/kubernetes/api-reference.md) - DGDR specification
 - [Profiler Arguments Reference](https://github.com/ai-dynamo/dynamo/blob/main/components/src/dynamo/profiler/utils/dgdr_v1beta1_types.py) - Full Configuration Reference