docs: restructure docs directory and move fern config to fern/ (#6700)

Signed-off-by: Neal Vaidya <nealv@nvidia.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

docs: restructure docs directory and move fern config to fern/ (#6700)
Signed-off-by: Neal Vaidya <nealv@nvidia.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
ece08dc9 · Neal Vaidya · GitHub · 1412e44b · ece08dc9 · ece08dc9
Unverified Commit ece08dc9 authored Mar 01, 2026 by Neal Vaidya Committed by GitHub Mar 01, 2026
20 changed files
--- a/docs/pages/features/speculative-decoding/speculative-decoding-vllm.md
+++ b/docs/pages/features/speculative-decoding/speculative-decoding-vllm.md
--- a/docs/pages/getting-started/examples.md
+++ b/docs/pages/getting-started/examples.md
--- a/docs/pages/getting-started/quickstart.md
+++ b/docs/pages/getting-started/quickstart.md
--- a/docs/versions/dev.yml
+++ b/docs/versions/dev.yml
@@ -21,222 +21,222 @@ navigation:
  - section: Getting Started
    contents:
      - page: Quickstart
-        path: ../pages/getting-started/quickstart.md
+        path: getting-started/quickstart.md
      - page: Support Matrix
-        path: ../pages/reference/support-matrix.md
+        path: reference/support-matrix.md
      - page: Feature Matrix
-        path: ../pages/reference/feature-matrix.md
+        path: reference/feature-matrix.md
      - page: Release Artifacts
-        path: ../pages/reference/release-artifacts.md
+        path: reference/release-artifacts.md
      - page: Examples
-        path: ../pages/getting-started/examples.md
+        path: getting-started/examples.md

  # ==================== Kubernetes Deployment ====================
  - section: Kubernetes Deployment
    contents:
      - section: Deployment Guide
-        path: ../pages/kubernetes/README.md
+        path: kubernetes/README.md
        contents:
          - page: Detailed Installation Guide
-            path: ../pages/kubernetes/installation-guide.md
+            path: kubernetes/installation-guide.md
          - page: Dynamo Operator
-            path: ../pages/kubernetes/dynamo-operator.md
+            path: kubernetes/dynamo-operator.md
          - page: Service Discovery
-            path: ../pages/kubernetes/service-discovery.md
+            path: kubernetes/service-discovery.md
          - page: Webhooks
-            path: ../pages/kubernetes/webhooks.md
+            path: kubernetes/webhooks.md
          - page: Minikube Setup
-            path: ../pages/kubernetes/deployment/minikube.md
+            path: kubernetes/deployment/minikube.md
          - page: Managing Models with DynamoModel
-            path: ../pages/kubernetes/deployment/dynamomodel-guide.md
+            path: kubernetes/deployment/dynamomodel-guide.md
          - page: Autoscaling
-            path: ../pages/kubernetes/autoscaling.md
+            path: kubernetes/autoscaling.md
          - page: Inference Gateway (GAIE)
-            path: ../pages/kubernetes/inference-gateway.md
+            path: kubernetes/inference-gateway.md
          - section: Checkpointing
-            path: ../pages/kubernetes/chrek/README.md
+            path: kubernetes/chrek/README.md
            contents:
              - page: Integration with Dynamo
-                path: ../pages/kubernetes/chrek/dynamo.md
+                path: kubernetes/chrek/dynamo.md
      - section: Observability (K8s)
        contents:
          - page: Metrics
-            path: ../pages/kubernetes/observability/metrics.md
+            path: kubernetes/observability/metrics.md
          - page: Logging
-            path: ../pages/kubernetes/observability/logging.md
+            path: kubernetes/observability/logging.md
          - page: Operator Metrics
-            path: ../pages/kubernetes/observability/operator-metrics.md
+            path: kubernetes/observability/operator-metrics.md
      - section: Multinode
        contents:
          - page: Multinode Deployments
-            path: ../pages/kubernetes/deployment/multinode-deployment.md
+            path: kubernetes/deployment/multinode-deployment.md
          - page: Grove
-            path: ../pages/kubernetes/grove.md
+            path: kubernetes/grove.md

  # ==================== User Guides ====================
  - section: User Guides
    contents:
      - page: KV Cache Aware Routing
-        path: ../pages/components/router/router-guide.md
+        path: components/router/router-guide.md
      - page: Disaggregated Serving
-        path: ../pages/features/disaggregated-serving/README.md
+        path: features/disaggregated-serving/README.md
      - page: KV Cache Offloading
-        path: ../pages/components/kvbm/kvbm-guide.md
+        path: components/kvbm/kvbm-guide.md
      - page: Dynamo Benchmarking
-        path: ../pages/benchmarks/benchmarking.md
+        path: benchmarks/benchmarking.md
      - section: Multimodality Support
-        path: ../pages/features/multimodal/README.md
+        path: features/multimodal/README.md
        contents:
          - page: vLLM Multimodal
-            path: ../pages/features/multimodal/multimodal-vllm.md
+            path: features/multimodal/multimodal-vllm.md
          - page: TensorRT-LLM Multimodal
-            path: ../pages/features/multimodal/multimodal-trtllm.md
+            path: features/multimodal/multimodal-trtllm.md
          - page: SGLang Multimodal
-            path: ../pages/features/multimodal/multimodal-sglang.md
+            path: features/multimodal/multimodal-sglang.md
      - page: Tool Calling
-        path: ../pages/agents/tool-calling.md
+        path: agents/tool-calling.md
      - page: SGLang for Agentic Workloads
-        path: ../pages/backends/sglang/agents.md
+        path: backends/sglang/agents.md
      - page: LoRA Adapters
-        path: ../pages/features/lora/README.md
+        path: features/lora/README.md
      - section: Observability (Local)
-        path: ../pages/observability/README.md
+        path: observability/README.md
        contents:
          - page: Prometheus + Grafana Setup
-            path: ../pages/observability/prometheus-grafana.md
+            path: observability/prometheus-grafana.md
          - page: Metrics
-            path: ../pages/observability/metrics.md
+            path: observability/metrics.md
          - page: Metrics Developer Guide
-            path: ../pages/observability/metrics-developer-guide.md
+            path: observability/metrics-developer-guide.md
          - page: Health Checks
-            path: ../pages/observability/health-checks.md
+            path: observability/health-checks.md
          - page: Tracing
-            path: ../pages/observability/tracing.md
+            path: observability/tracing.md
          - page: Logging
-            path: ../pages/observability/logging.md
+            path: observability/logging.md
      - section: Fault Tolerance
-        path: ../pages/fault-tolerance/README.md
+        path: fault-tolerance/README.md
        contents:
          - page: Request Migration
-            path: ../pages/fault-tolerance/request-migration.md
+            path: fault-tolerance/request-migration.md
          - page: Request Cancellation
-            path: ../pages/fault-tolerance/request-cancellation.md
+            path: fault-tolerance/request-cancellation.md
          - page: Graceful Shutdown
-            path: ../pages/fault-tolerance/graceful-shutdown.md
+            path: fault-tolerance/graceful-shutdown.md
          - page: Request Rejection
-            path: ../pages/fault-tolerance/request-rejection.md
+            path: fault-tolerance/request-rejection.md
          - page: Testing
-            path: ../pages/fault-tolerance/testing.md
+            path: fault-tolerance/testing.md
      - page: Writing Python Workers in Dynamo
-        path: ../pages/development/backend-guide.md
+        path: development/backend-guide.md

  # ==================== Backends ====================
  - section: Backends
    contents:
      - section: SGLang
-        path: ../pages/backends/sglang/README.md
+        path: backends/sglang/README.md
        contents:
          - page: Reference Guide
-            path: ../pages/backends/sglang/sglang-reference-guide.md
+            path: backends/sglang/sglang-reference-guide.md
          - page: Examples
-            path: ../pages/backends/sglang/sglang-examples.md
+            path: backends/sglang/sglang-examples.md
          - page: Disaggregation
-            path: ../pages/backends/sglang/sglang-disaggregation.md
+            path: backends/sglang/sglang-disaggregation.md
          - page: Diffusion
-            path: ../pages/backends/sglang/sglang-diffusion.md
+            path: backends/sglang/sglang-diffusion.md
          - page: Observability
-            path: ../pages/backends/sglang/sglang-observability.md
+            path: backends/sglang/sglang-observability.md
      - page: TensorRT-LLM
-        path: ../pages/backends/trtllm/README.md
+        path: backends/trtllm/README.md
      - page: vLLM
-        path: ../pages/backends/vllm/README.md
+        path: backends/vllm/README.md

  # ==================== Components ====================
  - section: Components
    contents:
      - section: Frontend
-        path: ../pages/components/frontend/README.md
+        path: components/frontend/README.md
        contents:
          - page: Frontend Guide
-            path: ../pages/components/frontend/frontend-guide.md
+            path: components/frontend/frontend-guide.md
      - section: Router
-        path: ../pages/components/router/README.md
+        path: components/router/README.md
        contents:
          - page: Router Guide
-            path: ../pages/components/router/router-guide.md
+            path: components/router/router-guide.md
          - page: Router Examples
-            path: ../pages/components/router/router-examples.md
+            path: components/router/router-examples.md
      - section: Planner
-        path: ../pages/components/planner/README.md
+        path: components/planner/README.md
        contents:
          - page: Planner Guide
-            path: ../pages/components/planner/planner-guide.md
+            path: components/planner/planner-guide.md
          - page: Planner Examples
-            path: ../pages/components/planner/planner-examples.md
+            path: components/planner/planner-examples.md
      - section: Profiler
-        path: ../pages/components/profiler/README.md
+        path: components/profiler/README.md
        contents:
          - page: Profiler Guide
-            path: ../pages/components/profiler/profiler-guide.md
+            path: components/profiler/profiler-guide.md
          - page: Profiler Examples
-            path: ../pages/components/profiler/profiler-examples.md
+            path: components/profiler/profiler-examples.md
      - section: KVBM
-        path: ../pages/components/kvbm/README.md
+        path: components/kvbm/README.md
        contents:
          - page: KVBM Guide
-            path: ../pages/components/kvbm/kvbm-guide.md
+            path: components/kvbm/kvbm-guide.md

  # ==================== Integrations ====================
  - section: Integrations
    contents:
      - page: LMCache
-        path: ../pages/integrations/lmcache-integration.md
+        path: integrations/lmcache-integration.md
      - page: SGLang HiCache
-        path: ../pages/integrations/sglang-hicache.md
+        path: integrations/sglang-hicache.md
      - page: FlexKV
-        path: ../pages/integrations/flexkv-integration.md
+        path: integrations/flexkv-integration.md
      - page: KV Events for Custom Engines
-        path: ../pages/integrations/kv-events-custom-engines.md
+        path: integrations/kv-events-custom-engines.md

  # ==================== Design Docs ====================
  - section: Design Docs
    contents:
      - page: Overall Architecture
-        path: ../pages/design-docs/architecture.md
+        path: design-docs/architecture.md
      - page: Architecture Flow
-        path: ../pages/design-docs/dynamo-flow.md
+        path: design-docs/dynamo-flow.md
      - page: Disaggregated Serving
-        path: ../pages/design-docs/disagg-serving.md
+        path: design-docs/disagg-serving.md
      - page: Distributed Runtime
-        path: ../pages/design-docs/distributed-runtime.md
+        path: design-docs/distributed-runtime.md
      - page: Discovery Plane
-        path: ../pages/design-docs/discovery-plane.md
+        path: design-docs/discovery-plane.md
      - page: Request Plane
-        path: ../pages/design-docs/request-plane.md
+        path: design-docs/request-plane.md
      - page: Event Plane
-        path: ../pages/design-docs/event-plane.md
+        path: design-docs/event-plane.md
      - page: Router Design
-        path: ../pages/design-docs/router-design.md
+        path: design-docs/router-design.md
      - page: KVBM Design
-        path: ../pages/design-docs/kvbm-design.md
+        path: design-docs/kvbm-design.md
      - page: Planner Design
-        path: ../pages/design-docs/planner-design.md
+        path: design-docs/planner-design.md

  # ==================== Blog ====================
  - section: Blog
    hidden: true
-    path: ../blogs/index.mdx
+    path: blogs/index.mdx
    slug: blog
    contents:
      - page: "Flash Indexer: Inter-Galactic KV Routing"
-        path: ../blogs/flash-indexer/flash-indexer.md
+        path: blogs/flash-indexer/flash-indexer.md
        slug: flash-indexer

  # ==================== Documentation ====================
  - section: Documentation
    contents:
      - page: Dynamo Docs Guide
-        path: ../README.md
+        path: README.md

  # ==================== Hidden Pages ====================
  # Pages accessible via direct URL but not shown in main navigation.
@@ -247,111 +247,111 @@ navigation:
    contents:
      # -- Development --
      - page: Runtime Guide
-        path: ../pages/development/runtime-guide.md
+        path: development/runtime-guide.md
      - page: Jail Stream
-        path: ../pages/development/jail-stream.md
+        path: development/jail-stream.md
      # -- API Reference --
      - section: NIXL Connect API
-        path: ../pages/api/nixl-connect/README.md
+        path: api/nixl-connect/README.md
        contents:
          - page: Connector
-            path: ../pages/api/nixl-connect/connector.md
+            path: api/nixl-connect/connector.md
          - page: Device
-            path: ../pages/api/nixl-connect/device.md
+            path: api/nixl-connect/device.md
          - page: Device Kind
-            path: ../pages/api/nixl-connect/device-kind.md
+            path: api/nixl-connect/device-kind.md
          - page: Descriptor
-            path: ../pages/api/nixl-connect/descriptor.md
+            path: api/nixl-connect/descriptor.md
          - page: Read Operation
-            path: ../pages/api/nixl-connect/read-operation.md
+            path: api/nixl-connect/read-operation.md
          - page: Write Operation
-            path: ../pages/api/nixl-connect/write-operation.md
+            path: api/nixl-connect/write-operation.md
          - page: Readable Operation
-            path: ../pages/api/nixl-connect/readable-operation.md
+            path: api/nixl-connect/readable-operation.md
          - page: Writable Operation
-            path: ../pages/api/nixl-connect/writable-operation.md
+            path: api/nixl-connect/writable-operation.md
          - page: Operation Status
-            path: ../pages/api/nixl-connect/operation-status.md
+            path: api/nixl-connect/operation-status.md
          - page: RDMA Metadata
-            path: ../pages/api/nixl-connect/rdma-metadata.md
+            path: api/nixl-connect/rdma-metadata.md
      # -- Kubernetes (hidden sub-pages) --
      - page: API Reference (K8s)
-        path: ../pages/kubernetes/api-reference.md
+        path: kubernetes/api-reference.md
      - page: Creating Deployments
-        path: ../pages/kubernetes/deployment/create-deployment.md
+        path: kubernetes/deployment/create-deployment.md
      - page: FluxCD
-        path: ../pages/kubernetes/fluxcd.md
+        path: kubernetes/fluxcd.md
      - page: Model Caching with Fluid
-        path: ../pages/kubernetes/model-caching-with-fluid.md
+        path: kubernetes/model-caching-with-fluid.md
      # -- Reference --
      - page: Glossary
-        path: ../pages/reference/glossary.md
+        path: reference/glossary.md
      - page: Tuning Disaggregated Performance
-        path: ../pages/performance/tuning.md
+        path: performance/tuning.md
      # -- Backend detail pages --
      - section: vLLM Details
        contents:
          - page: DeepSeek-R1
-            path: ../pages/backends/vllm/deepseek-r1.md
+            path: backends/vllm/deepseek-r1.md
          - page: GPT-OSS
-            path: ../pages/backends/vllm/gpt-oss.md
+            path: backends/vllm/gpt-oss.md
          - page: Multi-Node
-            path: ../pages/backends/vllm/multi-node.md
+            path: backends/vllm/multi-node.md
          - page: Prometheus
-            path: ../pages/backends/vllm/prometheus.md
+            path: backends/vllm/prometheus.md
          - page: Prompt Embeddings
-            path: ../pages/backends/vllm/prompt-embeddings.md
+            path: backends/vllm/prompt-embeddings.md
          - page: vLLM-Omni
-            path: ../pages/backends/vllm/vllm-omni.md
+            path: backends/vllm/vllm-omni.md
      - section: TensorRT-LLM Details
        contents:
          - page: Multinode Examples
-            path: ../pages/backends/trtllm/multinode/multinode-examples.md
+            path: backends/trtllm/multinode/multinode-examples.md
          - page: Llama4 + Eagle
-            path: ../pages/backends/trtllm/llama4-plus-eagle.md
+            path: backends/trtllm/llama4-plus-eagle.md
          - page: KV Cache Transfer
-            path: ../pages/backends/trtllm/kv-cache-transfer.md
+            path: backends/trtllm/kv-cache-transfer.md
          - page: Gemma3 Sliding Window
-            path: ../pages/backends/trtllm/gemma3-sliding-window-attention.md
+            path: backends/trtllm/gemma3-sliding-window-attention.md
          - page: GPT-OSS
-            path: ../pages/backends/trtllm/gpt-oss.md
+            path: backends/trtllm/gpt-oss.md
          - page: Prometheus
-            path: ../pages/backends/trtllm/prometheus.md
+            path: backends/trtllm/prometheus.md
      # -- Features (hidden sub-pages) --
      - section: Speculative Decoding
-        path: ../pages/features/speculative-decoding/README.md
+        path: features/speculative-decoding/README.md
        contents:
          - page: Speculative Decoding with vLLM
-            path: ../pages/features/speculative-decoding/speculative-decoding-vllm.md
+            path: features/speculative-decoding/speculative-decoding-vllm.md
      # -- Benchmarks --
      - page: KV Router A/B Testing
-        path: ../pages/benchmarks/kv-router-ab-testing.md
+        path: benchmarks/kv-router-ab-testing.md
      # -- Mocker --
      - page: Mocker
-        path: ../pages/mocker/mocker.md
+        path: mocker/mocker.md
      # -- Templates --
      - section: Templates
-        path: ../pages/templates/README.md
+        path: templates/README.md
        contents:
          - page: Backend Guide
-            path: ../pages/templates/backend-guide.md
+            path: templates/backend-guide.md
          - page: Backend README
-            path: ../pages/templates/backend-readme.md
+            path: templates/backend-readme.md
          - page: Component Design
-            path: ../pages/templates/component-design.md
+            path: templates/component-design.md
          - page: Component Examples
-            path: ../pages/templates/component-examples.md
+            path: templates/component-examples.md
          - page: Component Guide
-            path: ../pages/templates/component-guide.md
+            path: templates/component-guide.md
          - page: Component README
-            path: ../pages/templates/component-readme.md
+            path: templates/component-readme.md
          - page: Feature Backend
-            path: ../pages/templates/feature-backend.md
+            path: templates/feature-backend.md
          - page: Feature README
-            path: ../pages/templates/feature-readme.md
+            path: templates/feature-readme.md
          - page: In-Code README
-            path: ../pages/templates/incode-readme.md
+            path: templates/incode-readme.md
          - page: Infrastructure README
-            path: ../pages/templates/infrastructure-readme.md
+            path: templates/infrastructure-readme.md
          - page: Integration README
-            path: ../pages/templates/integration-readme.md
+            path: templates/integration-readme.md
--- a/docs/pages/integrations/flexkv-integration.md
+++ b/docs/pages/integrations/flexkv-integration.md
--- a/docs/pages/integrations/kv-events-custom-engines.md
+++ b/docs/pages/integrations/kv-events-custom-engines.md
--- a/docs/pages/integrations/lmcache-integration.md
+++ b/docs/pages/integrations/lmcache-integration.md
--- a/docs/pages/integrations/sglang-hicache.md
+++ b/docs/pages/integrations/sglang-hicache.md
--- a/docs/pages/kubernetes/README.md
+++ b/docs/pages/kubernetes/README.md
--- a/docs/pages/kubernetes/api-reference.md
+++ b/docs/pages/kubernetes/api-reference.md
@@ -38,7 +38,7 @@ Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API


 Deprecated: This field is deprecated and ignored. Use DynamoGraphDeploymentScalingAdapter
-with HPA, KEDA, or Planner for autoscaling instead. See docs/pages/kubernetes/autoscaling.md
+with HPA, KEDA, or Planner for autoscaling instead. See docs/kubernetes/autoscaling.md
 for migration guidance. This field will be removed in a future API version.


@@ -381,7 +381,7 @@ _Appears in:_
 | `dynamoNamespace` _string_ | DynamoNamespace is deprecated and will be removed in a future version.<br />The DGD Kubernetes namespace and DynamoGraphDeployment name are used to construct the Dynamo namespace for each component |  | Optional: \{\} <br /> |
 | `globalDynamoNamespace` _boolean_ | GlobalDynamoNamespace indicates that the Component will be placed in the global Dynamo namespace |  |  |
 | `resources` _[Resources](#resources)_ | Resources requested and limits for this component, including CPU, memory,<br />GPUs/devices, and any runtime-specific resources. |  |  |
-| `autoscaling` _[Autoscaling](#autoscaling)_ | Deprecated: This field is deprecated and ignored. Use DynamoGraphDeploymentScalingAdapter<br />with HPA, KEDA, or Planner for autoscaling instead. See docs/pages/kubernetes/autoscaling.md<br />for migration guidance. This field will be removed in a future API version. |  |  |
+| `autoscaling` _[Autoscaling](#autoscaling)_ | Deprecated: This field is deprecated and ignored. Use DynamoGraphDeploymentScalingAdapter<br />with HPA, KEDA, or Planner for autoscaling instead. See docs/kubernetes/autoscaling.md<br />for migration guidance. This field will be removed in a future API version. |  |  |
 | `envs` _[EnvVar](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#envvar-v1-core) array_ | Envs defines additional environment variables to inject into the component containers. |  |  |
 | `envFromSecret` _string_ | EnvFromSecret references a Secret whose key/value pairs will be exposed as<br />environment variables in the component containers. |  |  |
 | `volumeMounts` _[VolumeMount](#volumemount) array_ | VolumeMounts references PVCs defined at the top level for volumes to be mounted by the component. |  |  |
@@ -421,7 +421,7 @@ _Appears in:_
 | `dynamoNamespace` _string_ | DynamoNamespace is deprecated and will be removed in a future version.<br />The DGD Kubernetes namespace and DynamoGraphDeployment name are used to construct the Dynamo namespace for each component |  | Optional: \{\} <br /> |
 | `globalDynamoNamespace` _boolean_ | GlobalDynamoNamespace indicates that the Component will be placed in the global Dynamo namespace |  |  |
 | `resources` _[Resources](#resources)_ | Resources requested and limits for this component, including CPU, memory,<br />GPUs/devices, and any runtime-specific resources. |  |  |
-| `autoscaling` _[Autoscaling](#autoscaling)_ | Deprecated: This field is deprecated and ignored. Use DynamoGraphDeploymentScalingAdapter<br />with HPA, KEDA, or Planner for autoscaling instead. See docs/pages/kubernetes/autoscaling.md<br />for migration guidance. This field will be removed in a future API version. |  |  |
+| `autoscaling` _[Autoscaling](#autoscaling)_ | Deprecated: This field is deprecated and ignored. Use DynamoGraphDeploymentScalingAdapter<br />with HPA, KEDA, or Planner for autoscaling instead. See docs/kubernetes/autoscaling.md<br />for migration guidance. This field will be removed in a future API version. |  |  |
 | `envs` _[EnvVar](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#envvar-v1-core) array_ | Envs defines additional environment variables to inject into the component containers. |  |  |
 | `envFromSecret` _string_ | EnvFromSecret references a Secret whose key/value pairs will be exposed as<br />environment variables in the component containers. |  |  |
 | `volumeMounts` _[VolumeMount](#volumemount) array_ | VolumeMounts references PVCs defined at the top level for volumes to be mounted by the component. |  |  |

--- a/docs/kubernetes/api_reference.md
+++ b/docs/kubernetes/api_reference.md
-<!--
-SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
-SPDX-License-Identifier: Apache-2.0
-
-Licensed under the Apache License, Version 2.0 (the "License");
-you may not use this file except in compliance with the License.
-You may obtain a copy of the License at
-
-http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
-->
-
-> **⚠️ Important**: This documentation is automatically generated from source code.
-> Do not edit this file directly.
-
-# API Reference
-
-## Packages
- [nvidia.com/v1alpha1](#nvidiacomv1alpha1)
- [nvidia.com/v1beta1](#nvidiacomv1beta1)
-
-
-## nvidia.com/v1alpha1
-
-Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group.
-
-This package defines the DynamoGraphDeploymentRequest (DGDR) custom resource, which provides
-a high-level, SLA-driven interface for deploying machine learning models on Dynamo.
-
-Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group.
-
-### Resource Types
- [DynamoCheckpoint](#dynamocheckpoint)
- [DynamoComponentDeployment](#dynamocomponentdeployment)
- [DynamoGraphDeployment](#dynamographdeployment)
- [DynamoGraphDeploymentRequest](#dynamographdeploymentrequest)
- [DynamoGraphDeploymentScalingAdapter](#dynamographdeploymentscalingadapter)
- [DynamoModel](#dynamomodel)
-
-
-
-#### Autoscaling
-
-
-
-Deprecated: This field is deprecated and ignored. Use DynamoGraphDeploymentScalingAdapter
-with HPA, KEDA, or Planner for autoscaling instead. See docs/kubernetes/autoscaling.md
-for migration guidance. This field will be removed in a future API version.
-
-
-
-_Appears in:_
- [DynamoComponentDeploymentSharedSpec](#dynamocomponentdeploymentsharedspec)
- [DynamoComponentDeploymentSpec](#dynamocomponentdeploymentspec)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `enabled` _boolean_ | Deprecated: This field is ignored. |  |  |
-| `minReplicas` _integer_ | Deprecated: This field is ignored. |  |  |
-| `maxReplicas` _integer_ | Deprecated: This field is ignored. |  |  |
-| `behavior` _[HorizontalPodAutoscalerBehavior](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#horizontalpodautoscalerbehavior-v2-autoscaling)_ | Deprecated: This field is ignored. |  |  |
-| `metrics` _[MetricSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#metricspec-v2-autoscaling) array_ | Deprecated: This field is ignored. |  |  |
-
-
-
-
-#### CheckpointMode
-
-_Underlying type:_ _string_
-
-CheckpointMode defines how checkpoint creation is handled
-
-_Validation:_
- Enum: [Auto Manual]
-
-_Appears in:_
- [ServiceCheckpointConfig](#servicecheckpointconfig)
-
-| Field | Description |
-| --- | --- |
-| `Auto` | CheckpointModeAuto means the DGD controller will automatically create a Checkpoint CR<br /> |
-| `Manual` | CheckpointModeManual means the user must create the Checkpoint CR themselves<br /> |
-
-
-#### ComponentKind
-
-_Underlying type:_ _string_
-
-ComponentKind represents the type of underlying Kubernetes resource.
-
-_Validation:_
- Enum: [PodClique PodCliqueScalingGroup Deployment LeaderWorkerSet]
-
-_Appears in:_
- [ServiceReplicaStatus](#servicereplicastatus)
-
-| Field | Description |
-| --- | --- |
-| `PodClique` | ComponentKindPodClique represents a PodClique resource.<br /> |
-| `PodCliqueScalingGroup` | ComponentKindPodCliqueScalingGroup represents a PodCliqueScalingGroup resource.<br /> |
-| `Deployment` | ComponentKindDeployment represents a Deployment resource.<br /> |
-| `LeaderWorkerSet` | ComponentKindLeaderWorkerSet represents a LeaderWorkerSet resource.<br /> |
-
-
-#### ConfigMapKeySelector
-
-
-
-ConfigMapKeySelector selects a specific key from a ConfigMap.
-Used to reference external configuration data stored in ConfigMaps.
-
-
-
-_Appears in:_
- [ProfilingConfigSpec](#profilingconfigspec)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `name` _string_ | Name of the ConfigMap containing the desired data. |  | Required: \{\} <br /> |
-| `key` _string_ | Key in the ConfigMap to select. If not specified, defaults to "disagg.yaml". | disagg.yaml |  |
-
-
-#### DeploymentOverridesSpec
-
-
-
-DeploymentOverridesSpec allows users to customize metadata for auto-created DynamoGraphDeployments.
-When autoApply is enabled, these overrides are applied to the generated DGD resource.
-
-
-
-_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `name` _string_ | Name is the desired name for the created DynamoGraphDeployment.<br />If not specified, defaults to the DGDR name. |  | Optional: \{\} <br /> |
-| `namespace` _string_ | Namespace is the desired namespace for the created DynamoGraphDeployment.<br />If not specified, defaults to the DGDR namespace. |  | Optional: \{\} <br /> |
-| `labels` _object (keys:string, values:string)_ | Labels are additional labels to add to the DynamoGraphDeployment metadata.<br />These are merged with auto-generated labels from the profiling process. |  | Optional: \{\} <br /> |
-| `annotations` _object (keys:string, values:string)_ | Annotations are additional annotations to add to the DynamoGraphDeployment metadata. |  | Optional: \{\} <br /> |
-| `workersImage` _string_ | WorkersImage specifies the container image to use for DynamoGraphDeployment worker components.<br />This image is used for both temporary DGDs created during online profiling and the final DGD.<br />If omitted, the image from the base config file (e.g., disagg.yaml) is used.<br />Example: "nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.1" |  | Optional: \{\} <br /> |
-
-
-#### DeploymentStatus
-
-
-
-DeploymentStatus tracks the state of an auto-created DynamoGraphDeployment.
-This status is populated when autoApply is enabled and a DGD is created.
-
-
-
-_Appears in:_
- [DynamoGraphDeploymentRequestStatus](#dynamographdeploymentrequeststatus)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `name` _string_ | Name is the name of the created DynamoGraphDeployment. |  |  |
-| `namespace` _string_ | Namespace is the namespace of the created DynamoGraphDeployment. |  |  |
-| `state` _string_ | State is the current state of the DynamoGraphDeployment.<br />This value is mirrored from the DGD's status.state field. |  |  |
-| `created` _boolean_ | Created indicates whether the DGD has been successfully created.<br />Used to prevent recreation if the DGD is manually deleted by users. |  |  |
-
-
-
-
-#### DynamoCheckpoint
-
-
-
-DynamoCheckpoint is the Schema for the dynamocheckpoints API
-It represents a container checkpoint that can be used to restore pods to a warm state
-
-
-
-
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `apiVersion` _string_ | `nvidia.com/v1alpha1` | | |
-| `kind` _string_ | `DynamoCheckpoint` | | |
-| `metadata` _[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#objectmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. |  |  |
-| `spec` _[DynamoCheckpointSpec](#dynamocheckpointspec)_ |  |  |  |
-| `status` _[DynamoCheckpointStatus](#dynamocheckpointstatus)_ |  |  |  |
-
-
-
-
-#### DynamoCheckpointIdentity
-
-
-
-DynamoCheckpointIdentity defines the inputs that determine checkpoint equivalence
-Two checkpoints with the same identity hash are considered equivalent
-
-
-
-_Appears in:_
- [DynamoCheckpointSpec](#dynamocheckpointspec)
- [ServiceCheckpointConfig](#servicecheckpointconfig)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `model` _string_ | Model is the model identifier (e.g., "meta-llama/Llama-3-70B") |  | Required: \{\} <br /> |
-| `backendFramework` _string_ | BackendFramework is the runtime framework (vllm, sglang, trtllm) |  | Enum: [vllm sglang trtllm] <br />Required: \{\} <br /> |
-| `dynamoVersion` _string_ | DynamoVersion is the Dynamo platform version (optional)<br />If not specified, version is not included in identity hash<br />This ensures checkpoint compatibility across Dynamo releases |  | Optional: \{\} <br /> |
-| `tensorParallelSize` _integer_ | TensorParallelSize is the tensor parallel configuration | 1 | Minimum: 1 <br />Optional: \{\} <br /> |
-| `pipelineParallelSize` _integer_ | PipelineParallelSize is the pipeline parallel configuration | 1 | Minimum: 1 <br />Optional: \{\} <br /> |
-| `dtype` _string_ | Dtype is the data type (fp16, bf16, fp8, etc.) |  | Optional: \{\} <br /> |
-| `maxModelLen` _integer_ | MaxModelLen is the maximum sequence length |  | Minimum: 1 <br />Optional: \{\} <br /> |
-| `extraParameters` _object (keys:string, values:string)_ | ExtraParameters are additional parameters that affect the checkpoint hash<br />Use for any framework-specific or custom parameters not covered above |  | Optional: \{\} <br /> |
-
-
-#### DynamoCheckpointJobConfig
-
-
-
-DynamoCheckpointJobConfig defines the configuration for the checkpoint creation Job
-
-
-
-_Appears in:_
- [DynamoCheckpointSpec](#dynamocheckpointspec)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `podTemplateSpec` _[PodTemplateSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#podtemplatespec-v1-core)_ | PodTemplateSpec allows customizing the checkpoint Job pod<br />This should include the container that runs the workload to be checkpointed |  | Required: \{\} <br /> |
-| `activeDeadlineSeconds` _integer_ | ActiveDeadlineSeconds specifies the maximum time the Job can run | 3600 | Optional: \{\} <br /> |
-| `backoffLimit` _integer_ | BackoffLimit specifies the number of retries before marking the Job failed | 3 | Optional: \{\} <br /> |
-| `ttlSecondsAfterFinished` _integer_ | TTLSecondsAfterFinished specifies how long to keep the Job after completion | 300 | Optional: \{\} <br /> |
-
-
-#### DynamoCheckpointPhase
-
-_Underlying type:_ _string_
-
-DynamoCheckpointPhase represents the current phase of the checkpoint lifecycle
-
-_Validation:_
- Enum: [Pending Creating Ready Failed]
-
-_Appears in:_
- [DynamoCheckpointStatus](#dynamocheckpointstatus)
-
-| Field | Description |
-| --- | --- |
-| `Pending` | DynamoCheckpointPhasePending indicates the checkpoint CR has been created but the Job has not started<br /> |
-| `Creating` | DynamoCheckpointPhaseCreating indicates the checkpoint Job is running<br /> |
-| `Ready` | DynamoCheckpointPhaseReady indicates the checkpoint tar file is available on the PVC<br /> |
-| `Failed` | DynamoCheckpointPhaseFailed indicates the checkpoint creation failed<br /> |
-
-
-#### DynamoCheckpointSpec
-
-
-
-DynamoCheckpointSpec defines the desired state of DynamoCheckpoint
-
-
-
-_Appears in:_
- [DynamoCheckpoint](#dynamocheckpoint)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `identity` _[DynamoCheckpointIdentity](#dynamocheckpointidentity)_ | Identity defines the inputs that determine checkpoint equivalence |  | Required: \{\} <br /> |
-| `job` _[DynamoCheckpointJobConfig](#dynamocheckpointjobconfig)_ | Job defines the configuration for the checkpoint creation Job |  | Required: \{\} <br /> |
-
-
-#### DynamoCheckpointStatus
-
-
-
-DynamoCheckpointStatus defines the observed state of DynamoCheckpoint
-
-
-
-_Appears in:_
- [DynamoCheckpoint](#dynamocheckpoint)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `phase` _[DynamoCheckpointPhase](#dynamocheckpointphase)_ | Phase represents the current phase of the checkpoint lifecycle |  | Enum: [Pending Creating Ready Failed] <br />Optional: \{\} <br /> |
-| `identityHash` _string_ | IdentityHash is the computed hash of the checkpoint identity<br />This hash is used to identify equivalent checkpoints |  | Optional: \{\} <br /> |
-| `location` _string_ | Location is the full URI/path to the checkpoint in the storage backend<br />For PVC: same as TarPath (e.g., /checkpoints/\{hash\}.tar)<br />For S3: s3://bucket/prefix/\{hash\}.tar<br />For OCI: oci://registry/repo:\{hash\} |  | Optional: \{\} <br /> |
-| `storageType` _[DynamoCheckpointStorageType](#dynamocheckpointstoragetype)_ | StorageType indicates the storage backend type used for this checkpoint |  | Enum: [pvc s3 oci] <br />Optional: \{\} <br /> |
-| `jobName` _string_ | JobName is the name of the checkpoint creation Job |  | Optional: \{\} <br /> |
-| `createdAt` _[Time](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#time-v1-meta)_ | CreatedAt is the timestamp when the checkpoint tar was created |  | Optional: \{\} <br /> |
-| `message` _string_ | Message provides additional information about the current state |  | Optional: \{\} <br /> |
-| `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#condition-v1-meta) array_ | Conditions represent the latest available observations of the checkpoint's state |  | Optional: \{\} <br /> |
-
-
-#### DynamoCheckpointStorageType
-
-_Underlying type:_ _string_
-
-DynamoCheckpointStorageType defines the supported storage backends for checkpoints
-
-_Validation:_
- Enum: [pvc s3 oci]
-
-_Appears in:_
- [DynamoCheckpointStatus](#dynamocheckpointstatus)
-
-
-
-#### DynamoComponentDeployment
-
-
-
-DynamoComponentDeployment is the Schema for the dynamocomponentdeployments API
-
-
-
-
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `apiVersion` _string_ | `nvidia.com/v1alpha1` | | |
-| `kind` _string_ | `DynamoComponentDeployment` | | |
-| `metadata` _[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#objectmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. |  |  |
-| `spec` _[DynamoComponentDeploymentSpec](#dynamocomponentdeploymentspec)_ | Spec defines the desired state for this Dynamo component deployment. |  |  |
-
-
-#### DynamoComponentDeploymentSharedSpec
-
-
-
-
-
-
-
-_Appears in:_
- [DynamoComponentDeploymentSpec](#dynamocomponentdeploymentspec)
- [DynamoGraphDeploymentSpec](#dynamographdeploymentspec)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `annotations` _object (keys:string, values:string)_ | Annotations to add to generated Kubernetes resources for this component<br />(such as Pod, Service, and Ingress when applicable). |  |  |
-| `labels` _object (keys:string, values:string)_ | Labels to add to generated Kubernetes resources for this component. |  |  |
-| `serviceName` _string_ | The name of the component |  |  |
-| `componentType` _string_ | ComponentType indicates the role of this component (for example, "main"). |  |  |
-| `subComponentType` _string_ | SubComponentType indicates the sub-role of this component (for example, "prefill"). |  |  |
-| `dynamoNamespace` _string_ | DynamoNamespace is deprecated and will be removed in a future version.<br />The DGD Kubernetes namespace and DynamoGraphDeployment name are used to construct the Dynamo namespace for each component |  | Optional: \{\} <br /> |
-| `globalDynamoNamespace` _boolean_ | GlobalDynamoNamespace indicates that the Component will be placed in the global Dynamo namespace |  |  |
-| `resources` _[Resources](#resources)_ | Resources requested and limits for this component, including CPU, memory,<br />GPUs/devices, and any runtime-specific resources. |  |  |
-| `autoscaling` _[Autoscaling](#autoscaling)_ | Deprecated: This field is deprecated and ignored. Use DynamoGraphDeploymentScalingAdapter<br />with HPA, KEDA, or Planner for autoscaling instead. See docs/kubernetes/autoscaling.md<br />for migration guidance. This field will be removed in a future API version. |  |  |
-| `envs` _[EnvVar](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#envvar-v1-core) array_ | Envs defines additional environment variables to inject into the component containers. |  |  |
-| `envFromSecret` _string_ | EnvFromSecret references a Secret whose key/value pairs will be exposed as<br />environment variables in the component containers. |  |  |
-| `volumeMounts` _[VolumeMount](#volumemount) array_ | VolumeMounts references PVCs defined at the top level for volumes to be mounted by the component. |  |  |
-| `ingress` _[IngressSpec](#ingressspec)_ | Ingress config to expose the component outside the cluster (or through a service mesh). |  |  |
-| `modelRef` _[ModelReference](#modelreference)_ | ModelRef references a model that this component serves<br />When specified, a headless service will be created for endpoint discovery |  | Optional: \{\} <br /> |
-| `sharedMemory` _[SharedMemorySpec](#sharedmemoryspec)_ | SharedMemory controls the tmpfs mounted at /dev/shm (enable/disable and size). |  |  |
-| `extraPodMetadata` _[ExtraPodMetadata](#extrapodmetadata)_ | ExtraPodMetadata adds labels/annotations to the created Pods. |  | Optional: \{\} <br /> |
-| `extraPodSpec` _[ExtraPodSpec](#extrapodspec)_ | ExtraPodSpec allows to override the main pod spec configuration.<br />It is a k8s standard PodSpec. It also contains a MainContainer (standard k8s Container) field<br />that allows overriding the main container configuration. |  | Optional: \{\} <br /> |
-| `livenessProbe` _[Probe](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#probe-v1-core)_ | LivenessProbe to detect and restart unhealthy containers. |  |  |
-| `readinessProbe` _[Probe](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#probe-v1-core)_ | ReadinessProbe to signal when the container is ready to receive traffic. |  |  |
-| `replicas` _integer_ | Replicas is the desired number of Pods for this component.<br />When scalingAdapter is enabled, this field is managed by the<br />DynamoGraphDeploymentScalingAdapter and should not be modified directly. |  | Minimum: 0 <br /> |
-| `multinode` _[MultinodeSpec](#multinodespec)_ | Multinode is the configuration for multinode components. |  |  |
-| `scalingAdapter` _[ScalingAdapter](#scalingadapter)_ | ScalingAdapter configures whether this service uses the DynamoGraphDeploymentScalingAdapter.<br />When enabled, replicas are managed via DGDSA and external autoscalers can scale<br />the service using the Scale subresource. When disabled, replicas can be modified directly. |  | Optional: \{\} <br /> |
-| `eppConfig` _[EPPConfig](#eppconfig)_ | EPPConfig defines EPP-specific configuration options for Endpoint Picker Plugin components.<br />Only applicable when ComponentType is "epp". |  | Optional: \{\} <br /> |
-| `checkpoint` _[ServiceCheckpointConfig](#servicecheckpointconfig)_ | Checkpoint configures container checkpointing for this service.<br />When enabled, pods can be restored from a checkpoint files for faster cold start. |  | Optional: \{\} <br /> |
-
-
-#### DynamoComponentDeploymentSpec
-
-
-
-DynamoComponentDeploymentSpec defines the desired state of DynamoComponentDeployment
-
-
-
-_Appears in:_
- [DynamoComponentDeployment](#dynamocomponentdeployment)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `backendFramework` _string_ | BackendFramework specifies the backend framework (e.g., "sglang", "vllm", "trtllm") |  | Enum: [sglang vllm trtllm] <br /> |
-| `annotations` _object (keys:string, values:string)_ | Annotations to add to generated Kubernetes resources for this component<br />(such as Pod, Service, and Ingress when applicable). |  |  |
-| `labels` _object (keys:string, values:string)_ | Labels to add to generated Kubernetes resources for this component. |  |  |
-| `serviceName` _string_ | The name of the component |  |  |
-| `componentType` _string_ | ComponentType indicates the role of this component (for example, "main"). |  |  |
-| `subComponentType` _string_ | SubComponentType indicates the sub-role of this component (for example, "prefill"). |  |  |
-| `dynamoNamespace` _string_ | DynamoNamespace is deprecated and will be removed in a future version.<br />The DGD Kubernetes namespace and DynamoGraphDeployment name are used to construct the Dynamo namespace for each component |  | Optional: \{\} <br /> |
-| `globalDynamoNamespace` _boolean_ | GlobalDynamoNamespace indicates that the Component will be placed in the global Dynamo namespace |  |  |
-| `resources` _[Resources](#resources)_ | Resources requested and limits for this component, including CPU, memory,<br />GPUs/devices, and any runtime-specific resources. |  |  |
-| `autoscaling` _[Autoscaling](#autoscaling)_ | Deprecated: This field is deprecated and ignored. Use DynamoGraphDeploymentScalingAdapter<br />with HPA, KEDA, or Planner for autoscaling instead. See docs/kubernetes/autoscaling.md<br />for migration guidance. This field will be removed in a future API version. |  |  |
-| `envs` _[EnvVar](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#envvar-v1-core) array_ | Envs defines additional environment variables to inject into the component containers. |  |  |
-| `envFromSecret` _string_ | EnvFromSecret references a Secret whose key/value pairs will be exposed as<br />environment variables in the component containers. |  |  |
-| `volumeMounts` _[VolumeMount](#volumemount) array_ | VolumeMounts references PVCs defined at the top level for volumes to be mounted by the component. |  |  |
-| `ingress` _[IngressSpec](#ingressspec)_ | Ingress config to expose the component outside the cluster (or through a service mesh). |  |  |
-| `modelRef` _[ModelReference](#modelreference)_ | ModelRef references a model that this component serves<br />When specified, a headless service will be created for endpoint discovery |  | Optional: \{\} <br /> |
-| `sharedMemory` _[SharedMemorySpec](#sharedmemoryspec)_ | SharedMemory controls the tmpfs mounted at /dev/shm (enable/disable and size). |  |  |
-| `extraPodMetadata` _[ExtraPodMetadata](#extrapodmetadata)_ | ExtraPodMetadata adds labels/annotations to the created Pods. |  | Optional: \{\} <br /> |
-| `extraPodSpec` _[ExtraPodSpec](#extrapodspec)_ | ExtraPodSpec allows to override the main pod spec configuration.<br />It is a k8s standard PodSpec. It also contains a MainContainer (standard k8s Container) field<br />that allows overriding the main container configuration. |  | Optional: \{\} <br /> |
-| `livenessProbe` _[Probe](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#probe-v1-core)_ | LivenessProbe to detect and restart unhealthy containers. |  |  |
-| `readinessProbe` _[Probe](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#probe-v1-core)_ | ReadinessProbe to signal when the container is ready to receive traffic. |  |  |
-| `replicas` _integer_ | Replicas is the desired number of Pods for this component.<br />When scalingAdapter is enabled, this field is managed by the<br />DynamoGraphDeploymentScalingAdapter and should not be modified directly. |  | Minimum: 0 <br /> |
-| `multinode` _[MultinodeSpec](#multinodespec)_ | Multinode is the configuration for multinode components. |  |  |
-| `scalingAdapter` _[ScalingAdapter](#scalingadapter)_ | ScalingAdapter configures whether this service uses the DynamoGraphDeploymentScalingAdapter.<br />When enabled, replicas are managed via DGDSA and external autoscalers can scale<br />the service using the Scale subresource. When disabled, replicas can be modified directly. |  | Optional: \{\} <br /> |
-| `eppConfig` _[EPPConfig](#eppconfig)_ | EPPConfig defines EPP-specific configuration options for Endpoint Picker Plugin components.<br />Only applicable when ComponentType is "epp". |  | Optional: \{\} <br /> |
-| `checkpoint` _[ServiceCheckpointConfig](#servicecheckpointconfig)_ | Checkpoint configures container checkpointing for this service.<br />When enabled, pods can be restored from a checkpoint files for faster cold start. |  | Optional: \{\} <br /> |
-
-
-#### DynamoGraphDeployment
-
-
-
-DynamoGraphDeployment is the Schema for the dynamographdeployments API.
-
-
-
-
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `apiVersion` _string_ | `nvidia.com/v1alpha1` | | |
-| `kind` _string_ | `DynamoGraphDeployment` | | |
-| `metadata` _[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#objectmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. |  |  |
-| `spec` _[DynamoGraphDeploymentSpec](#dynamographdeploymentspec)_ | Spec defines the desired state for this graph deployment. |  |  |
-| `status` _[DynamoGraphDeploymentStatus](#dynamographdeploymentstatus)_ | Status reflects the current observed state of this graph deployment. |  |  |
-
-
-#### DynamoGraphDeploymentRequest
-
-
-
-DynamoGraphDeploymentRequest is the Schema for the dynamographdeploymentrequests API.
-It provides a simplified, SLA-driven interface for deploying inference models on Dynamo.
-Users specify a model and optional performance targets; the controller handles profiling,
-configuration selection, and deployment.
-
-Lifecycle:
- 1. Pending: Spec validated, preparing for profiling
- 2. Profiling: Profiling job is running to discover optimal configurations
- 3. Ready: Profiling complete, generated DGD spec available in status
- 4. Deploying: DGD is being created and rolled out (when autoApply=true)
- 5. Deployed: DGD is running and healthy
- 6. Failed: An unrecoverable error occurred
-
-
-
-
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `apiVersion` _string_ | `nvidia.com/v1beta1` | | |
-| `kind` _string_ | `DynamoGraphDeploymentRequest` | | |
-| `metadata` _[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#objectmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. |  |  |
-| `spec` _[DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)_ | Spec defines the desired state for this deployment request. |  |  |
-| `status` _[DynamoGraphDeploymentRequestStatus](#dynamographdeploymentrequeststatus)_ | Status reflects the current observed state of this deployment request. |  |  |
-
-
-#### DynamoGraphDeploymentRequestSpec
-
-
-
-DynamoGraphDeploymentRequestSpec defines the desired state of a DynamoGraphDeploymentRequest.
-Only the Model field is required; all other fields are optional and have sensible defaults.
-
-
-
-_Appears in:_
- [DynamoGraphDeploymentRequest](#dynamographdeploymentrequest)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `model` _string_ | Model specifies the model to deploy (e.g., "Qwen/Qwen3-0.6B", "meta-llama/Llama-3-70b").<br />Can be a HuggingFace ID or a private model name. |  | Required: \{\} <br />MinLength: 1 <br /> |
-| `backend` _[BackendType](#backendtype)_ | Backend specifies the inference backend to use for profiling and deployment. | auto | Enum: [auto sglang trtllm vllm] <br /> |
-| `image` _string_ | Image is the container image reference for the profiling job. |  | Optional: \{\} <br /> |
-| `modelCache` _[ModelCacheSpec](#modelcachespec)_ | ModelCache provides optional PVC configuration for pre-downloaded model weights. |  | Optional: \{\} <br /> |
-| `hardware` _[HardwareSpec](#hardwarespec)_ | Hardware describes the hardware resources available for profiling and deployment. |  | Optional: \{\} <br /> |
-| `workload` _[WorkloadSpec](#workloadspec)_ | Workload defines the expected workload characteristics for SLA-based profiling. |  | Optional: \{\} <br /> |
-| `sla` _[SLASpec](#slaspec)_ | SLA defines service-level agreement targets that drive profiling optimization. |  | Optional: \{\} <br /> |
-| `overrides` _[OverridesSpec](#overridesspec)_ | Overrides allows customizing the profiling job and the generated DynamoGraphDeployment. |  | Optional: \{\} <br /> |
-| `features` _[FeaturesSpec](#featuresspec)_ | Features controls optional Dynamo platform features in the generated deployment. |  | Optional: \{\} <br /> |
-| `searchStrategy` _[SearchStrategy](#searchstrategy)_ | SearchStrategy controls the profiling search depth. | rapid | Enum: [rapid thorough] <br /> |
-| `autoApply` _boolean_ | AutoApply indicates whether to automatically create a DynamoGraphDeployment<br />after profiling completes. If false, the generated spec is stored in status<br />for manual review and application. | true |  |
-
-
-#### DynamoGraphDeploymentRequestStatus
-
-
-
-DynamoGraphDeploymentRequestStatus represents the observed state of a DynamoGraphDeploymentRequest.
-
-
-
-_Appears in:_
- [DynamoGraphDeploymentRequest](#dynamographdeploymentrequest)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `phase` _[DGDRPhase](#dgdrphase)_ | Phase is the high-level lifecycle phase of the deployment request. |  | Enum: [Pending Profiling Ready Deploying Deployed Failed] <br /> |
-| `profilingPhase` _[ProfilingPhase](#profilingphase)_ | ProfilingPhase indicates the current sub-phase of the profiling pipeline.<br />Only meaningful when Phase is "Profiling". |  | Optional: \{\} <br /> |
-| `dgdName` _string_ | DGDName is the name of the generated or created DynamoGraphDeployment. |  | Optional: \{\} <br /> |
-| `profilingJobName` _string_ | ProfilingJobName is the name of the Kubernetes Job running the profiler. |  | Optional: \{\} <br /> |
-| `observedGeneration` _integer_ | ObservedGeneration is the most recent generation observed by the controller. |  |  |
-| `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#condition-v1-meta) array_ | Conditions contains the latest observed conditions of the deployment request.<br />Standard condition types include: Succeeded, Validation, Profiling, SpecGenerated, DeploymentReady. |  |  |
-| `profilingResults` _[ProfilingResultsStatus](#profilingresultsstatus)_ | ProfilingResults contains the output of the profiling process including<br />Pareto-optimal configurations and the selected deployment configuration. |  | Optional: \{\} <br /> |
-| `deploymentInfo` _[DeploymentInfoStatus](#deploymentinfostatus)_ | DeploymentInfo tracks the state of the deployed DynamoGraphDeployment. |  | Optional: \{\} <br /> |
-
-
-#### DynamoGraphDeploymentScalingAdapter
-
-
-
-DynamoGraphDeploymentScalingAdapter provides a scaling interface for individual services
-within a DynamoGraphDeployment. It implements the Kubernetes scale
-subresource, enabling integration with HPA, KEDA, and custom autoscalers.
-
-The adapter acts as an intermediary between autoscalers and the DGD,
-ensuring that only the adapter controller modifies the DGD's service replicas.
-This prevents conflicts when multiple autoscaling mechanisms are in play.
-
-
-
-
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `apiVersion` _string_ | `nvidia.com/v1alpha1` | | |
-| `kind` _string_ | `DynamoGraphDeploymentScalingAdapter` | | |
-| `metadata` _[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#objectmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. |  |  |
-| `spec` _[DynamoGraphDeploymentScalingAdapterSpec](#dynamographdeploymentscalingadapterspec)_ |  |  |  |
-| `status` _[DynamoGraphDeploymentScalingAdapterStatus](#dynamographdeploymentscalingadapterstatus)_ |  |  |  |
-
-
-#### DynamoGraphDeploymentScalingAdapterSpec
-
-
-
-DynamoGraphDeploymentScalingAdapterSpec defines the desired state of DynamoGraphDeploymentScalingAdapter
-
-
-
-_Appears in:_
- [DynamoGraphDeploymentScalingAdapter](#dynamographdeploymentscalingadapter)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `replicas` _integer_ | Replicas is the desired number of replicas for the target service.<br />This field is modified by external autoscalers (HPA/KEDA/Planner) or manually by users. |  | Minimum: 0 <br />Required: \{\} <br /> |
-| `dgdRef` _[DynamoGraphDeploymentServiceRef](#dynamographdeploymentserviceref)_ | DGDRef references the DynamoGraphDeployment and the specific service to scale. |  | Required: \{\} <br /> |
-
-
-#### DynamoGraphDeploymentScalingAdapterStatus
-
-
-
-DynamoGraphDeploymentScalingAdapterStatus defines the observed state of DynamoGraphDeploymentScalingAdapter
-
-
-
-_Appears in:_
- [DynamoGraphDeploymentScalingAdapter](#dynamographdeploymentscalingadapter)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `replicas` _integer_ | Replicas is the current number of replicas for the target service.<br />This is synced from the DGD's service replicas and is required for the scale subresource. |  | Optional: \{\} <br /> |
-| `selector` _string_ | Selector is a label selector string for the pods managed by this adapter.<br />Required for HPA compatibility via the scale subresource. |  | Optional: \{\} <br /> |
-| `lastScaleTime` _[Time](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#time-v1-meta)_ | LastScaleTime is the last time the adapter scaled the target service. |  | Optional: \{\} <br /> |
-
-
-#### DynamoGraphDeploymentServiceRef
-
-
-
-DynamoGraphDeploymentServiceRef identifies a specific service within a DynamoGraphDeployment
-
-
-
-_Appears in:_
- [DynamoGraphDeploymentScalingAdapterSpec](#dynamographdeploymentscalingadapterspec)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `name` _string_ | Name of the DynamoGraphDeployment |  | MinLength: 1 <br />Required: \{\} <br /> |
-| `serviceName` _string_ | ServiceName is the key name of the service within the DGD's spec.services map to scale |  | MinLength: 1 <br />Required: \{\} <br /> |
-
-
-#### DynamoGraphDeploymentSpec
-
-
-
-DynamoGraphDeploymentSpec defines the desired state of DynamoGraphDeployment.
-
-
-
-_Appears in:_
- [DynamoGraphDeployment](#dynamographdeployment)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `pvcs` _[PVC](#pvc) array_ | PVCs defines a list of persistent volume claims that can be referenced by components.<br />Each PVC must have a unique name that can be referenced in component specifications. |  | MaxItems: 100 <br />Optional: \{\} <br /> |
-| `services` _object (keys:string, values:[DynamoComponentDeploymentSharedSpec](#dynamocomponentdeploymentsharedspec))_ | Services are the services to deploy as part of this deployment. |  | MaxProperties: 25 <br />Optional: \{\} <br /> |
-| `envs` _[EnvVar](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#envvar-v1-core) array_ | Envs are environment variables applied to all services in the deployment unless<br />overridden by service-specific configuration. |  | Optional: \{\} <br /> |
-| `backendFramework` _string_ | BackendFramework specifies the backend framework (e.g., "sglang", "vllm", "trtllm"). |  | Enum: [sglang vllm trtllm] <br /> |
-| `restart` _[Restart](#restart)_ | Restart specifies the restart policy for the graph deployment. |  | Optional: \{\} <br /> |
-
-
-#### DynamoGraphDeploymentStatus
-
-
-
-DynamoGraphDeploymentStatus defines the observed state of DynamoGraphDeployment.
-
-
-
-_Appears in:_
- [DynamoGraphDeployment](#dynamographdeployment)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `state` _string_ | State is a high-level textual status of the graph deployment lifecycle. |  |  |
-| `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#condition-v1-meta) array_ | Conditions contains the latest observed conditions of the graph deployment.<br />The slice is merged by type on patch updates. |  |  |
-| `services` _object (keys:string, values:[ServiceReplicaStatus](#servicereplicastatus))_ | Services contains per-service replica status information.<br />The map key is the service name from spec.services. |  | Optional: \{\} <br /> |
-| `restart` _[RestartStatus](#restartstatus)_ | Restart contains the status of the restart of the graph deployment. |  | Optional: \{\} <br /> |
-| `checkpoints` _object (keys:string, values:[ServiceCheckpointStatus](#servicecheckpointstatus))_ | Checkpoints contains per-service checkpoint status information.<br />The map key is the service name from spec.services. |  | Optional: \{\} <br /> |
-
-
-#### DynamoModel
-
-
-
-DynamoModel is the Schema for the dynamo models API
-
-
-
-
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `apiVersion` _string_ | `nvidia.com/v1alpha1` | | |
-| `kind` _string_ | `DynamoModel` | | |
-| `metadata` _[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#objectmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. |  |  |
-| `spec` _[DynamoModelSpec](#dynamomodelspec)_ |  |  |  |
-| `status` _[DynamoModelStatus](#dynamomodelstatus)_ |  |  |  |
-
-
-#### DynamoModelSpec
-
-
-
-DynamoModelSpec defines the desired state of DynamoModel
-
-
-
-_Appears in:_
- [DynamoModel](#dynamomodel)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `modelName` _string_ | ModelName is the full model identifier (e.g., "meta-llama/Llama-3.3-70B-Instruct-lora") |  | Required: \{\} <br /> |
-| `baseModelName` _string_ | BaseModelName is the base model identifier that matches the service label<br />This is used to discover endpoints via headless services |  | Required: \{\} <br /> |
-| `modelType` _string_ | ModelType specifies the type of model (e.g., "base", "lora", "adapter") | base | Enum: [base lora adapter] <br />Optional: \{\} <br /> |
-| `source` _[ModelSource](#modelsource)_ | Source specifies the model source location (only applicable for lora model type) |  | Optional: \{\} <br /> |
-
-
-#### DynamoModelStatus
-
-
-
-DynamoModelStatus defines the observed state of DynamoModel
-
-
-
-_Appears in:_
- [DynamoModel](#dynamomodel)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `endpoints` _[EndpointInfo](#endpointinfo) array_ | Endpoints is the current list of all endpoints for this model |  | Optional: \{\} <br /> |
-| `readyEndpoints` _integer_ | ReadyEndpoints is the count of endpoints that are ready |  |  |
-| `totalEndpoints` _integer_ | TotalEndpoints is the total count of endpoints |  |  |
-| `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#condition-v1-meta) array_ | Conditions represents the latest available observations of the model's state |  | Optional: \{\} <br /> |
-
-
-#### EPPConfig
-
-
-
-EPPConfig contains configuration for EPP (Endpoint Picker Plugin) components.
-EPP is responsible for intelligent endpoint selection and KV-aware routing.
-
-
-
-_Appears in:_
- [DynamoComponentDeploymentSharedSpec](#dynamocomponentdeploymentsharedspec)
- [DynamoComponentDeploymentSpec](#dynamocomponentdeploymentspec)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `configMapRef` _[ConfigMapKeySelector](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#configmapkeyselector-v1-core)_ | ConfigMapRef references a user-provided ConfigMap containing EPP configuration.<br />The ConfigMap should contain EndpointPickerConfig YAML.<br />Mutually exclusive with Config. |  | Optional: \{\} <br /> |
-| `config` _[EndpointPickerConfig](#endpointpickerconfig)_ | Config allows specifying EPP EndpointPickerConfig directly as a structured object.<br />The operator will marshal this to YAML and create a ConfigMap automatically.<br />Mutually exclusive with ConfigMapRef.<br />One of ConfigMapRef or Config must be specified (no default configuration).<br />Uses the upstream type from github.com/kubernetes-sigs/gateway-api-inference-extension |  | Type: object <br />Optional: \{\} <br /> |
-
-
-#### EndpointInfo
-
-
-
-EndpointInfo represents a single endpoint (pod) serving the model
-
-
-
-_Appears in:_
- [DynamoModelStatus](#dynamomodelstatus)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `address` _string_ | Address is the full address of the endpoint (e.g., "http://10.0.1.5:9090") |  |  |
-| `podName` _string_ | PodName is the name of the pod serving this endpoint |  | Optional: \{\} <br /> |
-| `ready` _boolean_ | Ready indicates whether the endpoint is ready to serve traffic<br />For LoRA models: true if the POST /loras request succeeded with a 2xx status code<br />For base models: always false (no probing performed) |  |  |
-
-
-#### ExtraPodMetadata
-
-
-
-
-
-
-
-_Appears in:_
- [DynamoComponentDeploymentSharedSpec](#dynamocomponentdeploymentsharedspec)
- [DynamoComponentDeploymentSpec](#dynamocomponentdeploymentspec)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `annotations` _object (keys:string, values:string)_ |  |  |  |
-| `labels` _object (keys:string, values:string)_ |  |  |  |
-
-
-#### ExtraPodSpec
-
-
-
-
-
-
-
-_Appears in:_
- [DynamoComponentDeploymentSharedSpec](#dynamocomponentdeploymentsharedspec)
- [DynamoComponentDeploymentSpec](#dynamocomponentdeploymentspec)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `mainContainer` _[Container](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#container-v1-core)_ |  |  |  |
-
-
-#### IngressSpec
-
-
-
-
-
-
-
-_Appears in:_
- [DynamoComponentDeploymentSharedSpec](#dynamocomponentdeploymentsharedspec)
- [DynamoComponentDeploymentSpec](#dynamocomponentdeploymentspec)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `enabled` _boolean_ | Enabled exposes the component through an ingress or virtual service when true. |  |  |
-| `host` _string_ | Host is the base host name to route external traffic to this component. |  |  |
-| `useVirtualService` _boolean_ | UseVirtualService indicates whether to configure a service-mesh VirtualService instead of a standard Ingress. |  |  |
-| `virtualServiceGateway` _string_ | VirtualServiceGateway optionally specifies the gateway name to attach the VirtualService to. |  |  |
-| `hostPrefix` _string_ | HostPrefix is an optional prefix added before the host. |  |  |
-| `annotations` _object (keys:string, values:string)_ | Annotations to set on the generated Ingress/VirtualService resources. |  |  |
-| `labels` _object (keys:string, values:string)_ | Labels to set on the generated Ingress/VirtualService resources. |  |  |
-| `tls` _[IngressTLSSpec](#ingresstlsspec)_ | TLS holds the TLS configuration used by the Ingress/VirtualService. |  |  |
-| `hostSuffix` _string_ | HostSuffix is an optional suffix appended after the host. |  |  |
-| `ingressControllerClassName` _string_ | IngressControllerClassName selects the ingress controller class (e.g., "nginx"). |  |  |
-
-
-#### IngressTLSSpec
-
-
-
-
-
-
-
-_Appears in:_
- [IngressSpec](#ingressspec)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `secretName` _string_ | SecretName is the name of a Kubernetes Secret containing the TLS certificate and key. |  |  |
-
-
-
-
-#### ModelReference
-
-
-
-ModelReference identifies a model served by this component
-
-
-
-_Appears in:_
- [DynamoComponentDeploymentSharedSpec](#dynamocomponentdeploymentsharedspec)
- [DynamoComponentDeploymentSpec](#dynamocomponentdeploymentspec)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `name` _string_ | Name is the base model identifier (e.g., "llama-3-70b-instruct-v1") |  | Required: \{\} <br /> |
-| `revision` _string_ | Revision is the model revision/version (optional) |  | Optional: \{\} <br /> |
-
-
-#### ModelSource
-
-
-
-ModelSource defines the source location of a model
-
-
-
-_Appears in:_
- [DynamoModelSpec](#dynamomodelspec)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `uri` _string_ | URI is the model source URI<br />Supported formats:<br />- S3: s3://bucket/path/to/model<br />- HuggingFace: hf://org/model@revision_sha |  | Required: \{\} <br /> |
-
-
-#### MultinodeSpec
-
-
-
-
-
-
-
-_Appears in:_
- [DynamoComponentDeploymentSharedSpec](#dynamocomponentdeploymentsharedspec)
- [DynamoComponentDeploymentSpec](#dynamocomponentdeploymentspec)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `nodeCount` _integer_ | Indicates the number of nodes to deploy for multinode components.<br />Total number of GPUs is NumberOfNodes * GPU limit.<br />Must be greater than 1. | 2 | Minimum: 2 <br /> |
-
-
-#### PVC
-
-
-
-
-
-
-
-_Appears in:_
- [DynamoGraphDeploymentSpec](#dynamographdeploymentspec)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `create` _boolean_ | Create indicates to create a new PVC |  |  |
-| `name` _string_ | Name is the name of the PVC |  | Required: \{\} <br /> |
-| `storageClass` _string_ | StorageClass to be used for PVC creation. Required when create is true. |  |  |
-| `size` _[Quantity](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#quantity-resource-api)_ | Size of the volume in Gi, used during PVC creation. Required when create is true. |  |  |
-| `volumeAccessMode` _[PersistentVolumeAccessMode](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#persistentvolumeaccessmode-v1-core)_ | VolumeAccessMode is the volume access mode of the PVC. Required when create is true. |  |  |
-
-
-#### ProfilingConfigSpec
-
-
-
-ProfilingConfigSpec defines configuration for the profiling process.
-This structure maps directly to the profile_sla.py config format.
-See benchmarks/profiler/utils/profiler_argparse.py for the complete schema.
-
-
-
-_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `config` _[JSON](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#json-v1-apiextensions-k8s-io)_ | Config is the profiling configuration as arbitrary JSON/YAML. This will be passed directly to the profiler.<br />The profiler will validate the configuration and report any errors. |  | Optional: \{\} <br />Type: object <br /> |
-| `configMapRef` _[ConfigMapKeySelector](#configmapkeyselector)_ | ConfigMapRef is an optional reference to a ConfigMap containing the DynamoGraphDeployment<br />base config file (disagg.yaml). This is separate from the profiling config above.<br />The path to this config will be set as engine.config in the profiling config. |  | Optional: \{\} <br /> |
-| `profilerImage` _string_ | ProfilerImage specifies the container image to use for profiling jobs.<br />This image contains the profiler code and dependencies needed for SLA-based profiling.<br />Example: "nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.1" |  | Required: \{\} <br /> |
-| `outputPVC` _string_ | OutputPVC is an optional PersistentVolumeClaim name for storing profiling output.<br />If specified, all profiling artifacts (logs, plots, configs, raw data) will be written<br />to this PVC instead of an ephemeral emptyDir volume. This allows users to access<br />complete profiling results after the job completes by mounting the PVC.<br />The PVC must exist in the same namespace as the DGDR.<br />If not specified, profiling uses emptyDir and only essential data is saved to ConfigMaps.<br />Note: ConfigMaps are still created regardless of this setting for planner integration. |  | Optional: \{\} <br /> |
-| `resources` _[ResourceRequirements](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#resourcerequirements-v1-core)_ | Resources specifies the compute resource requirements for the profiling job container.<br />If not specified, no resource requests or limits are set. |  | Optional: \{\} <br /> |
-| `tolerations` _[Toleration](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#toleration-v1-core) array_ | Tolerations allows the profiling job to be scheduled on nodes with matching taints.<br />For example, to schedule on GPU nodes, add a toleration for the nvidia.com/gpu taint. |  | Optional: \{\} <br /> |
-
-
-#### ResourceItem
-
-
-
-
-
-
-
-_Appears in:_
- [Resources](#resources)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `cpu` _string_ | CPU specifies the CPU resource request/limit (e.g., "1000m", "2") |  |  |
-| `memory` _string_ | Memory specifies the memory resource request/limit (e.g., "4Gi", "8Gi") |  |  |
-| `gpu` _string_ | GPU indicates the number of GPUs to request.<br />Total number of GPUs is NumberOfNodes * GPU in case of multinode deployment. |  |  |
-| `gpuType` _string_ | GPUType can specify a custom GPU type, e.g. "gpu.intel.com/xe"<br />By default if not specified, the GPU type is "nvidia.com/gpu" |  |  |
-| `custom` _object (keys:string, values:string)_ | Custom specifies additional custom resource requests/limits |  |  |
-
-
-#### Resources
-
-
-
-Resources defines requested and limits for a component, including CPU, memory,
-GPUs/devices, and any runtime-specific resources.
-
-
-
-_Appears in:_
- [DynamoComponentDeploymentSharedSpec](#dynamocomponentdeploymentsharedspec)
- [DynamoComponentDeploymentSpec](#dynamocomponentdeploymentspec)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `requests` _[ResourceItem](#resourceitem)_ | Requests specifies the minimum resources required by the component |  |  |
-| `limits` _[ResourceItem](#resourceitem)_ | Limits specifies the maximum resources allowed for the component |  |  |
-| `claims` _[ResourceClaim](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#resourceclaim-v1-core) array_ | Claims specifies resource claims for dynamic resource allocation |  |  |
-
-
-#### Restart
-
-
-
-
-
-
-
-_Appears in:_
- [DynamoGraphDeploymentSpec](#dynamographdeploymentspec)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `id` _string_ | ID is an arbitrary string that triggers a restart when changed.<br />Any modification to this value will initiate a restart of the graph deployment according to the strategy. |  | MinLength: 1 <br />Required: \{\} <br /> |
-| `strategy` _[RestartStrategy](#restartstrategy)_ | Strategy specifies the restart strategy for the graph deployment. |  | Optional: \{\} <br /> |
-
-
-#### RestartPhase
-
-_Underlying type:_ _string_
-
-
-
-
-
-_Appears in:_
- [RestartStatus](#restartstatus)
-
-| Field | Description |
-| --- | --- |
-| `Pending` |  |
-| `Restarting` |  |
-| `Completed` |  |
-| `Failed` |  |
-
-
-#### RestartStatus
-
-
-
-RestartStatus contains the status of the restart of the graph deployment.
-
-
-
-_Appears in:_
- [DynamoGraphDeploymentStatus](#dynamographdeploymentstatus)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `observedID` _string_ | ObservedID is the restart ID that has been observed and is being processed.<br />Matches the Restart.ID field in the spec. |  |  |
-| `phase` _[RestartPhase](#restartphase)_ | Phase is the phase of the restart. |  |  |
-| `inProgress` _string array_ | InProgress contains the names of the services that are currently being restarted. |  | Optional: \{\} <br /> |
-
-
-#### RestartStrategy
-
-
-
-
-
-
-
-_Appears in:_
- [Restart](#restart)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `type` _[RestartStrategyType](#restartstrategytype)_ | Type specifies the restart strategy type. | Sequential | Enum: [Sequential Parallel] <br /> |
-| `order` _string array_ | Order specifies the order in which the services should be restarted. |  | Optional: \{\} <br /> |
-
-
-#### RestartStrategyType
-
-_Underlying type:_ _string_
-
-
-
-
-
-_Appears in:_
- [RestartStrategy](#restartstrategy)
-
-| Field | Description |
-| --- | --- |
-| `Sequential` |  |
-| `Parallel` |  |
-
-
-#### ScalingAdapter
-
-
-
-ScalingAdapter configures whether a service uses the DynamoGraphDeploymentScalingAdapter
-for replica management. When enabled, the DGDSA owns the replicas field and
-external autoscalers (HPA, KEDA, Planner) can control scaling via the Scale subresource.
-
-
-
-_Appears in:_
- [DynamoComponentDeploymentSharedSpec](#dynamocomponentdeploymentsharedspec)
- [DynamoComponentDeploymentSpec](#dynamocomponentdeploymentspec)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `enabled` _boolean_ | Enabled indicates whether the ScalingAdapter should be enabled for this service.<br />When true, a DGDSA is created and owns the replicas field.<br />When false (default), no DGDSA is created and replicas can be modified directly in the DGD. | false | Optional: \{\} <br /> |
-
-
-#### ServiceCheckpointConfig
-
-
-
-ServiceCheckpointConfig configures checkpointing for a DGD service
-
-
-
-_Appears in:_
- [DynamoComponentDeploymentSharedSpec](#dynamocomponentdeploymentsharedspec)
- [DynamoComponentDeploymentSpec](#dynamocomponentdeploymentspec)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `enabled` _boolean_ | Enabled indicates whether checkpointing is enabled for this service | false | Optional: \{\} <br /> |
-| `mode` _[CheckpointMode](#checkpointmode)_ | Mode defines how checkpoint creation is handled<br />- Auto: DGD controller creates Checkpoint CR automatically<br />- Manual: User must create Checkpoint CR | Auto | Enum: [Auto Manual] <br />Optional: \{\} <br /> |
-| `checkpointRef` _string_ | CheckpointRef references an existing Checkpoint CR to use<br />If specified, Identity is ignored and this checkpoint is used directly |  | Optional: \{\} <br /> |
-| `identity` _[DynamoCheckpointIdentity](#dynamocheckpointidentity)_ | Identity defines the checkpoint identity for hash computation<br />Used when Mode is Auto or when looking up existing checkpoints<br />Required when checkpointRef is not specified |  | Optional: \{\} <br /> |
-
-
-#### ServiceCheckpointStatus
-
-
-
-ServiceCheckpointStatus contains checkpoint information for a single service.
-
-
-
-_Appears in:_
- [DynamoGraphDeploymentStatus](#dynamographdeploymentstatus)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `checkpointName` _string_ | CheckpointName is the name of the associated Checkpoint CR |  | Optional: \{\} <br /> |
-| `identityHash` _string_ | IdentityHash is the computed hash of the checkpoint identity |  | Optional: \{\} <br /> |
-| `ready` _boolean_ | Ready indicates if the checkpoint is ready for use |  | Optional: \{\} <br /> |
-
-
-#### ServiceReplicaStatus
-
-
-
-ServiceReplicaStatus contains replica information for a single service.
-
-
-
-_Appears in:_
- [DynamoGraphDeploymentStatus](#dynamographdeploymentstatus)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `componentKind` _[ComponentKind](#componentkind)_ | ComponentKind is the underlying resource kind (e.g., "PodClique", "PodCliqueScalingGroup", "Deployment", "LeaderWorkerSet"). |  | Enum: [PodClique PodCliqueScalingGroup Deployment LeaderWorkerSet] <br /> |
-| `componentName` _string_ | ComponentName is the name of the underlying resource. |  |  |
-| `replicas` _integer_ | Replicas is the total number of non-terminated replicas.<br />Required for all component kinds. |  | Minimum: 0 <br /> |
-| `updatedReplicas` _integer_ | UpdatedReplicas is the number of replicas at the current/desired revision.<br />Required for all component kinds. |  | Minimum: 0 <br /> |
-| `readyReplicas` _integer_ | ReadyReplicas is the number of ready replicas.<br />Populated for PodClique, Deployment, and LeaderWorkerSet.<br />Not available for PodCliqueScalingGroup.<br />When nil, the field is omitted from the API response. |  | Minimum: 0 <br />Optional: \{\} <br /> |
-| `availableReplicas` _integer_ | AvailableReplicas is the number of available replicas.<br />For Deployment: replicas ready for >= minReadySeconds.<br />For PodCliqueScalingGroup: replicas where all constituent PodCliques have >= MinAvailable ready pods.<br />Not available for PodClique or LeaderWorkerSet.<br />When nil, the field is omitted from the API response. |  | Minimum: 0 <br />Optional: \{\} <br /> |
-
-
-#### SharedMemorySpec
-
-
-
-
-
-
-
-_Appears in:_
- [DynamoComponentDeploymentSharedSpec](#dynamocomponentdeploymentsharedspec)
- [DynamoComponentDeploymentSpec](#dynamocomponentdeploymentspec)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `disabled` _boolean_ |  |  |  |
-| `size` _[Quantity](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#quantity-resource-api)_ |  |  |  |
-
-
-#### VolumeMount
-
-
-
-VolumeMount references a PVC defined at the top level for volumes to be mounted by the component
-
-
-
-_Appears in:_
- [DynamoComponentDeploymentSharedSpec](#dynamocomponentdeploymentsharedspec)
- [DynamoComponentDeploymentSpec](#dynamocomponentdeploymentspec)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `name` _string_ | Name references a PVC name defined in the top-level PVCs map |  | Required: \{\} <br /> |
-| `mountPoint` _string_ | MountPoint specifies where to mount the volume.<br />If useAsCompilationCache is true and mountPoint is not specified,<br />a backend-specific default will be used. |  |  |
-| `useAsCompilationCache` _boolean_ | UseAsCompilationCache indicates this volume should be used as a compilation cache.<br />When true, backend-specific environment variables will be set and default mount points may be used. | false |  |
-
-
-
-## nvidia.com/v1beta1
-
-Package v1beta1 contains API Schema definitions for the nvidia.com v1beta1 API group.
-
-### Resource Types
- [DynamoGraphDeploymentRequest](#dynamographdeploymentrequest)
-
-
-
-#### BackendSpec
-
-
-
-BackendSpec defines the inference backend and container image configuration.
-
-
-
-_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `backend` _[BackendType](#backendtype)_ | Backend specifies the inference backend to use for profiling and deployment. | auto | Enum: [auto sglang trtllm vllm] <br />Optional: \{\} <br /> |
-| `dynamoImage` _string_ | DynamoImage is the full K8s dynamo image reference<br />(e.g. "nvcr.io/nvidia/dynamo-runtime:latest"). |  | Optional: \{\} <br /> |
-
-
-#### BackendType
-
-_Underlying type:_ _string_
-
-BackendType specifies the inference backend.
-
-_Validation:_
- Enum: [auto sglang trtllm vllm]
-
-_Appears in:_
- [BackendSpec](#backendspec)
-
-| Field | Description |
-| --- | --- |
-| `auto` |  |
-| `sglang` |  |
-| `trtllm` |  |
-| `vllm` |  |
-
-
-#### DGDRPhase
-
-_Underlying type:_ _string_
-
-DGDRPhase represents the lifecycle phase of a DynamoGraphDeploymentRequest.
-
-_Validation:_
- Enum: [Pending Profiling Ready Deploying Deployed Failed]
-
-_Appears in:_
- [DynamoGraphDeploymentRequestStatus](#dynamographdeploymentrequeststatus)
-
-| Field | Description |
-| --- | --- |
-| `Pending` |  |
-| `Profiling` |  |
-| `Ready` |  |
-| `Deploying` |  |
-| `Deployed` |  |
-| `Failed` |  |
-
-
-#### DeploymentInfoStatus
-
-
-
-DeploymentInfoStatus tracks the state of the deployed DynamoGraphDeployment.
-
-
-
-_Appears in:_
- [DynamoGraphDeploymentRequestStatus](#dynamographdeploymentrequeststatus)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `replicas` _integer_ | Replicas is the desired number of replicas. |  | Optional: \{\} <br /> |
-| `availableReplicas` _integer_ | AvailableReplicas is the number of replicas that are available and ready. |  | Optional: \{\} <br /> |
-
-
-#### DynamoGraphDeploymentRequest
-
-
-
-DynamoGraphDeploymentRequest is the Schema for the dynamographdeploymentrequests API.
-It provides a simplified, SLA-driven interface for deploying inference models on Dynamo.
-Users specify a model and optional performance targets; the controller handles profiling,
-configuration selection, and deployment.
-
-Lifecycle:
- 1. Pending: Spec validated, preparing for profiling
- 2. Profiling: Profiling job is running to discover optimal configurations
- 3. Ready: Profiling complete, generated DGD spec available in status
- 4. Deploying: DGD is being created and rolled out (when autoApply=true)
- 5. Deployed: DGD is running and healthy
- 6. Failed: An unrecoverable error occurred
-
-
-
-
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `apiVersion` _string_ | `nvidia.com/v1beta1` | | |
-| `kind` _string_ | `DynamoGraphDeploymentRequest` | | |
-| `metadata` _[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#objectmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. |  |  |
-| `spec` _[DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)_ | Spec defines the desired state for this deployment request. |  |  |
-| `status` _[DynamoGraphDeploymentRequestStatus](#dynamographdeploymentrequeststatus)_ | Status reflects the current observed state of this deployment request. |  |  |
-
-
-#### DynamoGraphDeploymentRequestSpec
-
-
-
-DynamoGraphDeploymentRequestSpec defines the desired state of a DynamoGraphDeploymentRequest.
-Only the Model field is required; all other fields are optional and have sensible defaults.
-
-
-
-_Appears in:_
- [DynamoGraphDeploymentRequest](#dynamographdeploymentrequest)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `model` _[ModelSpec](#modelspec)_ | Model specifies the model to deploy including optional PVC cache configuration. |  | Required: \{\} <br /> |
-| `backend` _[BackendSpec](#backendspec)_ | Backend specifies the inference backend and container image configuration. |  | Optional: \{\} <br /> |
-| `hardware` _[HardwareSpec](#hardwarespec)_ | Hardware describes the hardware resources available for profiling and deployment.<br />Typically auto-filled by the operator from cluster discovery. |  | Optional: \{\} <br /> |
-| `workload` _[WorkloadSpec](#workloadspec)_ | Workload defines the expected workload characteristics for SLA-based profiling. |  | Optional: \{\} <br /> |
-| `sla` _[SLASpec](#slaspec)_ | SLA defines service-level agreement targets that drive profiling optimization. |  | Optional: \{\} <br /> |
-| `overrides` _[OverridesSpec](#overridesspec)_ | Overrides allows customizing the profiling job and the generated DynamoGraphDeployment. |  | Optional: \{\} <br /> |
-| `features` _[FeaturesSpec](#featuresspec)_ | Features controls optional Dynamo platform features in the generated deployment. |  | Optional: \{\} <br /> |
-| `searchStrategy` _[SearchStrategy](#searchstrategy)_ | SearchStrategy controls the profiling search depth.<br />"rapid" performs a fast sweep; "thorough" explores more configurations. | rapid | Enum: [rapid thorough] <br />Optional: \{\} <br /> |
-| `autoApply` _boolean_ | AutoApply indicates whether to automatically create a DynamoGraphDeployment<br />after profiling completes. If false, the generated spec is stored in status<br />for manual review and application. | true | Optional: \{\} <br /> |
-
-
-#### DynamoGraphDeploymentRequestStatus
-
-
-
-DynamoGraphDeploymentRequestStatus represents the observed state of a DynamoGraphDeploymentRequest.
-
-
-
-_Appears in:_
- [DynamoGraphDeploymentRequest](#dynamographdeploymentrequest)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `phase` _[DGDRPhase](#dgdrphase)_ | Phase is the high-level lifecycle phase of the deployment request. |  | Enum: [Pending Profiling Ready Deploying Deployed Failed] <br />Optional: \{\} <br /> |
-| `profilingPhase` _[ProfilingPhase](#profilingphase)_ | ProfilingPhase indicates the current sub-phase of the profiling pipeline.<br />Only meaningful when Phase is "Profiling". Cleared when profiling completes or fails. |  | Enum: [Initializing SweepingPrefill SweepingDecode SelectingConfig BuildingCurves GeneratingDGD Done] <br />Optional: \{\} <br /> |
-| `dgdName` _string_ | DGDName is the name of the generated or created DynamoGraphDeployment. |  | Optional: \{\} <br /> |
-| `profilingJobName` _string_ | ProfilingJobName is the name of the Kubernetes Job running the profiler. |  | Optional: \{\} <br /> |
-| `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#condition-v1-meta) array_ | Conditions contains the latest observed conditions of the deployment request.<br />Standard condition types include: Validated, ProfilingComplete, DeploymentReady. |  | Optional: \{\} <br /> |
-| `profilingResults` _[ProfilingResultsStatus](#profilingresultsstatus)_ | ProfilingResults contains the output of the profiling process including<br />Pareto-optimal configurations and the selected deployment configuration. |  | Optional: \{\} <br /> |
-| `deploymentInfo` _[DeploymentInfoStatus](#deploymentinfostatus)_ | DeploymentInfo tracks the state of the deployed DynamoGraphDeployment.<br />Populated when a DGD has been created (either via autoApply or manually). |  | Optional: \{\} <br /> |
-| `observedGeneration` _integer_ | ObservedGeneration is the most recent generation observed by the controller. |  | Optional: \{\} <br /> |
-
-
-#### FeaturesSpec
-
-
-
-FeaturesSpec controls optional Dynamo platform features in the generated deployment.
-
-
-
-_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `planner` _[PlannerSpec](#plannerspec)_ | Planner configures the SLA planner for autoscaling in the generated DGD. |  | Optional: \{\} <br /> |
-| `kvRouter` _boolean_ | KVRouter enables KV-cache-aware routing in the generated DGD. |  | Optional: \{\} <br /> |
-| `mocker` _[MockerSpec](#mockerspec)_ | Mocker configures the simulated (mocker) backend for testing without GPUs. |  | Optional: \{\} <br /> |
-
-
-#### HardwareSpec
-
-
-
-HardwareSpec describes the hardware resources available for profiling and deployment.
-These fields are typically auto-filled by the operator from cluster discovery.
-
-
-
-_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `gpuSku` _string_ | GPUSKU is the GPU SKU identifier (e.g., "H100_SXM", "A100_80GB"). |  | Optional: \{\} <br /> |
-| `vramMb` _float_ | VRAMMB is the VRAM per GPU in MiB. |  | Optional: \{\} <br /> |
-| `totalGpus` _integer_ | TotalGPUs is the total number of GPUs available in the cluster. |  | Optional: \{\} <br /> |
-| `numGpusPerNode` _integer_ | NumGPUsPerNode is the number of GPUs per node. |  | Optional: \{\} <br /> |
-
-
-#### MockerSpec
-
-
-
-MockerSpec configures the simulated (mocker) backend.
-
-
-
-_Appears in:_
- [FeaturesSpec](#featuresspec)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `enabled` _boolean_ | Enabled indicates whether to deploy mocker workers instead of real inference workers.<br />Useful for large-scale testing without GPUs. |  | Optional: \{\} <br /> |
-
-
-#### ModelCacheSpec
-
-
-
-ModelCacheSpec references a PVC containing pre-downloaded model weights.
-
-
-
-_Appears in:_
- [ModelSpec](#modelspec)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `pvcName` _string_ | PVCName is the name of the PersistentVolumeClaim containing model weights.<br />The PVC must exist in the same namespace as the DGDR. |  | Optional: \{\} <br /> |
-| `modelPathInPvc` _string_ | ModelPathInPVC is the path to the model checkpoint directory within the PVC<br />(e.g. "deepseek-r1" or "models/Llama-3.1-405B-FP8"). |  | Optional: \{\} <br /> |
-| `pvcMountPath` _string_ | PVCMountPath is the mount path for the PVC inside the container. | /opt/model-cache | Optional: \{\} <br /> |
-
-
-#### ModelSpec
-
-
-
-ModelSpec defines the model to deploy.
-
-
-
-_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `modelName` _string_ | ModelName is the model name or identifier (e.g. "meta-llama/Llama-3.1-405B").<br />Can be a HuggingFace ID or a private model name. Always required. |  | MinLength: 1 <br />Required: \{\} <br /> |
-| `modelCache` _[ModelCacheSpec](#modelcachespec)_ | ModelCache is the optional PVC model cache configuration.<br />When provided, weights are loaded from the PVC instead of downloading from HF. |  | Optional: \{\} <br /> |
-
-
-#### OptimizationType
-
-_Underlying type:_ _string_
-
-OptimizationType specifies the profiling optimization strategy.
-
-_Validation:_
- Enum: [latency throughput]
-
-_Appears in:_
- [SLASpec](#slaspec)
-
-| Field | Description |
-| --- | --- |
-| `latency` |  |
-| `throughput` |  |
-
-
-#### OverridesSpec
-
-
-
-OverridesSpec allows customizing the profiling job and the generated DynamoGraphDeployment.
-
-
-
-_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `profilingJob` _[JobSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#jobspec-v1-batch)_ | ProfilingJob allows overriding the profiling Job specification.<br />Fields set here are merged into the controller-generated Job spec. |  | Optional: \{\} <br /> |
-| `dgd` _[RawExtension](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg)_ | DGD allows providing a full or partial DynamoGraphDeployment to use as the base<br />for the generated deployment. Fields from profiling results are merged on top. |  | EmbeddedResource: \{\} <br />Optional: \{\} <br /> |
-
-
-#### ParetoConfig
-
-
-
-ParetoConfig represents a single Pareto-optimal deployment configuration
-discovered during profiling.
-
-
-
-_Appears in:_
- [ProfilingResultsStatus](#profilingresultsstatus)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `config` _[RawExtension](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg)_ | Config is the full deployment configuration for this Pareto point. |  | Type: object <br /> |
-
-
-#### PlannerPreDeploymentSweepMode
-
-_Underlying type:_ _string_
-
-PlannerPreDeploymentSweepMode controls pre-deployment sweeping thoroughness for planner profiling.
-
-_Validation:_
- Enum: [none rapid thorough]
-
-_Appears in:_
- [PlannerSpec](#plannerspec)
-
-| Field | Description |
-| --- | --- |
-| `none` |  |
-| `rapid` |  |
-| `thorough` |  |
-
-
-#### PlannerSpec
-
-
-
-PlannerSpec configures the SLA planner for autoscaling in the generated DGD.
-
-
-
-_Appears in:_
- [FeaturesSpec](#featuresspec)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `enabled` _boolean_ | Enabled indicates whether the planner is enabled. |  | Optional: \{\} <br /> |
-| `plannerPreDeploymentSweeping` _[PlannerPreDeploymentSweepMode](#plannerpredeploymentsweepmode)_ | PlannerPreDeploymentSweeping controls pre-deployment sweeping mode for planner in-depth profiling.<br />"none" means no pre-deployment sweep (only load-based scaling).<br />"rapid" uses AI Configurator to simulate engine performance.<br />"thorough" uses real GPUs to measure engine performance (takes several hours). |  | Enum: [none rapid thorough] <br />Optional: \{\} <br /> |
-| `plannerArgsList` _string array_ | PlannerArgsList is a list of additional planner arguments. |  | Optional: \{\} <br /> |
-
-
-#### ProfilingPhase
-
-_Underlying type:_ _string_
-
-ProfilingPhase represents a sub-phase within the profiling pipeline.
-When the DGDR Phase is "Profiling", this value indicates which step
-of the profiling pipeline is currently executing.
-
-_Validation:_
- Enum: [Initializing SweepingPrefill SweepingDecode SelectingConfig BuildingCurves GeneratingDGD Done]
-
-_Appears in:_
- [DynamoGraphDeploymentRequestStatus](#dynamographdeploymentrequeststatus)
-
-| Field | Description |
-| --- | --- |
-| `Initializing` | Profiler is loading the DGD template, detecting GPU hardware,<br />and resolving the model architecture from HuggingFace.<br /> |
-| `SweepingPrefill` | Sweeping parallelization strategies (TP/TEP/DEP) across GPU counts<br />for prefill, measuring TTFT at each configuration.<br /> |
-| `SweepingDecode` | Sweeping parallelization strategies and concurrency levels<br />for decode, measuring ITL at each configuration.<br /> |
-| `SelectingConfig` | Filtering results against SLA targets and selecting the most<br />cost-efficient configuration that meets TTFT/ITL requirements.<br /> |
-| `BuildingCurves` | Building detailed interpolation curves (ISL→TTFT for prefill,<br />KV-usage×context-length→ITL for decode) using the selected configs.<br /> |
-| `GeneratingDGD` | Packaging profiling data into a ConfigMap and generating<br />the final DGD YAML with planner integration.<br /> |
-| `Done` | Profiling pipeline finished successfully.<br /> |
-
-
-#### ProfilingResultsStatus
-
-
-
-ProfilingResultsStatus contains the output of the profiling process.
-
-
-
-_Appears in:_
- [DynamoGraphDeploymentRequestStatus](#dynamographdeploymentrequeststatus)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `pareto` _[ParetoConfig](#paretoconfig) array_ | Pareto is the list of Pareto-optimal deployment configurations discovered during profiling.<br />Each entry represents a different cost/performance trade-off. |  | Optional: \{\} <br /> |
-| `selectedConfig` _[RawExtension](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg)_ | SelectedConfig is the recommended configuration chosen by the profiler<br />based on the SLA targets. This is the configuration used for deployment<br />when autoApply is true. |  | Type: object <br />Optional: \{\} <br /> |
-
-
-#### SLASpec
-
-
-
-SLASpec defines the service-level agreement targets.
-
-
-
-_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `optimizationType` _[OptimizationType](#optimizationtype)_ | OptimizationType controls the profiling optimization strategy.<br />Use when explicit SLA targets (ttft+itl or e2eLatency) are not known. |  | Enum: [latency throughput] <br />Optional: \{\} <br /> |
-| `ttft` _float_ | TTFT is the Time To First Token target in milliseconds. |  | Optional: \{\} <br /> |
-| `itl` _float_ | ITL is the Inter-Token Latency target in milliseconds. |  | Optional: \{\} <br /> |
-| `e2eLatency` _float_ | E2ELatency is the target end-to-end request latency in milliseconds.<br />Alternative to specifying TTFT + ITL. |  | Optional: \{\} <br /> |
-
-
-#### SearchStrategy
-
-_Underlying type:_ _string_
-
-SearchStrategy controls the profiling search depth.
-
-_Validation:_
- Enum: [rapid thorough]
-
-_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
-
-| Field | Description |
-| --- | --- |
-| `rapid` |  |
-| `thorough` |  |
-
-
-#### WorkloadSpec
-
-
-
-WorkloadSpec defines the workload characteristics for SLA-based profiling.
-
-
-
-_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
-
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `isl` _integer_ | ISL is the Input Sequence Length (number of tokens). |  | Optional: \{\} <br /> |
-| `osl` _integer_ | OSL is the Output Sequence Length (number of tokens). |  | Optional: \{\} <br /> |
-| `concurrency` _float_ | Concurrency is the target concurrency level.<br />Required (or RequestRate) when the planner is disabled. |  | Optional: \{\} <br /> |
-| `requestRate` _float_ | RequestRate is the target request rate (req/s).<br />Required (or Concurrency) when the planner is disabled. |  | Optional: \{\} <br /> |
-
-
-# Operator Default Values Injection
-
-The Dynamo operator automatically applies default values to various fields when they are not explicitly specified in your deployments. These defaults include:
-
- **Health Probes**: Startup, liveness, and readiness probes are configured differently for frontend, worker, and planner components. For example, worker components receive a startup probe with a 2-hour timeout (720 failures × 10 seconds) to accommodate long model loading times.
-
- **Security Context**: All components receive `fsGroup: 1000` by default to ensure proper file permissions for mounted volumes. This can be overridden via the `extraPodSpec.securityContext` field.
-
- **Shared Memory**: All components receive an 8Gi shared memory volume mounted at `/dev/shm` by default (can be disabled or resized via the `sharedMemory` field).
-
- **Environment Variables**: Components automatically receive environment variables like `DYN_NAMESPACE`, `DYN_PARENT_DGD_K8S_NAME`, `DYNAMO_PORT`, and backend-specific variables.
-
- **Pod Configuration**: Default `terminationGracePeriodSeconds` of 60 seconds and `restartPolicy: Always`.
-
- **Autoscaling**: When enabled without explicit metrics, defaults to CPU-based autoscaling with 80% target utilization.
-
- **Backend-Specific Behavior**: For multinode deployments, probes are automatically modified or removed for worker nodes depending on the backend framework (VLLM, SGLang, or TensorRT-LLM).
-
-## Pod Specification Defaults
-
-All components receive the following pod-level defaults unless overridden:
-
- **`terminationGracePeriodSeconds`**: `60` seconds
- **`restartPolicy`**: `Always`
-
-## Security Context
-
-The operator automatically applies default security context settings to all components to ensure proper file permissions, particularly for mounted volumes:
-
- **`fsGroup`**: `1000` - Sets the group ownership of mounted volumes and any files created in those volumes
-
-This default ensures that non-root containers can write to mounted volumes (like model caches or persistent storage) without permission issues. The `fsGroup` setting is particularly important for:
- Model downloads and caching
- Compilation cache directories
- Persistent volume claims (PVCs)
- SSH key generation in multinode deployments
-
-### Overriding Security Context
-
-To override the default security context, specify your own `securityContext` in the `extraPodSpec` of your component:
-
-```yaml
-services:
-  YourWorker:
-    extraPodSpec:
-      securityContext:
-        fsGroup: 2000  # Custom group ID
-        runAsUser: 1000
-        runAsGroup: 1000
-        runAsNonRoot: true
-```
-
-**Important**: When you provide *any* `securityContext` object in `extraPodSpec`, the operator will not inject any defaults. This gives you complete control over the security context, including the ability to run as root (by omitting `runAsNonRoot` or setting it to `false`).
-
-### OpenShift and Security Context Constraints
-
-In OpenShift environments with Security Context Constraints (SCCs), you may need to omit explicit UID/GID values to allow OpenShift's admission controllers to assign them dynamically:
-
-```yaml
-services:
-  YourWorker:
-    extraPodSpec:
-      securityContext:
-        # Omit fsGroup to let OpenShift assign it based on SCC
-        # OpenShift will inject the appropriate UID range
-```
-
-Alternatively, if you want to keep the default `fsGroup: 1000` behavior and are certain your cluster allows it, you don't need to specify anything - the operator defaults will work.
-
-## Shared Memory Configuration
-
-Shared memory is enabled by default for all components:
-
- **Enabled**: `true` (unless explicitly disabled via `sharedMemory.disabled`)
- **Size**: `8Gi`
- **Mount Path**: `/dev/shm`
- **Volume Type**: `emptyDir` with `memory` medium
-
-To disable shared memory or customize the size, use the `sharedMemory` field in your component specification.
-
-## Health Probes by Component Type
-
-The operator applies different default health probes based on the component type.
-
-### Frontend Components
-
-Frontend components receive the following probe configurations:
-
-**Liveness Probe:**
- **Type**: HTTP GET
- **Path**: `/health`
- **Port**: `http` (8000)
- **Initial Delay**: 60 seconds
- **Period**: 60 seconds
- **Timeout**: 30 seconds
- **Failure Threshold**: 10
-
-**Readiness Probe:**
- **Type**: Exec command
- **Command**: `curl -s http://localhost:${DYNAMO_PORT}/health | jq -e ".status == \"healthy\""`
- **Initial Delay**: 60 seconds
- **Period**: 60 seconds
- **Timeout**: 30 seconds
- **Failure Threshold**: 10
-
-### Worker Components
-
-Worker components receive the following probe configurations:
-
-**Liveness Probe:**
- **Type**: HTTP GET
- **Path**: `/live`
- **Port**: `system` (9090)
- **Period**: 5 seconds
- **Timeout**: 30 seconds
- **Failure Threshold**: 1
-
-**Readiness Probe:**
- **Type**: HTTP GET
- **Path**: `/health`
- **Port**: `system` (9090)
- **Period**: 10 seconds
- **Timeout**: 30 seconds
- **Failure Threshold**: 60
-
-**Startup Probe:**
- **Type**: HTTP GET
- **Path**: `/live`
- **Port**: `system` (9090)
- **Period**: 10 seconds
- **Timeout**: 5 seconds
- **Failure Threshold**: 720 (allows up to 2 hours for startup: 10s × 720 = 7200s)
-
-:::{note}
-For larger models (typically >70B parameters) or slower storage systems, you may need to increase the `failureThreshold` to allow more time for model loading. Calculate the required threshold based on your expected startup time: `failureThreshold = (expected_startup_seconds / period)`. Override the startup probe in your component specification if the default 2-hour window is insufficient.
-:::
-
-### Multinode Deployment Probe Modifications
-
-For multinode deployments, the operator modifies probes based on the backend framework and node role:
-
-#### VLLM Backend
-
-The operator automatically selects between two deployment modes based on parallelism configuration:
-
-**Tensor/Pipeline Parallel Mode** (when `world_size > GPUs_per_node`):
- Uses Ray for distributed execution (`--distributed-executor-backend ray`)
- **Leader nodes**: Starts Ray head and runs vLLM; all probes remain active
- **Worker nodes**: Run Ray agents only; all probes (liveness, readiness, startup) are removed
-
-**Data Parallel Mode** (when `world_size × data_parallel_size > GPUs_per_node`):
- **Worker nodes**: All probes (liveness, readiness, startup) are removed
- **Leader nodes**: All probes remain active
-
-#### SGLang Backend
- **Worker nodes**: All probes (liveness, readiness, startup) are removed
-
-#### TensorRT-LLM Backend
- **Leader nodes**: All probes remain unchanged
- **Worker nodes**:
-  - Liveness and startup probes are removed
-  - Readiness probe is replaced with a TCP socket check on SSH port (2222):
-    - **Initial Delay**: 20 seconds
-    - **Period**: 20 seconds
-    - **Timeout**: 5 seconds
-    - **Failure Threshold**: 10
-
-## Environment Variables
-
-The operator automatically injects environment variables based on component type and configuration:
-
-### All Components
-
- **`DYN_NAMESPACE`**: The Dynamo namespace for the component
- **`DYN_PARENT_DGD_K8S_NAME`**: The parent DynamoGraphDeployment Kubernetes resource name
- **`DYN_PARENT_DGD_K8S_NAMESPACE`**: The parent DynamoGraphDeployment Kubernetes namespace
-
-### Frontend Components
-
- **`DYNAMO_PORT`**: `8000`
- **`DYN_HTTP_PORT`**: `8000`
-
-### Worker Components
-
- **`DYN_SYSTEM_PORT`**: `9090` (automatically enables the system metrics server)
- **`DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS`**: `["generate"]`
- **`DYN_SYSTEM_ENABLED`**: `true` (needed for runtime images 0.6.1 and older)
-
-### Planner Components
-
- **`PLANNER_PROMETHEUS_PORT`**: `9085`
-
-### VLLM Backend (with compilation cache)
-
-When a volume mount is configured with `useAsCompilationCache: true`:
- **`VLLM_CACHE_ROOT`**: Set to the mount point of the cache volume
-
-## Service Account
-
-Planner components automatically receive the following service account:
-
- **`serviceAccountName`**: `planner-serviceaccount`
-
-## Image Pull Secrets
-
-The operator automatically discovers and injects image pull secrets for container images. When a component specifies a container image, the operator:
-
-1. Scans all Kubernetes secrets of type `kubernetes.io/dockerconfigjson` in the component's namespace
-2. Extracts the docker registry server URLs from each secret's authentication configuration
-3. Matches the container image's registry host against the discovered registry URLs
-4. Automatically injects matching secrets as `imagePullSecrets` in the pod specification
-
-This eliminates the need to manually specify image pull secrets for each component. The operator maintains an internal index of docker secrets and their associated registries, refreshing this index periodically.
-
-**To disable automatic image pull secret discovery** for a specific component, add the following annotation:
-
-```yaml
-annotations:
-  nvidia.com/disable-image-pull-secret-discovery: "true"
-```
-
-## Autoscaling Defaults
-
-When autoscaling is enabled but no metrics are specified, the operator applies:
-
- **Default Metric**: CPU utilization
- **Target Average Utilization**: `80%`
-
-## Port Configurations
-
-Default container ports are configured based on component type:
-
-### Frontend Components
- **Port**: 8000
- **Protocol**: TCP
- **Name**: `http`
-
-### Worker Components
- **Port**: 9090
- **Protocol**: TCP
- **Name**: `system`
-
-### Planner Components
- **Port**: 9085
- **Protocol**: TCP
- **Name**: `metrics`
-
-## Backend-Specific Configurations
-
-### VLLM
- **Ray Head Port**: 6379 (for Ray cluster coordination in multinode TP/PP deployments)
- **Data Parallel RPC Port**: 13445 (for data parallel multinode deployments)
-
-### SGLang
- **Distribution Init Port**: 29500 (for multinode deployments)
-
-### TensorRT-LLM
- **SSH Port**: 2222 (for multinode MPI communication)
- **OpenMPI Environment**: `OMPI_MCA_orte_keep_fqdn_hostnames=1`
-
-## Implementation Reference
-
-For users who want to understand the implementation details or contribute to the operator, the default values described in this document are set in the following source files:
-
- **Health Probes, Security Context & Pod Specifications**: [`internal/dynamo/graph.go`](https://github.com/ai-dynamo/dynamo/blob/main/deploy/operator/internal/dynamo/graph.go) - Contains the main logic for applying default probes, security context, environment variables, shared memory, and pod configurations
- **Component-Specific Defaults**:
-  - [`internal/dynamo/component_frontend.go`](https://github.com/ai-dynamo/dynamo/blob/main/deploy/operator/internal/dynamo/component_frontend.go)
-  - [`internal/dynamo/component_worker.go`](https://github.com/ai-dynamo/dynamo/blob/main/deploy/operator/internal/dynamo/component_worker.go)
-  - [`internal/dynamo/component_planner.go`](https://github.com/ai-dynamo/dynamo/blob/main/deploy/operator/internal/dynamo/component_planner.go)
- **Image Pull Secrets**: [`internal/secrets/docker.go`](https://github.com/ai-dynamo/dynamo/blob/main/deploy/operator/internal/secrets/docker.go) - Implements the docker secret indexer and automatic discovery
- **Backend-Specific Behavior**:
-  - [`internal/dynamo/backend_vllm.go`](https://github.com/ai-dynamo/dynamo/blob/main/deploy/operator/internal/dynamo/backend_vllm.go)
-  - [`internal/dynamo/backend_sglang.go`](https://github.com/ai-dynamo/dynamo/blob/main/deploy/operator/internal/dynamo/backend_sglang.go)
-  - [`internal/dynamo/backend_trtllm.go`](https://github.com/ai-dynamo/dynamo/blob/main/deploy/operator/internal/dynamo/backend_trtllm.go)
- **Constants & Annotations**: [`internal/consts/consts.go`](https://github.com/ai-dynamo/dynamo/blob/main/deploy/operator/internal/consts/consts.go) - Defines annotation keys and other constants
-
-## Notes
-
- All these defaults can be overridden by explicitly specifying values in your DynamoComponentDeployment or DynamoGraphDeployment resources
- User-specified probes (via `livenessProbe`, `readinessProbe`, or `startupProbe` fields) take precedence over operator defaults
- For security context, if you provide *any* `securityContext` in `extraPodSpec`, no defaults will be injected, giving you full control
- For multinode deployments, some defaults are modified or removed as described above to accommodate distributed execution patterns
- The `extraPodSpec.mainContainer` field can be used to override probe configurations set by the operator
--- a/docs/pages/kubernetes/autoscaling.md
+++ b/docs/pages/kubernetes/autoscaling.md
--- a/docs/pages/kubernetes/chrek/README.md
+++ b/docs/pages/kubernetes/chrek/README.md
--- a/docs/pages/kubernetes/chrek/dynamo.md
+++ b/docs/pages/kubernetes/chrek/dynamo.md
--- a/docs/pages/kubernetes/deployment/create-deployment.md
+++ b/docs/pages/kubernetes/deployment/create-deployment.md
--- a/docs/pages/kubernetes/deployment/dynamomodel-guide.md
+++ b/docs/pages/kubernetes/deployment/dynamomodel-guide.md
--- a/docs/pages/kubernetes/deployment/minikube.md
+++ b/docs/pages/kubernetes/deployment/minikube.md
--- a/docs/pages/kubernetes/deployment/multinode-deployment.md
+++ b/docs/pages/kubernetes/deployment/multinode-deployment.md
--- a/docs/pages/kubernetes/dynamo-operator.md
+++ b/docs/pages/kubernetes/dynamo-operator.md
--- a/docs/pages/kubernetes/fluxcd.md
+++ b/docs/pages/kubernetes/fluxcd.md