feat: Add v1beta1 DGDR API with conversion framework (#6352)

Signed-off-by: Jont828 <jt572@cornell.edu> Signed-off-by: Hongkuan Zhou <hongkuanz@nvidia.com> Signed-off-by: Hannah Zhang <hannahz@nvidia.com> Co-authored-by: Jont828 <jt572@cornell.edu>

feat: Add v1beta1 DGDR API with conversion framework (#6352)
Signed-off-by: Jont828 <jt572@cornell.edu> Signed-off-by: Hongkuan Zhou <hongkuanz@nvidia.com> Signed-off-by: Hannah Zhang <hannahz@nvidia.com> Co-authored-by: Jont828 <jt572@cornell.edu>
ebc61637 · hhzhang16 · GitHub · 7bbacce1 · ebc61637 · ebc61637
Unverified Commit ebc61637 authored Feb 19, 2026 by hhzhang16 Committed by GitHub Feb 19, 2026
Hide whitespace changes
Inline Side-by-side

Showing with 856 additions and 0 deletions

docs/kubernetes/api_reference.md docs/kubernetes/api_reference.md +441 -0

docs/pages/kubernetes/api-reference.md docs/pages/kubernetes/api-reference.md +415 -0

No files found.
--- a/docs/kubernetes/api_reference.md
+++ b/docs/kubernetes/api_reference.md
@@ -22,6 +22,7 @@ limitations under the License.

 ## Packages
 - [nvidia.com/v1alpha1](#nvidiacomv1alpha1)
+- [nvidia.com/v1beta1](#nvidiacomv1beta1)


 ## nvidia.com/v1alpha1
@@ -1129,6 +1130,446 @@ _Appears in:_
 | `useAsCompilationCache` _boolean_ | UseAsCompilationCache indicates this volume should be used as a compilation cache.<br />When true, backend-specific environment variables will be set and default mount points may be used. | false |  |


+
+## nvidia.com/v1beta1
+
+Package v1beta1 contains API Schema definitions for the nvidia.com v1beta1 API group.
+
+### Resource Types
+- [DynamoGraphDeploymentRequest](#dynamographdeploymentrequest)
+
+
+
+#### BackendSpec
+
+
+
+BackendSpec defines the inference backend and container image configuration.
+
+
+
+_Appears in:_
+- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `backend` _[BackendType](#backendtype)_ | Backend specifies the inference backend to use for profiling and deployment. | auto | Enum: [auto sglang trtllm vllm] <br />Optional: \{\} <br /> |
+| `dynamoImage` _string_ | DynamoImage is the full K8s dynamo image reference<br />(e.g. "nvcr.io/nvidia/dynamo-runtime:latest"). |  | Optional: \{\} <br /> |
+
+
+#### BackendType
+
+_Underlying type:_ _string_
+
+BackendType specifies the inference backend.
+
+_Validation:_
+- Enum: [auto sglang trtllm vllm]
+
+_Appears in:_
+- [BackendSpec](#backendspec)
+
+| Field | Description |
+| --- | --- |
+| `auto` |  |
+| `sglang` |  |
+| `trtllm` |  |
+| `vllm` |  |
+
+
+#### DGDRPhase
+
+_Underlying type:_ _string_
+
+DGDRPhase represents the lifecycle phase of a DynamoGraphDeploymentRequest.
+
+_Validation:_
+- Enum: [Pending Profiling Ready Deploying Deployed Failed]
+
+_Appears in:_
+- [DynamoGraphDeploymentRequestStatus](#dynamographdeploymentrequeststatus)
+
+| Field | Description |
+| --- | --- |
+| `Pending` |  |
+| `Profiling` |  |
+| `Ready` |  |
+| `Deploying` |  |
+| `Deployed` |  |
+| `Failed` |  |
+
+
+#### DeploymentInfoStatus
+
+
+
+DeploymentInfoStatus tracks the state of the deployed DynamoGraphDeployment.
+
+
+
+_Appears in:_
+- [DynamoGraphDeploymentRequestStatus](#dynamographdeploymentrequeststatus)
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `replicas` _integer_ | Replicas is the desired number of replicas. |  | Optional: \{\} <br /> |
+| `availableReplicas` _integer_ | AvailableReplicas is the number of replicas that are available and ready. |  | Optional: \{\} <br /> |
+
+
+#### DynamoGraphDeploymentRequest
+
+
+
+DynamoGraphDeploymentRequest is the Schema for the dynamographdeploymentrequests API.
+It provides a simplified, SLA-driven interface for deploying inference models on Dynamo.
+Users specify a model and optional performance targets; the controller handles profiling,
+configuration selection, and deployment.
+
+Lifecycle:
+ 1. Pending: Spec validated, preparing for profiling
+ 2. Profiling: Profiling job is running to discover optimal configurations
+ 3. Ready: Profiling complete, generated DGD spec available in status
+ 4. Deploying: DGD is being created and rolled out (when autoApply=true)
+ 5. Deployed: DGD is running and healthy
+ 6. Failed: An unrecoverable error occurred
+
+
+
+
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `apiVersion` _string_ | `nvidia.com/v1beta1` | | |
+| `kind` _string_ | `DynamoGraphDeploymentRequest` | | |
+| `metadata` _[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#objectmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. |  |  |
+| `spec` _[DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)_ | Spec defines the desired state for this deployment request. |  |  |
+| `status` _[DynamoGraphDeploymentRequestStatus](#dynamographdeploymentrequeststatus)_ | Status reflects the current observed state of this deployment request. |  |  |
+
+
+#### DynamoGraphDeploymentRequestSpec
+
+
+
+DynamoGraphDeploymentRequestSpec defines the desired state of a DynamoGraphDeploymentRequest.
+Only the Model field is required; all other fields are optional and have sensible defaults.
+
+
+
+_Appears in:_
+- [DynamoGraphDeploymentRequest](#dynamographdeploymentrequest)
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `model` _[ModelSpec](#modelspec)_ | Model specifies the model to deploy including optional PVC cache configuration. |  | Required: \{\} <br /> |
+| `backend` _[BackendSpec](#backendspec)_ | Backend specifies the inference backend and container image configuration. |  | Optional: \{\} <br /> |
+| `hardware` _[HardwareSpec](#hardwarespec)_ | Hardware describes the hardware resources available for profiling and deployment.<br />Typically auto-filled by the operator from cluster discovery. |  | Optional: \{\} <br /> |
+| `workload` _[WorkloadSpec](#workloadspec)_ | Workload defines the expected workload characteristics for SLA-based profiling. |  | Optional: \{\} <br /> |
+| `sla` _[SLASpec](#slaspec)_ | SLA defines service-level agreement targets that drive profiling optimization. |  | Optional: \{\} <br /> |
+| `overrides` _[OverridesSpec](#overridesspec)_ | Overrides allows customizing the profiling job and the generated DynamoGraphDeployment. |  | Optional: \{\} <br /> |
+| `features` _[FeaturesSpec](#featuresspec)_ | Features controls optional Dynamo platform features in the generated deployment. |  | Optional: \{\} <br /> |
+| `searchStrategy` _[SearchStrategy](#searchstrategy)_ | SearchStrategy controls the profiling search depth.<br />"rapid" performs a fast sweep; "thorough" explores more configurations. | rapid | Enum: [rapid thorough] <br />Optional: \{\} <br /> |
+| `autoApply` _boolean_ | AutoApply indicates whether to automatically create a DynamoGraphDeployment<br />after profiling completes. If false, the generated spec is stored in status<br />for manual review and application. | true | Optional: \{\} <br /> |
+
+
+#### DynamoGraphDeploymentRequestStatus
+
+
+
+DynamoGraphDeploymentRequestStatus represents the observed state of a DynamoGraphDeploymentRequest.
+
+
+
+_Appears in:_
+- [DynamoGraphDeploymentRequest](#dynamographdeploymentrequest)
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `phase` _[DGDRPhase](#dgdrphase)_ | Phase is the high-level lifecycle phase of the deployment request. |  | Enum: [Pending Profiling Ready Deploying Deployed Failed] <br />Optional: \{\} <br /> |
+| `profilingPhase` _[ProfilingPhase](#profilingphase)_ | ProfilingPhase indicates the current sub-phase of the profiling pipeline.<br />Only meaningful when Phase is "Profiling". Cleared when profiling completes or fails. |  | Enum: [Initializing SweepingPrefill SweepingDecode SelectingConfig BuildingCurves GeneratingDGD Done] <br />Optional: \{\} <br /> |
+| `dgdName` _string_ | DGDName is the name of the generated or created DynamoGraphDeployment. |  | Optional: \{\} <br /> |
+| `profilingJobName` _string_ | ProfilingJobName is the name of the Kubernetes Job running the profiler. |  | Optional: \{\} <br /> |
+| `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#condition-v1-meta) array_ | Conditions contains the latest observed conditions of the deployment request.<br />Standard condition types include: Validated, ProfilingComplete, DeploymentReady. |  | Optional: \{\} <br /> |
+| `profilingResults` _[ProfilingResultsStatus](#profilingresultsstatus)_ | ProfilingResults contains the output of the profiling process including<br />Pareto-optimal configurations and the selected deployment configuration. |  | Optional: \{\} <br /> |
+| `deploymentInfo` _[DeploymentInfoStatus](#deploymentinfostatus)_ | DeploymentInfo tracks the state of the deployed DynamoGraphDeployment.<br />Populated when a DGD has been created (either via autoApply or manually). |  | Optional: \{\} <br /> |
+| `observedGeneration` _integer_ | ObservedGeneration is the most recent generation observed by the controller. |  | Optional: \{\} <br /> |
+
+
+#### FeaturesSpec
+
+
+
+FeaturesSpec controls optional Dynamo platform features in the generated deployment.
+
+
+
+_Appears in:_
+- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `planner` _[PlannerSpec](#plannerspec)_ | Planner configures the SLA planner for autoscaling in the generated DGD. |  | Optional: \{\} <br /> |
+| `kvRouter` _boolean_ | KVRouter enables KV-cache-aware routing in the generated DGD. |  | Optional: \{\} <br /> |
+| `mocker` _[MockerSpec](#mockerspec)_ | Mocker configures the simulated (mocker) backend for testing without GPUs. |  | Optional: \{\} <br /> |
+
+
+#### HardwareSpec
+
+
+
+HardwareSpec describes the hardware resources available for profiling and deployment.
+These fields are typically auto-filled by the operator from cluster discovery.
+
+
+
+_Appears in:_
+- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `gpuSku` _string_ | GPUSKU is the GPU SKU identifier (e.g., "H100_SXM", "A100_80GB"). |  | Optional: \{\} <br /> |
+| `vramMb` _float_ | VRAMMB is the VRAM per GPU in MiB. |  | Optional: \{\} <br /> |
+| `totalGpus` _integer_ | TotalGPUs is the total number of GPUs available in the cluster. |  | Optional: \{\} <br /> |
+| `numGpusPerNode` _integer_ | NumGPUsPerNode is the number of GPUs per node. |  | Optional: \{\} <br /> |
+
+
+#### MockerSpec
+
+
+
+MockerSpec configures the simulated (mocker) backend.
+
+
+
+_Appears in:_
+- [FeaturesSpec](#featuresspec)
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `enabled` _boolean_ | Enabled indicates whether to deploy mocker workers instead of real inference workers.<br />Useful for large-scale testing without GPUs. |  | Optional: \{\} <br /> |
+
+
+#### ModelCacheSpec
+
+
+
+ModelCacheSpec references a PVC containing pre-downloaded model weights.
+
+
+
+_Appears in:_
+- [ModelSpec](#modelspec)
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `pvcName` _string_ | PVCName is the name of the PersistentVolumeClaim containing model weights.<br />The PVC must exist in the same namespace as the DGDR. |  | Optional: \{\} <br /> |
+| `modelPathInPvc` _string_ | ModelPathInPVC is the path to the model checkpoint directory within the PVC<br />(e.g. "deepseek-r1" or "models/Llama-3.1-405B-FP8"). |  | Optional: \{\} <br /> |
+| `pvcMountPath` _string_ | PVCMountPath is the mount path for the PVC inside the container. | /opt/model-cache | Optional: \{\} <br /> |
+
+
+#### ModelSpec
+
+
+
+ModelSpec defines the model to deploy.
+
+
+
+_Appears in:_
+- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `modelName` _string_ | ModelName is the model name or identifier (e.g. "meta-llama/Llama-3.1-405B").<br />Can be a HuggingFace ID or a private model name. Always required. |  | MinLength: 1 <br />Required: \{\} <br /> |
+| `modelCache` _[ModelCacheSpec](#modelcachespec)_ | ModelCache is the optional PVC model cache configuration.<br />When provided, weights are loaded from the PVC instead of downloading from HF. |  | Optional: \{\} <br /> |
+
+
+#### OptimizationType
+
+_Underlying type:_ _string_
+
+OptimizationType specifies the profiling optimization strategy.
+
+_Validation:_
+- Enum: [latency throughput]
+
+_Appears in:_
+- [SLASpec](#slaspec)
+
+| Field | Description |
+| --- | --- |
+| `latency` |  |
+| `throughput` |  |
+
+
+#### OverridesSpec
+
+
+
+OverridesSpec allows customizing the profiling job and the generated DynamoGraphDeployment.
+
+
+
+_Appears in:_
+- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `profilingJob` _[JobSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#jobspec-v1-batch)_ | ProfilingJob allows overriding the profiling Job specification.<br />Fields set here are merged into the controller-generated Job spec. |  | Optional: \{\} <br /> |
+| `dgd` _[RawExtension](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg)_ | DGD allows providing a full or partial DynamoGraphDeployment to use as the base<br />for the generated deployment. Fields from profiling results are merged on top. |  | EmbeddedResource: \{\} <br />Optional: \{\} <br /> |
+
+
+#### ParetoConfig
+
+
+
+ParetoConfig represents a single Pareto-optimal deployment configuration
+discovered during profiling.
+
+
+
+_Appears in:_
+- [ProfilingResultsStatus](#profilingresultsstatus)
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `config` _[RawExtension](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg)_ | Config is the full deployment configuration for this Pareto point. |  | Type: object <br /> |
+
+
+#### PlannerPreDeploymentSweepMode
+
+_Underlying type:_ _string_
+
+PlannerPreDeploymentSweepMode controls pre-deployment sweeping thoroughness for planner profiling.
+
+_Validation:_
+- Enum: [none rapid thorough]
+
+_Appears in:_
+- [PlannerSpec](#plannerspec)
+
+| Field | Description |
+| --- | --- |
+| `none` |  |
+| `rapid` |  |
+| `thorough` |  |
+
+
+#### PlannerSpec
+
+
+
+PlannerSpec configures the SLA planner for autoscaling in the generated DGD.
+
+
+
+_Appears in:_
+- [FeaturesSpec](#featuresspec)
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `enabled` _boolean_ | Enabled indicates whether the planner is enabled. |  | Optional: \{\} <br /> |
+| `plannerPreDeploymentSweeping` _[PlannerPreDeploymentSweepMode](#plannerpredeploymentsweepmode)_ | PlannerPreDeploymentSweeping controls pre-deployment sweeping mode for planner in-depth profiling.<br />"none" means no pre-deployment sweep (only load-based scaling).<br />"rapid" uses AI Configurator to simulate engine performance.<br />"thorough" uses real GPUs to measure engine performance (takes several hours). |  | Enum: [none rapid thorough] <br />Optional: \{\} <br /> |
+| `plannerArgsList` _string array_ | PlannerArgsList is a list of additional planner arguments. |  | Optional: \{\} <br /> |
+
+
+#### ProfilingPhase
+
+_Underlying type:_ _string_
+
+ProfilingPhase represents a sub-phase within the profiling pipeline.
+When the DGDR Phase is "Profiling", this value indicates which step
+of the profiling pipeline is currently executing.
+
+_Validation:_
+- Enum: [Initializing SweepingPrefill SweepingDecode SelectingConfig BuildingCurves GeneratingDGD Done]
+
+_Appears in:_
+- [DynamoGraphDeploymentRequestStatus](#dynamographdeploymentrequeststatus)
+
+| Field | Description |
+| --- | --- |
+| `Initializing` | Profiler is loading the DGD template, detecting GPU hardware,<br />and resolving the model architecture from HuggingFace.<br /> |
+| `SweepingPrefill` | Sweeping parallelization strategies (TP/TEP/DEP) across GPU counts<br />for prefill, measuring TTFT at each configuration.<br /> |
+| `SweepingDecode` | Sweeping parallelization strategies and concurrency levels<br />for decode, measuring ITL at each configuration.<br /> |
+| `SelectingConfig` | Filtering results against SLA targets and selecting the most<br />cost-efficient configuration that meets TTFT/ITL requirements.<br /> |
+| `BuildingCurves` | Building detailed interpolation curves (ISL→TTFT for prefill,<br />KV-usage×context-length→ITL for decode) using the selected configs.<br /> |
+| `GeneratingDGD` | Packaging profiling data into a ConfigMap and generating<br />the final DGD YAML with planner integration.<br /> |
+| `Done` | Profiling pipeline finished successfully.<br /> |
+
+
+#### ProfilingResultsStatus
+
+
+
+ProfilingResultsStatus contains the output of the profiling process.
+
+
+
+_Appears in:_
+- [DynamoGraphDeploymentRequestStatus](#dynamographdeploymentrequeststatus)
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `pareto` _[ParetoConfig](#paretoconfig) array_ | Pareto is the list of Pareto-optimal deployment configurations discovered during profiling.<br />Each entry represents a different cost/performance trade-off. |  | Optional: \{\} <br /> |
+| `selectedConfig` _[RawExtension](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg)_ | SelectedConfig is the recommended configuration chosen by the profiler<br />based on the SLA targets. This is the configuration used for deployment<br />when autoApply is true. |  | Type: object <br />Optional: \{\} <br /> |
+
+
+#### SLASpec
+
+
+
+SLASpec defines the service-level agreement targets.
+
+
+
+_Appears in:_
+- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `optimizationType` _[OptimizationType](#optimizationtype)_ | OptimizationType controls the profiling optimization strategy.<br />Use when explicit SLA targets (ttft+itl or e2eLatency) are not known. |  | Enum: [latency throughput] <br />Optional: \{\} <br /> |
+| `ttft` _float_ | TTFT is the Time To First Token target in milliseconds. |  | Optional: \{\} <br /> |
+| `itl` _float_ | ITL is the Inter-Token Latency target in milliseconds. |  | Optional: \{\} <br /> |
+| `e2eLatency` _float_ | E2ELatency is the target end-to-end request latency in milliseconds.<br />Alternative to specifying TTFT + ITL. |  | Optional: \{\} <br /> |
+
+
+#### SearchStrategy
+
+_Underlying type:_ _string_
+
+SearchStrategy controls the profiling search depth.
+
+_Validation:_
+- Enum: [rapid thorough]
+
+_Appears in:_
+- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
+
+| Field | Description |
+| --- | --- |
+| `rapid` |  |
+| `thorough` |  |
+
+
+#### WorkloadSpec
+
+
+
+WorkloadSpec defines the workload characteristics for SLA-based profiling.
+
+
+
+_Appears in:_
+- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `isl` _integer_ | ISL is the Input Sequence Length (number of tokens). |  | Optional: \{\} <br /> |
+| `osl` _integer_ | OSL is the Output Sequence Length (number of tokens). |  | Optional: \{\} <br /> |
+| `concurrency` _float_ | Concurrency is the target concurrency level.<br />Required (or RequestRate) when the planner is disabled. |  | Optional: \{\} <br /> |
+| `requestRate` _float_ | RequestRate is the target request rate (req/s).<br />Required (or Concurrency) when the planner is disabled. |  | Optional: \{\} <br /> |
+
+
 # Operator Default Values Injection

 The Dynamo operator automatically applies default values to various fields when they are not explicitly specified in your deployments. These defaults include:

--- a/docs/pages/kubernetes/api-reference.md
+++ b/docs/pages/kubernetes/api-reference.md
@@ -10,6 +10,7 @@

 ## Packages
 - [nvidia.com/v1alpha1](#nvidiacomv1alpha1)
+- [nvidia.com/v1beta1](#nvidiacomv1beta1)


 ## nvidia.com/v1alpha1
@@ -475,6 +476,10 @@ Lifecycle:
 The spec becomes immutable once profiling starts. Users must delete and recreate
 the DGDR to modify configuration after this point.

+DEPRECATION NOTICE: v1alpha1 DynamoGraphDeploymentRequest is deprecated.
+Please migrate to nvidia.com/v1beta1 DynamoGraphDeploymentRequest.
+v1alpha1 will be removed in a future release.
+



@@ -1206,6 +1211,416 @@ _Appears in:_
 | `useAsCompilationCache` _boolean_ | UseAsCompilationCache indicates this volume should be used as a compilation cache.<br />When true, backend-specific environment variables will be set and default mount points may be used. | false |  |


+
+## nvidia.com/v1beta1
+
+Package v1beta1 contains API Schema definitions for the nvidia.com v1beta1 API group.
+
+### Resource Types
+- [DynamoGraphDeploymentRequest](#v1beta1-dynamographdeploymentrequest)
+
+
+
+#### BackendType
+
+_Underlying type:_ _string_
+
+BackendType specifies the inference backend.
+
+_Validation:_
+- Enum: [auto sglang trtllm vllm]
+
+_Appears in:_
+- [DynamoGraphDeploymentRequestSpec](#v1beta1-dynamographdeploymentrequestspec)
+
+| Field | Description |
+| --- | --- |
+| `auto` |  |
+| `sglang` |  |
+| `trtllm` |  |
+| `vllm` |  |
+
+
+#### DGDRPhase
+
+_Underlying type:_ _string_
+
+DGDRPhase represents the lifecycle phase of a DynamoGraphDeploymentRequest.
+
+_Validation:_
+- Enum: [Pending Profiling Ready Deploying Deployed Failed]
+
+_Appears in:_
+- [DynamoGraphDeploymentRequestStatus](#v1beta1-dynamographdeploymentrequeststatus)
+
+| Field | Description |
+| --- | --- |
+| `Pending` |  |
+| `Profiling` |  |
+| `Ready` |  |
+| `Deploying` |  |
+| `Deployed` |  |
+| `Failed` |  |
+
+
+#### DeploymentInfoStatus
+
+
+
+DeploymentInfoStatus tracks the state of the deployed DynamoGraphDeployment.
+
+
+
+_Appears in:_
+- [DynamoGraphDeploymentRequestStatus](#v1beta1-dynamographdeploymentrequeststatus)
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `replicas` _integer_ | Replicas is the desired number of replicas. |  | Optional: \{\} <br /> |
+| `availableReplicas` _integer_ | AvailableReplicas is the number of replicas that are available and ready. |  | Optional: \{\} <br /> |
+
+
+#### v1beta1 DynamoGraphDeploymentRequest
+
+
+
+DynamoGraphDeploymentRequest is the Schema for the dynamographdeploymentrequests API.
+It provides a simplified, SLA-driven interface for deploying inference models on Dynamo.
+Users specify a model and optional performance targets; the controller handles profiling,
+configuration selection, and deployment.
+
+Lifecycle:
+ 1. Pending: Spec validated, preparing for profiling
+ 2. Profiling: Profiling job is running to discover optimal configurations
+ 3. Ready: Profiling complete, generated DGD spec available in status
+ 4. Deploying: DGD is being created and rolled out (when autoApply=true)
+ 5. Deployed: DGD is running and healthy
+ 6. Failed: An unrecoverable error occurred
+
+
+
+
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `apiVersion` _string_ | `nvidia.com/v1beta1` | | |
+| `kind` _string_ | `DynamoGraphDeploymentRequest` | | |
+| `metadata` _[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#objectmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. |  |  |
+| `spec` _[DynamoGraphDeploymentRequestSpec](#v1beta1-dynamographdeploymentrequestspec)_ | Spec defines the desired state for this deployment request. |  |  |
+| `status` _[DynamoGraphDeploymentRequestStatus](#v1beta1-dynamographdeploymentrequeststatus)_ | Status reflects the current observed state of this deployment request. |  |  |
+
+
+#### v1beta1 DynamoGraphDeploymentRequestSpec
+
+
+
+DynamoGraphDeploymentRequestSpec defines the desired state of a DynamoGraphDeploymentRequest.
+Only the Model field is required; all other fields are optional and have sensible defaults.
+
+
+
+_Appears in:_
+- [DynamoGraphDeploymentRequest](#v1beta1-dynamographdeploymentrequest)
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `model` _string_ | Model specifies the model to deploy (e.g., "Qwen/Qwen3-0.6B", "meta-llama/Llama-3-70b").<br />Can be a HuggingFace ID or a private model name. |  | MinLength: 1 <br />Required: \{\} <br /> |
+| `backend` _[BackendType](#backendtype)_ | Backend specifies the inference backend to use for profiling and deployment. | auto | Enum: [auto sglang trtllm vllm] <br />Optional: \{\} <br /> |
+| `image` _string_ | Image is the container image reference for the profiling job (frontend image).<br />Example: "nvcr.io/nvidia/dynamo-runtime:latest"<br />backend type automatically; backend images can be overridden via overrides.dgd. |  | Optional: \{\} <br /> |
+| `modelCache` _[ModelCacheSpec](#modelcachespec)_ | ModelCache provides optional PVC configuration for pre-downloaded model weights.<br />When provided, weights are loaded from the PVC instead of downloading from HuggingFace. |  | Optional: \{\} <br /> |
+| `hardware` _[HardwareSpec](#hardwarespec)_ | Hardware describes the hardware resources available for profiling and deployment.<br />Typically auto-filled by the operator from cluster discovery. |  | Optional: \{\} <br /> |
+| `workload` _[WorkloadSpec](#workloadspec)_ | Workload defines the expected workload characteristics for SLA-based profiling. |  | Optional: \{\} <br /> |
+| `sla` _[SLASpec](#slaspec)_ | SLA defines service-level agreement targets that drive profiling optimization. |  | Optional: \{\} <br /> |
+| `overrides` _[OverridesSpec](#overridesspec)_ | Overrides allows customizing the profiling job and the generated DynamoGraphDeployment. |  | Optional: \{\} <br /> |
+| `features` _[FeaturesSpec](#featuresspec)_ | Features controls optional Dynamo platform features in the generated deployment. |  | Optional: \{\} <br /> |
+| `searchStrategy` _[SearchStrategy](#searchstrategy)_ | SearchStrategy controls the profiling search depth.<br />"rapid" performs a fast sweep; "thorough" explores more configurations. | rapid | Enum: [rapid thorough] <br />Optional: \{\} <br /> |
+| `autoApply` _boolean_ | AutoApply indicates whether to automatically create a DynamoGraphDeployment<br />after profiling completes. If false, the generated spec is stored in status<br />for manual review and application. | true | Optional: \{\} <br /> |
+
+
+#### v1beta1 DynamoGraphDeploymentRequestStatus
+
+
+
+DynamoGraphDeploymentRequestStatus represents the observed state of a DynamoGraphDeploymentRequest.
+
+
+
+_Appears in:_
+- [DynamoGraphDeploymentRequest](#v1beta1-dynamographdeploymentrequest)
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `phase` _[DGDRPhase](#dgdrphase)_ | Phase is the high-level lifecycle phase of the deployment request. |  | Enum: [Pending Profiling Ready Deploying Deployed Failed] <br />Optional: \{\} <br /> |
+| `profilingPhase` _[ProfilingPhase](#profilingphase)_ | ProfilingPhase indicates the current sub-phase of the profiling pipeline.<br />Only meaningful when Phase is "Profiling". Cleared when profiling completes or fails. |  | Enum: [Initializing SweepingPrefill SweepingDecode SelectingConfig BuildingCurves GeneratingDGD Done] <br />Optional: \{\} <br /> |
+| `dgdName` _string_ | DGDName is the name of the generated or created DynamoGraphDeployment. |  | Optional: \{\} <br /> |
+| `profilingJobName` _string_ | ProfilingJobName is the name of the Kubernetes Job running the profiler. |  | Optional: \{\} <br /> |
+| `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#condition-v1-meta) array_ | Conditions contains the latest observed conditions of the deployment request.<br />Standard condition types include: Validated, ProfilingComplete, DeploymentReady. |  | Optional: \{\} <br /> |
+| `profilingResults` _[ProfilingResultsStatus](#profilingresultsstatus)_ | ProfilingResults contains the output of the profiling process including<br />Pareto-optimal configurations and the selected deployment configuration. |  | Optional: \{\} <br /> |
+| `deploymentInfo` _[DeploymentInfoStatus](#deploymentinfostatus)_ | DeploymentInfo tracks the state of the deployed DynamoGraphDeployment.<br />Populated when a DGD has been created (either via autoApply or manually). |  | Optional: \{\} <br /> |
+| `observedGeneration` _integer_ | ObservedGeneration is the most recent generation observed by the controller. |  | Optional: \{\} <br /> |
+
+
+#### FeaturesSpec
+
+
+
+FeaturesSpec controls optional Dynamo platform features in the generated deployment.
+
+
+
+_Appears in:_
+- [DynamoGraphDeploymentRequestSpec](#v1beta1-dynamographdeploymentrequestspec)
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `planner` _[PlannerSpec](#plannerspec)_ | Planner configures the SLA planner for autoscaling in the generated DGD. |  | Optional: \{\} <br /> |
+| `mocker` _[MockerSpec](#mockerspec)_ | Mocker configures the simulated (mocker) backend for testing without GPUs. |  | Optional: \{\} <br /> |
+
+
+#### HardwareSpec
+
+
+
+HardwareSpec describes the hardware resources available for profiling and deployment.
+These fields are typically auto-filled by the operator from cluster discovery.
+
+
+
+_Appears in:_
+- [DynamoGraphDeploymentRequestSpec](#v1beta1-dynamographdeploymentrequestspec)
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `gpuSku` _string_ | GPUSKU is the GPU SKU identifier (e.g., "H100_SXM", "A100_80GB"). |  | Optional: \{\} <br /> |
+| `vramMb` _float_ | VRAMMB is the VRAM per GPU in MiB. |  | Optional: \{\} <br /> |
+| `totalGpus` _integer_ | TotalGPUs is the total number of GPUs available in the cluster. |  | Optional: \{\} <br /> |
+| `numGpusPerNode` _integer_ | NumGPUsPerNode is the number of GPUs per node. |  | Optional: \{\} <br /> |
+
+
+
+
+#### MockerSpec
+
+
+
+MockerSpec configures the simulated (mocker) backend.
+
+
+
+_Appears in:_
+- [FeaturesSpec](#featuresspec)
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `enabled` _boolean_ | Enabled indicates whether to deploy mocker workers instead of real inference workers.<br />Useful for large-scale testing without GPUs. |  | Optional: \{\} <br /> |
+
+
+#### ModelCacheSpec
+
+
+
+ModelCacheSpec references a PVC containing pre-downloaded model weights.
+
+
+
+_Appears in:_
+- [DynamoGraphDeploymentRequestSpec](#v1beta1-dynamographdeploymentrequestspec)
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `pvcName` _string_ | PVCName is the name of the PersistentVolumeClaim containing model weights.<br />The PVC must exist in the same namespace as the DGDR. |  | Optional: \{\} <br /> |
+| `pvcModelPath` _string_ | PVCModelPath is the path to the model checkpoint directory within the PVC<br />(e.g. "deepseek-r1" or "models/Llama-3.1-405B-FP8"). |  | Optional: \{\} <br /> |
+| `pvcMountPath` _string_ | PVCMountPath is the mount path for the PVC inside the container. | /opt/model-cache | Optional: \{\} <br /> |
+
+
+#### OptimizationType
+
+_Underlying type:_ _string_
+
+OptimizationType specifies the profiling optimization strategy.
+
+_Validation:_
+- Enum: [latency throughput]
+
+_Appears in:_
+- [SLASpec](#slaspec)
+
+| Field | Description |
+| --- | --- |
+| `latency` |  |
+| `throughput` |  |
+
+
+#### OverridesSpec
+
+
+
+OverridesSpec allows customizing the profiling job and the generated DynamoGraphDeployment.
+
+
+
+_Appears in:_
+- [DynamoGraphDeploymentRequestSpec](#v1beta1-dynamographdeploymentrequestspec)
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `profilingJob` _[JobSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#jobspec-v1-batch)_ | ProfilingJob allows overriding the profiling Job specification.<br />Fields set here are merged into the controller-generated Job spec. |  | Optional: \{\} <br /> |
+| `dgd` _[RawExtension](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg)_ | DGD allows providing a full or partial nvidia.com/v1alpha1 DynamoGraphDeployment<br />to use as the base for the generated deployment. Fields from profiling results<br />are merged on top. Use this to override backend worker images.<br />The field is stored as a raw embedded resource rather than a typed<br />*v1alpha1.DynamoGraphDeployment to avoid a circular import: v1alpha1 already<br />imports v1beta1 as the conversion hub and Go does not allow import cycles.<br />The EmbeddedResource marker tells the API server to validate that the value is a<br />well-formed Kubernetes object (has apiVersion/kind), but does not enforce that it<br />is specifically a DynamoGraphDeployment. Full type validation (correct apiVersion,<br />kind, and field schema) is performed by the controller during reconciliation. |  | EmbeddedResource: \{\} <br />Optional: \{\} <br /> |
+
+
+#### ParetoConfig
+
+
+
+ParetoConfig represents a single Pareto-optimal deployment configuration
+discovered during profiling.
+
+
+
+_Appears in:_
+- [ProfilingResultsStatus](#profilingresultsstatus)
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `config` _[RawExtension](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg)_ | Config is the full deployment configuration for this Pareto point. |  | Type: object <br /> |
+
+
+#### PlannerPreDeploymentSweepMode
+
+_Underlying type:_ _string_
+
+PlannerPreDeploymentSweepMode controls pre-deployment sweeping thoroughness for planner profiling.
+
+_Validation:_
+- Enum: [none rapid thorough]
+
+_Appears in:_
+- [PlannerSpec](#plannerspec)
+
+| Field | Description |
+| --- | --- |
+| `none` |  |
+| `rapid` |  |
+| `thorough` |  |
+
+
+#### PlannerSpec
+
+
+
+PlannerSpec configures the SLA planner for autoscaling in the generated DGD.
+
+
+
+_Appears in:_
+- [FeaturesSpec](#featuresspec)
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `enabled` _boolean_ | Enabled indicates whether the planner is enabled. |  | Optional: \{\} <br /> |
+| `plannerPreDeploymentSweeping` _[PlannerPreDeploymentSweepMode](#plannerpredeploymentsweepmode)_ | PlannerPreDeploymentSweeping controls pre-deployment sweeping mode for planner in-depth profiling.<br />"none" means no pre-deployment sweep (only load-based scaling).<br />"rapid" uses AI Configurator to simulate engine performance.<br />"thorough" uses real GPUs to measure engine performance (takes several hours). |  | Enum: [none rapid thorough] <br />Optional: \{\} <br /> |
+| `plannerArgsList` _string array_ | PlannerArgsList is a list of additional planner arguments. |  | Optional: \{\} <br /> |
+
+
+#### ProfilingPhase
+
+_Underlying type:_ _string_
+
+ProfilingPhase represents a sub-phase within the profiling pipeline.
+When the DGDR Phase is "Profiling", this value indicates which step
+of the profiling pipeline is currently executing.
+
+_Validation:_
+- Enum: [Initializing SweepingPrefill SweepingDecode SelectingConfig BuildingCurves GeneratingDGD Done]
+
+_Appears in:_
+- [DynamoGraphDeploymentRequestStatus](#v1beta1-dynamographdeploymentrequeststatus)
+
+| Field | Description |
+| --- | --- |
+| `Initializing` | Profiler is loading the DGD template, detecting GPU hardware,<br />and resolving the model architecture from HuggingFace.<br /> |
+| `SweepingPrefill` | Sweeping parallelization strategies (TP/TEP/DEP) across GPU counts<br />for prefill, measuring TTFT at each configuration.<br /> |
+| `SweepingDecode` | Sweeping parallelization strategies and concurrency levels<br />for decode, measuring ITL at each configuration.<br /> |
+| `SelectingConfig` | Filtering results against SLA targets and selecting the most<br />cost-efficient configuration that meets TTFT/ITL requirements.<br /> |
+| `BuildingCurves` | Building detailed interpolation curves (ISL→TTFT for prefill,<br />KV-usage×context-length→ITL for decode) using the selected configs.<br /> |
+| `GeneratingDGD` | Packaging profiling data into a ConfigMap and generating<br />the final DGD YAML with planner integration.<br /> |
+| `Done` | Profiling pipeline finished successfully.<br /> |
+
+
+#### ProfilingResultsStatus
+
+
+
+ProfilingResultsStatus contains the output of the profiling process.
+
+
+
+_Appears in:_
+- [DynamoGraphDeploymentRequestStatus](#v1beta1-dynamographdeploymentrequeststatus)
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `pareto` _[ParetoConfig](#paretoconfig) array_ | Pareto is the list of Pareto-optimal deployment configurations discovered during profiling.<br />Each entry represents a different cost/performance trade-off. |  | Optional: \{\} <br /> |
+| `selectedConfig` _[RawExtension](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg)_ | SelectedConfig is the recommended configuration chosen by the profiler<br />based on the SLA targets. This is the configuration used for deployment<br />when autoApply is true. |  | Type: object <br />Optional: \{\} <br /> |
+
+
+#### SLASpec
+
+
+
+SLASpec defines the service-level agreement targets for profiling optimization.
+Exactly one mode should be active: ttft+itl (default), e2eLatency, or optimizationType.
+
+
+
+_Appears in:_
+- [DynamoGraphDeploymentRequestSpec](#v1beta1-dynamographdeploymentrequestspec)
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `optimizationType` _[OptimizationType](#optimizationtype)_ | OptimizationType controls the profiling optimization strategy.<br />Use when explicit SLA targets (ttft+itl or e2eLatency) are not known. |  | Enum: [latency throughput] <br />Optional: \{\} <br /> |
+| `ttft` _float_ | TTFT is the Time To First Token target in milliseconds. |  | Optional: \{\} <br /> |
+| `itl` _float_ | ITL is the Inter-Token Latency target in milliseconds. |  | Optional: \{\} <br /> |
+| `e2eLatency` _float_ | E2ELatency is the target end-to-end request latency in milliseconds.<br />Alternative to specifying TTFT + ITL. |  | Optional: \{\} <br /> |
+
+
+#### SearchStrategy
+
+_Underlying type:_ _string_
+
+SearchStrategy controls the profiling search depth.
+
+_Validation:_
+- Enum: [rapid thorough]
+
+_Appears in:_
+- [DynamoGraphDeploymentRequestSpec](#v1beta1-dynamographdeploymentrequestspec)
+
+| Field | Description |
+| --- | --- |
+| `rapid` |  |
+| `thorough` |  |
+
+
+#### WorkloadSpec
+
+
+
+WorkloadSpec defines the workload characteristics for SLA-based profiling.
+
+
+
+_Appears in:_
+- [DynamoGraphDeploymentRequestSpec](#v1beta1-dynamographdeploymentrequestspec)
+
+| Field | Description | Default | Validation |
+| --- | --- | --- | --- |
+| `isl` _integer_ | ISL is the Input Sequence Length (number of tokens). | 4000 | Optional: \{\} <br /> |
+| `osl` _integer_ | OSL is the Output Sequence Length (number of tokens). | 1000 | Optional: \{\} <br /> |
+| `concurrency` _float_ | Concurrency is the target concurrency level.<br />Required (or RequestRate) when the planner is disabled. |  | Optional: \{\} <br /> |
+| `requestRate` _float_ | RequestRate is the target request rate (req/s).<br />Required (or Concurrency) when the planner is disabled. |  | Optional: \{\} <br /> |
+
+
 # Operator Default Values Injection

 The Dynamo operator automatically applies default values to various fields when they are not explicitly specified in your deployments. These defaults include: