Unverified Commit ebc61637 authored by hhzhang16's avatar hhzhang16 Committed by GitHub
Browse files

feat: Add v1beta1 DGDR API with conversion framework (#6352)


Signed-off-by: default avatarJont828 <jt572@cornell.edu>
Signed-off-by: default avatarHongkuan Zhou <hongkuanz@nvidia.com>
Signed-off-by: default avatarHannah Zhang <hannahz@nvidia.com>
Co-authored-by: default avatarJont828 <jt572@cornell.edu>
parent 7bbacce1
......@@ -22,6 +22,7 @@ limitations under the License.
## Packages
- [nvidia.com/v1alpha1](#nvidiacomv1alpha1)
- [nvidia.com/v1beta1](#nvidiacomv1beta1)
## nvidia.com/v1alpha1
......@@ -1129,6 +1130,446 @@ _Appears in:_
| `useAsCompilationCache` _boolean_ | UseAsCompilationCache indicates this volume should be used as a compilation cache.<br />When true, backend-specific environment variables will be set and default mount points may be used. | false | |
## nvidia.com/v1beta1
Package v1beta1 contains API Schema definitions for the nvidia.com v1beta1 API group.
### Resource Types
- [DynamoGraphDeploymentRequest](#dynamographdeploymentrequest)
#### BackendSpec
BackendSpec defines the inference backend and container image configuration.
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `backend` _[BackendType](#backendtype)_ | Backend specifies the inference backend to use for profiling and deployment. | auto | Enum: [auto sglang trtllm vllm] <br />Optional: \{\} <br /> |
| `dynamoImage` _string_ | DynamoImage is the full K8s dynamo image reference<br />(e.g. "nvcr.io/nvidia/dynamo-runtime:latest"). | | Optional: \{\} <br /> |
#### BackendType
_Underlying type:_ _string_
BackendType specifies the inference backend.
_Validation:_
- Enum: [auto sglang trtllm vllm]
_Appears in:_
- [BackendSpec](#backendspec)
| Field | Description |
| --- | --- |
| `auto` | |
| `sglang` | |
| `trtllm` | |
| `vllm` | |
#### DGDRPhase
_Underlying type:_ _string_
DGDRPhase represents the lifecycle phase of a DynamoGraphDeploymentRequest.
_Validation:_
- Enum: [Pending Profiling Ready Deploying Deployed Failed]
_Appears in:_
- [DynamoGraphDeploymentRequestStatus](#dynamographdeploymentrequeststatus)
| Field | Description |
| --- | --- |
| `Pending` | |
| `Profiling` | |
| `Ready` | |
| `Deploying` | |
| `Deployed` | |
| `Failed` | |
#### DeploymentInfoStatus
DeploymentInfoStatus tracks the state of the deployed DynamoGraphDeployment.
_Appears in:_
- [DynamoGraphDeploymentRequestStatus](#dynamographdeploymentrequeststatus)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `replicas` _integer_ | Replicas is the desired number of replicas. | | Optional: \{\} <br /> |
| `availableReplicas` _integer_ | AvailableReplicas is the number of replicas that are available and ready. | | Optional: \{\} <br /> |
#### DynamoGraphDeploymentRequest
DynamoGraphDeploymentRequest is the Schema for the dynamographdeploymentrequests API.
It provides a simplified, SLA-driven interface for deploying inference models on Dynamo.
Users specify a model and optional performance targets; the controller handles profiling,
configuration selection, and deployment.
Lifecycle:
1. Pending: Spec validated, preparing for profiling
2. Profiling: Profiling job is running to discover optimal configurations
3. Ready: Profiling complete, generated DGD spec available in status
4. Deploying: DGD is being created and rolled out (when autoApply=true)
5. Deployed: DGD is running and healthy
6. Failed: An unrecoverable error occurred
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `apiVersion` _string_ | `nvidia.com/v1beta1` | | |
| `kind` _string_ | `DynamoGraphDeploymentRequest` | | |
| `metadata` _[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#objectmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. | | |
| `spec` _[DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)_ | Spec defines the desired state for this deployment request. | | |
| `status` _[DynamoGraphDeploymentRequestStatus](#dynamographdeploymentrequeststatus)_ | Status reflects the current observed state of this deployment request. | | |
#### DynamoGraphDeploymentRequestSpec
DynamoGraphDeploymentRequestSpec defines the desired state of a DynamoGraphDeploymentRequest.
Only the Model field is required; all other fields are optional and have sensible defaults.
_Appears in:_
- [DynamoGraphDeploymentRequest](#dynamographdeploymentrequest)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `model` _[ModelSpec](#modelspec)_ | Model specifies the model to deploy including optional PVC cache configuration. | | Required: \{\} <br /> |
| `backend` _[BackendSpec](#backendspec)_ | Backend specifies the inference backend and container image configuration. | | Optional: \{\} <br /> |
| `hardware` _[HardwareSpec](#hardwarespec)_ | Hardware describes the hardware resources available for profiling and deployment.<br />Typically auto-filled by the operator from cluster discovery. | | Optional: \{\} <br /> |
| `workload` _[WorkloadSpec](#workloadspec)_ | Workload defines the expected workload characteristics for SLA-based profiling. | | Optional: \{\} <br /> |
| `sla` _[SLASpec](#slaspec)_ | SLA defines service-level agreement targets that drive profiling optimization. | | Optional: \{\} <br /> |
| `overrides` _[OverridesSpec](#overridesspec)_ | Overrides allows customizing the profiling job and the generated DynamoGraphDeployment. | | Optional: \{\} <br /> |
| `features` _[FeaturesSpec](#featuresspec)_ | Features controls optional Dynamo platform features in the generated deployment. | | Optional: \{\} <br /> |
| `searchStrategy` _[SearchStrategy](#searchstrategy)_ | SearchStrategy controls the profiling search depth.<br />"rapid" performs a fast sweep; "thorough" explores more configurations. | rapid | Enum: [rapid thorough] <br />Optional: \{\} <br /> |
| `autoApply` _boolean_ | AutoApply indicates whether to automatically create a DynamoGraphDeployment<br />after profiling completes. If false, the generated spec is stored in status<br />for manual review and application. | true | Optional: \{\} <br /> |
#### DynamoGraphDeploymentRequestStatus
DynamoGraphDeploymentRequestStatus represents the observed state of a DynamoGraphDeploymentRequest.
_Appears in:_
- [DynamoGraphDeploymentRequest](#dynamographdeploymentrequest)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `phase` _[DGDRPhase](#dgdrphase)_ | Phase is the high-level lifecycle phase of the deployment request. | | Enum: [Pending Profiling Ready Deploying Deployed Failed] <br />Optional: \{\} <br /> |
| `profilingPhase` _[ProfilingPhase](#profilingphase)_ | ProfilingPhase indicates the current sub-phase of the profiling pipeline.<br />Only meaningful when Phase is "Profiling". Cleared when profiling completes or fails. | | Enum: [Initializing SweepingPrefill SweepingDecode SelectingConfig BuildingCurves GeneratingDGD Done] <br />Optional: \{\} <br /> |
| `dgdName` _string_ | DGDName is the name of the generated or created DynamoGraphDeployment. | | Optional: \{\} <br /> |
| `profilingJobName` _string_ | ProfilingJobName is the name of the Kubernetes Job running the profiler. | | Optional: \{\} <br /> |
| `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#condition-v1-meta) array_ | Conditions contains the latest observed conditions of the deployment request.<br />Standard condition types include: Validated, ProfilingComplete, DeploymentReady. | | Optional: \{\} <br /> |
| `profilingResults` _[ProfilingResultsStatus](#profilingresultsstatus)_ | ProfilingResults contains the output of the profiling process including<br />Pareto-optimal configurations and the selected deployment configuration. | | Optional: \{\} <br /> |
| `deploymentInfo` _[DeploymentInfoStatus](#deploymentinfostatus)_ | DeploymentInfo tracks the state of the deployed DynamoGraphDeployment.<br />Populated when a DGD has been created (either via autoApply or manually). | | Optional: \{\} <br /> |
| `observedGeneration` _integer_ | ObservedGeneration is the most recent generation observed by the controller. | | Optional: \{\} <br /> |
#### FeaturesSpec
FeaturesSpec controls optional Dynamo platform features in the generated deployment.
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `planner` _[PlannerSpec](#plannerspec)_ | Planner configures the SLA planner for autoscaling in the generated DGD. | | Optional: \{\} <br /> |
| `kvRouter` _boolean_ | KVRouter enables KV-cache-aware routing in the generated DGD. | | Optional: \{\} <br /> |
| `mocker` _[MockerSpec](#mockerspec)_ | Mocker configures the simulated (mocker) backend for testing without GPUs. | | Optional: \{\} <br /> |
#### HardwareSpec
HardwareSpec describes the hardware resources available for profiling and deployment.
These fields are typically auto-filled by the operator from cluster discovery.
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `gpuSku` _string_ | GPUSKU is the GPU SKU identifier (e.g., "H100_SXM", "A100_80GB"). | | Optional: \{\} <br /> |
| `vramMb` _float_ | VRAMMB is the VRAM per GPU in MiB. | | Optional: \{\} <br /> |
| `totalGpus` _integer_ | TotalGPUs is the total number of GPUs available in the cluster. | | Optional: \{\} <br /> |
| `numGpusPerNode` _integer_ | NumGPUsPerNode is the number of GPUs per node. | | Optional: \{\} <br /> |
#### MockerSpec
MockerSpec configures the simulated (mocker) backend.
_Appears in:_
- [FeaturesSpec](#featuresspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `enabled` _boolean_ | Enabled indicates whether to deploy mocker workers instead of real inference workers.<br />Useful for large-scale testing without GPUs. | | Optional: \{\} <br /> |
#### ModelCacheSpec
ModelCacheSpec references a PVC containing pre-downloaded model weights.
_Appears in:_
- [ModelSpec](#modelspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `pvcName` _string_ | PVCName is the name of the PersistentVolumeClaim containing model weights.<br />The PVC must exist in the same namespace as the DGDR. | | Optional: \{\} <br /> |
| `modelPathInPvc` _string_ | ModelPathInPVC is the path to the model checkpoint directory within the PVC<br />(e.g. "deepseek-r1" or "models/Llama-3.1-405B-FP8"). | | Optional: \{\} <br /> |
| `pvcMountPath` _string_ | PVCMountPath is the mount path for the PVC inside the container. | /opt/model-cache | Optional: \{\} <br /> |
#### ModelSpec
ModelSpec defines the model to deploy.
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `modelName` _string_ | ModelName is the model name or identifier (e.g. "meta-llama/Llama-3.1-405B").<br />Can be a HuggingFace ID or a private model name. Always required. | | MinLength: 1 <br />Required: \{\} <br /> |
| `modelCache` _[ModelCacheSpec](#modelcachespec)_ | ModelCache is the optional PVC model cache configuration.<br />When provided, weights are loaded from the PVC instead of downloading from HF. | | Optional: \{\} <br /> |
#### OptimizationType
_Underlying type:_ _string_
OptimizationType specifies the profiling optimization strategy.
_Validation:_
- Enum: [latency throughput]
_Appears in:_
- [SLASpec](#slaspec)
| Field | Description |
| --- | --- |
| `latency` | |
| `throughput` | |
#### OverridesSpec
OverridesSpec allows customizing the profiling job and the generated DynamoGraphDeployment.
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `profilingJob` _[JobSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#jobspec-v1-batch)_ | ProfilingJob allows overriding the profiling Job specification.<br />Fields set here are merged into the controller-generated Job spec. | | Optional: \{\} <br /> |
| `dgd` _[RawExtension](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg)_ | DGD allows providing a full or partial DynamoGraphDeployment to use as the base<br />for the generated deployment. Fields from profiling results are merged on top. | | EmbeddedResource: \{\} <br />Optional: \{\} <br /> |
#### ParetoConfig
ParetoConfig represents a single Pareto-optimal deployment configuration
discovered during profiling.
_Appears in:_
- [ProfilingResultsStatus](#profilingresultsstatus)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `config` _[RawExtension](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg)_ | Config is the full deployment configuration for this Pareto point. | | Type: object <br /> |
#### PlannerPreDeploymentSweepMode
_Underlying type:_ _string_
PlannerPreDeploymentSweepMode controls pre-deployment sweeping thoroughness for planner profiling.
_Validation:_
- Enum: [none rapid thorough]
_Appears in:_
- [PlannerSpec](#plannerspec)
| Field | Description |
| --- | --- |
| `none` | |
| `rapid` | |
| `thorough` | |
#### PlannerSpec
PlannerSpec configures the SLA planner for autoscaling in the generated DGD.
_Appears in:_
- [FeaturesSpec](#featuresspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `enabled` _boolean_ | Enabled indicates whether the planner is enabled. | | Optional: \{\} <br /> |
| `plannerPreDeploymentSweeping` _[PlannerPreDeploymentSweepMode](#plannerpredeploymentsweepmode)_ | PlannerPreDeploymentSweeping controls pre-deployment sweeping mode for planner in-depth profiling.<br />"none" means no pre-deployment sweep (only load-based scaling).<br />"rapid" uses AI Configurator to simulate engine performance.<br />"thorough" uses real GPUs to measure engine performance (takes several hours). | | Enum: [none rapid thorough] <br />Optional: \{\} <br /> |
| `plannerArgsList` _string array_ | PlannerArgsList is a list of additional planner arguments. | | Optional: \{\} <br /> |
#### ProfilingPhase
_Underlying type:_ _string_
ProfilingPhase represents a sub-phase within the profiling pipeline.
When the DGDR Phase is "Profiling", this value indicates which step
of the profiling pipeline is currently executing.
_Validation:_
- Enum: [Initializing SweepingPrefill SweepingDecode SelectingConfig BuildingCurves GeneratingDGD Done]
_Appears in:_
- [DynamoGraphDeploymentRequestStatus](#dynamographdeploymentrequeststatus)
| Field | Description |
| --- | --- |
| `Initializing` | Profiler is loading the DGD template, detecting GPU hardware,<br />and resolving the model architecture from HuggingFace.<br /> |
| `SweepingPrefill` | Sweeping parallelization strategies (TP/TEP/DEP) across GPU counts<br />for prefill, measuring TTFT at each configuration.<br /> |
| `SweepingDecode` | Sweeping parallelization strategies and concurrency levels<br />for decode, measuring ITL at each configuration.<br /> |
| `SelectingConfig` | Filtering results against SLA targets and selecting the most<br />cost-efficient configuration that meets TTFT/ITL requirements.<br /> |
| `BuildingCurves` | Building detailed interpolation curves (ISL→TTFT for prefill,<br />KV-usage×context-length→ITL for decode) using the selected configs.<br /> |
| `GeneratingDGD` | Packaging profiling data into a ConfigMap and generating<br />the final DGD YAML with planner integration.<br /> |
| `Done` | Profiling pipeline finished successfully.<br /> |
#### ProfilingResultsStatus
ProfilingResultsStatus contains the output of the profiling process.
_Appears in:_
- [DynamoGraphDeploymentRequestStatus](#dynamographdeploymentrequeststatus)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `pareto` _[ParetoConfig](#paretoconfig) array_ | Pareto is the list of Pareto-optimal deployment configurations discovered during profiling.<br />Each entry represents a different cost/performance trade-off. | | Optional: \{\} <br /> |
| `selectedConfig` _[RawExtension](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg)_ | SelectedConfig is the recommended configuration chosen by the profiler<br />based on the SLA targets. This is the configuration used for deployment<br />when autoApply is true. | | Type: object <br />Optional: \{\} <br /> |
#### SLASpec
SLASpec defines the service-level agreement targets.
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `optimizationType` _[OptimizationType](#optimizationtype)_ | OptimizationType controls the profiling optimization strategy.<br />Use when explicit SLA targets (ttft+itl or e2eLatency) are not known. | | Enum: [latency throughput] <br />Optional: \{\} <br /> |
| `ttft` _float_ | TTFT is the Time To First Token target in milliseconds. | | Optional: \{\} <br /> |
| `itl` _float_ | ITL is the Inter-Token Latency target in milliseconds. | | Optional: \{\} <br /> |
| `e2eLatency` _float_ | E2ELatency is the target end-to-end request latency in milliseconds.<br />Alternative to specifying TTFT + ITL. | | Optional: \{\} <br /> |
#### SearchStrategy
_Underlying type:_ _string_
SearchStrategy controls the profiling search depth.
_Validation:_
- Enum: [rapid thorough]
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
| Field | Description |
| --- | --- |
| `rapid` | |
| `thorough` | |
#### WorkloadSpec
WorkloadSpec defines the workload characteristics for SLA-based profiling.
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `isl` _integer_ | ISL is the Input Sequence Length (number of tokens). | | Optional: \{\} <br /> |
| `osl` _integer_ | OSL is the Output Sequence Length (number of tokens). | | Optional: \{\} <br /> |
| `concurrency` _float_ | Concurrency is the target concurrency level.<br />Required (or RequestRate) when the planner is disabled. | | Optional: \{\} <br /> |
| `requestRate` _float_ | RequestRate is the target request rate (req/s).<br />Required (or Concurrency) when the planner is disabled. | | Optional: \{\} <br /> |
# Operator Default Values Injection
The Dynamo operator automatically applies default values to various fields when they are not explicitly specified in your deployments. These defaults include:
......
......@@ -10,6 +10,7 @@
## Packages
- [nvidia.com/v1alpha1](#nvidiacomv1alpha1)
- [nvidia.com/v1beta1](#nvidiacomv1beta1)
## nvidia.com/v1alpha1
......@@ -475,6 +476,10 @@ Lifecycle:
The spec becomes immutable once profiling starts. Users must delete and recreate
the DGDR to modify configuration after this point.
DEPRECATION NOTICE: v1alpha1 DynamoGraphDeploymentRequest is deprecated.
Please migrate to nvidia.com/v1beta1 DynamoGraphDeploymentRequest.
v1alpha1 will be removed in a future release.
......@@ -1206,6 +1211,416 @@ _Appears in:_
| `useAsCompilationCache` _boolean_ | UseAsCompilationCache indicates this volume should be used as a compilation cache.<br />When true, backend-specific environment variables will be set and default mount points may be used. | false | |
## nvidia.com/v1beta1
Package v1beta1 contains API Schema definitions for the nvidia.com v1beta1 API group.
### Resource Types
- [DynamoGraphDeploymentRequest](#v1beta1-dynamographdeploymentrequest)
#### BackendType
_Underlying type:_ _string_
BackendType specifies the inference backend.
_Validation:_
- Enum: [auto sglang trtllm vllm]
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#v1beta1-dynamographdeploymentrequestspec)
| Field | Description |
| --- | --- |
| `auto` | |
| `sglang` | |
| `trtllm` | |
| `vllm` | |
#### DGDRPhase
_Underlying type:_ _string_
DGDRPhase represents the lifecycle phase of a DynamoGraphDeploymentRequest.
_Validation:_
- Enum: [Pending Profiling Ready Deploying Deployed Failed]
_Appears in:_
- [DynamoGraphDeploymentRequestStatus](#v1beta1-dynamographdeploymentrequeststatus)
| Field | Description |
| --- | --- |
| `Pending` | |
| `Profiling` | |
| `Ready` | |
| `Deploying` | |
| `Deployed` | |
| `Failed` | |
#### DeploymentInfoStatus
DeploymentInfoStatus tracks the state of the deployed DynamoGraphDeployment.
_Appears in:_
- [DynamoGraphDeploymentRequestStatus](#v1beta1-dynamographdeploymentrequeststatus)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `replicas` _integer_ | Replicas is the desired number of replicas. | | Optional: \{\} <br /> |
| `availableReplicas` _integer_ | AvailableReplicas is the number of replicas that are available and ready. | | Optional: \{\} <br /> |
#### v1beta1 DynamoGraphDeploymentRequest
DynamoGraphDeploymentRequest is the Schema for the dynamographdeploymentrequests API.
It provides a simplified, SLA-driven interface for deploying inference models on Dynamo.
Users specify a model and optional performance targets; the controller handles profiling,
configuration selection, and deployment.
Lifecycle:
1. Pending: Spec validated, preparing for profiling
2. Profiling: Profiling job is running to discover optimal configurations
3. Ready: Profiling complete, generated DGD spec available in status
4. Deploying: DGD is being created and rolled out (when autoApply=true)
5. Deployed: DGD is running and healthy
6. Failed: An unrecoverable error occurred
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `apiVersion` _string_ | `nvidia.com/v1beta1` | | |
| `kind` _string_ | `DynamoGraphDeploymentRequest` | | |
| `metadata` _[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#objectmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. | | |
| `spec` _[DynamoGraphDeploymentRequestSpec](#v1beta1-dynamographdeploymentrequestspec)_ | Spec defines the desired state for this deployment request. | | |
| `status` _[DynamoGraphDeploymentRequestStatus](#v1beta1-dynamographdeploymentrequeststatus)_ | Status reflects the current observed state of this deployment request. | | |
#### v1beta1 DynamoGraphDeploymentRequestSpec
DynamoGraphDeploymentRequestSpec defines the desired state of a DynamoGraphDeploymentRequest.
Only the Model field is required; all other fields are optional and have sensible defaults.
_Appears in:_
- [DynamoGraphDeploymentRequest](#v1beta1-dynamographdeploymentrequest)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `model` _string_ | Model specifies the model to deploy (e.g., "Qwen/Qwen3-0.6B", "meta-llama/Llama-3-70b").<br />Can be a HuggingFace ID or a private model name. | | MinLength: 1 <br />Required: \{\} <br /> |
| `backend` _[BackendType](#backendtype)_ | Backend specifies the inference backend to use for profiling and deployment. | auto | Enum: [auto sglang trtllm vllm] <br />Optional: \{\} <br /> |
| `image` _string_ | Image is the container image reference for the profiling job (frontend image).<br />Example: "nvcr.io/nvidia/dynamo-runtime:latest"<br />backend type automatically; backend images can be overridden via overrides.dgd. | | Optional: \{\} <br /> |
| `modelCache` _[ModelCacheSpec](#modelcachespec)_ | ModelCache provides optional PVC configuration for pre-downloaded model weights.<br />When provided, weights are loaded from the PVC instead of downloading from HuggingFace. | | Optional: \{\} <br /> |
| `hardware` _[HardwareSpec](#hardwarespec)_ | Hardware describes the hardware resources available for profiling and deployment.<br />Typically auto-filled by the operator from cluster discovery. | | Optional: \{\} <br /> |
| `workload` _[WorkloadSpec](#workloadspec)_ | Workload defines the expected workload characteristics for SLA-based profiling. | | Optional: \{\} <br /> |
| `sla` _[SLASpec](#slaspec)_ | SLA defines service-level agreement targets that drive profiling optimization. | | Optional: \{\} <br /> |
| `overrides` _[OverridesSpec](#overridesspec)_ | Overrides allows customizing the profiling job and the generated DynamoGraphDeployment. | | Optional: \{\} <br /> |
| `features` _[FeaturesSpec](#featuresspec)_ | Features controls optional Dynamo platform features in the generated deployment. | | Optional: \{\} <br /> |
| `searchStrategy` _[SearchStrategy](#searchstrategy)_ | SearchStrategy controls the profiling search depth.<br />"rapid" performs a fast sweep; "thorough" explores more configurations. | rapid | Enum: [rapid thorough] <br />Optional: \{\} <br /> |
| `autoApply` _boolean_ | AutoApply indicates whether to automatically create a DynamoGraphDeployment<br />after profiling completes. If false, the generated spec is stored in status<br />for manual review and application. | true | Optional: \{\} <br /> |
#### v1beta1 DynamoGraphDeploymentRequestStatus
DynamoGraphDeploymentRequestStatus represents the observed state of a DynamoGraphDeploymentRequest.
_Appears in:_
- [DynamoGraphDeploymentRequest](#v1beta1-dynamographdeploymentrequest)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `phase` _[DGDRPhase](#dgdrphase)_ | Phase is the high-level lifecycle phase of the deployment request. | | Enum: [Pending Profiling Ready Deploying Deployed Failed] <br />Optional: \{\} <br /> |
| `profilingPhase` _[ProfilingPhase](#profilingphase)_ | ProfilingPhase indicates the current sub-phase of the profiling pipeline.<br />Only meaningful when Phase is "Profiling". Cleared when profiling completes or fails. | | Enum: [Initializing SweepingPrefill SweepingDecode SelectingConfig BuildingCurves GeneratingDGD Done] <br />Optional: \{\} <br /> |
| `dgdName` _string_ | DGDName is the name of the generated or created DynamoGraphDeployment. | | Optional: \{\} <br /> |
| `profilingJobName` _string_ | ProfilingJobName is the name of the Kubernetes Job running the profiler. | | Optional: \{\} <br /> |
| `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#condition-v1-meta) array_ | Conditions contains the latest observed conditions of the deployment request.<br />Standard condition types include: Validated, ProfilingComplete, DeploymentReady. | | Optional: \{\} <br /> |
| `profilingResults` _[ProfilingResultsStatus](#profilingresultsstatus)_ | ProfilingResults contains the output of the profiling process including<br />Pareto-optimal configurations and the selected deployment configuration. | | Optional: \{\} <br /> |
| `deploymentInfo` _[DeploymentInfoStatus](#deploymentinfostatus)_ | DeploymentInfo tracks the state of the deployed DynamoGraphDeployment.<br />Populated when a DGD has been created (either via autoApply or manually). | | Optional: \{\} <br /> |
| `observedGeneration` _integer_ | ObservedGeneration is the most recent generation observed by the controller. | | Optional: \{\} <br /> |
#### FeaturesSpec
FeaturesSpec controls optional Dynamo platform features in the generated deployment.
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#v1beta1-dynamographdeploymentrequestspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `planner` _[PlannerSpec](#plannerspec)_ | Planner configures the SLA planner for autoscaling in the generated DGD. | | Optional: \{\} <br /> |
| `mocker` _[MockerSpec](#mockerspec)_ | Mocker configures the simulated (mocker) backend for testing without GPUs. | | Optional: \{\} <br /> |
#### HardwareSpec
HardwareSpec describes the hardware resources available for profiling and deployment.
These fields are typically auto-filled by the operator from cluster discovery.
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#v1beta1-dynamographdeploymentrequestspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `gpuSku` _string_ | GPUSKU is the GPU SKU identifier (e.g., "H100_SXM", "A100_80GB"). | | Optional: \{\} <br /> |
| `vramMb` _float_ | VRAMMB is the VRAM per GPU in MiB. | | Optional: \{\} <br /> |
| `totalGpus` _integer_ | TotalGPUs is the total number of GPUs available in the cluster. | | Optional: \{\} <br /> |
| `numGpusPerNode` _integer_ | NumGPUsPerNode is the number of GPUs per node. | | Optional: \{\} <br /> |
#### MockerSpec
MockerSpec configures the simulated (mocker) backend.
_Appears in:_
- [FeaturesSpec](#featuresspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `enabled` _boolean_ | Enabled indicates whether to deploy mocker workers instead of real inference workers.<br />Useful for large-scale testing without GPUs. | | Optional: \{\} <br /> |
#### ModelCacheSpec
ModelCacheSpec references a PVC containing pre-downloaded model weights.
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#v1beta1-dynamographdeploymentrequestspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `pvcName` _string_ | PVCName is the name of the PersistentVolumeClaim containing model weights.<br />The PVC must exist in the same namespace as the DGDR. | | Optional: \{\} <br /> |
| `pvcModelPath` _string_ | PVCModelPath is the path to the model checkpoint directory within the PVC<br />(e.g. "deepseek-r1" or "models/Llama-3.1-405B-FP8"). | | Optional: \{\} <br /> |
| `pvcMountPath` _string_ | PVCMountPath is the mount path for the PVC inside the container. | /opt/model-cache | Optional: \{\} <br /> |
#### OptimizationType
_Underlying type:_ _string_
OptimizationType specifies the profiling optimization strategy.
_Validation:_
- Enum: [latency throughput]
_Appears in:_
- [SLASpec](#slaspec)
| Field | Description |
| --- | --- |
| `latency` | |
| `throughput` | |
#### OverridesSpec
OverridesSpec allows customizing the profiling job and the generated DynamoGraphDeployment.
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#v1beta1-dynamographdeploymentrequestspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `profilingJob` _[JobSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#jobspec-v1-batch)_ | ProfilingJob allows overriding the profiling Job specification.<br />Fields set here are merged into the controller-generated Job spec. | | Optional: \{\} <br /> |
| `dgd` _[RawExtension](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg)_ | DGD allows providing a full or partial nvidia.com/v1alpha1 DynamoGraphDeployment<br />to use as the base for the generated deployment. Fields from profiling results<br />are merged on top. Use this to override backend worker images.<br />The field is stored as a raw embedded resource rather than a typed<br />*v1alpha1.DynamoGraphDeployment to avoid a circular import: v1alpha1 already<br />imports v1beta1 as the conversion hub and Go does not allow import cycles.<br />The EmbeddedResource marker tells the API server to validate that the value is a<br />well-formed Kubernetes object (has apiVersion/kind), but does not enforce that it<br />is specifically a DynamoGraphDeployment. Full type validation (correct apiVersion,<br />kind, and field schema) is performed by the controller during reconciliation. | | EmbeddedResource: \{\} <br />Optional: \{\} <br /> |
#### ParetoConfig
ParetoConfig represents a single Pareto-optimal deployment configuration
discovered during profiling.
_Appears in:_
- [ProfilingResultsStatus](#profilingresultsstatus)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `config` _[RawExtension](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg)_ | Config is the full deployment configuration for this Pareto point. | | Type: object <br /> |
#### PlannerPreDeploymentSweepMode
_Underlying type:_ _string_
PlannerPreDeploymentSweepMode controls pre-deployment sweeping thoroughness for planner profiling.
_Validation:_
- Enum: [none rapid thorough]
_Appears in:_
- [PlannerSpec](#plannerspec)
| Field | Description |
| --- | --- |
| `none` | |
| `rapid` | |
| `thorough` | |
#### PlannerSpec
PlannerSpec configures the SLA planner for autoscaling in the generated DGD.
_Appears in:_
- [FeaturesSpec](#featuresspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `enabled` _boolean_ | Enabled indicates whether the planner is enabled. | | Optional: \{\} <br /> |
| `plannerPreDeploymentSweeping` _[PlannerPreDeploymentSweepMode](#plannerpredeploymentsweepmode)_ | PlannerPreDeploymentSweeping controls pre-deployment sweeping mode for planner in-depth profiling.<br />"none" means no pre-deployment sweep (only load-based scaling).<br />"rapid" uses AI Configurator to simulate engine performance.<br />"thorough" uses real GPUs to measure engine performance (takes several hours). | | Enum: [none rapid thorough] <br />Optional: \{\} <br /> |
| `plannerArgsList` _string array_ | PlannerArgsList is a list of additional planner arguments. | | Optional: \{\} <br /> |
#### ProfilingPhase
_Underlying type:_ _string_
ProfilingPhase represents a sub-phase within the profiling pipeline.
When the DGDR Phase is "Profiling", this value indicates which step
of the profiling pipeline is currently executing.
_Validation:_
- Enum: [Initializing SweepingPrefill SweepingDecode SelectingConfig BuildingCurves GeneratingDGD Done]
_Appears in:_
- [DynamoGraphDeploymentRequestStatus](#v1beta1-dynamographdeploymentrequeststatus)
| Field | Description |
| --- | --- |
| `Initializing` | Profiler is loading the DGD template, detecting GPU hardware,<br />and resolving the model architecture from HuggingFace.<br /> |
| `SweepingPrefill` | Sweeping parallelization strategies (TP/TEP/DEP) across GPU counts<br />for prefill, measuring TTFT at each configuration.<br /> |
| `SweepingDecode` | Sweeping parallelization strategies and concurrency levels<br />for decode, measuring ITL at each configuration.<br /> |
| `SelectingConfig` | Filtering results against SLA targets and selecting the most<br />cost-efficient configuration that meets TTFT/ITL requirements.<br /> |
| `BuildingCurves` | Building detailed interpolation curves (ISL→TTFT for prefill,<br />KV-usage×context-length→ITL for decode) using the selected configs.<br /> |
| `GeneratingDGD` | Packaging profiling data into a ConfigMap and generating<br />the final DGD YAML with planner integration.<br /> |
| `Done` | Profiling pipeline finished successfully.<br /> |
#### ProfilingResultsStatus
ProfilingResultsStatus contains the output of the profiling process.
_Appears in:_
- [DynamoGraphDeploymentRequestStatus](#v1beta1-dynamographdeploymentrequeststatus)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `pareto` _[ParetoConfig](#paretoconfig) array_ | Pareto is the list of Pareto-optimal deployment configurations discovered during profiling.<br />Each entry represents a different cost/performance trade-off. | | Optional: \{\} <br /> |
| `selectedConfig` _[RawExtension](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg)_ | SelectedConfig is the recommended configuration chosen by the profiler<br />based on the SLA targets. This is the configuration used for deployment<br />when autoApply is true. | | Type: object <br />Optional: \{\} <br /> |
#### SLASpec
SLASpec defines the service-level agreement targets for profiling optimization.
Exactly one mode should be active: ttft+itl (default), e2eLatency, or optimizationType.
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#v1beta1-dynamographdeploymentrequestspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `optimizationType` _[OptimizationType](#optimizationtype)_ | OptimizationType controls the profiling optimization strategy.<br />Use when explicit SLA targets (ttft+itl or e2eLatency) are not known. | | Enum: [latency throughput] <br />Optional: \{\} <br /> |
| `ttft` _float_ | TTFT is the Time To First Token target in milliseconds. | | Optional: \{\} <br /> |
| `itl` _float_ | ITL is the Inter-Token Latency target in milliseconds. | | Optional: \{\} <br /> |
| `e2eLatency` _float_ | E2ELatency is the target end-to-end request latency in milliseconds.<br />Alternative to specifying TTFT + ITL. | | Optional: \{\} <br /> |
#### SearchStrategy
_Underlying type:_ _string_
SearchStrategy controls the profiling search depth.
_Validation:_
- Enum: [rapid thorough]
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#v1beta1-dynamographdeploymentrequestspec)
| Field | Description |
| --- | --- |
| `rapid` | |
| `thorough` | |
#### WorkloadSpec
WorkloadSpec defines the workload characteristics for SLA-based profiling.
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#v1beta1-dynamographdeploymentrequestspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `isl` _integer_ | ISL is the Input Sequence Length (number of tokens). | 4000 | Optional: \{\} <br /> |
| `osl` _integer_ | OSL is the Output Sequence Length (number of tokens). | 1000 | Optional: \{\} <br /> |
| `concurrency` _float_ | Concurrency is the target concurrency level.<br />Required (or RequestRate) when the planner is disabled. | | Optional: \{\} <br /> |
| `requestRate` _float_ | RequestRate is the target request rate (req/s).<br />Required (or Concurrency) when the planner is disabled. | | Optional: \{\} <br /> |
# Operator Default Values Injection
The Dynamo operator automatically applies default values to various fields when they are not explicitly specified in your deployments. These defaults include:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment