Unverified Commit ebc61637 authored by hhzhang16's avatar hhzhang16 Committed by GitHub
Browse files

feat: Add v1beta1 DGDR API with conversion framework (#6352)


Signed-off-by: default avatarJont828 <jt572@cornell.edu>
Signed-off-by: default avatarHongkuan Zhou <hongkuanz@nvidia.com>
Signed-off-by: default avatarHannah Zhang <hannahz@nvidia.com>
Co-authored-by: default avatarJont828 <jt572@cornell.edu>
parent 7bbacce1
...@@ -22,6 +22,7 @@ limitations under the License. ...@@ -22,6 +22,7 @@ limitations under the License.
## Packages ## Packages
- [nvidia.com/v1alpha1](#nvidiacomv1alpha1) - [nvidia.com/v1alpha1](#nvidiacomv1alpha1)
- [nvidia.com/v1beta1](#nvidiacomv1beta1)
## nvidia.com/v1alpha1 ## nvidia.com/v1alpha1
...@@ -1129,6 +1130,446 @@ _Appears in:_ ...@@ -1129,6 +1130,446 @@ _Appears in:_
| `useAsCompilationCache` _boolean_ | UseAsCompilationCache indicates this volume should be used as a compilation cache.<br />When true, backend-specific environment variables will be set and default mount points may be used. | false | | | `useAsCompilationCache` _boolean_ | UseAsCompilationCache indicates this volume should be used as a compilation cache.<br />When true, backend-specific environment variables will be set and default mount points may be used. | false | |
## nvidia.com/v1beta1
Package v1beta1 contains API Schema definitions for the nvidia.com v1beta1 API group.
### Resource Types
- [DynamoGraphDeploymentRequest](#dynamographdeploymentrequest)
#### BackendSpec
BackendSpec defines the inference backend and container image configuration.
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `backend` _[BackendType](#backendtype)_ | Backend specifies the inference backend to use for profiling and deployment. | auto | Enum: [auto sglang trtllm vllm] <br />Optional: \{\} <br /> |
| `dynamoImage` _string_ | DynamoImage is the full K8s dynamo image reference<br />(e.g. "nvcr.io/nvidia/dynamo-runtime:latest"). | | Optional: \{\} <br /> |
#### BackendType
_Underlying type:_ _string_
BackendType specifies the inference backend.
_Validation:_
- Enum: [auto sglang trtllm vllm]
_Appears in:_
- [BackendSpec](#backendspec)
| Field | Description |
| --- | --- |
| `auto` | |
| `sglang` | |
| `trtllm` | |
| `vllm` | |
#### DGDRPhase
_Underlying type:_ _string_
DGDRPhase represents the lifecycle phase of a DynamoGraphDeploymentRequest.
_Validation:_
- Enum: [Pending Profiling Ready Deploying Deployed Failed]
_Appears in:_
- [DynamoGraphDeploymentRequestStatus](#dynamographdeploymentrequeststatus)
| Field | Description |
| --- | --- |
| `Pending` | |
| `Profiling` | |
| `Ready` | |
| `Deploying` | |
| `Deployed` | |
| `Failed` | |
#### DeploymentInfoStatus
DeploymentInfoStatus tracks the state of the deployed DynamoGraphDeployment.
_Appears in:_
- [DynamoGraphDeploymentRequestStatus](#dynamographdeploymentrequeststatus)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `replicas` _integer_ | Replicas is the desired number of replicas. | | Optional: \{\} <br /> |
| `availableReplicas` _integer_ | AvailableReplicas is the number of replicas that are available and ready. | | Optional: \{\} <br /> |
#### DynamoGraphDeploymentRequest
DynamoGraphDeploymentRequest is the Schema for the dynamographdeploymentrequests API.
It provides a simplified, SLA-driven interface for deploying inference models on Dynamo.
Users specify a model and optional performance targets; the controller handles profiling,
configuration selection, and deployment.
Lifecycle:
1. Pending: Spec validated, preparing for profiling
2. Profiling: Profiling job is running to discover optimal configurations
3. Ready: Profiling complete, generated DGD spec available in status
4. Deploying: DGD is being created and rolled out (when autoApply=true)
5. Deployed: DGD is running and healthy
6. Failed: An unrecoverable error occurred
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `apiVersion` _string_ | `nvidia.com/v1beta1` | | |
| `kind` _string_ | `DynamoGraphDeploymentRequest` | | |
| `metadata` _[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#objectmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. | | |
| `spec` _[DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)_ | Spec defines the desired state for this deployment request. | | |
| `status` _[DynamoGraphDeploymentRequestStatus](#dynamographdeploymentrequeststatus)_ | Status reflects the current observed state of this deployment request. | | |
#### DynamoGraphDeploymentRequestSpec
DynamoGraphDeploymentRequestSpec defines the desired state of a DynamoGraphDeploymentRequest.
Only the Model field is required; all other fields are optional and have sensible defaults.
_Appears in:_
- [DynamoGraphDeploymentRequest](#dynamographdeploymentrequest)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `model` _[ModelSpec](#modelspec)_ | Model specifies the model to deploy including optional PVC cache configuration. | | Required: \{\} <br /> |
| `backend` _[BackendSpec](#backendspec)_ | Backend specifies the inference backend and container image configuration. | | Optional: \{\} <br /> |
| `hardware` _[HardwareSpec](#hardwarespec)_ | Hardware describes the hardware resources available for profiling and deployment.<br />Typically auto-filled by the operator from cluster discovery. | | Optional: \{\} <br /> |
| `workload` _[WorkloadSpec](#workloadspec)_ | Workload defines the expected workload characteristics for SLA-based profiling. | | Optional: \{\} <br /> |
| `sla` _[SLASpec](#slaspec)_ | SLA defines service-level agreement targets that drive profiling optimization. | | Optional: \{\} <br /> |
| `overrides` _[OverridesSpec](#overridesspec)_ | Overrides allows customizing the profiling job and the generated DynamoGraphDeployment. | | Optional: \{\} <br /> |
| `features` _[FeaturesSpec](#featuresspec)_ | Features controls optional Dynamo platform features in the generated deployment. | | Optional: \{\} <br /> |
| `searchStrategy` _[SearchStrategy](#searchstrategy)_ | SearchStrategy controls the profiling search depth.<br />"rapid" performs a fast sweep; "thorough" explores more configurations. | rapid | Enum: [rapid thorough] <br />Optional: \{\} <br /> |
| `autoApply` _boolean_ | AutoApply indicates whether to automatically create a DynamoGraphDeployment<br />after profiling completes. If false, the generated spec is stored in status<br />for manual review and application. | true | Optional: \{\} <br /> |
#### DynamoGraphDeploymentRequestStatus
DynamoGraphDeploymentRequestStatus represents the observed state of a DynamoGraphDeploymentRequest.
_Appears in:_
- [DynamoGraphDeploymentRequest](#dynamographdeploymentrequest)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `phase` _[DGDRPhase](#dgdrphase)_ | Phase is the high-level lifecycle phase of the deployment request. | | Enum: [Pending Profiling Ready Deploying Deployed Failed] <br />Optional: \{\} <br /> |
| `profilingPhase` _[ProfilingPhase](#profilingphase)_ | ProfilingPhase indicates the current sub-phase of the profiling pipeline.<br />Only meaningful when Phase is "Profiling". Cleared when profiling completes or fails. | | Enum: [Initializing SweepingPrefill SweepingDecode SelectingConfig BuildingCurves GeneratingDGD Done] <br />Optional: \{\} <br /> |
| `dgdName` _string_ | DGDName is the name of the generated or created DynamoGraphDeployment. | | Optional: \{\} <br /> |
| `profilingJobName` _string_ | ProfilingJobName is the name of the Kubernetes Job running the profiler. | | Optional: \{\} <br /> |
| `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#condition-v1-meta) array_ | Conditions contains the latest observed conditions of the deployment request.<br />Standard condition types include: Validated, ProfilingComplete, DeploymentReady. | | Optional: \{\} <br /> |
| `profilingResults` _[ProfilingResultsStatus](#profilingresultsstatus)_ | ProfilingResults contains the output of the profiling process including<br />Pareto-optimal configurations and the selected deployment configuration. | | Optional: \{\} <br /> |
| `deploymentInfo` _[DeploymentInfoStatus](#deploymentinfostatus)_ | DeploymentInfo tracks the state of the deployed DynamoGraphDeployment.<br />Populated when a DGD has been created (either via autoApply or manually). | | Optional: \{\} <br /> |
| `observedGeneration` _integer_ | ObservedGeneration is the most recent generation observed by the controller. | | Optional: \{\} <br /> |
#### FeaturesSpec
FeaturesSpec controls optional Dynamo platform features in the generated deployment.
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `planner` _[PlannerSpec](#plannerspec)_ | Planner configures the SLA planner for autoscaling in the generated DGD. | | Optional: \{\} <br /> |
| `kvRouter` _boolean_ | KVRouter enables KV-cache-aware routing in the generated DGD. | | Optional: \{\} <br /> |
| `mocker` _[MockerSpec](#mockerspec)_ | Mocker configures the simulated (mocker) backend for testing without GPUs. | | Optional: \{\} <br /> |
#### HardwareSpec
HardwareSpec describes the hardware resources available for profiling and deployment.
These fields are typically auto-filled by the operator from cluster discovery.
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `gpuSku` _string_ | GPUSKU is the GPU SKU identifier (e.g., "H100_SXM", "A100_80GB"). | | Optional: \{\} <br /> |
| `vramMb` _float_ | VRAMMB is the VRAM per GPU in MiB. | | Optional: \{\} <br /> |
| `totalGpus` _integer_ | TotalGPUs is the total number of GPUs available in the cluster. | | Optional: \{\} <br /> |
| `numGpusPerNode` _integer_ | NumGPUsPerNode is the number of GPUs per node. | | Optional: \{\} <br /> |
#### MockerSpec
MockerSpec configures the simulated (mocker) backend.
_Appears in:_
- [FeaturesSpec](#featuresspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `enabled` _boolean_ | Enabled indicates whether to deploy mocker workers instead of real inference workers.<br />Useful for large-scale testing without GPUs. | | Optional: \{\} <br /> |
#### ModelCacheSpec
ModelCacheSpec references a PVC containing pre-downloaded model weights.
_Appears in:_
- [ModelSpec](#modelspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `pvcName` _string_ | PVCName is the name of the PersistentVolumeClaim containing model weights.<br />The PVC must exist in the same namespace as the DGDR. | | Optional: \{\} <br /> |
| `modelPathInPvc` _string_ | ModelPathInPVC is the path to the model checkpoint directory within the PVC<br />(e.g. "deepseek-r1" or "models/Llama-3.1-405B-FP8"). | | Optional: \{\} <br /> |
| `pvcMountPath` _string_ | PVCMountPath is the mount path for the PVC inside the container. | /opt/model-cache | Optional: \{\} <br /> |
#### ModelSpec
ModelSpec defines the model to deploy.
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `modelName` _string_ | ModelName is the model name or identifier (e.g. "meta-llama/Llama-3.1-405B").<br />Can be a HuggingFace ID or a private model name. Always required. | | MinLength: 1 <br />Required: \{\} <br /> |
| `modelCache` _[ModelCacheSpec](#modelcachespec)_ | ModelCache is the optional PVC model cache configuration.<br />When provided, weights are loaded from the PVC instead of downloading from HF. | | Optional: \{\} <br /> |
#### OptimizationType
_Underlying type:_ _string_
OptimizationType specifies the profiling optimization strategy.
_Validation:_
- Enum: [latency throughput]
_Appears in:_
- [SLASpec](#slaspec)
| Field | Description |
| --- | --- |
| `latency` | |
| `throughput` | |
#### OverridesSpec
OverridesSpec allows customizing the profiling job and the generated DynamoGraphDeployment.
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `profilingJob` _[JobSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#jobspec-v1-batch)_ | ProfilingJob allows overriding the profiling Job specification.<br />Fields set here are merged into the controller-generated Job spec. | | Optional: \{\} <br /> |
| `dgd` _[RawExtension](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg)_ | DGD allows providing a full or partial DynamoGraphDeployment to use as the base<br />for the generated deployment. Fields from profiling results are merged on top. | | EmbeddedResource: \{\} <br />Optional: \{\} <br /> |
#### ParetoConfig
ParetoConfig represents a single Pareto-optimal deployment configuration
discovered during profiling.
_Appears in:_
- [ProfilingResultsStatus](#profilingresultsstatus)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `config` _[RawExtension](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg)_ | Config is the full deployment configuration for this Pareto point. | | Type: object <br /> |
#### PlannerPreDeploymentSweepMode
_Underlying type:_ _string_
PlannerPreDeploymentSweepMode controls pre-deployment sweeping thoroughness for planner profiling.
_Validation:_
- Enum: [none rapid thorough]
_Appears in:_
- [PlannerSpec](#plannerspec)
| Field | Description |
| --- | --- |
| `none` | |
| `rapid` | |
| `thorough` | |
#### PlannerSpec
PlannerSpec configures the SLA planner for autoscaling in the generated DGD.
_Appears in:_
- [FeaturesSpec](#featuresspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `enabled` _boolean_ | Enabled indicates whether the planner is enabled. | | Optional: \{\} <br /> |
| `plannerPreDeploymentSweeping` _[PlannerPreDeploymentSweepMode](#plannerpredeploymentsweepmode)_ | PlannerPreDeploymentSweeping controls pre-deployment sweeping mode for planner in-depth profiling.<br />"none" means no pre-deployment sweep (only load-based scaling).<br />"rapid" uses AI Configurator to simulate engine performance.<br />"thorough" uses real GPUs to measure engine performance (takes several hours). | | Enum: [none rapid thorough] <br />Optional: \{\} <br /> |
| `plannerArgsList` _string array_ | PlannerArgsList is a list of additional planner arguments. | | Optional: \{\} <br /> |
#### ProfilingPhase
_Underlying type:_ _string_
ProfilingPhase represents a sub-phase within the profiling pipeline.
When the DGDR Phase is "Profiling", this value indicates which step
of the profiling pipeline is currently executing.
_Validation:_
- Enum: [Initializing SweepingPrefill SweepingDecode SelectingConfig BuildingCurves GeneratingDGD Done]
_Appears in:_
- [DynamoGraphDeploymentRequestStatus](#dynamographdeploymentrequeststatus)
| Field | Description |
| --- | --- |
| `Initializing` | Profiler is loading the DGD template, detecting GPU hardware,<br />and resolving the model architecture from HuggingFace.<br /> |
| `SweepingPrefill` | Sweeping parallelization strategies (TP/TEP/DEP) across GPU counts<br />for prefill, measuring TTFT at each configuration.<br /> |
| `SweepingDecode` | Sweeping parallelization strategies and concurrency levels<br />for decode, measuring ITL at each configuration.<br /> |
| `SelectingConfig` | Filtering results against SLA targets and selecting the most<br />cost-efficient configuration that meets TTFT/ITL requirements.<br /> |
| `BuildingCurves` | Building detailed interpolation curves (ISL→TTFT for prefill,<br />KV-usage×context-length→ITL for decode) using the selected configs.<br /> |
| `GeneratingDGD` | Packaging profiling data into a ConfigMap and generating<br />the final DGD YAML with planner integration.<br /> |
| `Done` | Profiling pipeline finished successfully.<br /> |
#### ProfilingResultsStatus
ProfilingResultsStatus contains the output of the profiling process.
_Appears in:_
- [DynamoGraphDeploymentRequestStatus](#dynamographdeploymentrequeststatus)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `pareto` _[ParetoConfig](#paretoconfig) array_ | Pareto is the list of Pareto-optimal deployment configurations discovered during profiling.<br />Each entry represents a different cost/performance trade-off. | | Optional: \{\} <br /> |
| `selectedConfig` _[RawExtension](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg)_ | SelectedConfig is the recommended configuration chosen by the profiler<br />based on the SLA targets. This is the configuration used for deployment<br />when autoApply is true. | | Type: object <br />Optional: \{\} <br /> |
#### SLASpec
SLASpec defines the service-level agreement targets.
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `optimizationType` _[OptimizationType](#optimizationtype)_ | OptimizationType controls the profiling optimization strategy.<br />Use when explicit SLA targets (ttft+itl or e2eLatency) are not known. | | Enum: [latency throughput] <br />Optional: \{\} <br /> |
| `ttft` _float_ | TTFT is the Time To First Token target in milliseconds. | | Optional: \{\} <br /> |
| `itl` _float_ | ITL is the Inter-Token Latency target in milliseconds. | | Optional: \{\} <br /> |
| `e2eLatency` _float_ | E2ELatency is the target end-to-end request latency in milliseconds.<br />Alternative to specifying TTFT + ITL. | | Optional: \{\} <br /> |
#### SearchStrategy
_Underlying type:_ _string_
SearchStrategy controls the profiling search depth.
_Validation:_
- Enum: [rapid thorough]
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
| Field | Description |
| --- | --- |
| `rapid` | |
| `thorough` | |
#### WorkloadSpec
WorkloadSpec defines the workload characteristics for SLA-based profiling.
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `isl` _integer_ | ISL is the Input Sequence Length (number of tokens). | | Optional: \{\} <br /> |
| `osl` _integer_ | OSL is the Output Sequence Length (number of tokens). | | Optional: \{\} <br /> |
| `concurrency` _float_ | Concurrency is the target concurrency level.<br />Required (or RequestRate) when the planner is disabled. | | Optional: \{\} <br /> |
| `requestRate` _float_ | RequestRate is the target request rate (req/s).<br />Required (or Concurrency) when the planner is disabled. | | Optional: \{\} <br /> |
# Operator Default Values Injection # Operator Default Values Injection
The Dynamo operator automatically applies default values to various fields when they are not explicitly specified in your deployments. These defaults include: The Dynamo operator automatically applies default values to various fields when they are not explicitly specified in your deployments. These defaults include:
......
...@@ -10,6 +10,7 @@ ...@@ -10,6 +10,7 @@
## Packages ## Packages
- [nvidia.com/v1alpha1](#nvidiacomv1alpha1) - [nvidia.com/v1alpha1](#nvidiacomv1alpha1)
- [nvidia.com/v1beta1](#nvidiacomv1beta1)
## nvidia.com/v1alpha1 ## nvidia.com/v1alpha1
...@@ -475,6 +476,10 @@ Lifecycle: ...@@ -475,6 +476,10 @@ Lifecycle:
The spec becomes immutable once profiling starts. Users must delete and recreate The spec becomes immutable once profiling starts. Users must delete and recreate
the DGDR to modify configuration after this point. the DGDR to modify configuration after this point.
DEPRECATION NOTICE: v1alpha1 DynamoGraphDeploymentRequest is deprecated.
Please migrate to nvidia.com/v1beta1 DynamoGraphDeploymentRequest.
v1alpha1 will be removed in a future release.
...@@ -1206,6 +1211,416 @@ _Appears in:_ ...@@ -1206,6 +1211,416 @@ _Appears in:_
| `useAsCompilationCache` _boolean_ | UseAsCompilationCache indicates this volume should be used as a compilation cache.<br />When true, backend-specific environment variables will be set and default mount points may be used. | false | | | `useAsCompilationCache` _boolean_ | UseAsCompilationCache indicates this volume should be used as a compilation cache.<br />When true, backend-specific environment variables will be set and default mount points may be used. | false | |
## nvidia.com/v1beta1
Package v1beta1 contains API Schema definitions for the nvidia.com v1beta1 API group.
### Resource Types
- [DynamoGraphDeploymentRequest](#v1beta1-dynamographdeploymentrequest)
#### BackendType
_Underlying type:_ _string_
BackendType specifies the inference backend.
_Validation:_
- Enum: [auto sglang trtllm vllm]
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#v1beta1-dynamographdeploymentrequestspec)
| Field | Description |
| --- | --- |
| `auto` | |
| `sglang` | |
| `trtllm` | |
| `vllm` | |
#### DGDRPhase
_Underlying type:_ _string_
DGDRPhase represents the lifecycle phase of a DynamoGraphDeploymentRequest.
_Validation:_
- Enum: [Pending Profiling Ready Deploying Deployed Failed]
_Appears in:_
- [DynamoGraphDeploymentRequestStatus](#v1beta1-dynamographdeploymentrequeststatus)
| Field | Description |
| --- | --- |
| `Pending` | |
| `Profiling` | |
| `Ready` | |
| `Deploying` | |
| `Deployed` | |
| `Failed` | |
#### DeploymentInfoStatus
DeploymentInfoStatus tracks the state of the deployed DynamoGraphDeployment.
_Appears in:_
- [DynamoGraphDeploymentRequestStatus](#v1beta1-dynamographdeploymentrequeststatus)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `replicas` _integer_ | Replicas is the desired number of replicas. | | Optional: \{\} <br /> |
| `availableReplicas` _integer_ | AvailableReplicas is the number of replicas that are available and ready. | | Optional: \{\} <br /> |
#### v1beta1 DynamoGraphDeploymentRequest
DynamoGraphDeploymentRequest is the Schema for the dynamographdeploymentrequests API.
It provides a simplified, SLA-driven interface for deploying inference models on Dynamo.
Users specify a model and optional performance targets; the controller handles profiling,
configuration selection, and deployment.
Lifecycle:
1. Pending: Spec validated, preparing for profiling
2. Profiling: Profiling job is running to discover optimal configurations
3. Ready: Profiling complete, generated DGD spec available in status
4. Deploying: DGD is being created and rolled out (when autoApply=true)
5. Deployed: DGD is running and healthy
6. Failed: An unrecoverable error occurred
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `apiVersion` _string_ | `nvidia.com/v1beta1` | | |
| `kind` _string_ | `DynamoGraphDeploymentRequest` | | |
| `metadata` _[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#objectmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. | | |
| `spec` _[DynamoGraphDeploymentRequestSpec](#v1beta1-dynamographdeploymentrequestspec)_ | Spec defines the desired state for this deployment request. | | |
| `status` _[DynamoGraphDeploymentRequestStatus](#v1beta1-dynamographdeploymentrequeststatus)_ | Status reflects the current observed state of this deployment request. | | |
#### v1beta1 DynamoGraphDeploymentRequestSpec
DynamoGraphDeploymentRequestSpec defines the desired state of a DynamoGraphDeploymentRequest.
Only the Model field is required; all other fields are optional and have sensible defaults.
_Appears in:_
- [DynamoGraphDeploymentRequest](#v1beta1-dynamographdeploymentrequest)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `model` _string_ | Model specifies the model to deploy (e.g., "Qwen/Qwen3-0.6B", "meta-llama/Llama-3-70b").<br />Can be a HuggingFace ID or a private model name. | | MinLength: 1 <br />Required: \{\} <br /> |
| `backend` _[BackendType](#backendtype)_ | Backend specifies the inference backend to use for profiling and deployment. | auto | Enum: [auto sglang trtllm vllm] <br />Optional: \{\} <br /> |
| `image` _string_ | Image is the container image reference for the profiling job (frontend image).<br />Example: "nvcr.io/nvidia/dynamo-runtime:latest"<br />backend type automatically; backend images can be overridden via overrides.dgd. | | Optional: \{\} <br /> |
| `modelCache` _[ModelCacheSpec](#modelcachespec)_ | ModelCache provides optional PVC configuration for pre-downloaded model weights.<br />When provided, weights are loaded from the PVC instead of downloading from HuggingFace. | | Optional: \{\} <br /> |
| `hardware` _[HardwareSpec](#hardwarespec)_ | Hardware describes the hardware resources available for profiling and deployment.<br />Typically auto-filled by the operator from cluster discovery. | | Optional: \{\} <br /> |
| `workload` _[WorkloadSpec](#workloadspec)_ | Workload defines the expected workload characteristics for SLA-based profiling. | | Optional: \{\} <br /> |
| `sla` _[SLASpec](#slaspec)_ | SLA defines service-level agreement targets that drive profiling optimization. | | Optional: \{\} <br /> |
| `overrides` _[OverridesSpec](#overridesspec)_ | Overrides allows customizing the profiling job and the generated DynamoGraphDeployment. | | Optional: \{\} <br /> |
| `features` _[FeaturesSpec](#featuresspec)_ | Features controls optional Dynamo platform features in the generated deployment. | | Optional: \{\} <br /> |
| `searchStrategy` _[SearchStrategy](#searchstrategy)_ | SearchStrategy controls the profiling search depth.<br />"rapid" performs a fast sweep; "thorough" explores more configurations. | rapid | Enum: [rapid thorough] <br />Optional: \{\} <br /> |
| `autoApply` _boolean_ | AutoApply indicates whether to automatically create a DynamoGraphDeployment<br />after profiling completes. If false, the generated spec is stored in status<br />for manual review and application. | true | Optional: \{\} <br /> |
#### v1beta1 DynamoGraphDeploymentRequestStatus
DynamoGraphDeploymentRequestStatus represents the observed state of a DynamoGraphDeploymentRequest.
_Appears in:_
- [DynamoGraphDeploymentRequest](#v1beta1-dynamographdeploymentrequest)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `phase` _[DGDRPhase](#dgdrphase)_ | Phase is the high-level lifecycle phase of the deployment request. | | Enum: [Pending Profiling Ready Deploying Deployed Failed] <br />Optional: \{\} <br /> |
| `profilingPhase` _[ProfilingPhase](#profilingphase)_ | ProfilingPhase indicates the current sub-phase of the profiling pipeline.<br />Only meaningful when Phase is "Profiling". Cleared when profiling completes or fails. | | Enum: [Initializing SweepingPrefill SweepingDecode SelectingConfig BuildingCurves GeneratingDGD Done] <br />Optional: \{\} <br /> |
| `dgdName` _string_ | DGDName is the name of the generated or created DynamoGraphDeployment. | | Optional: \{\} <br /> |
| `profilingJobName` _string_ | ProfilingJobName is the name of the Kubernetes Job running the profiler. | | Optional: \{\} <br /> |
| `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#condition-v1-meta) array_ | Conditions contains the latest observed conditions of the deployment request.<br />Standard condition types include: Validated, ProfilingComplete, DeploymentReady. | | Optional: \{\} <br /> |
| `profilingResults` _[ProfilingResultsStatus](#profilingresultsstatus)_ | ProfilingResults contains the output of the profiling process including<br />Pareto-optimal configurations and the selected deployment configuration. | | Optional: \{\} <br /> |
| `deploymentInfo` _[DeploymentInfoStatus](#deploymentinfostatus)_ | DeploymentInfo tracks the state of the deployed DynamoGraphDeployment.<br />Populated when a DGD has been created (either via autoApply or manually). | | Optional: \{\} <br /> |
| `observedGeneration` _integer_ | ObservedGeneration is the most recent generation observed by the controller. | | Optional: \{\} <br /> |
#### FeaturesSpec
FeaturesSpec controls optional Dynamo platform features in the generated deployment.
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#v1beta1-dynamographdeploymentrequestspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `planner` _[PlannerSpec](#plannerspec)_ | Planner configures the SLA planner for autoscaling in the generated DGD. | | Optional: \{\} <br /> |
| `mocker` _[MockerSpec](#mockerspec)_ | Mocker configures the simulated (mocker) backend for testing without GPUs. | | Optional: \{\} <br /> |
#### HardwareSpec
HardwareSpec describes the hardware resources available for profiling and deployment.
These fields are typically auto-filled by the operator from cluster discovery.
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#v1beta1-dynamographdeploymentrequestspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `gpuSku` _string_ | GPUSKU is the GPU SKU identifier (e.g., "H100_SXM", "A100_80GB"). | | Optional: \{\} <br /> |
| `vramMb` _float_ | VRAMMB is the VRAM per GPU in MiB. | | Optional: \{\} <br /> |
| `totalGpus` _integer_ | TotalGPUs is the total number of GPUs available in the cluster. | | Optional: \{\} <br /> |
| `numGpusPerNode` _integer_ | NumGPUsPerNode is the number of GPUs per node. | | Optional: \{\} <br /> |
#### MockerSpec
MockerSpec configures the simulated (mocker) backend.
_Appears in:_
- [FeaturesSpec](#featuresspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `enabled` _boolean_ | Enabled indicates whether to deploy mocker workers instead of real inference workers.<br />Useful for large-scale testing without GPUs. | | Optional: \{\} <br /> |
#### ModelCacheSpec
ModelCacheSpec references a PVC containing pre-downloaded model weights.
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#v1beta1-dynamographdeploymentrequestspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `pvcName` _string_ | PVCName is the name of the PersistentVolumeClaim containing model weights.<br />The PVC must exist in the same namespace as the DGDR. | | Optional: \{\} <br /> |
| `pvcModelPath` _string_ | PVCModelPath is the path to the model checkpoint directory within the PVC<br />(e.g. "deepseek-r1" or "models/Llama-3.1-405B-FP8"). | | Optional: \{\} <br /> |
| `pvcMountPath` _string_ | PVCMountPath is the mount path for the PVC inside the container. | /opt/model-cache | Optional: \{\} <br /> |
#### OptimizationType
_Underlying type:_ _string_
OptimizationType specifies the profiling optimization strategy.
_Validation:_
- Enum: [latency throughput]
_Appears in:_
- [SLASpec](#slaspec)
| Field | Description |
| --- | --- |
| `latency` | |
| `throughput` | |
#### OverridesSpec
OverridesSpec allows customizing the profiling job and the generated DynamoGraphDeployment.
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#v1beta1-dynamographdeploymentrequestspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `profilingJob` _[JobSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#jobspec-v1-batch)_ | ProfilingJob allows overriding the profiling Job specification.<br />Fields set here are merged into the controller-generated Job spec. | | Optional: \{\} <br /> |
| `dgd` _[RawExtension](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg)_ | DGD allows providing a full or partial nvidia.com/v1alpha1 DynamoGraphDeployment<br />to use as the base for the generated deployment. Fields from profiling results<br />are merged on top. Use this to override backend worker images.<br />The field is stored as a raw embedded resource rather than a typed<br />*v1alpha1.DynamoGraphDeployment to avoid a circular import: v1alpha1 already<br />imports v1beta1 as the conversion hub and Go does not allow import cycles.<br />The EmbeddedResource marker tells the API server to validate that the value is a<br />well-formed Kubernetes object (has apiVersion/kind), but does not enforce that it<br />is specifically a DynamoGraphDeployment. Full type validation (correct apiVersion,<br />kind, and field schema) is performed by the controller during reconciliation. | | EmbeddedResource: \{\} <br />Optional: \{\} <br /> |
#### ParetoConfig
ParetoConfig represents a single Pareto-optimal deployment configuration
discovered during profiling.
_Appears in:_
- [ProfilingResultsStatus](#profilingresultsstatus)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `config` _[RawExtension](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg)_ | Config is the full deployment configuration for this Pareto point. | | Type: object <br /> |
#### PlannerPreDeploymentSweepMode
_Underlying type:_ _string_
PlannerPreDeploymentSweepMode controls pre-deployment sweeping thoroughness for planner profiling.
_Validation:_
- Enum: [none rapid thorough]
_Appears in:_
- [PlannerSpec](#plannerspec)
| Field | Description |
| --- | --- |
| `none` | |
| `rapid` | |
| `thorough` | |
#### PlannerSpec
PlannerSpec configures the SLA planner for autoscaling in the generated DGD.
_Appears in:_
- [FeaturesSpec](#featuresspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `enabled` _boolean_ | Enabled indicates whether the planner is enabled. | | Optional: \{\} <br /> |
| `plannerPreDeploymentSweeping` _[PlannerPreDeploymentSweepMode](#plannerpredeploymentsweepmode)_ | PlannerPreDeploymentSweeping controls pre-deployment sweeping mode for planner in-depth profiling.<br />"none" means no pre-deployment sweep (only load-based scaling).<br />"rapid" uses AI Configurator to simulate engine performance.<br />"thorough" uses real GPUs to measure engine performance (takes several hours). | | Enum: [none rapid thorough] <br />Optional: \{\} <br /> |
| `plannerArgsList` _string array_ | PlannerArgsList is a list of additional planner arguments. | | Optional: \{\} <br /> |
#### ProfilingPhase
_Underlying type:_ _string_
ProfilingPhase represents a sub-phase within the profiling pipeline.
When the DGDR Phase is "Profiling", this value indicates which step
of the profiling pipeline is currently executing.
_Validation:_
- Enum: [Initializing SweepingPrefill SweepingDecode SelectingConfig BuildingCurves GeneratingDGD Done]
_Appears in:_
- [DynamoGraphDeploymentRequestStatus](#v1beta1-dynamographdeploymentrequeststatus)
| Field | Description |
| --- | --- |
| `Initializing` | Profiler is loading the DGD template, detecting GPU hardware,<br />and resolving the model architecture from HuggingFace.<br /> |
| `SweepingPrefill` | Sweeping parallelization strategies (TP/TEP/DEP) across GPU counts<br />for prefill, measuring TTFT at each configuration.<br /> |
| `SweepingDecode` | Sweeping parallelization strategies and concurrency levels<br />for decode, measuring ITL at each configuration.<br /> |
| `SelectingConfig` | Filtering results against SLA targets and selecting the most<br />cost-efficient configuration that meets TTFT/ITL requirements.<br /> |
| `BuildingCurves` | Building detailed interpolation curves (ISL→TTFT for prefill,<br />KV-usage×context-length→ITL for decode) using the selected configs.<br /> |
| `GeneratingDGD` | Packaging profiling data into a ConfigMap and generating<br />the final DGD YAML with planner integration.<br /> |
| `Done` | Profiling pipeline finished successfully.<br /> |
#### ProfilingResultsStatus
ProfilingResultsStatus contains the output of the profiling process.
_Appears in:_
- [DynamoGraphDeploymentRequestStatus](#v1beta1-dynamographdeploymentrequeststatus)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `pareto` _[ParetoConfig](#paretoconfig) array_ | Pareto is the list of Pareto-optimal deployment configurations discovered during profiling.<br />Each entry represents a different cost/performance trade-off. | | Optional: \{\} <br /> |
| `selectedConfig` _[RawExtension](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg)_ | SelectedConfig is the recommended configuration chosen by the profiler<br />based on the SLA targets. This is the configuration used for deployment<br />when autoApply is true. | | Type: object <br />Optional: \{\} <br /> |
#### SLASpec
SLASpec defines the service-level agreement targets for profiling optimization.
Exactly one mode should be active: ttft+itl (default), e2eLatency, or optimizationType.
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#v1beta1-dynamographdeploymentrequestspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `optimizationType` _[OptimizationType](#optimizationtype)_ | OptimizationType controls the profiling optimization strategy.<br />Use when explicit SLA targets (ttft+itl or e2eLatency) are not known. | | Enum: [latency throughput] <br />Optional: \{\} <br /> |
| `ttft` _float_ | TTFT is the Time To First Token target in milliseconds. | | Optional: \{\} <br /> |
| `itl` _float_ | ITL is the Inter-Token Latency target in milliseconds. | | Optional: \{\} <br /> |
| `e2eLatency` _float_ | E2ELatency is the target end-to-end request latency in milliseconds.<br />Alternative to specifying TTFT + ITL. | | Optional: \{\} <br /> |
#### SearchStrategy
_Underlying type:_ _string_
SearchStrategy controls the profiling search depth.
_Validation:_
- Enum: [rapid thorough]
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#v1beta1-dynamographdeploymentrequestspec)
| Field | Description |
| --- | --- |
| `rapid` | |
| `thorough` | |
#### WorkloadSpec
WorkloadSpec defines the workload characteristics for SLA-based profiling.
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#v1beta1-dynamographdeploymentrequestspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `isl` _integer_ | ISL is the Input Sequence Length (number of tokens). | 4000 | Optional: \{\} <br /> |
| `osl` _integer_ | OSL is the Output Sequence Length (number of tokens). | 1000 | Optional: \{\} <br /> |
| `concurrency` _float_ | Concurrency is the target concurrency level.<br />Required (or RequestRate) when the planner is disabled. | | Optional: \{\} <br /> |
| `requestRate` _float_ | RequestRate is the target request rate (req/s).<br />Required (or Concurrency) when the planner is disabled. | | Optional: \{\} <br /> |
# Operator Default Values Injection # Operator Default Values Injection
The Dynamo operator automatically applies default values to various fields when they are not explicitly specified in your deployments. These defaults include: The Dynamo operator automatically applies default values to various fields when they are not explicitly specified in your deployments. These defaults include:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment