feat(operator): Refactor DGDR to use profiler's native configuration format (#3758)

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

feat(operator): Refactor DGDR to use profiler's native configuration format (#3758)
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
eaf11e70 · hhzhang16 · GitHub · 7b2f95e4 · eaf11e70 · eaf11e70
Unverified Commit eaf11e70 authored Oct 22, 2025 by hhzhang16 Committed by GitHub Oct 22, 2025
8 changed files
--- a/deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeploymentrequests.yaml
+++ b/deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeploymentrequests.yaml
@@ -36,7 +36,7 @@ spec:
        - jsonPath: .spec.modelName
          name: Model
          type: string
-        - jsonPath: .spec.backend
+        - jsonPath: .status.backend
          name: Backend
          type: string
        - jsonPath: .status.state
@@ -94,16 +94,6 @@ spec:
                    after profiling completes. If false, only the spec is generated and stored in status.
                    Users can then manually create a DGD using the generated spec.
                  type: boolean
-                backend:
-                  default: trtllm
-                  description: |-
-                    Backend specifies the inference backend framework to use.
-                    Supported values are: "vllm", "sglang", "trtllm".
-                  enum:
-                    - vllm
-                    - sglang
-                    - trtllm
-                  type: string
                deploymentOverrides:
                  description: |-
                    DeploymentOverrides allows customizing metadata for the auto-created DGD.
@@ -132,53 +122,29 @@ spec:
                        If not specified, defaults to the DGDR namespace.
                      type: string
                  type: object
-                gpu:
-                  description: |-
-                    GPU defines optional GPU type and resource specifications.
-                    These constraints guide the profiler to find configurations within specified bounds.
-                  properties:
-                    maxNumGPUsPerEngine:
-                      default: 8
-                      description: |-
-                        MaxNumGPUsPerEngine specifies the maximum number of GPUs per engine for profiling.
-                        The profiler will not consider configurations with more GPUs than this value.
-                      minimum: 1
-                      type: integer
-                    minNumGPUsPerEngine:
-                      default: 1
-                      description: |-
-                        MinNumGPUsPerEngine specifies the minimum number of GPUs per engine for profiling.
-                        The profiler will not consider configurations with fewer GPUs than this value.
-                      minimum: 1
-                      type: integer
-                    type:
-                      description: |-
-                        Type specifies the GPU type to target (e.g., "h200", "h100", "a100").
-                        If specified, profiling will focus on configurations optimized for this GPU type.
-                      type: string
-                  type: object
                modelName:
                  description: |-
-                    ModelName specifies the model to deploy (e.g., "meta/llama3-70b").
+                    ModelName specifies the model to deploy (e.g., "Qwen/Qwen3-0.6B", "meta-llama/Llama-3-70b").
-                    This should be a valid model identifier that the profiler can resolve.
+                    This is a high-level identifier for easy reference in kubectl output and logs.
                  type: string
-                online:
-                  default: false
-                  description: |-
-                    Online indicates whether to use online profiler (true) or AI Configurator (false).
-                    Online profiling uses real deployments for accurate measurements (2-4 hours).
-                    Offline profiling uses AI Configurator for fast simulation-based profiling (20-30 seconds).
-                  type: boolean
                profilingConfig:
                  description: |-
-                    ProfilingConfig provides custom configuration for the profiling job.
+                    ProfilingConfig provides the complete configuration for the profiling job.
-                    Applicable to both online and offline (AIC) profiling modes.
+                    This configuration is passed directly to the profiler.
+                    The structure matches the profile_sla config format exactly (see ProfilingConfigSpec for schema).
+                    The profiler will validate the configuration and report any errors.
                  properties:
+                    config:
+                      description: |-
+                        Config is the profiling configuration as arbitrary JSON/YAML. This will be passed directly to the profiler.
+                        The profiler will validate the configuration and report any errors.
+                      type: object
+                      x-kubernetes-preserve-unknown-fields: true
                    configMapRef:
                      description: |-
-                        ConfigMapRef is a reference to a ConfigMap containing profiling configuration.
+                        ConfigMapRef is an optional reference to a ConfigMap containing the DynamoGraphDeployment
-                        The ConfigMap should contain a key (default: "disagg.yaml") with the configuration file.
+                        base config file (disagg.yaml). This is separate from the profiling config above.
-                        This configuration is used by both online and offline (AIC) profiling modes.
+                        The path to this config will be set as engine.config in the profiling config.
                      properties:
                        key:
                          default: disagg.yaml
@@ -191,45 +157,18 @@ spec:
                        - name
                      type: object
                  type: object
-                sla:
-                  description: |-
-                    SLA defines the Service Level Agreement profiling targets.
-                    The profiler uses these targets to find an optimal deployment configuration.
-                  properties:
-                    isl:
-                      default: 3000
-                      description: |-
-                        ISL is the Input Sequence Length for profiling.
-                        Defines the length of input sequences to use during profiling tests.
-                      minimum: 1
-                      type: integer
-                    itl:
-                      default: 10
-                      description: |-
-                        ITL is the target Inter-Token Latency in milliseconds.
-                        This represents the maximum time allowed between consecutive tokens in the output.
-                      type: integer
-                    osl:
-                      default: 500
-                      description: |-
-                        OSL is the Output Sequence Length for profiling.
-                        Defines the expected length of output sequences to generate during profiling tests.
-                      minimum: 1
-                      type: integer
-                    ttft:
-                      default: 50
-                      description: |-
-                        TTFT is the target Time To First Token in milliseconds.
-                        This represents the maximum time allowed from request submission to receiving the first token.
-                      type: integer
-                  type: object
              required:
                - modelName
-                - sla
+                - profilingConfig
              type: object
            status:
              description: Status reflects the current observed state of this deployment request.
              properties:
+                backend:
+                  description: |-
+                    Backend is extracted from profilingConfig.config.engine.backend for display purposes.
+                    This field is populated by the controller and shown in kubectl output.
+                  type: string
                conditions:
                  description: |-
                    Conditions contains the latest observed conditions of the deployment request.

--- a/deploy/cloud/operator/api/v1alpha1/dynamographdeploymentrequest_types.go
+++ b/deploy/cloud/operator/api/v1alpha1/dynamographdeploymentrequest_types.go
@@ -24,6 +24,7 @@ a high-level, SLA-driven interface for deploying machine learning models on Dyna
 package v1alpha1
 import (
+	apiextensionsv1 "k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1"
 	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
 	runtime "k8s.io/apimachinery/pkg/runtime"
 )
@@ -31,61 +32,6 @@ import (
 // EDIT THIS FILE!  THIS IS SCAFFOLDING FOR YOU TO OWN!
 // NOTE: json tags are required.  Any new fields you add must have json tags for the fields to be serialized.
-// SLASpec defines Service Level Agreement targets for model profiling and deployment.
-// These targets guide the profiling process to find optimal deployment configurations
-// that meet the specified performance requirements.
-type SLASpec struct {
-	// ITL is the target Inter-Token Latency in milliseconds.
-	// This represents the maximum time allowed between consecutive tokens in the output.
-	// +kubebuilder:default=10
-	// +optional
-	ITL int `json:"itl,omitempty"`
-	// TTFT is the target Time To First Token in milliseconds.
-	// This represents the maximum time allowed from request submission to receiving the first token.
-	// +kubebuilder:default=50
-	// +optional
-	TTFT int `json:"ttft,omitempty"`
-	// ISL is the Input Sequence Length for profiling.
-	// Defines the length of input sequences to use during profiling tests.
-	// +kubebuilder:default=3000
-	// +kubebuilder:validation:Minimum=1
-	// +optional
-	ISL int `json:"isl,omitempty"`
-	// OSL is the Output Sequence Length for profiling.
-	// Defines the expected length of output sequences to generate during profiling tests.
-	// +kubebuilder:default=500
-	// +kubebuilder:validation:Minimum=1
-	// +optional
-	OSL int `json:"osl,omitempty"`
-}
-// GPUSpec defines optional GPU type and resource specifications for profiling and deployment.
-// These constraints help narrow down the search space during profiling to find configurations
-// that fit within specified hardware bounds.
-type GPUSpec struct {
-	// Type specifies the GPU type to target (e.g., "h200", "h100", "a100").
-	// If specified, profiling will focus on configurations optimized for this GPU type.
-	// +kubebuilder:validation:Optional
-	Type string `json:"type,omitempty"`
-	// MinNumGPUsPerEngine specifies the minimum number of GPUs per engine for profiling.
-	// The profiler will not consider configurations with fewer GPUs than this value.
-	// +kubebuilder:validation:Optional
-	// +kubebuilder:validation:Minimum=1
-	// +kubebuilder:default=1
-	MinNumGPUsPerEngine int `json:"minNumGPUsPerEngine,omitempty"`
-	// MaxNumGPUsPerEngine specifies the maximum number of GPUs per engine for profiling.
-	// The profiler will not consider configurations with more GPUs than this value.
-	// +kubebuilder:validation:Optional
-	// +kubebuilder:validation:Minimum=1
-	// +kubebuilder:default=8
-	MaxNumGPUsPerEngine int `json:"maxNumGPUsPerEngine,omitempty"`
-}
 // ConfigMapKeySelector selects a specific key from a ConfigMap.
 // Used to reference external configuration data stored in ConfigMaps.
 type ConfigMapKeySelector struct {
@@ -99,11 +45,19 @@ type ConfigMapKeySelector struct {
 }
 // ProfilingConfigSpec defines configuration for the profiling process.
-// Allows users to provide custom profiling parameters via ConfigMap references.
+// This structure maps directly to the profile_sla.py config format.
+// See benchmarks/profiler/utils/profiler_argparse.py for the complete schema.
 type ProfilingConfigSpec struct {
-	// ConfigMapRef is a reference to a ConfigMap containing profiling configuration.
+	// Config is the profiling configuration as arbitrary JSON/YAML. This will be passed directly to the profiler.
-	// The ConfigMap should contain a key (default: "disagg.yaml") with the configuration file.
+	// The profiler will validate the configuration and report any errors.
-	// This configuration is used by both online and offline (AIC) profiling modes.
+	// +kubebuilder:validation:Optional
+	// +kubebuilder:pruning:PreserveUnknownFields
+	// +kubebuilder:validation:Type=object
+	Config *apiextensionsv1.JSON `json:"config,omitempty"`
+	// ConfigMapRef is an optional reference to a ConfigMap containing the DynamoGraphDeployment
+	// base config file (disagg.yaml). This is separate from the profiling config above.
+	// The path to this config will be set as engine.config in the profiling config.
 	// +kubebuilder:validation:Optional
 	ConfigMapRef *ConfigMapKeySelector `json:"configMapRef,omitempty"`
 }
@@ -135,32 +89,17 @@ type DeploymentOverridesSpec struct {
 // This CRD serves as the primary interface for users to request model deployments with
 // specific performance constraints and resource requirements, enabling SLA-driven deployments.
 type DynamoGraphDeploymentRequestSpec struct {
-	// ModelName specifies the model to deploy (e.g., "meta/llama3-70b").
+	// ModelName specifies the model to deploy (e.g., "Qwen/Qwen3-0.6B", "meta-llama/Llama-3-70b").
-	// This should be a valid model identifier that the profiler can resolve.
+	// This is a high-level identifier for easy reference in kubectl output and logs.
 	// +kubebuilder:validation:Required
 	ModelName string `json:"modelName"`
-	// Backend specifies the inference backend framework to use.
+	// ProfilingConfig provides the complete configuration for the profiling job.
-	// Supported values are: "vllm", "sglang", "trtllm".
+	// This configuration is passed directly to the profiler.
-	// +kubebuilder:validation:Enum=vllm;sglang;trtllm
+	// The structure matches the profile_sla config format exactly (see ProfilingConfigSpec for schema).
-	// +kubebuilder:default=trtllm
+	// The profiler will validate the configuration and report any errors.
-	Backend string `json:"backend,omitempty"`
-	// SLA defines the Service Level Agreement profiling targets.
-	// The profiler uses these targets to find an optimal deployment configuration.
 	// +kubebuilder:validation:Required
-	SLA SLASpec `json:"sla"`
+	ProfilingConfig ProfilingConfigSpec `json:"profilingConfig"`
-	// GPU defines optional GPU type and resource specifications.
-	// These constraints guide the profiler to find configurations within specified bounds.
-	// +kubebuilder:validation:Optional
-	GPU *GPUSpec `json:"gpu,omitempty"`
-	// Online indicates whether to use online profiler (true) or AI Configurator (false).
-	// Online profiling uses real deployments for accurate measurements (2-4 hours).
-	// Offline profiling uses AI Configurator for fast simulation-based profiling (20-30 seconds).
-	// +kubebuilder:default=false
-	Online bool `json:"online,omitempty"`
 	// AutoApply indicates whether to automatically create a DynamoGraphDeployment
 	// after profiling completes. If false, only the spec is generated and stored in status.
@@ -172,11 +111,6 @@ type DynamoGraphDeploymentRequestSpec struct {
 	// Only applicable when AutoApply is true.
 	// +kubebuilder:validation:Optional
 	DeploymentOverrides *DeploymentOverridesSpec `json:"deploymentOverrides,omitempty"`
-	// ProfilingConfig provides custom configuration for the profiling job.
-	// Applicable to both online and offline (AIC) profiling modes.
-	// +kubebuilder:validation:Optional
-	ProfilingConfig *ProfilingConfigSpec `json:"profilingConfig,omitempty"`
 }
 // DeploymentStatus tracks the state of an auto-created DynamoGraphDeployment.
@@ -205,6 +139,11 @@ type DynamoGraphDeploymentRequestStatus struct {
 	// Empty string ("") represents the initial state before initialization.
 	State string `json:"state,omitempty"`
+	// Backend is extracted from profilingConfig.config.engine.backend for display purposes.
+	// This field is populated by the controller and shown in kubectl output.
+	// +kubebuilder:validation:Optional
+	Backend string `json:"backend,omitempty"`
 	// ObservedGeneration reflects the generation of the most recently observed spec.
 	// Used to detect spec changes and enforce immutability after profiling starts.
 	ObservedGeneration int64 `json:"observedGeneration,omitempty"`
@@ -253,7 +192,7 @@ type DynamoGraphDeploymentRequestStatus struct {
 // +kubebuilder:subresource:status
 // +kubebuilder:resource:shortName=dgdr
 // +kubebuilder:printcolumn:name="Model",type=string,JSONPath=`.spec.modelName`
-// +kubebuilder:printcolumn:name="Backend",type=string,JSONPath=`.spec.backend`
+// +kubebuilder:printcolumn:name="Backend",type=string,JSONPath=`.status.backend`
 // +kubebuilder:printcolumn:name="State",type=string,JSONPath=`.status.state`
 // +kubebuilder:printcolumn:name="DGD-State",type=string,JSONPath=`.status.deployment.state`
 // +kubebuilder:printcolumn:name="Age",type="date",JSONPath=".metadata.creationTimestamp"

--- a/deploy/cloud/operator/api/v1alpha1/zz_generated.deepcopy.go
+++ b/deploy/cloud/operator/api/v1alpha1/zz_generated.deepcopy.go
@@ -41,6 +41,7 @@ import (
 	"github.com/ai-dynamo/dynamo/deploy/cloud/operator/api/dynamo/common"
 	"k8s.io/api/autoscaling/v2"
 	"k8s.io/api/core/v1"
+	apiextensionsv1 "k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1"
 	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
 	"k8s.io/apimachinery/pkg/runtime"
 )
@@ -499,22 +500,12 @@ func (in *DynamoGraphDeploymentRequestList) DeepCopyObject() runtime.Object {
 // DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
 func (in *DynamoGraphDeploymentRequestSpec) DeepCopyInto(out *DynamoGraphDeploymentRequestSpec) {
 	*out = *in
-	out.SLA = in.SLA
+	in.ProfilingConfig.DeepCopyInto(&out.ProfilingConfig)
-	if in.GPU != nil {
-		in, out := &in.GPU, &out.GPU
-		*out = new(GPUSpec)
-		**out = **in
-	}
 	if in.DeploymentOverrides != nil {
 		in, out := &in.DeploymentOverrides, &out.DeploymentOverrides
 		*out = new(DeploymentOverridesSpec)
 		(*in).DeepCopyInto(*out)
 	}
-	if in.ProfilingConfig != nil {
-		in, out := &in.ProfilingConfig, &out.ProfilingConfig
-		*out = new(ProfilingConfigSpec)
-		(*in).DeepCopyInto(*out)
-	}
 }
 // DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new DynamoGraphDeploymentRequestSpec.
@@ -626,21 +617,6 @@ func (in *DynamoGraphDeploymentStatus) DeepCopy() *DynamoGraphDeploymentStatus {
 	return out
 }
-// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
-func (in *GPUSpec) DeepCopyInto(out *GPUSpec) {
-	*out = *in
-}
-// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new GPUSpec.
-func (in *GPUSpec) DeepCopy() *GPUSpec {
-	if in == nil {
-		return nil
-	}
-	out := new(GPUSpec)
-	in.DeepCopyInto(out)
-	return out
-}
 // DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
 func (in *IngressSpec) DeepCopyInto(out *IngressSpec) {
 	*out = *in
@@ -754,6 +730,11 @@ func (in *PVC) DeepCopy() *PVC {
 // DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
 func (in *ProfilingConfigSpec) DeepCopyInto(out *ProfilingConfigSpec) {
 	*out = *in
+	if in.Config != nil {
+		in, out := &in.Config, &out.Config
+		*out = new(apiextensionsv1.JSON)
+		(*in).DeepCopyInto(*out)
+	}
 	if in.ConfigMapRef != nil {
 		in, out := &in.ConfigMapRef, &out.ConfigMapRef
 		*out = new(ConfigMapKeySelector)
@@ -771,21 +752,6 @@ func (in *ProfilingConfigSpec) DeepCopy() *ProfilingConfigSpec {
 	return out
 }
-// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
-func (in *SLASpec) DeepCopyInto(out *SLASpec) {
-	*out = *in
-}
-// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new SLASpec.
-func (in *SLASpec) DeepCopy() *SLASpec {
-	if in == nil {
-		return nil
-	}
-	out := new(SLASpec)
-	in.DeepCopyInto(out)
-	return out
-}
 // DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
 func (in *SharedMemorySpec) DeepCopyInto(out *SharedMemorySpec) {
 	*out = *in

--- a/deploy/cloud/operator/config/crd/bases/nvidia.com_dynamographdeploymentrequests.yaml
+++ b/deploy/cloud/operator/config/crd/bases/nvidia.com_dynamographdeploymentrequests.yaml
@@ -36,7 +36,7 @@ spec:
        - jsonPath: .spec.modelName
          name: Model
          type: string
-        - jsonPath: .spec.backend
+        - jsonPath: .status.backend
          name: Backend
          type: string
        - jsonPath: .status.state
@@ -94,16 +94,6 @@ spec:
                    after profiling completes. If false, only the spec is generated and stored in status.
                    Users can then manually create a DGD using the generated spec.
                  type: boolean
-                backend:
-                  default: trtllm
-                  description: |-
-                    Backend specifies the inference backend framework to use.
-                    Supported values are: "vllm", "sglang", "trtllm".
-                  enum:
-                    - vllm
-                    - sglang
-                    - trtllm
-                  type: string
                deploymentOverrides:
                  description: |-
                    DeploymentOverrides allows customizing metadata for the auto-created DGD.
@@ -132,53 +122,29 @@ spec:
                        If not specified, defaults to the DGDR namespace.
                      type: string
                  type: object
-                gpu:
-                  description: |-
-                    GPU defines optional GPU type and resource specifications.
-                    These constraints guide the profiler to find configurations within specified bounds.
-                  properties:
-                    maxNumGPUsPerEngine:
-                      default: 8
-                      description: |-
-                        MaxNumGPUsPerEngine specifies the maximum number of GPUs per engine for profiling.
-                        The profiler will not consider configurations with more GPUs than this value.
-                      minimum: 1
-                      type: integer
-                    minNumGPUsPerEngine:
-                      default: 1
-                      description: |-
-                        MinNumGPUsPerEngine specifies the minimum number of GPUs per engine for profiling.
-                        The profiler will not consider configurations with fewer GPUs than this value.
-                      minimum: 1
-                      type: integer
-                    type:
-                      description: |-
-                        Type specifies the GPU type to target (e.g., "h200", "h100", "a100").
-                        If specified, profiling will focus on configurations optimized for this GPU type.
-                      type: string
-                  type: object
                modelName:
                  description: |-
-                    ModelName specifies the model to deploy (e.g., "meta/llama3-70b").
+                    ModelName specifies the model to deploy (e.g., "Qwen/Qwen3-0.6B", "meta-llama/Llama-3-70b").
-                    This should be a valid model identifier that the profiler can resolve.
+                    This is a high-level identifier for easy reference in kubectl output and logs.
                  type: string
-                online:
-                  default: false
-                  description: |-
-                    Online indicates whether to use online profiler (true) or AI Configurator (false).
-                    Online profiling uses real deployments for accurate measurements (2-4 hours).
-                    Offline profiling uses AI Configurator for fast simulation-based profiling (20-30 seconds).
-                  type: boolean
                profilingConfig:
                  description: |-
-                    ProfilingConfig provides custom configuration for the profiling job.
+                    ProfilingConfig provides the complete configuration for the profiling job.
-                    Applicable to both online and offline (AIC) profiling modes.
+                    This configuration is passed directly to the profiler.
+                    The structure matches the profile_sla config format exactly (see ProfilingConfigSpec for schema).
+                    The profiler will validate the configuration and report any errors.
                  properties:
+                    config:
+                      description: |-
+                        Config is the profiling configuration as arbitrary JSON/YAML. This will be passed directly to the profiler.
+                        The profiler will validate the configuration and report any errors.
+                      type: object
+                      x-kubernetes-preserve-unknown-fields: true
                    configMapRef:
                      description: |-
-                        ConfigMapRef is a reference to a ConfigMap containing profiling configuration.
+                        ConfigMapRef is an optional reference to a ConfigMap containing the DynamoGraphDeployment
-                        The ConfigMap should contain a key (default: "disagg.yaml") with the configuration file.
+                        base config file (disagg.yaml). This is separate from the profiling config above.
-                        This configuration is used by both online and offline (AIC) profiling modes.
+                        The path to this config will be set as engine.config in the profiling config.
                      properties:
                        key:
                          default: disagg.yaml
@@ -191,45 +157,18 @@ spec:
                        - name
                      type: object
                  type: object
-                sla:
-                  description: |-
-                    SLA defines the Service Level Agreement profiling targets.
-                    The profiler uses these targets to find an optimal deployment configuration.
-                  properties:
-                    isl:
-                      default: 3000
-                      description: |-
-                        ISL is the Input Sequence Length for profiling.
-                        Defines the length of input sequences to use during profiling tests.
-                      minimum: 1
-                      type: integer
-                    itl:
-                      default: 10
-                      description: |-
-                        ITL is the target Inter-Token Latency in milliseconds.
-                        This represents the maximum time allowed between consecutive tokens in the output.
-                      type: integer
-                    osl:
-                      default: 500
-                      description: |-
-                        OSL is the Output Sequence Length for profiling.
-                        Defines the expected length of output sequences to generate during profiling tests.
-                      minimum: 1
-                      type: integer
-                    ttft:
-                      default: 50
-                      description: |-
-                        TTFT is the target Time To First Token in milliseconds.
-                        This represents the maximum time allowed from request submission to receiving the first token.
-                      type: integer
-                  type: object
              required:
                - modelName
-                - sla
+                - profilingConfig
              type: object
            status:
              description: Status reflects the current observed state of this deployment request.
              properties:
+                backend:
+                  description: |-
+                    Backend is extracted from profilingConfig.config.engine.backend for display purposes.
+                    This field is populated by the controller and shown in kubectl output.
+                  type: string
                conditions:
                  description: |-
                    Conditions contains the latest observed conditions of the deployment request.

--- a/deploy/cloud/operator/config/samples/nvidia.com_v1alpha1_dynamographdeploymentrequest.yaml
+++ b/deploy/cloud/operator/config/samples/nvidia.com_v1alpha1_dynamographdeploymentrequest.yaml
@@ -18,18 +18,57 @@ kind: DynamoGraphDeploymentRequest
 metadata:
  name: example-llm-sla
 spec:
-  modelName: "meta/llama3-70b"
+  # ModelName is a high-level identifier for the model being deployed
-  backend: trtllm # enum: [vllm, sglang, trtllm]; default is trtllm
+  modelName: Qwen/Qwen3-0.6B
-  sla: # SLA profiling targets (all fields optional with defaults)
-    itl: 10    # Inter-Token Latency target in milliseconds (default: 10)
+  # ProfilingConfig maps directly to the profile_sla.py config format
-    ttft: 50   # Time To First Token target in milliseconds (default: 50)
+  # See benchmarks/profiler/utils/profiler_argparse.py for complete schema
-    isl: 3000  # Input Sequence Length (default: 3000)
+  profilingConfig:
-    osl: 500   # Output Sequence Length (default: 500)
+    config:
-  gpu: # optional
+      # Optional: Output directory for profiling results (defaults to /data in the Job)
-    type: h200_sxm
+      # output_dir: "profiling_results"
-    minNumGPUsPerEngine: 1  # default is 1
-    maxNumGPUsPerEngine: 8  # default is 8
+      # Engine configuration
-  online: false # true for online profiler, false for AIC profiler
+      engine:
+        backend: trtllm  # Inference backend: vllm, sglang, or trtllm
+        max_context_length: 16384  # Maximum context length supported by the model
+        is_moe_model: false  # Enable MoE model support (uses TEP/DEP instead of TP)
+      # Hardware configuration
+      hardware:
+        min_num_gpus_per_engine: 1  # Minimum GPUs to test
+        max_num_gpus_per_engine: 4  # Maximum GPUs to test (limited by model's num_heads/4)
+        num_gpus_per_node: 8  # GPUs per node (for MoE models)
+      # Sweep/profiling configuration
+      sweep:
+        skip_existing_results: true  # Skip configurations that already have results
+        prefill_interpolation_granularity: 16  # Samples for TTFT interpolation
+        decode_interpolation_granularity: 6  # Samples for ITL interpolation
+        # AI Configurator mode (fast simulation-based profiling, 20-30 seconds)
+        use_ai_configurator: false  # Set to false for online profiling (2-4 hours)
+        aic_system: h200_sxm  # Target GPU system for AI Configurator
+        aic_model_name: QWEN3_0.6B  # Model name for AI Configurator
+        aic_backend_version: "0.20.0"  # Backend version for AI Configurator
+      # SLA targets for profiling
+      sla:
+        isl: 3000  # Input sequence length
+        osl: 500   # Output sequence length
+        ttft: 50.0  # Time To First Token target (milliseconds)
+        itl: 10.0   # Inter-Token Latency target (milliseconds)
+      # Optional: Planner-specific arguments
+      # planner:
+      #   planner_min_endpoint: 2
+      #   # Add any other planner args here (use hyphens or underscores)
+    # Reference to ConfigMap containing the DGD base config (disagg.yaml)
+    # The path to this file will be automatically set as engine.config
+    configMapRef:
+      name: my-profiling-config
+      key: disagg.yaml  # defaults to "disagg.yaml"
  # Optional: Automatically create DynamoGraphDeployment after profiling
  autoApply: true  # default is false
@@ -42,9 +81,3 @@ spec:
  #     team: ml-platform
  #   annotations:
  #     description: "Auto-generated from DGDR"
-  # Currently required for both online and offline/AIC profiling, but will be removed in the future
-  profilingConfig:
-    configMapRef:
-      name: my-profiling-config
-      key: disagg.yaml  # default is "disagg.yaml"
--- a/deploy/cloud/operator/internal/controller/dynamographdeploymentrequest_controller.go
+++ b/deploy/cloud/operator/internal/controller/dynamographdeploymentrequest_controller.go
@@ -320,6 +320,9 @@ func (r *DynamoGraphDeploymentRequestReconciler) handleInitialState(ctx context.
 	// Set observedGeneration to track the spec we're processing
 	dgdr.Status.ObservedGeneration = dgdr.Generation
+	// Extract and populate backend from config for display in kubectl output
+	dgdr.Status.Backend = getBackendFromConfig(dgdr)
 	// Initialize status
 	r.Recorder.Event(dgdr, corev1.EventTypeNormal, EventReasonInitialized, MessageInitialized)
 	return r.updateStateAndRequeue(ctx, dgdr, StatePending, MessageInitialized)
@@ -337,7 +340,7 @@ func (r *DynamoGraphDeploymentRequestReconciler) handlePendingState(ctx context.
 	}
 	// Record event with appropriate message
-	if dgdr.Spec.Online {
+	if isOnlineProfiling(dgdr) {
 		r.Recorder.Event(dgdr, corev1.EventTypeNormal, EventReasonProfilingJobCreated, MessageProfilingJobCreated)
 	} else {
 		r.Recorder.Event(dgdr, corev1.EventTypeNormal, EventReasonProfilingJobCreated, MessageAICProfilingJobCreated)
@@ -670,15 +673,10 @@ func (r *DynamoGraphDeploymentRequestReconciler) handleFailedState(ctx context.C
 	return ctrl.Result{}, nil
 }
-// getProfilingJobName returns the job name for a DGDR based on profiling mode
+// getProfilingJobName returns the job name for a DGDR
 func getProfilingJobName(dgdr *nvidiacomv1alpha1.DynamoGraphDeploymentRequest) string {
-	var jobNamePrefix string
+	// Use "profile-" prefix for all profiling jobs
-	if dgdr.Spec.Online {
+	return fmt.Sprintf("profile-%s", dgdr.Name)
-		jobNamePrefix = JobNamePrefixOnline
-	} else {
-		jobNamePrefix = JobNamePrefixAIC
-	}
-	return fmt.Sprintf("%s%s", jobNamePrefix, dgdr.Name)
 }
 // getOutputConfigMapName returns the ConfigMap name for profiling output
@@ -686,32 +684,55 @@ func getOutputConfigMapName(dgdr *nvidiacomv1alpha1.DynamoGraphDeploymentRequest
 	return fmt.Sprintf("%s%s", ConfigMapOutputPrefix, dgdr.Name)
 }
-// validateSpec validates the DGDR spec
+// isOnlineProfiling determines whether online profiling or AI Configurator is being used
-func (r *DynamoGraphDeploymentRequestReconciler) validateSpec(ctx context.Context, dgdr *nvidiacomv1alpha1.DynamoGraphDeploymentRequest) error {
+// based on the sweep.use_ai_configurator config value
-	if dgdr.Spec.ModelName == "" {
+func isOnlineProfiling(dgdr *nvidiacomv1alpha1.DynamoGraphDeploymentRequest) bool {
-		return errors.New(ValidationErrorModelNameRequired)
+	if dgdr.Spec.ProfilingConfig.Config == nil {
+		return true
 	}
-	if dgdr.Spec.SLA.ITL <= 0 {
+	var config map[string]interface{}
-		return errors.New(ValidationErrorITLPositive)
+	if err := yaml.Unmarshal(dgdr.Spec.ProfilingConfig.Config.Raw, &config); err != nil {
+		return true // Default to online on parse error
 	}
-	if dgdr.Spec.SLA.TTFT <= 0 {
+	if sweep, ok := config["sweep"].(map[string]interface{}); ok {
-		return errors.New(ValidationErrorTTFTPositive)
+		if useAIC, exists := sweep["use_ai_configurator"].(bool); exists {
+			return !useAIC
+		}
 	}
+	// Default to online profiling if not specified
+	return true
+}
-	// Validate backend
+// getBackendFromConfig extracts the backend value from profilingConfig.config.engine.backend
-	validBackends := map[string]bool{
+func getBackendFromConfig(dgdr *nvidiacomv1alpha1.DynamoGraphDeploymentRequest) string {
-		BackendVLLM:   true,
+	if dgdr.Spec.ProfilingConfig.Config == nil {
-		BackendSGLang: true,
+		return ""
-		BackendTRTLLM: true,
 	}
-	if dgdr.Spec.Backend != "" && !validBackends[dgdr.Spec.Backend] {
-		return fmt.Errorf(ValidationErrorInvalidBackend, dgdr.Spec.Backend)
+	var config map[string]interface{}
+	if err := yaml.Unmarshal(dgdr.Spec.ProfilingConfig.Config.Raw, &config); err != nil {
+		return ""
 	}
-	// Validate ConfigMap if provided (for both online and offline/AIC profiling)
+	if engine, ok := config["engine"].(map[string]interface{}); ok {
-	if dgdr.Spec.ProfilingConfig != nil && dgdr.Spec.ProfilingConfig.ConfigMapRef != nil {
+		if backend, ok := engine["backend"].(string); ok {
+			return backend
+		}
+	}
+	return ""
+}
+// validateSpec validates the DGDR spec
+func (r *DynamoGraphDeploymentRequestReconciler) validateSpec(ctx context.Context, dgdr *nvidiacomv1alpha1.DynamoGraphDeploymentRequest) error {
+	// Basic validation - check that profilingConfig.config is provided
+	if dgdr.Spec.ProfilingConfig.Config == nil || len(dgdr.Spec.ProfilingConfig.Config.Raw) == 0 {
+		return errors.New("profilingConfig.config is required and must not be empty")
+	}
+	// Validate ConfigMap if provided (for the DGD base config)
+	if dgdr.Spec.ProfilingConfig.ConfigMapRef != nil {
 		cm := &corev1.ConfigMap{}
 		err := r.Get(ctx, types.NamespacedName{
 			Name:      dgdr.Spec.ProfilingConfig.ConfigMapRef.Name,
@@ -737,6 +758,24 @@ func (r *DynamoGraphDeploymentRequestReconciler) validateSpec(ctx context.Contex
 		}
 	}
+	// Parse config to validate structure
+	var config map[string]interface{}
+	if err := yaml.Unmarshal(dgdr.Spec.ProfilingConfig.Config.Raw, &config); err != nil {
+		return fmt.Errorf("failed to parse profilingConfig.config: %w", err)
+	}
+	// Additional validation: Ensure engine.config is set (either as path or will be set from ConfigMapRef)
+	engineConfig, hasEngine := config["engine"].(map[string]interface{})
+	if hasEngine {
+		_, hasConfig := engineConfig["config"]
+		if !hasConfig && dgdr.Spec.ProfilingConfig.ConfigMapRef == nil {
+			return errors.New("either profilingConfig.config.engine.config must be set, or profilingConfig.configMapRef must be provided")
+		}
+	} else if dgdr.Spec.ProfilingConfig.ConfigMapRef == nil {
+		return errors.New("profilingConfig.config must contain 'engine' section, or profilingConfig.configMapRef must be provided")
+	}
+	// The profiler will validate the rest of the configuration
 	return nil
 }
@@ -757,33 +796,48 @@ func (r *DynamoGraphDeploymentRequestReconciler) createProfilingJob(ctx context.
 		}
 	}
-	// Use ProfilerImage for both online and offline (AIC) profiling
-	imageName := r.ProfilerImage
-	if imageName == "" {
-		return fmt.Errorf("profiler image not configured: the operator's profilerImage must be set in the Helm chart values (dynamo-operator.dynamo.dgdr.profilerImage). The image must contain the ai-dynamo profiler (python -m benchmarks.profiler.profile_sla entrypoint). For development, build from the ai-dynamo repository Dockerfile and push to your registry. A public image will be available in release 0.6.1")
-	}
-	logger.Info("Using profiler image", "image", imageName, "online", dgdr.Spec.Online)
-	// Determine label based on profiling mode
-	var labelValue string
-	if dgdr.Spec.Online {
-		labelValue = LabelValueDynamoProfiler
-	} else {
-		labelValue = LabelValueAICProfiler
-	}
 	// Use SyncResource to create/update the job
 	modified, job, err := commonController.SyncResource(ctx, r, dgdr, func(ctx context.Context) (*batchv1.Job, bool, error) {
 		jobName := getProfilingJobName(dgdr)
 		outputConfigMapName := getOutputConfigMapName(dgdr)
-		// Build profiler container based on online vs offline (AIC) mode
+		// Parse the profiling config from JSON
-		var profilerArgs []string
+		var config map[string]interface{}
-		var profilerEnv []corev1.EnvVar
+		if err := yaml.Unmarshal(dgdr.Spec.ProfilingConfig.Config.Raw, &config); err != nil {
+			return nil, false, fmt.Errorf("failed to parse profiling config: %w", err)
+		}
+		// Set deployment.namespace if not already set
+		if _, hasDeployment := config["deployment"]; !hasDeployment {
+			config["deployment"] = make(map[string]interface{})
+		}
+		deploymentConfig := config["deployment"].(map[string]interface{})
+		if _, hasNamespace := deploymentConfig["namespace"]; !hasNamespace {
+			deploymentConfig["namespace"] = dgdr.Namespace
+		}
+		// Set output_dir if not already set
+		if _, hasOutputDir := config["output_dir"]; !hasOutputDir {
+			config["output_dir"] = ProfilingOutputPath
+		}
+		// If ConfigMapRef is provided, set engine.config path
+		if dgdr.Spec.ProfilingConfig.ConfigMapRef != nil {
+			if _, hasEngine := config["engine"]; !hasEngine {
+				config["engine"] = make(map[string]interface{})
+			}
+			engineConfig := config["engine"].(map[string]interface{})
+			engineConfig["config"] = fmt.Sprintf("%s/%s", ProfilingConfigPath, ProfilingConfigFile)
+		}
+		// Serialize config to YAML for passing to profiler
+		configYAML, err := yaml.Marshal(config)
+		if err != nil {
+			return nil, false, fmt.Errorf("failed to marshal profiling config to YAML: %w", err)
+		}
 		// Common environment variables
-		profilerEnv = []corev1.EnvVar{
+		profilerEnv := []corev1.EnvVar{
 			{
 				Name: "HUGGING_FACE_HUB_TOKEN",
 				ValueFrom: &corev1.EnvVarSource{
@@ -805,7 +859,7 @@ func (r *DynamoGraphDeploymentRequestReconciler) createProfilingJob(ctx context.
 			},
 		}
-		// Build container with volume mounts
+		// Build volume mounts
 		volumeMounts := []corev1.VolumeMount{
 			{
 				Name:      VolumeNameProfilingOutput,
@@ -813,49 +867,8 @@ func (r *DynamoGraphDeploymentRequestReconciler) createProfilingJob(ctx context.
 			},
 		}
-		// Determine GPU range for profiling
+		// Add ConfigMap volume mount if provided
-		minGPUs := 1
+		if dgdr.Spec.ProfilingConfig.ConfigMapRef != nil {
-		maxGPUs := 8
-		if dgdr.Spec.GPU != nil {
-			if dgdr.Spec.GPU.MinNumGPUsPerEngine > 0 {
-				minGPUs = dgdr.Spec.GPU.MinNumGPUsPerEngine
-			}
-			if dgdr.Spec.GPU.MaxNumGPUsPerEngine > 0 {
-				maxGPUs = dgdr.Spec.GPU.MaxNumGPUsPerEngine
-			}
-		}
-		// Build common profiler args (shared by both online and offline modes)
-		profilerArgs = []string{
-			"--namespace", dgdr.Namespace,
-			"--backend", dgdr.Spec.Backend,
-			"--ttft", fmt.Sprintf("%d", dgdr.Spec.SLA.TTFT),
-			"--itl", fmt.Sprintf("%d", dgdr.Spec.SLA.ITL),
-			"--isl", fmt.Sprintf("%d", dgdr.Spec.SLA.ISL),
-			"--osl", fmt.Sprintf("%d", dgdr.Spec.SLA.OSL),
-			"--output-dir", ProfilingOutputPath,
-			"--min-num-gpus-per-engine", fmt.Sprintf("%d", minGPUs),
-			"--max-num-gpus-per-engine", fmt.Sprintf("%d", maxGPUs),
-		}
-		// Add mode-specific args
-		if !dgdr.Spec.Online {
-			// Offline (AIC) profiling: add AI Configurator args
-			profilerArgs = append(profilerArgs,
-				"--use-ai-configurator",
-				"--aic-model-name", dgdr.Spec.ModelName,
-				"--aic-backend-version", "0.20.0", // TODO: don't hardcode this
-			)
-			// Add AIC-specific GPU system type
-			if dgdr.Spec.GPU != nil && dgdr.Spec.GPU.Type != "" {
-				profilerArgs = append(profilerArgs, "--aic-system", dgdr.Spec.GPU.Type)
-			}
-		}
-		// Add config if provided (for both online and offline modes)
-		if dgdr.Spec.ProfilingConfig != nil && dgdr.Spec.ProfilingConfig.ConfigMapRef != nil {
-			profilerArgs = append(profilerArgs, "--config", fmt.Sprintf("%s/%s", ProfilingConfigPath, ProfilingConfigFile))
 			volumeMounts = append(volumeMounts, corev1.VolumeMount{
 				Name:      VolumeNameProfilingConfig,
 				MountPath: ProfilingConfigPath,
@@ -863,6 +876,18 @@ func (r *DynamoGraphDeploymentRequestReconciler) createProfilingJob(ctx context.
 			})
 		}
+		// Profiler args: pass the config as an inline YAML string via --profile-config
+		profilerArgs := []string{
+			"--profile-config", string(configYAML),
+		}
+		// Determine profiler image
+		imageName := r.ProfilerImage
+		if imageName == "" {
+			return nil, false, fmt.Errorf("profiler image not configured: configure dynamo-operator.dynamo.dgdr.profilerImage in Helm values")
+		}
+		logger.Info("Using profiler image", "image", imageName)
 		profilerContainer := corev1.Container{
 			Name:    ContainerNameProfiler,
 			Image:   imageName,
@@ -918,8 +943,8 @@ func (r *DynamoGraphDeploymentRequestReconciler) createProfilingJob(ctx context.
 			},
 		}}
-		// Add ConfigMap volume if provided (for both online and offline/AIC)
+		// Add ConfigMap volume if provided
-		if dgdr.Spec.ProfilingConfig != nil && dgdr.Spec.ProfilingConfig.ConfigMapRef != nil {
+		if dgdr.Spec.ProfilingConfig.ConfigMapRef != nil {
 			key := dgdr.Spec.ProfilingConfig.ConfigMapRef.Key
 			if key == "" {
 				key = ProfilingConfigFile
@@ -944,6 +969,12 @@ func (r *DynamoGraphDeploymentRequestReconciler) createProfilingJob(ctx context.
 		// Limit retries to prevent infinite loop
 		backoffLimit := int32(3)
+		// Determine label based on whether AI Configurator is used
+		labelValue := LabelValueDynamoProfiler
+		if !isOnlineProfiling(dgdr) {
+			labelValue = LabelValueAICProfiler
+		}
 		job := &batchv1.Job{
 			ObjectMeta: metav1.ObjectMeta{
 				Name:      jobName,
@@ -978,11 +1009,7 @@ func (r *DynamoGraphDeploymentRequestReconciler) createProfilingJob(ctx context.
 	}
 	if modified {
-		if dgdr.Spec.Online {
+		logger.Info("Profiling job created/updated", "job", job.Name)
-			logger.Info("Online profiling job created/updated", "job", job.Name)
-		} else {
-			logger.Info("Offline (AIC) profiling job created/updated", "job", job.Name)
-		}
 	}
 	return nil
@@ -1070,7 +1097,7 @@ func (r *DynamoGraphDeploymentRequestReconciler) getProfilingJobErrorDetails(ctx
 // generateDGDSpec generates DGD spec from profiling results (online or offline/AIC)
 func (r *DynamoGraphDeploymentRequestReconciler) generateDGDSpec(ctx context.Context, dgdr *nvidiacomv1alpha1.DynamoGraphDeploymentRequest) error {
 	logger := log.FromContext(ctx)
-	logger.Info("Generating DGD spec from profiling results", "name", dgdr.Name, "online", dgdr.Spec.Online)
+	logger.Info("Generating DGD spec from profiling results", "name", dgdr.Name)
 	// Read the generated spec from ConfigMap (created by sidecar)
 	outputConfigMapName := getOutputConfigMapName(dgdr)

--- a/deploy/cloud/operator/internal/controller/dynamographdeploymentrequest_controller_test.go
+++ b/deploy/cloud/operator/internal/controller/dynamographdeploymentrequest_controller_test.go
--- a/docs/kubernetes/api_reference.md
+++ b/docs/kubernetes/api_reference.md
@@ -28,6 +28,11 @@ limitations under the License.
 Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group.
+This package defines the DynamoGraphDeploymentRequest (DGDR) custom resource, which provides
+a high-level, SLA-driven interface for deploying machine learning models on Dynamo.
+Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group.
 ### Resource Types
 - [DynamoComponentDeployment](#dynamocomponentdeployment)
 - [DynamoGraphDeployment](#dynamographdeployment)
@@ -62,7 +67,8 @@ _Appears in:_
-ConfigMapKeySelector selects a key from a ConfigMap.
+ConfigMapKeySelector selects a specific key from a ConfigMap.
+Used to reference external configuration data stored in ConfigMaps.
@@ -71,15 +77,16 @@ _Appears in:_
 | Field | Description | Default | Validation |
 | --- | --- | --- | --- |
-| `name` _string_ | Name of the ConfigMap. |  | Required: {} <br /> |
+| `name` _string_ | Name of the ConfigMap containing the desired data. |  | Required: {} <br /> |
-| `key` _string_ | Key in the ConfigMap to select. | disagg.yaml |  |
+| `key` _string_ | Key in the ConfigMap to select. If not specified, defaults to "disagg.yaml". | disagg.yaml |  |
 #### DeploymentOverridesSpec
-DeploymentOverridesSpec defines metadata overrides for the auto-created DGD.
+DeploymentOverridesSpec allows users to customize metadata for auto-created DynamoGraphDeployments.
+When autoApply is enabled, these overrides are applied to the generated DGD resource.
@@ -88,17 +95,18 @@ _Appears in:_
 | Field | Description | Default | Validation |
 | --- | --- | --- | --- |
-| `name` _string_ | Name is the name for the created DynamoGraphDeployment.<br />If not specified, defaults to the DGDR name. |  | Optional: {} <br /> |
+| `name` _string_ | Name is the desired name for the created DynamoGraphDeployment.<br />If not specified, defaults to the DGDR name. |  | Optional: {} <br /> |
-| `namespace` _string_ | Namespace is the namespace for the created DynamoGraphDeployment.<br />If not specified, defaults to the DGDR namespace. |  | Optional: {} <br /> |
+| `namespace` _string_ | Namespace is the desired namespace for the created DynamoGraphDeployment.<br />If not specified, defaults to the DGDR namespace. |  | Optional: {} <br /> |
-| `labels` _object (keys:string, values:string)_ | Labels are additional labels to add to the DynamoGraphDeployment.<br />These are merged with auto-generated labels. |  | Optional: {} <br /> |
+| `labels` _object (keys:string, values:string)_ | Labels are additional labels to add to the DynamoGraphDeployment metadata.<br />These are merged with auto-generated labels from the profiling process. |  | Optional: {} <br /> |
-| `annotations` _object (keys:string, values:string)_ | Annotations are additional annotations to add to the DynamoGraphDeployment. |  | Optional: {} <br /> |
+| `annotations` _object (keys:string, values:string)_ | Annotations are additional annotations to add to the DynamoGraphDeployment metadata. |  | Optional: {} <br /> |
 #### DeploymentStatus
-DeploymentStatus tracks the auto-created DGD status.
+DeploymentStatus tracks the state of an auto-created DynamoGraphDeployment.
+This status is populated when autoApply is enabled and a DGD is created.
@@ -109,8 +117,8 @@ _Appears in:_
 | --- | --- | --- | --- |
 | `name` _string_ | Name is the name of the created DynamoGraphDeployment. |  |  |
 | `namespace` _string_ | Namespace is the namespace of the created DynamoGraphDeployment. |  |  |
-| `state` _string_ | State is the current state of the DynamoGraphDeployment.<br />This is mirrored from the DGD's status.state field. |  |  |
+| `state` _string_ | State is the current state of the DynamoGraphDeployment.<br />This value is mirrored from the DGD's status.state field. |  |  |
-| `created` _boolean_ | Created indicates whether the DGD has been created.<br />Used to prevent recreation if DGD is deleted by user. |  |  |
+| `created` _boolean_ | Created indicates whether the DGD has been successfully created.<br />Used to prevent recreation if the DGD is manually deleted by users. |  |  |
 #### DynamoComponentDeployment
@@ -229,6 +237,19 @@ It serves as the primary interface for users to request model deployments with
 specific performance and resource constraints, enabling SLA-driven deployments.
+Lifecycle:
+ 1. Initial → Pending: Validates spec and prepares for profiling
+ 2. Pending → Profiling: Creates and runs profiling job (online or AIC)
+ 3. Profiling → Ready/Deploying: Generates DGD spec after profiling completes
+ 4. Deploying → Ready: When autoApply=true, monitors DGD until Ready
+ 5. Ready: Terminal state when DGD is operational or spec is available
+ 6. DeploymentDeleted: Terminal state when auto-created DGD is manually deleted
+The spec becomes immutable once profiling starts. Users must delete and recreate
+the DGDR to modify configuration after this point.
@@ -245,9 +266,9 @@ specific performance and resource constraints, enabling SLA-driven deployments.
-DynamoGraphDeploymentRequestSpec defines the desired state of DynamoGraphDeploymentRequest.
+DynamoGraphDeploymentRequestSpec defines the desired state of a DynamoGraphDeploymentRequest.
-This CRD serves as the primary interface for users to request model deployments
+This CRD serves as the primary interface for users to request model deployments with
-with specific performance and resource constraints for SLA-driven deployments.
+specific performance constraints and resource requirements, enabling SLA-driven deployments.
@@ -256,21 +277,18 @@ _Appears in:_
 | Field | Description | Default | Validation |
 | --- | --- | --- | --- |
-| `modelName` _string_ | ModelName specifies the model to deploy (e.g., "meta/llama3-70b"). |  | Required: {} <br /> |
+| `modelName` _string_ | ModelName specifies the model to deploy (e.g., "Qwen/Qwen3-0.6B", "meta-llama/Llama-3-70b").<br />This is a high-level identifier for easy reference in kubectl output and logs. |  | Required: {} <br /> |
-| `backend` _string_ | Backend specifies the backend framework to use. | trtllm | Enum: [vllm sglang trtllm] <br /> |
+| `profilingConfig` _[ProfilingConfigSpec](#profilingconfigspec)_ | ProfilingConfig provides the complete configuration for the profiling job.<br />This configuration is passed directly to the profiler.<br />The structure matches the profile_sla config format exactly (see ProfilingConfigSpec for schema).<br />The profiler will validate the configuration and report any errors. |  | Required: {} <br /> |
-| `sla` _[SLASpec](#slaspec)_ | SLA defines the Service Level Agreement profiling targets. |  | Required: {} <br /> |
+| `autoApply` _boolean_ | AutoApply indicates whether to automatically create a DynamoGraphDeployment<br />after profiling completes. If false, only the spec is generated and stored in status.<br />Users can then manually create a DGD using the generated spec. | false |  |
-| `gpu` _[GPUSpec](#gpuspec)_ | GPU defines optional GPU type specification. |  | Optional: {} <br /> |
+| `deploymentOverrides` _[DeploymentOverridesSpec](#deploymentoverridesspec)_ | DeploymentOverrides allows customizing metadata for the auto-created DGD.<br />Only applicable when AutoApply is true. |  | Optional: {} <br /> |
-| `online` _boolean_ | Online indicates whether to use online profiler (true) or AI Configurator (false).<br />When true, uses real deployment for profiling (2-4 hours).<br />When false, uses AI Configurator for fast profiling (20-30 seconds). | false |  |
-| `autoApply` _boolean_ | AutoApply indicates whether to automatically create a DynamoGraphDeployment<br />after profiling completes. If false, only the spec is generated in status. | false |  |
-| `deploymentOverrides` _[DeploymentOverridesSpec](#deploymentoverridesspec)_ | DeploymentOverrides allows overriding metadata for the auto-created DGD.<br />Only used when AutoApply is true. |  | Optional: {} <br /> |
-| `profilingConfig` _[ProfilingConfigSpec](#profilingconfigspec)_ | ProfilingConfig provides configuration for the profiling job.<br />Can be used for both online and offline (AIC) profiling. |  | Optional: {} <br /> |
 #### DynamoGraphDeploymentRequestStatus
-DynamoGraphDeploymentRequestStatus defines the observed state of DynamoGraphDeploymentRequest.
+DynamoGraphDeploymentRequestStatus represents the observed state of a DynamoGraphDeploymentRequest.
+The controller updates this status as the DGDR progresses through its lifecycle.
@@ -279,12 +297,13 @@ _Appears in:_
 | Field | Description | Default | Validation |
 | --- | --- | --- | --- |
-| `state` _string_ | State is a high-level textual status of the deployment request lifecycle.<br />Possible values: "Pending", "Profiling", "Deploying", "Ready", "DeploymentDeleted", "Failed" |  |  |
+| `state` _string_ | State is a high-level textual status of the deployment request lifecycle.<br />Possible values: "", "Pending", "Profiling", "Deploying", "Ready", "DeploymentDeleted", "Failed"<br />Empty string ("") represents the initial state before initialization. |  |  |
-| `observedGeneration` _integer_ | ObservedGeneration reflects the generation of the most recently observed spec.<br />Used to detect spec changes and enforce immutability. |  |  |
+| `backend` _string_ | Backend is extracted from profilingConfig.config.engine.backend for display purposes.<br />This field is populated by the controller and shown in kubectl output. |  | Optional: {} <br /> |
-| `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#condition-v1-meta) array_ | Conditions contains the latest observed conditions of the deployment request.<br />The slice is merged by type on patch updates. |  |  |
+| `observedGeneration` _integer_ | ObservedGeneration reflects the generation of the most recently observed spec.<br />Used to detect spec changes and enforce immutability after profiling starts. |  |  |
-| `profilingResults` _string_ | ProfilingResults contains references to the profiling data and results. |  | Optional: {} <br /> |
+| `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#condition-v1-meta) array_ | Conditions contains the latest observed conditions of the deployment request.<br />Standard condition types include: Validation, Profiling, SpecGenerated, DeploymentReady.<br />Conditions are merged by type on patch updates. |  |  |
-| `generatedDeployment` _[RawExtension](#rawextension)_ | GeneratedDeployment contains the full generated DynamoGraphDeployment (including metadata)<br />based on profiling results. This can be used to create a DynamoGraphDeployment resource.<br />Stored as RawExtension to preserve all fields including metadata. |  | EmbeddedResource: {} <br />Optional: {} <br /> |
+| `profilingResults` _string_ | ProfilingResults contains a reference to the ConfigMap holding profiling data.<br />Format: "configmap/<name>" |  | Optional: {} <br /> |
-| `deployment` _[DeploymentStatus](#deploymentstatus)_ | Deployment tracks the auto-created DGD if AutoApply is true. |  | Optional: {} <br /> |
+| `generatedDeployment` _[RawExtension](#rawextension)_ | GeneratedDeployment contains the full generated DynamoGraphDeployment specification<br />including metadata, based on profiling results. Users can extract this to create<br />a DGD manually, or it's used automatically when autoApply is true.<br />Stored as RawExtension to preserve all fields including metadata. |  | EmbeddedResource: {} <br />Optional: {} <br /> |
+| `deployment` _[DeploymentStatus](#deploymentstatus)_ | Deployment tracks the auto-created DGD when AutoApply is true.<br />Contains name, namespace, state, and creation status of the managed DGD. |  | Optional: {} <br /> |
 #### DynamoGraphDeploymentSpec
@@ -323,24 +342,6 @@ _Appears in:_
 | `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#condition-v1-meta) array_ | Conditions contains the latest observed conditions of the graph deployment.<br />The slice is merged by type on patch updates. |  |  |
-#### GPUSpec
-GPUSpec defines optional GPU type specification.
-_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `type` _string_ | Type specifies the GPU type (e.g., "h200", "h100", "a100"). |  | Optional: {} <br /> |
-| `minNumGPUsPerEngine` _integer_ | MinNumGPUsPerEngine specifies the minimum number of GPUs per engine for profiling. | 1 | Minimum: 1 <br />Optional: {} <br /> |
-| `maxNumGPUsPerEngine` _integer_ | MaxNumGPUsPerEngine specifies the maximum number of GPUs per engine for profiling. | 8 | Minimum: 1 <br />Optional: {} <br /> |
 #### IngressSpec
@@ -424,23 +425,9 @@ _Appears in:_
-ProfilingConfigSpec defines the profiling configuration.
+ProfilingConfigSpec defines configuration for the profiling process.
+This structure maps directly to the profile_sla.py config format.
+See benchmarks/profiler/utils/profiler_argparse.py for the complete schema.
-_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
-| Field | Description | Default | Validation |
-| --- | --- | --- | --- |
-| `configMapRef` _[ConfigMapKeySelector](#configmapkeyselector)_ | ConfigMapRef is a reference to a ConfigMap containing the profiling configuration.<br />The ConfigMap should contain a key (default: "disagg.yaml") with the configuration file.<br />Can be used for both online and offline (AIC) profiling. |  | Optional: {} <br /> |
-#### SLASpec
-SLASpec defines the Service Level Agreement profiling targets.
@@ -449,10 +436,8 @@ _Appears in:_
 | Field | Description | Default | Validation |
 | --- | --- | --- | --- |
-| `itl` _integer_ | ITL is the target Inter-Token Latency in milliseconds. |  | Required: {} <br /> |
+| `config` _[JSON](#json)_ | Config is the profiling configuration as arbitrary JSON/YAML. This will be passed directly to the profiler.<br />The profiler will validate the configuration and report any errors. |  | Optional: {} <br />Type: object <br /> |
-| `ttft` _integer_ | TTFT is the target Time To First Token in milliseconds. |  | Required: {} <br /> |
+| `configMapRef` _[ConfigMapKeySelector](#configmapkeyselector)_ | ConfigMapRef is an optional reference to a ConfigMap containing the DynamoGraphDeployment<br />base config file (disagg.yaml). This is separate from the profiling config above.<br />The path to this config will be set as engine.config in the profiling config. |  | Optional: {} <br /> |
-| `isl` _integer_ | ISL is the Input Sequence Length for profiling. |  | Minimum: 1 <br />Required: {} <br /> |
-| `osl` _integer_ | OSL is the Output Sequence Length for profiling. |  | Minimum: 1 <br />Required: {} <br /> |
 #### SharedMemorySpec
@@ -588,7 +573,16 @@ For larger models (typically >70B parameters) or slower storage systems, you may
 For multinode deployments, the operator modifies probes based on the backend framework and node role:
 #### VLLM Backend
+The operator automatically selects between two deployment modes based on parallelism configuration:
+**Ray-Based Mode** (when `world_size > GPUs_per_node`):
+- **Worker nodes**: All probes (liveness, readiness, startup) are removed
+- **Leader nodes**: All probes remain active
+**Data Parallel Mode** (when `world_size × data_parallel_size > GPUs_per_node`):
 - **Worker nodes**: All probes (liveness, readiness, startup) are removed
+- **Leader nodes**: All probes remain active
 #### SGLang Backend
 - **Worker nodes**: All probes (liveness, readiness, startup) are removed
@@ -686,7 +680,8 @@ Default container ports are configured based on component type:
 ## Backend-Specific Configurations
 ### VLLM
- **Ray Head Port**: 6379 (for multinode deployments)
+- **Ray Head Port**: 6379 (for Ray-based multinode deployments)
+- **Data Parallel RPC Port**: 13445 (for data parallel multinode deployments)
 ### SGLang
 - **Distribution Init Port**: 29500 (for multinode deployments)