feat: add useMocker field to DGDR (#4813)

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

feat: add useMocker field to DGDR (#4813)
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
4e6493ae · hhzhang16 · GitHub · c341ee8d · 4e6493ae · 4e6493ae
Unverified Commit 4e6493ae authored Dec 15, 2025 by hhzhang16 Committed by GitHub Dec 15, 2025
6 changed files
--- a/deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeploymentrequests.yaml
+++ b/deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeploymentrequests.yaml
@@ -96,8 +96,9 @@ spec:
                  type: boolean
                backend:
                  description: |-
-                    Backend specifies the inference backend to use.
+                    Backend specifies the inference backend for profiling.
                    The controller automatically sets this value in profilingConfig.config.engine.backend.
+                    Profiling runs on real GPUs or via AIC simulation to collect performance data.
                  enum:
                    - vllm
                    - sglang
@@ -304,6 +305,15 @@ spec:
                  required:
                    - profilerImage
                  type: object
+                useMocker:
+                  default: false
+                  description: |-
+                    UseMocker indicates whether to deploy a mocker DynamoGraphDeployment instead of
+                    a real backend deployment. When true, the deployment uses simulated engines that
+                    don't require GPUs, using the profiling data to simulate realistic timing behavior.
+                    Mocker is available in all backend images and useful for large-scale experiments.
+                    Profiling still runs against the real backend (specified above) to collect performance data.
+                  type: boolean
              required:
                - backend
                - model
@@ -404,6 +414,7 @@ spec:
                    including metadata, based on profiling results. Users can extract this to create
                    a DGD manually, or it's used automatically when autoApply is true.
                    Stored as RawExtension to preserve all fields including metadata.
+                    For mocker backends, this contains the mocker DGD spec.
                  type: object
                  x-kubernetes-embedded-resource: true
                  x-kubernetes-preserve-unknown-fields: true

--- a/deploy/cloud/operator/api/v1alpha1/dynamographdeploymentrequest_types.go
+++ b/deploy/cloud/operator/api/v1alpha1/dynamographdeploymentrequest_types.go
@@ -129,12 +129,21 @@ type DynamoGraphDeploymentRequestSpec struct {
 	// +kubebuilder:validation:Required
 	Model string `json:"model"`
-	// Backend specifies the inference backend to use.
+	// Backend specifies the inference backend for profiling.
 	// The controller automatically sets this value in profilingConfig.config.engine.backend.
+	// Profiling runs on real GPUs or via AIC simulation to collect performance data.
 	// +kubebuilder:validation:Required
 	// +kubebuilder:validation:Enum=vllm;sglang;trtllm
 	Backend string `json:"backend"`
+	// UseMocker indicates whether to deploy a mocker DynamoGraphDeployment instead of
+	// a real backend deployment. When true, the deployment uses simulated engines that
+	// don't require GPUs, using the profiling data to simulate realistic timing behavior.
+	// Mocker is available in all backend images and useful for large-scale experiments.
+	// Profiling still runs against the real backend (specified above) to collect performance data.
+	// +kubebuilder:default=false
+	UseMocker bool `json:"useMocker,omitempty"`
 	// EnableGpuDiscovery controls whether the profiler should automatically discover GPU
 	// resources from the Kubernetes cluster nodes. When enabled, the profiler will override
 	// any manually specified hardware configuration (min_num_gpus_per_engine, max_num_gpus_per_engine,
@@ -213,6 +222,7 @@ type DynamoGraphDeploymentRequestStatus struct {
 	// including metadata, based on profiling results. Users can extract this to create
 	// a DGD manually, or it's used automatically when autoApply is true.
 	// Stored as RawExtension to preserve all fields including metadata.
+	// For mocker backends, this contains the mocker DGD spec.
 	// +kubebuilder:validation:Optional
 	// +kubebuilder:pruning:PreserveUnknownFields
 	// +kubebuilder:validation:EmbeddedResource

--- a/deploy/cloud/operator/config/crd/bases/nvidia.com_dynamographdeploymentrequests.yaml
+++ b/deploy/cloud/operator/config/crd/bases/nvidia.com_dynamographdeploymentrequests.yaml
@@ -96,8 +96,9 @@ spec:
                  type: boolean
                backend:
                  description: |-
-                    Backend specifies the inference backend to use.
+                    Backend specifies the inference backend for profiling.
                    The controller automatically sets this value in profilingConfig.config.engine.backend.
+                    Profiling runs on real GPUs or via AIC simulation to collect performance data.
                  enum:
                    - vllm
                    - sglang
@@ -304,6 +305,15 @@ spec:
                  required:
                    - profilerImage
                  type: object
+                useMocker:
+                  default: false
+                  description: |-
+                    UseMocker indicates whether to deploy a mocker DynamoGraphDeployment instead of
+                    a real backend deployment. When true, the deployment uses simulated engines that
+                    don't require GPUs, using the profiling data to simulate realistic timing behavior.
+                    Mocker is available in all backend images and useful for large-scale experiments.
+                    Profiling still runs against the real backend (specified above) to collect performance data.
+                  type: boolean
              required:
                - backend
                - model
@@ -404,6 +414,7 @@ spec:
                    including metadata, based on profiling results. Users can extract this to create
                    a DGD manually, or it's used automatically when autoApply is true.
                    Stored as RawExtension to preserve all fields including metadata.
+                    For mocker backends, this contains the mocker DGD spec.
                  type: object
                  x-kubernetes-embedded-resource: true
                  x-kubernetes-preserve-unknown-fields: true

--- a/deploy/cloud/operator/internal/controller/dynamographdeploymentrequest_controller.go
+++ b/deploy/cloud/operator/internal/controller/dynamographdeploymentrequest_controller.go
@@ -118,10 +118,11 @@ const (
 	VolumeNameProfilingOutput = "profiling-output"
 	// Volume paths
-	ProfilingOutputPath = "/data"
+	ProfilingOutputPath       = "/data"
-	ProfilingOutputFile = "config_with_planner.yaml"
+	ProfilingOutputFile       = "config_with_planner.yaml"
-	ProfilingConfigPath = "/config"
+	ProfilingOutputFileMocker = "mocker_config_with_planner.yaml"
-	ProfilingConfigFile = "disagg.yaml"
+	ProfilingConfigPath       = "/config"
+	ProfilingConfigFile       = "disagg.yaml"
 	// Command line arguments
 	ArgModel   = "--model"
@@ -202,6 +203,13 @@ data:
 EOF
 sed 's/^/    /' {{.OutputPath}}/{{.OutputFile}} >> /tmp/cm.yaml
+# Add mocker config (profiler always generates both real and mocker configs)
+if [ -f {{.OutputPath}}/{{.MockerOutputFile}} ]; then
+  echo "  {{.MockerOutputFile}}: |" >> /tmp/cm.yaml
+  sed 's/^/    /' {{.OutputPath}}/{{.MockerOutputFile}} >> /tmp/cm.yaml
+  echo "Added mocker config to ConfigMap"
+fi
 # Note: Profiling data (raw_data.npz converted to JSON) is included in the
 # generated DGD YAML as a separate ConfigMap by the profiler, no need to add it here
@@ -582,6 +590,7 @@ func (r *DynamoGraphDeploymentRequestReconciler) createDGD(ctx context.Context,
 	dgdName := generatedDGD.Name
 	dgdNamespace := dgdr.Namespace
+	// Apply deployment overrides
 	if dgdr.Spec.DeploymentOverrides != nil {
 		if dgdr.Spec.DeploymentOverrides.Name != "" {
 			dgdName = dgdr.Spec.DeploymentOverrides.Name
@@ -987,11 +996,12 @@ func (r *DynamoGraphDeploymentRequestReconciler) createProfilingJob(ctx context.
 		var scriptBuf bytes.Buffer
 		err = tmpl.Execute(&scriptBuf, map[string]string{
-			"OutputPath":    ProfilingOutputPath,
+			"OutputPath":       ProfilingOutputPath,
-			"OutputFile":    ProfilingOutputFile,
+			"OutputFile":       ProfilingOutputFile,
-			"ConfigMapName": outputConfigMapName,
+			"MockerOutputFile": ProfilingOutputFileMocker,
-			"Namespace":     dgdr.Namespace,
+			"ConfigMapName":    outputConfigMapName,
-			"DGDRName":      dgdr.Name,
+			"Namespace":        dgdr.Namespace,
+			"DGDRName":         dgdr.Name,
 		})
 		if err != nil {
 			return nil, false, fmt.Errorf("failed to execute sidecar script template: %w", err)
@@ -1265,7 +1275,7 @@ func (r *DynamoGraphDeploymentRequestReconciler) getProfilingJobErrorDetails(ctx
 // generateDGDSpec generates DGD spec from profiling results (online or offline/AIC)
 func (r *DynamoGraphDeploymentRequestReconciler) generateDGDSpec(ctx context.Context, dgdr *nvidiacomv1alpha1.DynamoGraphDeploymentRequest) error {
 	logger := log.FromContext(ctx)
-	logger.Info("Generating DGD spec from profiling results", "name", dgdr.Name)
+	logger.Info("Generating DGD spec from profiling results", "name", dgdr.Name, "backend", dgdr.Spec.Backend)
 	// Read the generated spec from ConfigMap (created by sidecar)
 	outputConfigMapName := getOutputConfigMapName(dgdr)
@@ -1282,18 +1292,28 @@ func (r *DynamoGraphDeploymentRequestReconciler) generateDGDSpec(ctx context.Con
 		return fmt.Errorf("failed to get output ConfigMap: %w", err)
 	}
+	// Select the right config file based on useMocker flag
+	// Profiler always generates both real and mocker configs
+	var outputFile string
+	if dgdr.Spec.UseMocker {
+		outputFile = ProfilingOutputFileMocker
+		logger.Info("Using mocker deployment config")
+	} else {
+		outputFile = ProfilingOutputFile
+	}
 	// Get YAML content from ConfigMap
-	yamlContent, exists := cm.Data[ProfilingOutputFile]
+	yamlContent, exists := cm.Data[outputFile]
 	if !exists {
-		return fmt.Errorf("key %s not found in ConfigMap %s", ProfilingOutputFile, outputConfigMapName)
+		return fmt.Errorf("key %s not found in ConfigMap %s", outputFile, outputConfigMapName)
 	}
-	logger.Info("Found profiling output in ConfigMap", "configMap", outputConfigMapName, "size", len(yamlContent))
+	logger.Info("Found profiling output in ConfigMap", "configMap", outputConfigMapName, "outputFile", outputFile, "size", len(yamlContent))
 	// Extract DGD and any supporting resources from potentially multi-document YAML (ConfigMap + DGD)
 	dgd, additionalResources, err := r.extractResourcesFromYAML([]byte(yamlContent))
 	if err != nil {
-		return fmt.Errorf("failed to extract DGD from %s: %w", ProfilingOutputFile, err)
+		return fmt.Errorf("failed to extract DGD from %s: %w", outputFile, err)
 	}
 	logger.Info("Parsed profiling output", "dgdName", dgd.Name, "additionalResources", len(additionalResources))

--- a/docs/kubernetes/api_reference.md
+++ b/docs/kubernetes/api_reference.md
@@ -309,7 +309,8 @@ _Appears in:_
 | Field | Description | Default | Validation |
 | --- | --- | --- | --- |
 | `model` _string_ | Model specifies the model to deploy (e.g., "Qwen/Qwen3-0.6B", "meta-llama/Llama-3-70b").<br />This is a high-level identifier for easy reference in kubectl output and logs.<br />The controller automatically sets this value in profilingConfig.config.deployment.model. |  | Required: \{\} <br /> |
-| `backend` _string_ | Backend specifies the inference backend to use.<br />The controller automatically sets this value in profilingConfig.config.engine.backend. |  | Enum: [vllm sglang trtllm] <br />Required: \{\} <br /> |
+| `backend` _string_ | Backend specifies the inference backend for profiling.<br />The controller automatically sets this value in profilingConfig.config.engine.backend.<br />Profiling runs on real GPUs or via AIC simulation to collect performance data. |  | Enum: [vllm sglang trtllm] <br />Required: \{\} <br /> |
+| `useMocker` _boolean_ | UseMocker indicates whether to deploy a mocker DynamoGraphDeployment instead of<br />a real backend deployment. When true, the deployment uses simulated engines that<br />don't require GPUs, using the profiling data to simulate realistic timing behavior.<br />Mocker is available in all backend images and useful for large-scale experiments.<br />Profiling still runs against the real backend (specified above) to collect performance data. | false |  |
 | `enableGpuDiscovery` _boolean_ | EnableGpuDiscovery controls whether the profiler should automatically discover GPU<br />resources from the Kubernetes cluster nodes. When enabled, the profiler will override<br />any manually specified hardware configuration (min_num_gpus_per_engine, max_num_gpus_per_engine,<br />num_gpus_per_node) with values detected from the cluster.<br />Requires cluster-wide node access permissions - only available with cluster-scoped operators. | false | Optional: \{\} <br /> |
 | `profilingConfig` _[ProfilingConfigSpec](#profilingconfigspec)_ | ProfilingConfig provides the complete configuration for the profiling job.<br />This configuration is passed directly to the profiler.<br />The structure matches the profile_sla config format exactly (see ProfilingConfigSpec for schema).<br />Note: deployment.model and engine.backend are automatically set from the high-level<br />modelName and backend fields and should not be specified in this config. |  | Required: \{\} <br /> |
 | `autoApply` _boolean_ | AutoApply indicates whether to automatically create a DynamoGraphDeployment<br />after profiling completes. If false, only the spec is generated and stored in status.<br />Users can then manually create a DGD using the generated spec. | false |  |
@@ -335,7 +336,7 @@ _Appears in:_
 | `observedGeneration` _integer_ | ObservedGeneration reflects the generation of the most recently observed spec.<br />Used to detect spec changes and enforce immutability after profiling starts. |  |  |
 | `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#condition-v1-meta) array_ | Conditions contains the latest observed conditions of the deployment request.<br />Standard condition types include: Validation, Profiling, SpecGenerated, DeploymentReady.<br />Conditions are merged by type on patch updates. |  |  |
 | `profilingResults` _string_ | ProfilingResults contains a reference to the ConfigMap holding profiling data.<br />Format: "configmap/<name>" |  | Optional: \{\} <br /> |
-| `generatedDeployment` _[RawExtension](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg)_ | GeneratedDeployment contains the full generated DynamoGraphDeployment specification<br />including metadata, based on profiling results. Users can extract this to create<br />a DGD manually, or it's used automatically when autoApply is true.<br />Stored as RawExtension to preserve all fields including metadata. |  | EmbeddedResource: \{\} <br />Optional: \{\} <br /> |
+| `generatedDeployment` _[RawExtension](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg)_ | GeneratedDeployment contains the full generated DynamoGraphDeployment specification<br />including metadata, based on profiling results. Users can extract this to create<br />a DGD manually, or it's used automatically when autoApply is true.<br />Stored as RawExtension to preserve all fields including metadata.<br />For mocker backends, this contains the mocker DGD spec. |  | EmbeddedResource: \{\} <br />Optional: \{\} <br /> |
 | `deployment` _[DeploymentStatus](#deploymentstatus)_ | Deployment tracks the auto-created DGD when AutoApply is true.<br />Contains name, namespace, state, and creation status of the managed DGD. |  | Optional: \{\} <br /> |

--- a/docs/planner/sla_planner_quickstart.md
+++ b/docs/planner/sla_planner_quickstart.md
@@ -355,6 +355,29 @@ For details about the profiling process, performance plots, and interpolation da
 ## Advanced Topics
+### Mocker Deployment
+Instead of a real DGD that uses GPU resources, you can deploy a mocker deployment that uses simulated engines rather than GPUs. Mocker is available in all backend images and uses profiling data to simulate realistic GPU timing behavior. It is useful for:
+- Large-scale experiments without GPU resources
+- Testing Planner behavior and infrastructure
+- Validating deployment configurations
+To deploy mocker instead of the real backend, set `useMocker: true`:
+```yaml
+spec:
+  model: <model-name>
+  backend: trtllm  # Real backend for profiling (vllm, sglang, or trtllm)
+  useMocker: true  # Deploy mocker instead of real backend
+  profilingConfig:
+    profilerImage: "nvcr.io/nvidia/dynamo/trtllm-runtime:<image-tag>"
+    ...
+  autoApply: true
+```
+Profiling still runs against the real backend (via GPUs or AIC) to collect performance data. The mocker deployment then uses this data to simulate realistic timing behavior.
 ### DGDR Immutability
 DGDRs are **immutable** - if you need to update SLAs or configuration: