| `model` _string_ | Model specifies the model to deploy (e.g., "Qwen/Qwen3-0.6B", "meta-llama/Llama-3-70b").<br/>This is a high-level identifier for easy reference in kubectl output and logs.<br/>The controller automatically sets this value in profilingConfig.config.deployment.model. | | Required: \{\}<br/> |
| `model` _string_ | Model specifies the model to deploy (e.g., "Qwen/Qwen3-0.6B", "meta-llama/Llama-3-70b").<br/>This is a high-level identifier for easy reference in kubectl output and logs.<br/>The controller automatically sets this value in profilingConfig.config.deployment.model. | | Required: \{\}<br/> |
| `backend` _string_ | Backend specifies the inference backend to use.<br/>The controller automatically sets this value in profilingConfig.config.engine.backend. | | Enum: [vllm sglang trtllm] <br/>Required: \{\}<br/> |
| `backend` _string_ | Backend specifies the inference backend for profiling.<br/>The controller automatically sets this value in profilingConfig.config.engine.backend.<br/>Profiling runs on real GPUs or via AIC simulation to collect performance data. | | Enum: [vllm sglang trtllm] <br/>Required: \{\}<br/> |
| `useMocker` _boolean_ | UseMocker indicates whether to deploy a mocker DynamoGraphDeployment instead of<br/>a real backend deployment. When true, the deployment uses simulated engines that<br/>don't require GPUs, using the profiling data to simulate realistic timing behavior.<br/>Mocker is available in all backend images and useful for large-scale experiments.<br/>Profiling still runs against the real backend (specified above) to collect performance data. | false | |
| `enableGpuDiscovery` _boolean_ | EnableGpuDiscovery controls whether the profiler should automatically discover GPU<br/>resources from the Kubernetes cluster nodes. When enabled, the profiler will override<br/>any manually specified hardware configuration (min_num_gpus_per_engine, max_num_gpus_per_engine,<br/>num_gpus_per_node) with values detected from the cluster.<br/>Requires cluster-wide node access permissions - only available with cluster-scoped operators. | false | Optional: \{\}<br/> |
| `enableGpuDiscovery` _boolean_ | EnableGpuDiscovery controls whether the profiler should automatically discover GPU<br/>resources from the Kubernetes cluster nodes. When enabled, the profiler will override<br/>any manually specified hardware configuration (min_num_gpus_per_engine, max_num_gpus_per_engine,<br/>num_gpus_per_node) with values detected from the cluster.<br/>Requires cluster-wide node access permissions - only available with cluster-scoped operators. | false | Optional: \{\}<br/> |
| `profilingConfig` _[ProfilingConfigSpec](#profilingconfigspec)_ | ProfilingConfig provides the complete configuration for the profiling job.<br />This configuration is passed directly to the profiler.<br />The structure matches the profile_sla config format exactly (see ProfilingConfigSpec for schema).<br/>Note: deployment.model and engine.backend are automatically set from the high-level<br/>modelName and backend fields and should not be specified in this config. | | Required: \{\}<br/> |
| `profilingConfig` _[ProfilingConfigSpec](#profilingconfigspec)_ | ProfilingConfig provides the complete configuration for the profiling job.<br />This configuration is passed directly to the profiler.<br />The structure matches the profile_sla config format exactly (see ProfilingConfigSpec for schema).<br/>Note: deployment.model and engine.backend are automatically set from the high-level<br/>modelName and backend fields and should not be specified in this config. | | Required: \{\}<br/> |
| `autoApply` _boolean_ | AutoApply indicates whether to automatically create a DynamoGraphDeployment<br/>after profiling completes. If false, only the spec is generated and stored in status.<br/>Users can then manually create a DGD using the generated spec. | false | |
| `autoApply` _boolean_ | AutoApply indicates whether to automatically create a DynamoGraphDeployment<br/>after profiling completes. If false, only the spec is generated and stored in status.<br/>Users can then manually create a DGD using the generated spec. | false | |
...
@@ -335,7 +336,7 @@ _Appears in:_
...
@@ -335,7 +336,7 @@ _Appears in:_
| `observedGeneration` _integer_ | ObservedGeneration reflects the generation of the most recently observed spec.<br/>Used to detect spec changes and enforce immutability after profiling starts. | | |
| `observedGeneration` _integer_ | ObservedGeneration reflects the generation of the most recently observed spec.<br/>Used to detect spec changes and enforce immutability after profiling starts. | | |
| `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#condition-v1-meta) array_ | Conditions contains the latest observed conditions of the deployment request.<br/>Standard condition types include: Validation, Profiling, SpecGenerated, DeploymentReady.<br/>Conditions are merged by type on patch updates. | | |
| `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#condition-v1-meta) array_ | Conditions contains the latest observed conditions of the deployment request.<br/>Standard condition types include: Validation, Profiling, SpecGenerated, DeploymentReady.<br/>Conditions are merged by type on patch updates. | | |
| `profilingResults` _string_ | ProfilingResults contains a reference to the ConfigMap holding profiling data.<br/>Format: "configmap/<name>" | | Optional: \{\}<br/> |
| `profilingResults` _string_ | ProfilingResults contains a reference to the ConfigMap holding profiling data.<br/>Format: "configmap/<name>" | | Optional: \{\}<br/> |
| `generatedDeployment` _[RawExtension](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg)_ | GeneratedDeployment contains the full generated DynamoGraphDeployment specification<br/>including metadata, based on profiling results. Users can extract this to create<br/>a DGD manually, or it's used automatically when autoApply is true.<br/>Stored as RawExtension to preserve all fields including metadata. | | EmbeddedResource: \{\}<br/>Optional: \{\}<br/> |
| `generatedDeployment` _[RawExtension](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg)_ | GeneratedDeployment contains the full generated DynamoGraphDeployment specification<br/>including metadata, based on profiling results. Users can extract this to create<br/>a DGD manually, or it's used automatically when autoApply is true.<br/>Stored as RawExtension to preserve all fields including metadata.<br/>For mocker backends, this contains the mocker DGD spec. | | EmbeddedResource: \{\}<br/>Optional: \{\}<br/> |
| `deployment` _[DeploymentStatus](#deploymentstatus)_ | Deployment tracks the auto-created DGD when AutoApply is true.<br/>Contains name, namespace, state, and creation status of the managed DGD. | | Optional: \{\}<br/> |
| `deployment` _[DeploymentStatus](#deploymentstatus)_ | Deployment tracks the auto-created DGD when AutoApply is true.<br/>Contains name, namespace, state, and creation status of the managed DGD. | | Optional: \{\}<br/> |
@@ -355,6 +355,29 @@ For details about the profiling process, performance plots, and interpolation da
...
@@ -355,6 +355,29 @@ For details about the profiling process, performance plots, and interpolation da
## Advanced Topics
## Advanced Topics
### Mocker Deployment
Instead of a real DGD that uses GPU resources, you can deploy a mocker deployment that uses simulated engines rather than GPUs. Mocker is available in all backend images and uses profiling data to simulate realistic GPU timing behavior. It is useful for:
- Large-scale experiments without GPU resources
- Testing Planner behavior and infrastructure
- Validating deployment configurations
To deploy mocker instead of the real backend, set `useMocker: true`:
```yaml
spec:
model:<model-name>
backend:trtllm# Real backend for profiling (vllm, sglang, or trtllm)
useMocker:true# Deploy mocker instead of real backend
Profiling still runs against the real backend (via GPUs or AIC) to collect performance data. The mocker deployment then uses this data to simulate realistic timing behavior.
### DGDR Immutability
### DGDR Immutability
DGDRs are **immutable** - if you need to update SLAs or configuration:
DGDRs are **immutable** - if you need to update SLAs or configuration: