api-reference.md 125 KB
Newer Older
1
2
3
4
5
---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
---

6
> **⚠️ Important**: This documentation is automatically generated from source code.
7
> Do not edit this file directly.
8

9
10
# API Reference

11
12
## Packages
- [nvidia.com/v1alpha1](#nvidiacomv1alpha1)
13
- [nvidia.com/v1beta1](#nvidiacomv1beta1)
14
- [operator.config.dynamo.nvidia.com/v1alpha1](#operatorconfigdynamonvidiacomv1alpha1)
15
16
17
18
19
20
21
22
23
24
25
26


## nvidia.com/v1alpha1

Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group.

This package defines the DynamoGraphDeploymentRequest (DGDR) custom resource, which provides
a high-level, SLA-driven interface for deploying machine learning models on Dynamo.

Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group.

### Resource Types
27
- [DynamoCheckpoint](#dynamocheckpoint)
28
29
30
31
32
33
34
35
36
37
38
39
40
- [DynamoComponentDeployment](#dynamocomponentdeployment)
- [DynamoGraphDeployment](#dynamographdeployment)
- [DynamoGraphDeploymentRequest](#dynamographdeploymentrequest)
- [DynamoGraphDeploymentScalingAdapter](#dynamographdeploymentscalingadapter)
- [DynamoModel](#dynamomodel)



#### Autoscaling



Deprecated: This field is deprecated and ignored. Use DynamoGraphDeploymentScalingAdapter
41
with HPA, KEDA, or Planner for autoscaling instead. See docs/kubernetes/autoscaling.md
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
for migration guidance. This field will be removed in a future API version.



_Appears in:_
- [DynamoComponentDeploymentSharedSpec](#dynamocomponentdeploymentsharedspec)
- [DynamoComponentDeploymentSpec](#dynamocomponentdeploymentspec)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `enabled` _boolean_ | Deprecated: This field is ignored. |  |  |
| `minReplicas` _integer_ | Deprecated: This field is ignored. |  |  |
| `maxReplicas` _integer_ | Deprecated: This field is ignored. |  |  |
| `behavior` _[HorizontalPodAutoscalerBehavior](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#horizontalpodautoscalerbehavior-v2-autoscaling)_ | Deprecated: This field is ignored. |  |  |
| `metrics` _[MetricSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#metricspec-v2-autoscaling) array_ | Deprecated: This field is ignored. |  |  |




61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
#### CheckpointMode

_Underlying type:_ _string_

CheckpointMode defines how checkpoint creation is handled

_Validation:_
- Enum: [Auto Manual]

_Appears in:_
- [ServiceCheckpointConfig](#servicecheckpointconfig)

| Field | Description |
| --- | --- |
| `Auto` | CheckpointModeAuto means the DGD controller will automatically create a Checkpoint CR<br /> |
| `Manual` | CheckpointModeManual means the user must create the Checkpoint CR themselves<br /> |


79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
#### ComponentKind

_Underlying type:_ _string_

ComponentKind represents the type of underlying Kubernetes resource.

_Validation:_
- Enum: [PodClique PodCliqueScalingGroup Deployment LeaderWorkerSet]

_Appears in:_
- [ServiceReplicaStatus](#servicereplicastatus)

| Field | Description |
| --- | --- |
| `PodClique` | ComponentKindPodClique represents a PodClique resource.<br /> |
| `PodCliqueScalingGroup` | ComponentKindPodCliqueScalingGroup represents a PodCliqueScalingGroup resource.<br /> |
| `Deployment` | ComponentKindDeployment represents a Deployment resource.<br /> |
| `LeaderWorkerSet` | ComponentKindLeaderWorkerSet represents a LeaderWorkerSet resource.<br /> |


#### ConfigMapKeySelector



ConfigMapKeySelector selects a specific key from a ConfigMap.
Used to reference external configuration data stored in ConfigMaps.



_Appears in:_
- [ProfilingConfigSpec](#profilingconfigspec)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `name` _string_ | Name of the ConfigMap containing the desired data. |  | Required: \{\} <br /> |
| `key` _string_ | Key in the ConfigMap to select. If not specified, defaults to "disagg.yaml". | disagg.yaml |  |


117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
#### DGDRState

_Underlying type:_ _string_



_Validation:_
- Enum: [Initializing Pending Profiling Deploying Ready DeploymentDeleted Failed]

_Appears in:_
- [DynamoGraphDeploymentRequestStatus](#dynamographdeploymentrequeststatus)

| Field | Description |
| --- | --- |
| `Initializing` |  |
| `Pending` |  |
| `Profiling` |  |
| `Deploying` |  |
| `Ready` |  |
| `DeploymentDeleted` |  |
| `Failed` |  |


140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
#### DGDState

_Underlying type:_ _string_



_Validation:_
- Enum: [initializing pending successful failed]

_Appears in:_
- [DeploymentStatus](#deploymentstatus)
- [DynamoGraphDeploymentStatus](#dynamographdeploymentstatus)

| Field | Description |
| --- | --- |
| `initializing` |  |
| `pending` |  |
| `successful` |  |
| `failed` |  |


161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
#### DeploymentOverridesSpec



DeploymentOverridesSpec allows users to customize metadata for auto-created DynamoGraphDeployments.
When autoApply is enabled, these overrides are applied to the generated DGD resource.



_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `name` _string_ | Name is the desired name for the created DynamoGraphDeployment.<br />If not specified, defaults to the DGDR name. |  | Optional: \{\} <br /> |
| `namespace` _string_ | Namespace is the desired namespace for the created DynamoGraphDeployment.<br />If not specified, defaults to the DGDR namespace. |  | Optional: \{\} <br /> |
| `labels` _object (keys:string, values:string)_ | Labels are additional labels to add to the DynamoGraphDeployment metadata.<br />These are merged with auto-generated labels from the profiling process. |  | Optional: \{\} <br /> |
| `annotations` _object (keys:string, values:string)_ | Annotations are additional annotations to add to the DynamoGraphDeployment metadata. |  | Optional: \{\} <br /> |
179
| `workersImage` _string_ | WorkersImage specifies the container image to use for DynamoGraphDeployment worker components.<br />This image is used for both temporary DGDs created during online profiling and the final DGD.<br />If omitted, the image from the base config file (e.g., disagg.yaml) is used.<br />Example: "nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0" |  | Optional: \{\} <br /> |
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197


#### DeploymentStatus



DeploymentStatus tracks the state of an auto-created DynamoGraphDeployment.
This status is populated when autoApply is enabled and a DGD is created.



_Appears in:_
- [DynamoGraphDeploymentRequestStatus](#dynamographdeploymentrequeststatus)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `name` _string_ | Name is the name of the created DynamoGraphDeployment. |  |  |
| `namespace` _string_ | Namespace is the namespace of the created DynamoGraphDeployment. |  |  |
198
| `state` _[DGDState](#dgdstate)_ | State is the current state of the DynamoGraphDeployment.<br />This value is mirrored from the DGD's status.state field. | initializing | Enum: [initializing pending successful failed] <br /> |
199
200
201
202
203
| `created` _boolean_ | Created indicates whether the DGD has been successfully created.<br />Used to prevent recreation if the DGD is manually deleted by users. |  |  |




204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
#### DynamoCheckpoint



DynamoCheckpoint is the Schema for the dynamocheckpoints API
It represents a container checkpoint that can be used to restore pods to a warm state





| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `apiVersion` _string_ | `nvidia.com/v1alpha1` | | |
| `kind` _string_ | `DynamoCheckpoint` | | |
| `metadata` _[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#objectmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. |  |  |
| `spec` _[DynamoCheckpointSpec](#dynamocheckpointspec)_ |  |  |  |
| `status` _[DynamoCheckpointStatus](#dynamocheckpointstatus)_ |  |  |  |




#### DynamoCheckpointIdentity



DynamoCheckpointIdentity defines the inputs that determine checkpoint equivalence
Two checkpoints with the same identity hash are considered equivalent



_Appears in:_
- [DynamoCheckpointSpec](#dynamocheckpointspec)
- [ServiceCheckpointConfig](#servicecheckpointconfig)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `model` _string_ | Model is the model identifier (e.g., "meta-llama/Llama-3-70B") |  | Required: \{\} <br /> |
| `backendFramework` _string_ | BackendFramework is the runtime framework (vllm, sglang, trtllm) |  | Enum: [vllm sglang trtllm] <br />Required: \{\} <br /> |
| `dynamoVersion` _string_ | DynamoVersion is the Dynamo platform version (optional)<br />If not specified, version is not included in identity hash<br />This ensures checkpoint compatibility across Dynamo releases |  | Optional: \{\} <br /> |
| `tensorParallelSize` _integer_ | TensorParallelSize is the tensor parallel configuration | 1 | Minimum: 1 <br />Optional: \{\} <br /> |
| `pipelineParallelSize` _integer_ | PipelineParallelSize is the pipeline parallel configuration | 1 | Minimum: 1 <br />Optional: \{\} <br /> |
| `dtype` _string_ | Dtype is the data type (fp16, bf16, fp8, etc.) |  | Optional: \{\} <br /> |
| `maxModelLen` _integer_ | MaxModelLen is the maximum sequence length |  | Minimum: 1 <br />Optional: \{\} <br /> |
| `extraParameters` _object (keys:string, values:string)_ | ExtraParameters are additional parameters that affect the checkpoint hash<br />Use for any framework-specific or custom parameters not covered above |  | Optional: \{\} <br /> |


#### DynamoCheckpointJobConfig



DynamoCheckpointJobConfig defines the configuration for the checkpoint creation Job



_Appears in:_
- [DynamoCheckpointSpec](#dynamocheckpointspec)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `podTemplateSpec` _[PodTemplateSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#podtemplatespec-v1-core)_ | PodTemplateSpec allows customizing the checkpoint Job pod<br />This should include the container that runs the workload to be checkpointed |  | Required: \{\} <br /> |
265
266
267
| `sharedMemory` _[SharedMemorySpec](#sharedmemoryspec)_ | SharedMemory controls the tmpfs mounted at /dev/shm for the checkpoint Job pod.<br />When omitted, checkpoint Jobs use the same default 8Gi tmpfs as Dynamo components. |  | Optional: \{\} <br /> |
| `activeDeadlineSeconds` _integer_ | ActiveDeadlineSeconds specifies the maximum time the Job can run | 3600 | Minimum: 1 <br />Optional: \{\} <br /> |
| `backoffLimit` _integer_ | Deprecated: BackoffLimit is ignored. Checkpoint Jobs never retry. |  | Minimum: 0 <br />Optional: \{\} <br /> |
268
| `ttlSecondsAfterFinished` _integer_ | Deprecated: TTLSecondsAfterFinished is ignored. Checkpoint Jobs use a fixed<br />300 second TTL. |  | Minimum: 0 <br />Optional: \{\} <br /> |
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286


#### DynamoCheckpointPhase

_Underlying type:_ _string_

DynamoCheckpointPhase represents the current phase of the checkpoint lifecycle

_Validation:_
- Enum: [Pending Creating Ready Failed]

_Appears in:_
- [DynamoCheckpointStatus](#dynamocheckpointstatus)

| Field | Description |
| --- | --- |
| `Pending` | DynamoCheckpointPhasePending indicates the checkpoint CR has been created but the Job has not started<br /> |
| `Creating` | DynamoCheckpointPhaseCreating indicates the checkpoint Job is running<br /> |
287
| `Ready` | DynamoCheckpointPhaseReady indicates the checkpoint artifact is available<br /> |
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
| `Failed` | DynamoCheckpointPhaseFailed indicates the checkpoint creation failed<br /> |


#### DynamoCheckpointSpec



DynamoCheckpointSpec defines the desired state of DynamoCheckpoint



_Appears in:_
- [DynamoCheckpoint](#dynamocheckpoint)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `identity` _[DynamoCheckpointIdentity](#dynamocheckpointidentity)_ | Identity defines the inputs that determine checkpoint equivalence |  | Required: \{\} <br /> |
305
| `gpuMemoryService` _[GPUMemoryServiceSpec](#gpumemoryservicespec)_ | GPUMemoryService enables checkpoint-time GPU Memory Service wiring.<br />It is intentionally outside spec.identity, so it does not affect the<br />checkpoint identity hash or deduplication. |  | Optional: \{\} <br /> |
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
| `job` _[DynamoCheckpointJobConfig](#dynamocheckpointjobconfig)_ | Job defines the configuration for the checkpoint creation Job |  | Required: \{\} <br /> |


#### DynamoCheckpointStatus



DynamoCheckpointStatus defines the observed state of DynamoCheckpoint



_Appears in:_
- [DynamoCheckpoint](#dynamocheckpoint)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `phase` _[DynamoCheckpointPhase](#dynamocheckpointphase)_ | Phase represents the current phase of the checkpoint lifecycle |  | Enum: [Pending Creating Ready Failed] <br />Optional: \{\} <br /> |
| `identityHash` _string_ | IdentityHash is the computed hash of the checkpoint identity<br />This hash is used to identify equivalent checkpoints |  | Optional: \{\} <br /> |
324
325
| `location` _string_ | Deprecated: Location is ignored and no longer populated. It is retained<br />only so older objects continue to validate. |  | Optional: \{\} <br /> |
| `storageType` _[DynamoCheckpointStorageType](#dynamocheckpointstoragetype)_ | Deprecated: StorageType is ignored and no longer populated. It is retained<br />only so older objects continue to validate. |  | Enum: [pvc s3 oci] <br />Optional: \{\} <br /> |
326
| `jobName` _string_ | JobName is the name of the checkpoint creation Job |  | Optional: \{\} <br /> |
327
| `createdAt` _[Time](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#time-v1-meta)_ | CreatedAt is the timestamp when the checkpoint became ready |  | Optional: \{\} <br /> |
328
| `message` _string_ | Message provides additional information about the current state |  | Optional: \{\} <br /> |
329
| `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#condition-v1-meta) array_ | DEPRECATED: Conditions are deprecated. Use status.phase instead. |  | Optional: \{\} <br /> |
330
331
332
333
334
335


#### DynamoCheckpointStorageType

_Underlying type:_ _string_

336
337
338
Deprecated: StorageType is retained for compatibility with older
DynamoCheckpoint status consumers. The current checkpoint flow publishes
PVC-backed artifacts discovered from the snapshot-agent DaemonSet.
339
340
341
342
343
344
345
346
347

_Validation:_
- Enum: [pvc s3 oci]

_Appears in:_
- [DynamoCheckpointStatus](#dynamocheckpointstatus)



348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
#### DynamoComponentDeployment



DynamoComponentDeployment is the Schema for the dynamocomponentdeployments API





| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `apiVersion` _string_ | `nvidia.com/v1alpha1` | | |
| `kind` _string_ | `DynamoComponentDeployment` | | |
| `metadata` _[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#objectmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. |  |  |
| `spec` _[DynamoComponentDeploymentSpec](#dynamocomponentdeploymentspec)_ | Spec defines the desired state for this Dynamo component deployment. |  |  |


#### DynamoComponentDeploymentSharedSpec







_Appears in:_
- [DynamoComponentDeploymentSpec](#dynamocomponentdeploymentspec)
- [DynamoGraphDeploymentSpec](#dynamographdeploymentspec)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `annotations` _object (keys:string, values:string)_ | Annotations to add to generated Kubernetes resources for this component<br />(such as Pod, Service, and Ingress when applicable). |  |  |
| `labels` _object (keys:string, values:string)_ | Labels to add to generated Kubernetes resources for this component. |  |  |
| `serviceName` _string_ | The name of the component |  |  |
| `componentType` _string_ | ComponentType indicates the role of this component (for example, "main"). |  |  |
| `subComponentType` _string_ | SubComponentType indicates the sub-role of this component (for example, "prefill"). |  |  |
| `dynamoNamespace` _string_ | DynamoNamespace is deprecated and will be removed in a future version.<br />The DGD Kubernetes namespace and DynamoGraphDeployment name are used to construct the Dynamo namespace for each component |  | Optional: \{\} <br /> |
| `globalDynamoNamespace` _boolean_ | GlobalDynamoNamespace indicates that the Component will be placed in the global Dynamo namespace |  |  |
| `resources` _[Resources](#resources)_ | Resources requested and limits for this component, including CPU, memory,<br />GPUs/devices, and any runtime-specific resources. |  |  |
388
| `autoscaling` _[Autoscaling](#autoscaling)_ | Deprecated: This field is deprecated and ignored. Use DynamoGraphDeploymentScalingAdapter<br />with HPA, KEDA, or Planner for autoscaling instead. See docs/kubernetes/autoscaling.md<br />for migration guidance. This field will be removed in a future API version. |  |  |
389
390
391
392
| `envs` _[EnvVar](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#envvar-v1-core) array_ | Envs defines additional environment variables to inject into the component containers. |  |  |
| `envFromSecret` _string_ | EnvFromSecret references a Secret whose key/value pairs will be exposed as<br />environment variables in the component containers. |  |  |
| `volumeMounts` _[VolumeMount](#volumemount) array_ | VolumeMounts references PVCs defined at the top level for volumes to be mounted by the component. |  |  |
| `ingress` _[IngressSpec](#ingressspec)_ | Ingress config to expose the component outside the cluster (or through a service mesh). |  |  |
393
| `modelRef` _[ModelReference](#modelreference)_ | ModelRef references a model that this component serves<br />When specified, a headless service will be created for endpoint discovery |  | Optional: \{\} <br /> |
394
| `sharedMemory` _[SharedMemorySpec](#sharedmemoryspec)_ | SharedMemory controls the tmpfs mounted at /dev/shm (enable/disable and size). |  |  |
395
396
| `extraPodMetadata` _[ExtraPodMetadata](#extrapodmetadata)_ | ExtraPodMetadata adds labels/annotations to the created Pods. |  | Optional: \{\} <br /> |
| `extraPodSpec` _[ExtraPodSpec](#extrapodspec)_ | ExtraPodSpec allows to override the main pod spec configuration.<br />It is a k8s standard PodSpec. It also contains a MainContainer (standard k8s Container) field<br />that allows overriding the main container configuration. |  | Optional: \{\} <br /> |
397
398
399
400
| `livenessProbe` _[Probe](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#probe-v1-core)_ | LivenessProbe to detect and restart unhealthy containers. |  |  |
| `readinessProbe` _[Probe](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#probe-v1-core)_ | ReadinessProbe to signal when the container is ready to receive traffic. |  |  |
| `replicas` _integer_ | Replicas is the desired number of Pods for this component.<br />When scalingAdapter is enabled, this field is managed by the<br />DynamoGraphDeploymentScalingAdapter and should not be modified directly. |  | Minimum: 0 <br /> |
| `multinode` _[MultinodeSpec](#multinodespec)_ | Multinode is the configuration for multinode components. |  |  |
401
402
| `scalingAdapter` _[ScalingAdapter](#scalingadapter)_ | ScalingAdapter configures whether this service uses the DynamoGraphDeploymentScalingAdapter.<br />When enabled, replicas are managed via DGDSA and external autoscalers can scale<br />the service using the Scale subresource. When disabled, replicas can be modified directly. |  | Optional: \{\} <br /> |
| `eppConfig` _[EPPConfig](#eppconfig)_ | EPPConfig defines EPP-specific configuration options for Endpoint Picker Plugin components.<br />Only applicable when ComponentType is "epp". |  | Optional: \{\} <br /> |
403
| `frontendSidecar` _[FrontendSidecarSpec](#frontendsidecarspec)_ | FrontendSidecar configures an auto-generated frontend sidecar container.<br />When specified, the operator injects a fully configured frontend container<br />with all standard Dynamo environment variables, health probes, and ports.<br />This eliminates the need to manually specify these in extraPodSpec.containers. (GAIE) |  | Optional: \{\} <br /> |
404
| `checkpoint` _[ServiceCheckpointConfig](#servicecheckpointconfig)_ | Checkpoint configures container checkpointing for this service.<br />When enabled, pods can be restored from a checkpoint files for faster cold start. |  | Optional: \{\} <br /> |
405
| `topologyConstraint` _[TopologyConstraint](#topologyconstraint)_ | TopologyConstraint for this service. packDomain is required.<br />When both this and spec.topologyConstraint.packDomain are set, packDomain<br />must be narrower than or equal to the spec-level packDomain. |  | Optional: \{\} <br /> |
406
| `gpuMemoryService` _[GPUMemoryServiceSpec](#gpumemoryservicespec)_ | GPUMemoryService configures the GPU Memory Service (GMS) sidecar.<br />When enabled, a GMS sidecar is injected and GPU access is managed via DRA. |  | Optional: \{\} <br /> |
407
| `failover` _[FailoverSpec](#failoverspec)_ | Failover configures GMS (GPU Memory Service) failover for this service.<br />For intraPod mode: the main container is cloned into two engine containers (active + standby).<br />For interPod mode: the operator creates a dedicated GMS weight server pod and<br />multiple engine pods per rank that share GPUs via DRA resource claims. |  | Optional: \{\} <br /> |
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431


#### DynamoComponentDeploymentSpec



DynamoComponentDeploymentSpec defines the desired state of DynamoComponentDeployment



_Appears in:_
- [DynamoComponentDeployment](#dynamocomponentdeployment)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `backendFramework` _string_ | BackendFramework specifies the backend framework (e.g., "sglang", "vllm", "trtllm") |  | Enum: [sglang vllm trtllm] <br /> |
| `annotations` _object (keys:string, values:string)_ | Annotations to add to generated Kubernetes resources for this component<br />(such as Pod, Service, and Ingress when applicable). |  |  |
| `labels` _object (keys:string, values:string)_ | Labels to add to generated Kubernetes resources for this component. |  |  |
| `serviceName` _string_ | The name of the component |  |  |
| `componentType` _string_ | ComponentType indicates the role of this component (for example, "main"). |  |  |
| `subComponentType` _string_ | SubComponentType indicates the sub-role of this component (for example, "prefill"). |  |  |
| `dynamoNamespace` _string_ | DynamoNamespace is deprecated and will be removed in a future version.<br />The DGD Kubernetes namespace and DynamoGraphDeployment name are used to construct the Dynamo namespace for each component |  | Optional: \{\} <br /> |
| `globalDynamoNamespace` _boolean_ | GlobalDynamoNamespace indicates that the Component will be placed in the global Dynamo namespace |  |  |
| `resources` _[Resources](#resources)_ | Resources requested and limits for this component, including CPU, memory,<br />GPUs/devices, and any runtime-specific resources. |  |  |
432
| `autoscaling` _[Autoscaling](#autoscaling)_ | Deprecated: This field is deprecated and ignored. Use DynamoGraphDeploymentScalingAdapter<br />with HPA, KEDA, or Planner for autoscaling instead. See docs/kubernetes/autoscaling.md<br />for migration guidance. This field will be removed in a future API version. |  |  |
433
434
435
436
| `envs` _[EnvVar](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#envvar-v1-core) array_ | Envs defines additional environment variables to inject into the component containers. |  |  |
| `envFromSecret` _string_ | EnvFromSecret references a Secret whose key/value pairs will be exposed as<br />environment variables in the component containers. |  |  |
| `volumeMounts` _[VolumeMount](#volumemount) array_ | VolumeMounts references PVCs defined at the top level for volumes to be mounted by the component. |  |  |
| `ingress` _[IngressSpec](#ingressspec)_ | Ingress config to expose the component outside the cluster (or through a service mesh). |  |  |
437
| `modelRef` _[ModelReference](#modelreference)_ | ModelRef references a model that this component serves<br />When specified, a headless service will be created for endpoint discovery |  | Optional: \{\} <br /> |
438
| `sharedMemory` _[SharedMemorySpec](#sharedmemoryspec)_ | SharedMemory controls the tmpfs mounted at /dev/shm (enable/disable and size). |  |  |
439
440
| `extraPodMetadata` _[ExtraPodMetadata](#extrapodmetadata)_ | ExtraPodMetadata adds labels/annotations to the created Pods. |  | Optional: \{\} <br /> |
| `extraPodSpec` _[ExtraPodSpec](#extrapodspec)_ | ExtraPodSpec allows to override the main pod spec configuration.<br />It is a k8s standard PodSpec. It also contains a MainContainer (standard k8s Container) field<br />that allows overriding the main container configuration. |  | Optional: \{\} <br /> |
441
442
443
444
| `livenessProbe` _[Probe](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#probe-v1-core)_ | LivenessProbe to detect and restart unhealthy containers. |  |  |
| `readinessProbe` _[Probe](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#probe-v1-core)_ | ReadinessProbe to signal when the container is ready to receive traffic. |  |  |
| `replicas` _integer_ | Replicas is the desired number of Pods for this component.<br />When scalingAdapter is enabled, this field is managed by the<br />DynamoGraphDeploymentScalingAdapter and should not be modified directly. |  | Minimum: 0 <br /> |
| `multinode` _[MultinodeSpec](#multinodespec)_ | Multinode is the configuration for multinode components. |  |  |
445
446
| `scalingAdapter` _[ScalingAdapter](#scalingadapter)_ | ScalingAdapter configures whether this service uses the DynamoGraphDeploymentScalingAdapter.<br />When enabled, replicas are managed via DGDSA and external autoscalers can scale<br />the service using the Scale subresource. When disabled, replicas can be modified directly. |  | Optional: \{\} <br /> |
| `eppConfig` _[EPPConfig](#eppconfig)_ | EPPConfig defines EPP-specific configuration options for Endpoint Picker Plugin components.<br />Only applicable when ComponentType is "epp". |  | Optional: \{\} <br /> |
447
| `frontendSidecar` _[FrontendSidecarSpec](#frontendsidecarspec)_ | FrontendSidecar configures an auto-generated frontend sidecar container.<br />When specified, the operator injects a fully configured frontend container<br />with all standard Dynamo environment variables, health probes, and ports.<br />This eliminates the need to manually specify these in extraPodSpec.containers. (GAIE) |  | Optional: \{\} <br /> |
448
| `checkpoint` _[ServiceCheckpointConfig](#servicecheckpointconfig)_ | Checkpoint configures container checkpointing for this service.<br />When enabled, pods can be restored from a checkpoint files for faster cold start. |  | Optional: \{\} <br /> |
449
| `topologyConstraint` _[TopologyConstraint](#topologyconstraint)_ | TopologyConstraint for this service. packDomain is required.<br />When both this and spec.topologyConstraint.packDomain are set, packDomain<br />must be narrower than or equal to the spec-level packDomain. |  | Optional: \{\} <br /> |
450
| `gpuMemoryService` _[GPUMemoryServiceSpec](#gpumemoryservicespec)_ | GPUMemoryService configures the GPU Memory Service (GMS) sidecar.<br />When enabled, a GMS sidecar is injected and GPU access is managed via DRA. |  | Optional: \{\} <br /> |
451
| `failover` _[FailoverSpec](#failoverspec)_ | Failover configures GMS (GPU Memory Service) failover for this service.<br />For intraPod mode: the main container is cloned into two engine containers (active + standby).<br />For interPod mode: the operator creates a dedicated GMS weight server pod and<br />multiple engine pods per rank that share GPUs via DRA resource claims. |  | Optional: \{\} <br /> |
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481


#### DynamoGraphDeployment



DynamoGraphDeployment is the Schema for the dynamographdeployments API.





| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `apiVersion` _string_ | `nvidia.com/v1alpha1` | | |
| `kind` _string_ | `DynamoGraphDeployment` | | |
| `metadata` _[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#objectmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. |  |  |
| `spec` _[DynamoGraphDeploymentSpec](#dynamographdeploymentspec)_ | Spec defines the desired state for this graph deployment. |  |  |
| `status` _[DynamoGraphDeploymentStatus](#dynamographdeploymentstatus)_ | Status reflects the current observed state of this graph deployment. |  |  |


#### DynamoGraphDeploymentRequest



DynamoGraphDeploymentRequest is the Schema for the dynamographdeploymentrequests API.
It serves as the primary interface for users to request model deployments with
specific performance and resource constraints, enabling SLA-driven deployments.

Lifecycle:
482
 1. Initializing → Pending: Validates spec and prepares for profiling
483
484
485
486
487
488
489
490
491
 2. Pending → Profiling: Creates and runs profiling job (online or AIC)
 3. Profiling → Ready/Deploying: Generates DGD spec after profiling completes
 4. Deploying → Ready: When autoApply=true, monitors DGD until Ready
 5. Ready: Terminal state when DGD is operational or spec is available
 6. DeploymentDeleted: Terminal state when auto-created DGD is manually deleted

The spec becomes immutable once profiling starts. Users must delete and recreate
the DGDR to modify configuration after this point.

492
493
494
495
DEPRECATION NOTICE: v1alpha1 DynamoGraphDeploymentRequest is deprecated.
Please migrate to nvidia.com/v1beta1 DynamoGraphDeploymentRequest.
v1alpha1 will be removed in a future release.

496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524




| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `apiVersion` _string_ | `nvidia.com/v1alpha1` | | |
| `kind` _string_ | `DynamoGraphDeploymentRequest` | | |
| `metadata` _[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#objectmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. |  |  |
| `spec` _[DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)_ | Spec defines the desired state for this deployment request. |  |  |
| `status` _[DynamoGraphDeploymentRequestStatus](#dynamographdeploymentrequeststatus)_ | Status reflects the current observed state of this deployment request. |  |  |


#### DynamoGraphDeploymentRequestSpec



DynamoGraphDeploymentRequestSpec defines the desired state of a DynamoGraphDeploymentRequest.
This CRD serves as the primary interface for users to request model deployments with
specific performance constraints and resource requirements, enabling SLA-driven deployments.



_Appears in:_
- [DynamoGraphDeploymentRequest](#dynamographdeploymentrequest)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `model` _string_ | Model specifies the model to deploy (e.g., "Qwen/Qwen3-0.6B", "meta-llama/Llama-3-70b").<br />This is a high-level identifier for easy reference in kubectl output and logs.<br />The controller automatically sets this value in profilingConfig.config.deployment.model. |  | Required: \{\} <br /> |
525
| `backend` _string_ | Backend specifies the inference backend for profiling.<br />The controller automatically sets this value in profilingConfig.config.engine.backend.<br />Profiling runs on real GPUs or via AIC simulation to collect performance data. |  | Enum: [auto vllm sglang trtllm] <br />Required: \{\} <br /> |
526
| `useMocker` _boolean_ | UseMocker indicates whether to deploy a mocker DynamoGraphDeployment instead of<br />a real backend deployment. When true, the deployment uses simulated engines that<br />don't require GPUs, using the profiling data to simulate realistic timing behavior.<br />Mocker is available in all backend images and useful for large-scale experiments.<br />Profiling still runs against the real backend (specified above) to collect performance data. | false |  |
527
528
| `profilingConfig` _[ProfilingConfigSpec](#profilingconfigspec)_ | ProfilingConfig provides the complete configuration for the profiling job.<br />Note: GPU discovery is automatically attempted to detect GPU resources from Kubernetes<br />cluster nodes. If the operator has node read permissions (cluster-wide or explicitly granted),<br />discovered GPU configuration is used as defaults when hardware configuration is not manually<br />specified (minNumGpusPerEngine, maxNumGpusPerEngine, numGpusPerNode). User-specified values<br />always take precedence over auto-discovered values. If GPU discovery fails (e.g.,<br />namespace-restricted operator without node permissions), manual hardware config is required.<br />This configuration is passed directly to the profiler.<br />The structure matches the profile_sla config format exactly (see ProfilingConfigSpec for schema).<br />Note: deployment.model and engine.backend are automatically set from the high-level<br />modelName and backend fields and should not be specified in this config. |  | Required: \{\} <br /> |
| `enableGpuDiscovery` _boolean_ | EnableGPUDiscovery controls whether the operator attempts to discover GPU hardware from cluster nodes.<br />DEPRECATED: This field is deprecated and will be removed in v1beta1. GPU discovery is now always<br />attempted automatically. Setting this field has no effect - the operator will always try to discover<br />GPU hardware when node read permissions are available. If discovery is unavailable (e.g., namespace-scoped<br />operator without permissions), manual hardware configuration is required regardless of this setting. | true | Optional: \{\} <br /> |
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
| `autoApply` _boolean_ | AutoApply indicates whether to automatically create a DynamoGraphDeployment<br />after profiling completes. If false, only the spec is generated and stored in status.<br />Users can then manually create a DGD using the generated spec. | false |  |
| `deploymentOverrides` _[DeploymentOverridesSpec](#deploymentoverridesspec)_ | DeploymentOverrides allows customizing metadata for the auto-created DGD.<br />Only applicable when AutoApply is true. |  | Optional: \{\} <br /> |


#### DynamoGraphDeploymentRequestStatus



DynamoGraphDeploymentRequestStatus represents the observed state of a DynamoGraphDeploymentRequest.
The controller updates this status as the DGDR progresses through its lifecycle.



_Appears in:_
- [DynamoGraphDeploymentRequest](#dynamographdeploymentrequest)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
547
| `state` _[DGDRState](#dgdrstate)_ | State is a high-level textual status of the deployment request lifecycle. | Initializing | Enum: [Initializing Pending Profiling Deploying Ready DeploymentDeleted Failed] <br /> |
548
549
550
| `backend` _string_ | Backend is extracted from profilingConfig.config.engine.backend for display purposes.<br />This field is populated by the controller and shown in kubectl output. |  | Optional: \{\} <br /> |
| `observedGeneration` _integer_ | ObservedGeneration reflects the generation of the most recently observed spec.<br />Used to detect spec changes and enforce immutability after profiling starts. |  |  |
| `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#condition-v1-meta) array_ | Conditions contains the latest observed conditions of the deployment request.<br />Standard condition types include: Validation, Profiling, SpecGenerated, DeploymentReady.<br />Conditions are merged by type on patch updates. |  |  |
551
| `profilingResults` _string_ | ProfilingResults contains a reference to the ConfigMap holding profiling data.<br />Format: "configmap/\<name\>" |  | Optional: \{\} <br /> |
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
| `generatedDeployment` _[RawExtension](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg)_ | GeneratedDeployment contains the full generated DynamoGraphDeployment specification<br />including metadata, based on profiling results. Users can extract this to create<br />a DGD manually, or it's used automatically when autoApply is true.<br />Stored as RawExtension to preserve all fields including metadata.<br />For mocker backends, this contains the mocker DGD spec. |  | EmbeddedResource: \{\} <br />Optional: \{\} <br /> |
| `deployment` _[DeploymentStatus](#deploymentstatus)_ | Deployment tracks the auto-created DGD when AutoApply is true.<br />Contains name, namespace, state, and creation status of the managed DGD. |  | Optional: \{\} <br /> |


#### DynamoGraphDeploymentScalingAdapter



DynamoGraphDeploymentScalingAdapter provides a scaling interface for individual services
within a DynamoGraphDeployment. It implements the Kubernetes scale
subresource, enabling integration with HPA, KEDA, and custom autoscalers.

The adapter acts as an intermediary between autoscalers and the DGD,
ensuring that only the adapter controller modifies the DGD's service replicas.
This prevents conflicts when multiple autoscaling mechanisms are in play.





| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `apiVersion` _string_ | `nvidia.com/v1alpha1` | | |
| `kind` _string_ | `DynamoGraphDeploymentScalingAdapter` | | |
| `metadata` _[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#objectmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. |  |  |
| `spec` _[DynamoGraphDeploymentScalingAdapterSpec](#dynamographdeploymentscalingadapterspec)_ |  |  |  |
| `status` _[DynamoGraphDeploymentScalingAdapterStatus](#dynamographdeploymentscalingadapterstatus)_ |  |  |  |


#### DynamoGraphDeploymentScalingAdapterSpec



DynamoGraphDeploymentScalingAdapterSpec defines the desired state of DynamoGraphDeploymentScalingAdapter



_Appears in:_
- [DynamoGraphDeploymentScalingAdapter](#dynamographdeploymentscalingadapter)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `replicas` _integer_ | Replicas is the desired number of replicas for the target service.<br />This field is modified by external autoscalers (HPA/KEDA/Planner) or manually by users. |  | Minimum: 0 <br />Required: \{\} <br /> |
| `dgdRef` _[DynamoGraphDeploymentServiceRef](#dynamographdeploymentserviceref)_ | DGDRef references the DynamoGraphDeployment and the specific service to scale. |  | Required: \{\} <br /> |


#### DynamoGraphDeploymentScalingAdapterStatus



DynamoGraphDeploymentScalingAdapterStatus defines the observed state of DynamoGraphDeploymentScalingAdapter



_Appears in:_
- [DynamoGraphDeploymentScalingAdapter](#dynamographdeploymentscalingadapter)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
611
612
613
| `replicas` _integer_ | Replicas is the current number of replicas for the target service.<br />This is synced from the DGD's service replicas and is required for the scale subresource. |  | Optional: \{\} <br /> |
| `selector` _string_ | Selector is a label selector string for the pods managed by this adapter.<br />Required for HPA compatibility via the scale subresource. |  | Optional: \{\} <br /> |
| `lastScaleTime` _[Time](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#time-v1-meta)_ | LastScaleTime is the last time the adapter scaled the target service. |  | Optional: \{\} <br /> |
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645


#### DynamoGraphDeploymentServiceRef



DynamoGraphDeploymentServiceRef identifies a specific service within a DynamoGraphDeployment



_Appears in:_
- [DynamoGraphDeploymentScalingAdapterSpec](#dynamographdeploymentscalingadapterspec)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `name` _string_ | Name of the DynamoGraphDeployment |  | MinLength: 1 <br />Required: \{\} <br /> |
| `serviceName` _string_ | ServiceName is the key name of the service within the DGD's spec.services map to scale |  | MinLength: 1 <br />Required: \{\} <br /> |


#### DynamoGraphDeploymentSpec



DynamoGraphDeploymentSpec defines the desired state of DynamoGraphDeployment.



_Appears in:_
- [DynamoGraphDeployment](#dynamographdeployment)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
646
647
| `annotations` _object (keys:string, values:string)_ | Annotations to propagate to all child resources (PCS, DCD, Deployments, and pod templates).<br />Service-level annotations take precedence over these values. |  | Optional: \{\} <br /> |
| `labels` _object (keys:string, values:string)_ | Labels to propagate to all child resources (PCS, DCD, Deployments, and pod templates).<br />Service-level labels take precedence over these values. |  | Optional: \{\} <br /> |
648
649
650
651
| `pvcs` _[PVC](#pvc) array_ | PVCs defines a list of persistent volume claims that can be referenced by components.<br />Each PVC must have a unique name that can be referenced in component specifications. |  | MaxItems: 100 <br />Optional: \{\} <br /> |
| `services` _object (keys:string, values:[DynamoComponentDeploymentSharedSpec](#dynamocomponentdeploymentsharedspec))_ | Services are the services to deploy as part of this deployment. |  | MaxProperties: 25 <br />Optional: \{\} <br /> |
| `envs` _[EnvVar](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#envvar-v1-core) array_ | Envs are environment variables applied to all services in the deployment unless<br />overridden by service-specific configuration. |  | Optional: \{\} <br /> |
| `backendFramework` _string_ | BackendFramework specifies the backend framework (e.g., "sglang", "vllm", "trtllm"). |  | Enum: [sglang vllm trtllm] <br /> |
652
| `restart` _[Restart](#restart)_ | Restart specifies the restart policy for the graph deployment. |  | Optional: \{\} <br /> |
653
| `topologyConstraint` _[SpecTopologyConstraint](#spectopologyconstraint)_ | TopologyConstraint is the deployment-level topology constraint.<br />When set, topologyProfile is required and names the ClusterTopology CR to use.<br />packDomain is optional here — it can be omitted when only services carry constraints.<br />Services without their own topologyConstraint inherit from this value. |  | Optional: \{\} <br /> |
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668


#### DynamoGraphDeploymentStatus



DynamoGraphDeploymentStatus defines the observed state of DynamoGraphDeployment.



_Appears in:_
- [DynamoGraphDeployment](#dynamographdeployment)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
669
| `observedGeneration` _integer_ | ObservedGeneration is the most recent generation observed by the controller. |  | Optional: \{\} <br /> |
670
| `state` _[DGDState](#dgdstate)_ | State is a high-level textual status of the graph deployment lifecycle. | initializing | Enum: [initializing pending successful failed] <br /> |
671
| `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#condition-v1-meta) array_ | Conditions contains the latest observed conditions of the graph deployment.<br />The slice is merged by type on patch updates. |  |  |
672
673
674
| `services` _object (keys:string, values:[ServiceReplicaStatus](#servicereplicastatus))_ | Services contains per-service replica status information.<br />The map key is the service name from spec.services. |  | Optional: \{\} <br /> |
| `restart` _[RestartStatus](#restartstatus)_ | Restart contains the status of the restart of the graph deployment. |  | Optional: \{\} <br /> |
| `checkpoints` _object (keys:string, values:[ServiceCheckpointStatus](#servicecheckpointstatus))_ | Checkpoints contains per-service checkpoint status information.<br />The map key is the service name from spec.services. |  | Optional: \{\} <br /> |
675
| `rollingUpdate` _[RollingUpdateStatus](#rollingupdatestatus)_ | RollingUpdate tracks the progress of operator manged rolling updates.<br />Currently only supported for singl-node, non-Grove deployments (DCD/Deployment). |  | Optional: \{\} <br /> |
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711


#### DynamoModel



DynamoModel is the Schema for the dynamo models API





| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `apiVersion` _string_ | `nvidia.com/v1alpha1` | | |
| `kind` _string_ | `DynamoModel` | | |
| `metadata` _[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#objectmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. |  |  |
| `spec` _[DynamoModelSpec](#dynamomodelspec)_ |  |  |  |
| `status` _[DynamoModelStatus](#dynamomodelstatus)_ |  |  |  |


#### DynamoModelSpec



DynamoModelSpec defines the desired state of DynamoModel



_Appears in:_
- [DynamoModel](#dynamomodel)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `modelName` _string_ | ModelName is the full model identifier (e.g., "meta-llama/Llama-3.3-70B-Instruct-lora") |  | Required: \{\} <br /> |
| `baseModelName` _string_ | BaseModelName is the base model identifier that matches the service label<br />This is used to discover endpoints via headless services |  | Required: \{\} <br /> |
712
713
| `modelType` _string_ | ModelType specifies the type of model (e.g., "base", "lora", "adapter") | base | Enum: [base lora adapter] <br />Optional: \{\} <br /> |
| `source` _[ModelSource](#modelsource)_ | Source specifies the model source location (only applicable for lora model type) |  | Optional: \{\} <br /> |
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728


#### DynamoModelStatus



DynamoModelStatus defines the observed state of DynamoModel



_Appears in:_
- [DynamoModel](#dynamomodel)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
729
| `endpoints` _[EndpointInfo](#endpointinfo) array_ | Endpoints is the current list of all endpoints for this model |  | Optional: \{\} <br /> |
730
731
| `readyEndpoints` _integer_ | ReadyEndpoints is the count of endpoints that are ready |  |  |
| `totalEndpoints` _integer_ | TotalEndpoints is the total count of endpoints |  |  |
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
| `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#condition-v1-meta) array_ | Conditions represents the latest available observations of the model's state |  | Optional: \{\} <br /> |


#### EPPConfig



EPPConfig contains configuration for EPP (Endpoint Picker Plugin) components.
EPP is responsible for intelligent endpoint selection and KV-aware routing.



_Appears in:_
- [DynamoComponentDeploymentSharedSpec](#dynamocomponentdeploymentsharedspec)
- [DynamoComponentDeploymentSpec](#dynamocomponentdeploymentspec)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `configMapRef` _[ConfigMapKeySelector](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#configmapkeyselector-v1-core)_ | ConfigMapRef references a user-provided ConfigMap containing EPP configuration.<br />The ConfigMap should contain EndpointPickerConfig YAML.<br />Mutually exclusive with Config. |  | Optional: \{\} <br /> |
| `config` _[EndpointPickerConfig](#endpointpickerconfig)_ | Config allows specifying EPP EndpointPickerConfig directly as a structured object.<br />The operator will marshal this to YAML and create a ConfigMap automatically.<br />Mutually exclusive with ConfigMapRef.<br />One of ConfigMapRef or Config must be specified (no default configuration).<br />Uses the upstream type from github.com/kubernetes-sigs/gateway-api-inference-extension |  | Type: object <br />Optional: \{\} <br /> |
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767


#### EndpointInfo



EndpointInfo represents a single endpoint (pod) serving the model



_Appears in:_
- [DynamoModelStatus](#dynamomodelstatus)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `address` _string_ | Address is the full address of the endpoint (e.g., "http://10.0.1.5:9090") |  |  |
768
| `podName` _string_ | PodName is the name of the pod serving this endpoint |  | Optional: \{\} <br /> |
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
| `ready` _boolean_ | Ready indicates whether the endpoint is ready to serve traffic<br />For LoRA models: true if the POST /loras request succeeded with a 2xx status code<br />For base models: always false (no probing performed) |  |  |


#### ExtraPodMetadata







_Appears in:_
- [DynamoComponentDeploymentSharedSpec](#dynamocomponentdeploymentsharedspec)
- [DynamoComponentDeploymentSpec](#dynamocomponentdeploymentspec)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `annotations` _object (keys:string, values:string)_ |  |  |  |
| `labels` _object (keys:string, values:string)_ |  |  |  |


#### ExtraPodSpec







_Appears in:_
- [DynamoComponentDeploymentSharedSpec](#dynamocomponentdeploymentsharedspec)
- [DynamoComponentDeploymentSpec](#dynamocomponentdeploymentspec)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `mainContainer` _[Container](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#container-v1-core)_ |  |  |  |


807
808
809
810
811
#### FailoverSpec



FailoverSpec configures active-passive failover for a worker component.
812
813
814
815
For intraPod mode: requires gpuMemoryService.enabled; the main container is cloned
into engine containers (active + standby) within the same pod.
For interPod mode: the operator creates a dedicated GMS weight server pod and
multiple engine pods per rank that share GPUs via DRA resource claims.
816
817
818
819
820
821
822
823
824



_Appears in:_
- [DynamoComponentDeploymentSharedSpec](#dynamocomponentdeploymentsharedspec)
- [DynamoComponentDeploymentSpec](#dynamocomponentdeploymentspec)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
825
826
827
| `enabled` _boolean_ | Enabled activates failover mode. |  |  |
| `mode` _[GPUMemoryServiceMode](#gpumemoryservicemode)_ | Mode selects the failover deployment topology.<br />intraPod: engine containers run within the same pod (requires gpuMemoryService.enabled).<br />interPod: a dedicated GMS weight server pod + engine pods per rank (requires Grove). | intraPod | Enum: [intraPod interPod] <br />Optional: \{\} <br /> |
| `numShadows` _integer_ | NumShadows is the number of shadow (standby) engine pods per rank.<br />Total engine pods per rank = NumShadows + 1 (1 primary + NumShadows shadows).<br />NumShadows is only meaningful for mode=interPod; intraPod uses a fixed<br />1 primary + 1 shadow sidecar layout and any value other than 1 is<br />rejected at admission time. | 1 | Minimum: 1 <br />Optional: \{\} <br /> |
828
829


830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
#### FrontendSidecarSpec



FrontendSidecarSpec configures the auto-generated frontend sidecar container.
The operator uses these fields together with built-in frontend defaults (command, probes, ports,
and Dynamo env vars) to produce a fully configured sidecar container.



_Appears in:_
- [DynamoComponentDeploymentSharedSpec](#dynamocomponentdeploymentsharedspec)
- [DynamoComponentDeploymentSpec](#dynamocomponentdeploymentspec)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `image` _string_ | Image is the container image for the frontend sidecar. |  | Required: \{\} <br /> |
| `args` _string array_ | Args overrides the default frontend arguments. When specified, these replace<br />the default ["-m", "dynamo.frontend"] entirely.<br />For example, ["-m", "dynamo.frontend", "--router-mode", "direct"] for GAIE deployments. |  | Optional: \{\} <br /> |
| `envFromSecret` _string_ | EnvFromSecret references a Secret whose key/value pairs will be exposed as<br />environment variables in the frontend sidecar container. |  | Optional: \{\} <br /> |
| `envs` _[EnvVar](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#envvar-v1-core) array_ | Envs defines additional environment variables for the frontend sidecar.<br />These are merged with (and can override) the auto-generated Dynamo env vars. |  | Optional: \{\} <br /> |


852
853
854
855
856
857
858
859
860
#### GPUMemoryServiceMode

_Underlying type:_ _string_

GPUMemoryServiceMode selects the GMS deployment topology.



_Appears in:_
861
- [FailoverSpec](#failoverspec)
862
863
864
865
866
- [GPUMemoryServiceSpec](#gpumemoryservicespec)

| Field | Description |
| --- | --- |
| `intraPod` | GMSModeIntraPod runs GMS as a sidecar within the same pod.<br /> |
867
| `interPod` | GMSModeInterPod runs GMS as a separate weight server pod and one or more<br />engine pods per rank, sharing GPUs via DRA ResourceClaims and a shared<br />hostPath volume for UDS sockets. Only valid on FailoverSpec; the<br />GPUMemoryServiceSpec sidecar always runs in intraPod mode.<br /> |
868
869
870
871
872
873
874
875
876
877
878
879
880
881


#### GPUMemoryServiceSpec



GPUMemoryServiceSpec configures the GPU Memory Service (GMS) sidecar for a worker component.
When enabled, the operator injects a GMS sidecar that provides shared GPU memory access
via DRA (Dynamic Resource Allocation). The sidecar runs two GMS processes per GPU
(weights + kv_cache) and communicates with the main container over UDS sockets.



_Appears in:_
882
- [DynamoCheckpointSpec](#dynamocheckpointspec)
883
884
885
886
887
888
889
890
891
892
- [DynamoComponentDeploymentSharedSpec](#dynamocomponentdeploymentsharedspec)
- [DynamoComponentDeploymentSpec](#dynamocomponentdeploymentspec)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `enabled` _boolean_ | Enabled activates the GMS sidecar. GPU resources on the main container<br />are replaced with a DRA ResourceClaim for shared GPU access. |  |  |
| `mode` _[GPUMemoryServiceMode](#gpumemoryservicemode)_ | Mode selects the GMS deployment topology. | intraPod | Enum: [intraPod interPod] <br />Optional: \{\} <br /> |
| `deviceClassName` _string_ | DeviceClassName is the DRA DeviceClass to request GPUs from. | gpu.nvidia.com | Optional: \{\} <br /> |


893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
#### IngressSpec







_Appears in:_
- [DynamoComponentDeploymentSharedSpec](#dynamocomponentdeploymentsharedspec)
- [DynamoComponentDeploymentSpec](#dynamocomponentdeploymentspec)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `enabled` _boolean_ | Enabled exposes the component through an ingress or virtual service when true. |  |  |
| `host` _string_ | Host is the base host name to route external traffic to this component. |  |  |
| `useVirtualService` _boolean_ | UseVirtualService indicates whether to configure a service-mesh VirtualService instead of a standard Ingress. |  |  |
| `virtualServiceGateway` _string_ | VirtualServiceGateway optionally specifies the gateway name to attach the VirtualService to. |  |  |
| `hostPrefix` _string_ | HostPrefix is an optional prefix added before the host. |  |  |
| `annotations` _object (keys:string, values:string)_ | Annotations to set on the generated Ingress/VirtualService resources. |  |  |
| `labels` _object (keys:string, values:string)_ | Labels to set on the generated Ingress/VirtualService resources. |  |  |
| `tls` _[IngressTLSSpec](#ingresstlsspec)_ | TLS holds the TLS configuration used by the Ingress/VirtualService. |  |  |
| `hostSuffix` _string_ | HostSuffix is an optional suffix appended after the host. |  |  |
| `ingressControllerClassName` _string_ | IngressControllerClassName selects the ingress controller class (e.g., "nginx"). |  |  |


#### IngressTLSSpec







_Appears in:_
- [IngressSpec](#ingressspec)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `secretName` _string_ | SecretName is the name of a Kubernetes Secret containing the TLS certificate and key. |  |  |




#### ModelReference



ModelReference identifies a model served by this component



_Appears in:_
- [DynamoComponentDeploymentSharedSpec](#dynamocomponentdeploymentsharedspec)
- [DynamoComponentDeploymentSpec](#dynamocomponentdeploymentspec)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `name` _string_ | Name is the base model identifier (e.g., "llama-3-70b-instruct-v1") |  | Required: \{\} <br /> |
952
| `revision` _string_ | Revision is the model revision/version (optional) |  | Optional: \{\} <br /> |
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013


#### ModelSource



ModelSource defines the source location of a model



_Appears in:_
- [DynamoModelSpec](#dynamomodelspec)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `uri` _string_ | URI is the model source URI<br />Supported formats:<br />- S3: s3://bucket/path/to/model<br />- HuggingFace: hf://org/model@revision_sha |  | Required: \{\} <br /> |


#### MultinodeSpec







_Appears in:_
- [DynamoComponentDeploymentSharedSpec](#dynamocomponentdeploymentsharedspec)
- [DynamoComponentDeploymentSpec](#dynamocomponentdeploymentspec)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `nodeCount` _integer_ | Indicates the number of nodes to deploy for multinode components.<br />Total number of GPUs is NumberOfNodes * GPU limit.<br />Must be greater than 1. | 2 | Minimum: 2 <br /> |


#### PVC







_Appears in:_
- [DynamoGraphDeploymentSpec](#dynamographdeploymentspec)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `create` _boolean_ | Create indicates to create a new PVC |  |  |
| `name` _string_ | Name is the name of the PVC |  | Required: \{\} <br /> |
| `storageClass` _string_ | StorageClass to be used for PVC creation. Required when create is true. |  |  |
| `size` _[Quantity](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#quantity-resource-api)_ | Size of the volume in Gi, used during PVC creation. Required when create is true. |  |  |
| `volumeAccessMode` _[PersistentVolumeAccessMode](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#persistentvolumeaccessmode-v1-core)_ | VolumeAccessMode is the volume access mode of the PVC. Required when create is true. |  |  |


#### ProfilingConfigSpec



ProfilingConfigSpec defines configuration for the profiling process.
This structure maps directly to the profile_sla.py config format.
1014
See dynamo/profiler/utils/profiler_argparse.py for the complete schema.
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024



_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `config` _[JSON](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#json-v1-apiextensions-k8s-io)_ | Config is the profiling configuration as arbitrary JSON/YAML. This will be passed directly to the profiler.<br />The profiler will validate the configuration and report any errors. |  | Optional: \{\} <br />Type: object <br /> |
| `configMapRef` _[ConfigMapKeySelector](#configmapkeyselector)_ | ConfigMapRef is an optional reference to a ConfigMap containing the DynamoGraphDeployment<br />base config file (disagg.yaml). This is separate from the profiling config above.<br />The path to this config will be set as engine.config in the profiling config. |  | Optional: \{\} <br /> |
1025
| `profilerImage` _string_ | ProfilerImage specifies the container image to use for profiling jobs.<br />This image contains the profiler code and dependencies needed for SLA-based profiling.<br />Example: "nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0" |  | Required: \{\} <br /> |
1026
1027
1028
| `outputPVC` _string_ | OutputPVC is an optional PersistentVolumeClaim name for storing profiling output.<br />If specified, all profiling artifacts (logs, plots, configs, raw data) will be written<br />to this PVC instead of an ephemeral emptyDir volume. This allows users to access<br />complete profiling results after the job completes by mounting the PVC.<br />The PVC must exist in the same namespace as the DGDR.<br />If not specified, profiling uses emptyDir and only essential data is saved to ConfigMaps.<br />Note: ConfigMaps are still created regardless of this setting for planner integration. |  | Optional: \{\} <br /> |
| `resources` _[ResourceRequirements](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#resourcerequirements-v1-core)_ | Resources specifies the compute resource requirements for the profiling job container.<br />If not specified, no resource requests or limits are set. |  | Optional: \{\} <br /> |
| `tolerations` _[Toleration](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#toleration-v1-core) array_ | Tolerations allows the profiling job to be scheduled on nodes with matching taints.<br />For example, to schedule on GPU nodes, add a toleration for the nvidia.com/gpu taint. |  | Optional: \{\} <br /> |
1029
| `nodeSelector` _object (keys:string, values:string)_ | NodeSelector is a selector which must match a node's labels for the profiling pod to be scheduled on that node.<br />For example, to schedule on ARM64 nodes, use \{"kubernetes.io/arch": "arm64"\}. |  | Optional: \{\} <br /> |
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071


#### ResourceItem







_Appears in:_
- [Resources](#resources)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `cpu` _string_ | CPU specifies the CPU resource request/limit (e.g., "1000m", "2") |  |  |
| `memory` _string_ | Memory specifies the memory resource request/limit (e.g., "4Gi", "8Gi") |  |  |
| `gpu` _string_ | GPU indicates the number of GPUs to request.<br />Total number of GPUs is NumberOfNodes * GPU in case of multinode deployment. |  |  |
| `gpuType` _string_ | GPUType can specify a custom GPU type, e.g. "gpu.intel.com/xe"<br />By default if not specified, the GPU type is "nvidia.com/gpu" |  |  |
| `custom` _object (keys:string, values:string)_ | Custom specifies additional custom resource requests/limits |  |  |


#### Resources



Resources defines requested and limits for a component, including CPU, memory,
GPUs/devices, and any runtime-specific resources.



_Appears in:_
- [DynamoComponentDeploymentSharedSpec](#dynamocomponentdeploymentsharedspec)
- [DynamoComponentDeploymentSpec](#dynamocomponentdeploymentspec)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `requests` _[ResourceItem](#resourceitem)_ | Requests specifies the minimum resources required by the component |  |  |
| `limits` _[ResourceItem](#resourceitem)_ | Limits specifies the maximum resources allowed for the component |  |  |
| `claims` _[ResourceClaim](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#resourceclaim-v1-core) array_ | Claims specifies resource claims for dynamic resource allocation |  |  |


1072
1073
#### Restart

1074
1075
1076
1077
1078
1079
1080






_Appears in:_
1081
- [DynamoGraphDeploymentSpec](#dynamographdeploymentspec)
1082
1083
1084

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
1085
1086
| `id` _string_ | ID is an arbitrary string that triggers a restart when changed.<br />Any modification to this value will initiate a restart of the graph deployment according to the strategy. |  | MinLength: 1 <br />Required: \{\} <br /> |
| `strategy` _[RestartStrategy](#restartstrategy)_ | Strategy specifies the restart strategy for the graph deployment. |  | Optional: \{\} <br /> |
1087
1088


1089
#### RestartPhase
1090

1091
_Underlying type:_ _string_
1092
1093


1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105



_Appears in:_
- [RestartStatus](#restartstatus)

| Field | Description |
| --- | --- |
| `Pending` |  |
| `Restarting` |  |
| `Completed` |  |
| `Failed` |  |
1106
| `Superseded` |  |
1107
1108
1109
1110
1111
1112
1113


#### RestartStatus



RestartStatus contains the status of the restart of the graph deployment.
1114
1115
1116
1117
1118
1119
1120
1121



_Appears in:_
- [DynamoGraphDeploymentStatus](#dynamographdeploymentstatus)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
1122
1123
1124
| `observedID` _string_ | ObservedID is the restart ID that has been observed and is being processed.<br />Matches the Restart.ID field in the spec. |  |  |
| `phase` _[RestartPhase](#restartphase)_ | Phase is the phase of the restart. |  |  |
| `inProgress` _string array_ | InProgress contains the names of the services that are currently being restarted. |  | Optional: \{\} <br /> |
1125
1126


1127
#### RestartStrategy
1128
1129
1130
1131
1132
1133
1134
1135







_Appears in:_
1136
- [Restart](#restart)
1137
1138
1139

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
1140
1141
| `type` _[RestartStrategyType](#restartstrategytype)_ | Type specifies the restart strategy type. | Sequential | Enum: [Sequential Parallel] <br /> |
| `order` _string array_ | Order specifies the order in which the services should be restarted. |  | Optional: \{\} <br /> |
1142
1143


1144
1145
1146
#### RestartStrategyType

_Underlying type:_ _string_
1147
1148
1149



1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160


_Appears in:_
- [RestartStrategy](#restartstrategy)

| Field | Description |
| --- | --- |
| `Sequential` |  |
| `Parallel` |  |


1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
#### RollingUpdatePhase

_Underlying type:_ _string_

RollingUpdatePhase represents the current phase of a rolling update.

_Validation:_
- Enum: [Pending InProgress Completed Failed ]

_Appears in:_
- [RollingUpdateStatus](#rollingupdatestatus)

| Field | Description |
| --- | --- |
| `Pending` |  |
| `InProgress` |  |
| `Completed` |  |
| `` |  |


#### RollingUpdateStatus



RollingUpdateStatus tracks the progress of a rolling update.



_Appears in:_
- [DynamoGraphDeploymentStatus](#dynamographdeploymentstatus)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `phase` _[RollingUpdatePhase](#rollingupdatephase)_ | Phase indicates the current phase of the rolling update. |  | Enum: [Pending InProgress Completed Failed ] <br />Optional: \{\} <br /> |
| `startTime` _[Time](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#time-v1-meta)_ | StartTime is when the rolling update began. |  | Optional: \{\} <br /> |
| `endTime` _[Time](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#time-v1-meta)_ | EndTime is when the rolling update completed (successfully or failed). |  | Optional: \{\} <br /> |
| `updatedServices` _string array_ | UpdatedServices is the list of services that have completed the rolling update.<br />A service is considered updated when its new replicas are all ready and old replicas are fully scaled down.<br />Only services of componentType Worker (or Prefill/Decode) are considered. |  | Optional: \{\} <br /> |


1200
1201
1202
1203
1204
1205
1206
#### ScalingAdapter



ScalingAdapter configures whether a service uses the DynamoGraphDeploymentScalingAdapter
for replica management. When enabled, the DGDSA owns the replicas field and
external autoscalers (HPA, KEDA, Planner) can control scaling via the Scale subresource.
1207
1208
1209
1210
1211
1212
1213
1214
1215



_Appears in:_
- [DynamoComponentDeploymentSharedSpec](#dynamocomponentdeploymentsharedspec)
- [DynamoComponentDeploymentSpec](#dynamocomponentdeploymentspec)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
1216
| `enabled` _boolean_ | Enabled indicates whether the ScalingAdapter should be enabled for this service.<br />When true, a DGDSA is created and owns the replicas field.<br />When false (default), no DGDSA is created and replicas can be modified directly in the DGD. | false | Optional: \{\} <br /> |
1217
1218


1219
1220
#### ServiceCheckpointConfig

1221
1222


1223
ServiceCheckpointConfig configures checkpointing for a DGD service
1224
1225
1226
1227



_Appears in:_
1228
1229
- [DynamoComponentDeploymentSharedSpec](#dynamocomponentdeploymentsharedspec)
- [DynamoComponentDeploymentSpec](#dynamocomponentdeploymentspec)
1230
1231
1232

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
1233
1234
| `enabled` _boolean_ | Enabled indicates whether checkpointing is enabled for this service | false | Optional: \{\} <br /> |
| `mode` _[CheckpointMode](#checkpointmode)_ | Mode defines how checkpoint creation is handled<br />- Auto: DGD controller creates Checkpoint CR automatically<br />- Manual: User must create Checkpoint CR | Auto | Enum: [Auto Manual] <br />Optional: \{\} <br /> |
1235
| `checkpointRef` _string_ | CheckpointRef references an existing DynamoCheckpoint CR by metadata.name.<br />If specified, this service's Identity is ignored and the referenced checkpoint is used directly. |  | Optional: \{\} <br /> |
1236
| `identity` _[DynamoCheckpointIdentity](#dynamocheckpointidentity)_ | Identity defines the checkpoint identity for hash computation<br />Used when Mode is Auto or when looking up existing checkpoints<br />Required when checkpointRef is not specified |  | Optional: \{\} <br /> |
1237
1238


1239
1240
#### ServiceCheckpointStatus

1241
1242


1243
ServiceCheckpointStatus contains checkpoint information for a single service.
1244
1245
1246
1247



_Appears in:_
1248
- [DynamoGraphDeploymentStatus](#dynamographdeploymentstatus)
1249

1250
1251
1252
1253
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `checkpointName` _string_ | CheckpointName is the name of the associated Checkpoint CR |  | Optional: \{\} <br /> |
| `identityHash` _string_ | IdentityHash is the computed hash of the checkpoint identity |  | Optional: \{\} <br /> |
1254
| `ready` _boolean_ | Ready indicates if the checkpoint was visible to the worker at startup |  | Optional: \{\} <br /> |
1255
1256


1257
#### ServiceReplicaStatus
1258
1259
1260



1261
ServiceReplicaStatus contains replica information for a single service.
1262
1263
1264
1265
1266
1267
1268
1269



_Appears in:_
- [DynamoGraphDeploymentStatus](#dynamographdeploymentstatus)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
1270
| `componentKind` _[ComponentKind](#componentkind)_ | ComponentKind is the underlying resource kind (e.g., "PodClique", "PodCliqueScalingGroup", "Deployment", "LeaderWorkerSet"). |  | Enum: [PodClique PodCliqueScalingGroup Deployment LeaderWorkerSet] <br /> |
1271
1272
| `componentName` _string_ | ComponentName is the name of the primary underlying resource.<br />DEPRECATED: Use ComponentNames instead. This field will be removed in a future release.<br />During rolling updates, this reflects the new (target) component name. |  |  |
| `componentNames` _string array_ | ComponentNames is the list of underlying resource names for this service.<br />During normal operation, this contains a single name.<br />During rolling updates, this contains both old and new component names. |  | Optional: \{\} <br /> |
1273
1274
1275
1276
1277
1278
1279
| `replicas` _integer_ | Replicas is the total number of non-terminated replicas.<br />Required for all component kinds. |  | Minimum: 0 <br /> |
| `updatedReplicas` _integer_ | UpdatedReplicas is the number of replicas at the current/desired revision.<br />Required for all component kinds. |  | Minimum: 0 <br /> |
| `readyReplicas` _integer_ | ReadyReplicas is the number of ready replicas.<br />Populated for PodClique, Deployment, and LeaderWorkerSet.<br />Not available for PodCliqueScalingGroup.<br />When nil, the field is omitted from the API response. |  | Minimum: 0 <br />Optional: \{\} <br /> |
| `availableReplicas` _integer_ | AvailableReplicas is the number of available replicas.<br />For Deployment: replicas ready for >= minReadySeconds.<br />For PodCliqueScalingGroup: replicas where all constituent PodCliques have >= MinAvailable ready pods.<br />Not available for PodClique or LeaderWorkerSet.<br />When nil, the field is omitted from the API response. |  | Minimum: 0 <br />Optional: \{\} <br /> |


#### SharedMemorySpec
1280
1281
1282
1283
1284
1285
1286
1287







_Appears in:_
1288
- [DynamoCheckpointJobConfig](#dynamocheckpointjobconfig)
1289
1290
- [DynamoComponentDeploymentSharedSpec](#dynamocomponentdeploymentsharedspec)
- [DynamoComponentDeploymentSpec](#dynamocomponentdeploymentspec)
1291
1292
1293

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
1294
1295
| `disabled` _boolean_ |  |  |  |
| `size` _[Quantity](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#quantity-resource-api)_ |  |  |  |
1296
1297


1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
#### SpecTopologyConstraint



SpecTopologyConstraint defines deployment-level topology placement requirements.
It carries both the topology profile (which ClusterTopology CR to use) and an
optional default pack domain that services without their own constraint inherit.



_Appears in:_
- [DynamoGraphDeploymentSpec](#dynamographdeploymentspec)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `topologyProfile` _string_ | TopologyProfile is the name of the ClusterTopology CR that defines the<br />topology hierarchy for this deployment. |  | MinLength: 1 <br /> |
| `packDomain` _[TopologyDomain](#topologydomain)_ | PackDomain is the default topology domain to pack pods within.<br />Optional — omit when only services carry constraints. |  | Pattern: `^[a-z0-9]([a-z0-9-]*[a-z0-9])?$` <br />Optional: \{\} <br /> |


#### TopologyConstraint



TopologyConstraint defines service-level topology placement requirements.
The topology profile is inherited from the deployment-level SpecTopologyConstraint;
only the pack domain is specified here.



_Appears in:_
- [DynamoComponentDeploymentSharedSpec](#dynamocomponentdeploymentsharedspec)
- [DynamoComponentDeploymentSpec](#dynamocomponentdeploymentspec)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `packDomain` _[TopologyDomain](#topologydomain)_ | PackDomain is the topology domain to pack pods within. Must match a<br />domain defined in the referenced ClusterTopology CR. |  | Pattern: `^[a-z0-9]([a-z0-9-]*[a-z0-9])?$` <br /> |


#### TopologyDomain

_Underlying type:_ _string_

TopologyDomain is a free-form topology level identifier.
Domain names are defined by the cluster admin in the ClusterTopology CR.
Common examples: "region", "zone", "datacenter", "block", "rack", "host", "numa".
Must match `^[a-z0-9]([a-z0-9-]*[a-z0-9])?$` (lowercase alphanumeric,
may contain hyphens but must not start or end with one).

_Validation:_
- Pattern: `^[a-z0-9]([a-z0-9-]*[a-z0-9])?$`

_Appears in:_
- [SpecTopologyConstraint](#spectopologyconstraint)
- [TopologyConstraint](#topologyconstraint)



1355
1356
#### VolumeMount

1357
1358


1359
VolumeMount references a PVC defined at the top level for volumes to be mounted by the component
1360
1361
1362
1363



_Appears in:_
1364
1365
- [DynamoComponentDeploymentSharedSpec](#dynamocomponentdeploymentsharedspec)
- [DynamoComponentDeploymentSpec](#dynamocomponentdeploymentspec)
1366

1367
1368
1369
1370
1371
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `name` _string_ | Name references a PVC name defined in the top-level PVCs map |  | Required: \{\} <br /> |
| `mountPoint` _string_ | MountPoint specifies where to mount the volume.<br />If useAsCompilationCache is true and mountPoint is not specified,<br />a backend-specific default will be used. |  |  |
| `useAsCompilationCache` _boolean_ | UseAsCompilationCache indicates this volume should be used as a compilation cache.<br />When true, backend-specific environment variables will be set and default mount points may be used. | false |  |
1372
1373


1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488

## nvidia.com/v1beta1

Package v1beta1 contains API Schema definitions for the nvidia.com v1beta1 API group.

### Resource Types
- [DynamoGraphDeploymentRequest](#v1beta1-dynamographdeploymentrequest)



#### BackendType

_Underlying type:_ _string_

BackendType specifies the inference backend.

_Validation:_
- Enum: [auto sglang trtllm vllm]

_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#v1beta1-dynamographdeploymentrequestspec)

| Field | Description |
| --- | --- |
| `auto` |  |
| `sglang` |  |
| `trtllm` |  |
| `vllm` |  |


#### DGDRPhase

_Underlying type:_ _string_

DGDRPhase represents the lifecycle phase of a DynamoGraphDeploymentRequest.

_Validation:_
- Enum: [Pending Profiling Ready Deploying Deployed Failed]

_Appears in:_
- [DynamoGraphDeploymentRequestStatus](#v1beta1-dynamographdeploymentrequeststatus)

| Field | Description |
| --- | --- |
| `Pending` |  |
| `Profiling` |  |
| `Ready` |  |
| `Deploying` |  |
| `Deployed` |  |
| `Failed` |  |


#### DeploymentInfoStatus



DeploymentInfoStatus tracks the state of the deployed DynamoGraphDeployment.



_Appears in:_
- [DynamoGraphDeploymentRequestStatus](#v1beta1-dynamographdeploymentrequeststatus)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `replicas` _integer_ | Replicas is the desired number of replicas. |  | Optional: \{\} <br /> |
| `availableReplicas` _integer_ | AvailableReplicas is the number of replicas that are available and ready. |  | Optional: \{\} <br /> |


#### v1beta1 DynamoGraphDeploymentRequest



DynamoGraphDeploymentRequest is the Schema for the dynamographdeploymentrequests API.
It provides a simplified, SLA-driven interface for deploying inference models on Dynamo.
Users specify a model and optional performance targets; the controller handles profiling,
configuration selection, and deployment.

Lifecycle:
 1. Pending: Spec validated, preparing for profiling
 2. Profiling: Profiling job is running to discover optimal configurations
 3. Ready: Profiling complete, generated DGD spec available in status
 4. Deploying: DGD is being created and rolled out (when autoApply=true)
 5. Deployed: DGD is running and healthy
 6. Failed: An unrecoverable error occurred





| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `apiVersion` _string_ | `nvidia.com/v1beta1` | | |
| `kind` _string_ | `DynamoGraphDeploymentRequest` | | |
| `metadata` _[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#objectmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. |  |  |
| `spec` _[DynamoGraphDeploymentRequestSpec](#v1beta1-dynamographdeploymentrequestspec)_ | Spec defines the desired state for this deployment request. |  |  |
| `status` _[DynamoGraphDeploymentRequestStatus](#v1beta1-dynamographdeploymentrequeststatus)_ | Status reflects the current observed state of this deployment request. |  |  |


#### v1beta1 DynamoGraphDeploymentRequestSpec



DynamoGraphDeploymentRequestSpec defines the desired state of a DynamoGraphDeploymentRequest.
Only the Model field is required; all other fields are optional and have sensible defaults.



_Appears in:_
- [DynamoGraphDeploymentRequest](#v1beta1-dynamographdeploymentrequest)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `model` _string_ | Model specifies the model to deploy (e.g., "Qwen/Qwen3-0.6B", "meta-llama/Llama-3-70b").<br />Can be a HuggingFace ID or a private model name. |  | MinLength: 1 <br />Required: \{\} <br /> |
| `backend` _[BackendType](#backendtype)_ | Backend specifies the inference backend to use for profiling and deployment. | auto | Enum: [auto sglang trtllm vllm] <br />Optional: \{\} <br /> |
1489
| `image` _string_ | Image is the container image reference for the profiling job (frontend image).<br />Example: "nvcr.io/nvidia/ai-dynamo/dynamo-frontend:1.0.0". |  | Optional: \{\} <br /> |
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
| `modelCache` _[ModelCacheSpec](#modelcachespec)_ | ModelCache provides optional PVC configuration for pre-downloaded model weights.<br />When provided, weights are loaded from the PVC instead of downloading from HuggingFace. |  | Optional: \{\} <br /> |
| `hardware` _[HardwareSpec](#hardwarespec)_ | Hardware describes the hardware resources available for profiling and deployment.<br />Typically auto-filled by the operator from cluster discovery. |  | Optional: \{\} <br /> |
| `workload` _[WorkloadSpec](#workloadspec)_ | Workload defines the expected workload characteristics for SLA-based profiling. |  | Optional: \{\} <br /> |
| `sla` _[SLASpec](#slaspec)_ | SLA defines service-level agreement targets that drive profiling optimization. |  | Optional: \{\} <br /> |
| `overrides` _[OverridesSpec](#overridesspec)_ | Overrides allows customizing the profiling job and the generated DynamoGraphDeployment. |  | Optional: \{\} <br /> |
| `features` _[FeaturesSpec](#featuresspec)_ | Features controls optional Dynamo platform features in the generated deployment. |  | Optional: \{\} <br /> |
| `searchStrategy` _[SearchStrategy](#searchstrategy)_ | SearchStrategy controls the profiling search depth.<br />"rapid" performs a fast sweep; "thorough" explores more configurations. | rapid | Enum: [rapid thorough] <br />Optional: \{\} <br /> |
| `autoApply` _boolean_ | AutoApply indicates whether to automatically create a DynamoGraphDeployment<br />after profiling completes. If false, the generated spec is stored in status<br />for manual review and application. | true | Optional: \{\} <br /> |


#### v1beta1 DynamoGraphDeploymentRequestStatus



DynamoGraphDeploymentRequestStatus represents the observed state of a DynamoGraphDeploymentRequest.



_Appears in:_
- [DynamoGraphDeploymentRequest](#v1beta1-dynamographdeploymentrequest)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `phase` _[DGDRPhase](#dgdrphase)_ | Phase is the high-level lifecycle phase of the deployment request. |  | Enum: [Pending Profiling Ready Deploying Deployed Failed] <br />Optional: \{\} <br /> |
| `profilingPhase` _[ProfilingPhase](#profilingphase)_ | ProfilingPhase indicates the current sub-phase of the profiling pipeline.<br />Only meaningful when Phase is "Profiling". Cleared when profiling completes or fails. |  | Enum: [Initializing SweepingPrefill SweepingDecode SelectingConfig BuildingCurves GeneratingDGD Done] <br />Optional: \{\} <br /> |
| `dgdName` _string_ | DGDName is the name of the generated or created DynamoGraphDeployment. |  | Optional: \{\} <br /> |
| `profilingJobName` _string_ | ProfilingJobName is the name of the Kubernetes Job running the profiler. |  | Optional: \{\} <br /> |
1517
| `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#condition-v1-meta) array_ | Conditions contains the latest observed conditions of the deployment request.<br />Standard condition types include: Succeeded, Validation, Profiling, SpecGenerated, DeploymentReady. |  | Optional: \{\} <br /> |
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
| `profilingResults` _[ProfilingResultsStatus](#profilingresultsstatus)_ | ProfilingResults contains the output of the profiling process including<br />Pareto-optimal configurations and the selected deployment configuration. |  | Optional: \{\} <br /> |
| `deploymentInfo` _[DeploymentInfoStatus](#deploymentinfostatus)_ | DeploymentInfo tracks the state of the deployed DynamoGraphDeployment.<br />Populated when a DGD has been created (either via autoApply or manually). |  | Optional: \{\} <br /> |
| `observedGeneration` _integer_ | ObservedGeneration is the most recent generation observed by the controller. |  | Optional: \{\} <br /> |


#### FeaturesSpec



FeaturesSpec controls optional Dynamo platform features in the generated deployment.



_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#v1beta1-dynamographdeploymentrequestspec)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
1536
| `planner` _[RawExtension](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg)_ | Planner is the raw SLA planner configuration passed to the planner service.<br />Its schema is defined by dynamo.planner.config.planner_config.PlannerConfig.<br />Go treats this as opaque bytes; the Planner service validates it at startup.<br />The presence of this field (non-null) enables the planner in the generated DGD. |  | Type: object <br />Optional: \{\} <br /> |
1537
1538
1539
| `mocker` _[MockerSpec](#mockerspec)_ | Mocker configures the simulated (mocker) backend for testing without GPUs. |  | Optional: \{\} <br /> |


1540
1541
1542
1543
1544
1545
1546
#### GPUSKUType

_Underlying type:_ _string_

GPUSKUType is the AIC hardware system identifier for a supported GPU.

_Validation:_
1547
- Enum: [gb200_sxm b200_sxm h200_sxm h100_sxm h100_pcie a100_sxm a100_pcie l40s l40 l4 v100_sxm v100_pcie t4 mi200 mi300]
1548
1549
1550
1551
1552
1553

_Appears in:_
- [HardwareSpec](#hardwarespec)

| Field | Description |
| --- | --- |
1554
| `gb200_sxm` | --- Blackwell ---<br /> |
1555
| `b200_sxm` |  |
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
| `h200_sxm` | --- Hopper ---<br /> |
| `h100_sxm` |  |
| `h100_pcie` |  |
| `a100_sxm` | --- Ampere ---<br /> |
| `a100_pcie` |  |
| `l40s` | --- Ada ---<br /> |
| `l40` |  |
| `l4` |  |
| `v100_sxm` | --- Older NVIDIA ---<br /> |
| `v100_pcie` |  |
| `t4` |  |
| `mi200` | --- AMD ---<br /> |
| `mi300` |  |
1569
1570


1571
1572
1573
1574
#### HardwareSpec



1575
1576
1577
1578
1579
1580
HardwareSpec describes the GPU hardware for profiling and deployment.
All fields are auto-detected from cluster GPU nodes when omitted
(requires cluster-wide mode with GPU discovery enabled).
gpuSku is a selector (restricts which nodes are considered);
the other fields are pure overrides passed to the profiler.
If all four fields are set, discovery is skipped.
1581
1582
1583
1584
1585
1586
1587
1588



_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#v1beta1-dynamographdeploymentrequestspec)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
1589
1590
1591
1592
| `gpuSku` _[GPUSKUType](#gpuskutype)_ | GPUSKU selects the GPU type to target.<br />When omitted, auto-detected by selecting the GPU with the highest<br />node count, then highest VRAM. In mixed-GPU clusters, set this to<br />choose which GPU type to use. Discovery and totalGpus are then<br />restricted to nodes matching this SKU. |  | Enum: [gb200_sxm b200_sxm h200_sxm h100_sxm h100_pcie a100_sxm a100_pcie l40s l40 l4 v100_sxm v100_pcie t4 mi200 mi300] <br />Optional: \{\} <br /> |
| `vramMb` _float_ | VRAMMB is the VRAM per GPU in MiB.<br />When omitted, auto-detected from cluster GPU nodes. |  | Optional: \{\} <br /> |
| `totalGpus` _integer_ | TotalGPUs is the GPU budget for profiling and deployment.<br />The profiler uses this to determine parallelism and replica count.<br />When omitted, computed by counting GPUs on discovered nodes<br />(filtered by gpuSku when set), temporarily capped at 32 to<br />limit profiler search space. This cap may be removed in a future<br />release. Set this field explicitly to override. |  | Optional: \{\} <br /> |
| `numGpusPerNode` _integer_ | NumGPUsPerNode is the number of GPUs per node.<br />When omitted, auto-detected from cluster GPU nodes. |  | Optional: \{\} <br /> |
1593
1594
| `interconnect` _string_ | Interconnect describes the primary GPU-to-GPU interconnect *within a node*.<br />Semantics / usage:<br />  - This is capability metadata used for profiling, planning, and deployment decisions.<br />  - It does NOT configure or enable any GPU interconnect; it only describes what is available/assumed.<br />  - When omitted, the operator may attempt best-effort discovery (currently distinguishes "nvlink"<br />    vs "pcie" based on DCGM NVLink link count). If discovery is unavailable, it may remain empty.<br />Impact of wrong / missing values:<br />  - If set more optimistically than reality (e.g., "nvlink" when only PCIe is present), performance<br />    models may overestimate intra-node bandwidth and choose overly aggressive parallelism or layouts,<br />    resulting in degraded performance compared to expectations.<br />  - If set more pessimistically than reality (e.g., "pcie" when NVLink is present), the system may<br />    choose conservative plans and leave performance on the table.<br />  - If unset and undiscovered, consumers should treat the interconnect as unknown and fall back to<br />    conservative assumptions.<br />Example values: "pcie", "nvlink". Other values may be accepted but may not be auto-detected. |  | Optional: \{\} <br /> |
| `rdma` _boolean_ | RDMA indicates whether the cluster has RDMA-capable networking available for Dynamo data movement.<br />Semantics / usage:<br />  - This is capability metadata used for profiling, planning, and deployment decisions.<br />  - It does NOT install, enable, or configure RDMA (e.g., drivers, SR-IOV, NVIDIA network operator,<br />    GPUDirect settings). It only expresses availability/intent.<br />  - When omitted, the operator may attempt best-effort discovery (e.g., via node labels indicating<br />    RDMA/SR-IOV capability and/or presence of NVIDIA network-operator RDMA components). If discovery<br />    is unavailable, it may remain unset.<br />Impact of wrong / missing values:<br />  - False positive (set true when RDMA is not actually usable end-to-end) may cause plans or<br />    deployments to assume RDMA is available; depending on the runtime transport selection and<br />    fallback behavior, this can lead to connection/setup failures or performance regressions.<br />  - False negative (set false when RDMA is available) will typically avoid RDMA-optimized paths and<br />    fall back to non-RDMA transports, usually remaining functional but potentially slower.<br />  - If unset and undiscovered, consumers should treat RDMA availability as unknown and use<br />    conservative defaults / fallback transports. |  | Optional: \{\} <br /> |
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763




#### MockerSpec



MockerSpec configures the simulated (mocker) backend.



_Appears in:_
- [FeaturesSpec](#featuresspec)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `enabled` _boolean_ | Enabled indicates whether to deploy mocker workers instead of real inference workers.<br />Useful for large-scale testing without GPUs. |  | Optional: \{\} <br /> |


#### ModelCacheSpec



ModelCacheSpec references a PVC containing pre-downloaded model weights.



_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#v1beta1-dynamographdeploymentrequestspec)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `pvcName` _string_ | PVCName is the name of the PersistentVolumeClaim containing model weights.<br />The PVC must exist in the same namespace as the DGDR. |  | Optional: \{\} <br /> |
| `pvcModelPath` _string_ | PVCModelPath is the path to the model checkpoint directory within the PVC<br />(e.g. "deepseek-r1" or "models/Llama-3.1-405B-FP8"). |  | Optional: \{\} <br /> |
| `pvcMountPath` _string_ | PVCMountPath is the mount path for the PVC inside the container. | /opt/model-cache | Optional: \{\} <br /> |


#### OverridesSpec



OverridesSpec allows customizing the profiling job and the generated DynamoGraphDeployment.



_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#v1beta1-dynamographdeploymentrequestspec)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `profilingJob` _[JobSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#jobspec-v1-batch)_ | ProfilingJob allows overriding the profiling Job specification.<br />Fields set here are merged into the controller-generated Job spec. |  | Optional: \{\} <br /> |
| `dgd` _[RawExtension](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg)_ | DGD allows providing a full or partial nvidia.com/v1alpha1 DynamoGraphDeployment<br />to use as the base for the generated deployment. Fields from profiling results<br />are merged on top. Use this to override backend worker images.<br />The field is stored as a raw embedded resource rather than a typed<br />*v1alpha1.DynamoGraphDeployment to avoid a circular import: v1alpha1 already<br />imports v1beta1 as the conversion hub and Go does not allow import cycles.<br />The EmbeddedResource marker tells the API server to validate that the value is a<br />well-formed Kubernetes object (has apiVersion/kind), but does not enforce that it<br />is specifically a DynamoGraphDeployment. Full type validation (correct apiVersion,<br />kind, and field schema) is performed by the controller during reconciliation. |  | EmbeddedResource: \{\} <br />Optional: \{\} <br /> |


#### ParetoConfig



ParetoConfig represents a single Pareto-optimal deployment configuration
discovered during profiling.



_Appears in:_
- [ProfilingResultsStatus](#profilingresultsstatus)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `config` _[RawExtension](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg)_ | Config is the full deployment configuration for this Pareto point. |  | Type: object <br /> |


#### ProfilingPhase

_Underlying type:_ _string_

ProfilingPhase represents a sub-phase within the profiling pipeline.
When the DGDR Phase is "Profiling", this value indicates which step
of the profiling pipeline is currently executing.

_Validation:_
- Enum: [Initializing SweepingPrefill SweepingDecode SelectingConfig BuildingCurves GeneratingDGD Done]

_Appears in:_
- [DynamoGraphDeploymentRequestStatus](#v1beta1-dynamographdeploymentrequeststatus)

| Field | Description |
| --- | --- |
| `Initializing` | Profiler is loading the DGD template, detecting GPU hardware,<br />and resolving the model architecture from HuggingFace.<br /> |
| `SweepingPrefill` | Sweeping parallelization strategies (TP/TEP/DEP) across GPU counts<br />for prefill, measuring TTFT at each configuration.<br /> |
| `SweepingDecode` | Sweeping parallelization strategies and concurrency levels<br />for decode, measuring ITL at each configuration.<br /> |
| `SelectingConfig` | Filtering results against SLA targets and selecting the most<br />cost-efficient configuration that meets TTFT/ITL requirements.<br /> |
| `BuildingCurves` | Building detailed interpolation curves (ISL→TTFT for prefill,<br />KV-usage×context-length→ITL for decode) using the selected configs.<br /> |
| `GeneratingDGD` | Packaging profiling data into a ConfigMap and generating<br />the final DGD YAML with planner integration.<br /> |
| `Done` | Profiling pipeline finished successfully.<br /> |


#### ProfilingResultsStatus



ProfilingResultsStatus contains the output of the profiling process.



_Appears in:_
- [DynamoGraphDeploymentRequestStatus](#v1beta1-dynamographdeploymentrequeststatus)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `pareto` _[ParetoConfig](#paretoconfig) array_ | Pareto is the list of Pareto-optimal deployment configurations discovered during profiling.<br />Each entry represents a different cost/performance trade-off. |  | Optional: \{\} <br /> |
| `selectedConfig` _[RawExtension](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg)_ | SelectedConfig is the recommended configuration chosen by the profiler<br />based on the SLA targets. This is the configuration used for deployment<br />when autoApply is true. |  | Type: object <br />Optional: \{\} <br /> |


#### SLASpec



SLASpec defines the service-level agreement targets for profiling optimization.



_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#v1beta1-dynamographdeploymentrequestspec)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `ttft` _float_ | TTFT is the Time To First Token target in milliseconds. |  | Optional: \{\} <br /> |
| `itl` _float_ | ITL is the Inter-Token Latency target in milliseconds. |  | Optional: \{\} <br /> |
| `e2eLatency` _float_ | E2ELatency is the target end-to-end request latency in milliseconds.<br />Alternative to specifying TTFT + ITL. |  | Optional: \{\} <br /> |


#### SearchStrategy

_Underlying type:_ _string_

SearchStrategy controls the profiling search depth.

_Validation:_
- Enum: [rapid thorough]

_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#v1beta1-dynamographdeploymentrequestspec)

| Field | Description |
| --- | --- |
| `rapid` |  |
| `thorough` |  |


#### WorkloadSpec



WorkloadSpec defines the workload characteristics for SLA-based profiling.



_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#v1beta1-dynamographdeploymentrequestspec)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `isl` _integer_ | ISL is the Input Sequence Length (number of tokens). | 4000 | Optional: \{\} <br /> |
| `osl` _integer_ | OSL is the Output Sequence Length (number of tokens). | 1000 | Optional: \{\} <br /> |
| `concurrency` _float_ | Concurrency is the target concurrency level.<br />Required (or RequestRate) when the planner is disabled. |  | Optional: \{\} <br /> |
| `requestRate` _float_ | RequestRate is the target request rate (req/s).<br />Required (or Concurrency) when the planner is disabled. |  | Optional: \{\} <br /> |


1764
1765
1766
1767
1768
1769
1770
1771
1772

## operator.config.dynamo.nvidia.com/v1alpha1


### Resource Types
- [OperatorConfiguration](#operatorconfiguration)



1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
#### CertProvisionMode

_Underlying type:_ _string_

CertProvisionMode controls how webhook TLS certificates are managed.



_Appears in:_
- [WebhookServer](#webhookserver)

| Field | Description |
| --- | --- |
| `auto` | CertProvisionModeAuto uses the built-in cert-controller to generate and rotate certificates.<br /> |
| `manual` | CertProvisionModeManual expects certificates to be provided externally (e.g., cert-manager, admin).<br /> |


1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
#### CheckpointConfiguration



CheckpointConfiguration holds checkpoint/restore settings.



_Appears in:_
- [OperatorConfiguration](#operatorconfiguration)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `enabled` _boolean_ | Enabled indicates if checkpoint functionality is enabled |  |  |
1804
| `storage` _[CheckpointStorageConfiguration](#checkpointstorageconfiguration)_ | Deprecated: Storage is retained for compatibility and ignored by the<br />current snapshot flow. Snapshot storage is discovered from the<br />snapshot-agent DaemonSet instead. |  |  |
1805
1806
1807
1808
1809
1810


#### CheckpointOCIConfig



1811
1812
Deprecated: CheckpointOCIConfig is retained for compatibility and ignored by
the current snapshot flow.
1813
1814
1815
1816
1817
1818
1819
1820



_Appears in:_
- [CheckpointStorageConfiguration](#checkpointstorageconfiguration)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
1821
1822
| `uri` _string_ | URI is the legacy OCI URI (oci://registry/repository). |  |  |
| `credentialsSecretRef` _string_ | CredentialsSecretRef is the legacy docker config secret name. |  |  |
1823
1824
1825
1826
1827
1828


#### CheckpointPVCConfig



1829
1830
Deprecated: CheckpointPVCConfig is retained for compatibility and ignored by
the current snapshot flow.
1831
1832
1833
1834
1835
1836
1837
1838



_Appears in:_
- [CheckpointStorageConfiguration](#checkpointstorageconfiguration)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
1839
1840
| `pvcName` _string_ | PVCName is the legacy PVC name. |  |  |
| `basePath` _string_ | BasePath is the legacy base directory within the PVC. |  |  |
1841
1842
1843
1844
1845
1846


#### CheckpointS3Config



1847
1848
Deprecated: CheckpointS3Config is retained for compatibility and ignored by
the current snapshot flow.
1849
1850
1851
1852
1853
1854
1855
1856



_Appears in:_
- [CheckpointStorageConfiguration](#checkpointstorageconfiguration)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
1857
1858
| `uri` _string_ | URI is the legacy S3 URI (s3://[endpoint/]bucket/prefix). |  |  |
| `credentialsSecretRef` _string_ | CredentialsSecretRef is the legacy credentials secret name. |  |  |
1859
1860
1861
1862
1863
1864


#### CheckpointStorageConfiguration



1865
1866
Deprecated: CheckpointStorageConfiguration is retained for compatibility and
ignored by the current snapshot flow.
1867
1868
1869
1870
1871
1872
1873
1874



_Appears in:_
- [CheckpointConfiguration](#checkpointconfiguration)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
1875
1876
1877
1878
| `type` _string_ | Type is the legacy storage backend type: pvc, s3, or oci. |  |  |
| `pvc` _[CheckpointPVCConfig](#checkpointpvcconfig)_ | PVC configuration for legacy pvc-based settings. |  |  |
| `s3` _[CheckpointS3Config](#checkpoints3config)_ | S3 configuration for legacy s3-based settings. |  |  |
| `oci` _[CheckpointOCIConfig](#checkpointociconfig)_ | OCI configuration for legacy oci-based settings. |  |  |
1879
1880


1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
#### DRAConfiguration



DRAConfiguration holds Dynamic Resource Allocation (resource.k8s.io) settings.

NOTE: auto-detection here only verifies that the resource.k8s.io API group is
registered on the apiserver (Kubernetes 1.32+). It does NOT verify that a
GPU-specific DRA resource driver (e.g. nvidia/k8s-dra-driver-gpu) is
installed, that its DeviceClass exists, or that node-level GPU drivers are
compatible. An admin can use `enabled: false` to force-off DRA integration
on clusters where the API is present but the GPU driver stack is not wired
up — this makes the operator fail GMS / inter-pod failover admissions early
with a clear error instead of letting pods Pend with a confusing
"resourceclaim not found" at schedule time.



_Appears in:_
- [OperatorConfiguration](#operatorconfiguration)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `enabled` _boolean_ | Enabled overrides auto-detection of the resource.k8s.io API group.<br />nil = auto-detect. Setting true requires detection to also succeed (the<br />operator will exit at startup otherwise). |  |  |


1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
#### DiscoveryBackend

_Underlying type:_ _string_

DiscoveryBackend is the type for the discovery backend.



_Appears in:_
- [DiscoveryConfiguration](#discoveryconfiguration)

| Field | Description |
| --- | --- |
| `kubernetes` | DiscoveryBackendKubernetes is the Kubernetes discovery backend<br /> |
| `etcd` | DiscoveryBackendEtcd is the etcd discovery backend<br /> |


#### DiscoveryConfiguration



DiscoveryConfiguration holds discovery backend settings.



_Appears in:_
- [OperatorConfiguration](#operatorconfiguration)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `backend` _[DiscoveryBackend](#discoverybackend)_ | Backend is the discovery backend: "kubernetes" or "etcd" | kubernetes |  |


#### GPUConfiguration



GPUConfiguration holds GPU discovery settings.



_Appears in:_
- [OperatorConfiguration](#operatorconfiguration)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `discoveryEnabled` _boolean_ | DiscoveryEnabled indicates whether GPU discovery is enabled | true |  |


#### GroveConfiguration



GroveConfiguration holds Grove orchestrator settings.



_Appears in:_
- [OrchestratorConfiguration](#orchestratorconfiguration)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `enabled` _boolean_ | Enabled overrides auto-detection. nil = auto-detect. |  |  |
| `terminationDelay` _[Duration](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#duration-v1-meta)_ | TerminationDelay configures the termination delay for Grove PodCliqueSets | 15m |  |


#### InfrastructureConfiguration



InfrastructureConfiguration holds service mesh and backend addresses.



_Appears in:_
- [OperatorConfiguration](#operatorconfiguration)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `natsAddress` _string_ | NATSAddress is the address of the NATS server |  |  |
| `etcdAddress` _string_ | ETCDAddress is the address of the etcd server |  |  |
| `modelExpressURL` _string_ | ModelExpressURL is the URL of the Model Express server to inject into all pods |  |  |
| `prometheusEndpoint` _string_ | PrometheusEndpoint is the URL of the Prometheus endpoint to use for metrics |  |  |


#### IngressConfiguration



IngressConfiguration holds ingress settings.



_Appears in:_
- [OperatorConfiguration](#operatorconfiguration)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `virtualServiceGateway` _string_ | VirtualServiceGateway is the name of the Istio virtual service gateway |  |  |
| `controllerClassName` _string_ | ControllerClassName is the ingress controller class name |  |  |
| `controllerTLSSecretName` _string_ | ControllerTLSSecretName is the TLS secret for the ingress controller |  |  |
| `hostSuffix` _string_ | HostSuffix is the suffix for ingress hostnames |  |  |


#### KaiSchedulerConfiguration



KaiSchedulerConfiguration holds Kai-scheduler settings.



_Appears in:_
- [OrchestratorConfiguration](#orchestratorconfiguration)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `enabled` _boolean_ | Enabled overrides auto-detection. nil = auto-detect. |  |  |


2027
2028


2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
#### LWSConfiguration



LWSConfiguration holds LWS orchestrator settings.



_Appears in:_
- [OrchestratorConfiguration](#orchestratorconfiguration)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `enabled` _boolean_ | Enabled overrides auto-detection. nil = auto-detect. |  |  |


#### LeaderElectionConfiguration



LeaderElectionConfiguration holds leader election settings.



_Appears in:_
- [OperatorConfiguration](#operatorconfiguration)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `enabled` _boolean_ | Enabled enables leader election for controller manager | false |  |
| `id` _string_ | ID is the leader election resource identity |  |  |
| `namespace` _string_ | Namespace is the namespace for the leader election resource |  |  |


#### LoggingConfiguration



LoggingConfiguration holds logging settings.



_Appears in:_
- [OperatorConfiguration](#operatorconfiguration)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `level` _string_ | Level is the log level (e.g., "info", "debug") | info |  |
| `format` _string_ | Format is the log format (e.g., "json", "text") | json |  |


#### MPIConfiguration



MPIConfiguration holds MPI SSH secret settings.



_Appears in:_
- [OperatorConfiguration](#operatorconfiguration)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `sshSecretName` _string_ | SSHSecretName is the name of the secret containing the SSH key for MPI |  |  |
| `sshSecretNamespace` _string_ | SSHSecretNamespace is the namespace where the MPI SSH secret is located |  |  |


#### MetricsServer



MetricsServer extends Server with secure serving option.



_Appears in:_
- [ServerConfiguration](#serverconfiguration)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `bindAddress` _string_ | BindAddress is the address the server binds to |  |  |
| `port` _integer_ | Port is the port the server listens on |  |  |
2112
| `secure` _boolean_ | Secure enables secure serving for the metrics endpoint.<br />nil = default to true (secure by default). |  |  |
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127


#### NamespaceConfiguration



NamespaceConfiguration determines operator namespace mode.



_Appears in:_
- [OperatorConfiguration](#operatorconfiguration)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
2128
2129
| `restricted` _string_ | Deprecated: Namespace-restricted mode is deprecated and will be removed in a future release.<br />Use cluster-wide mode (leave Restricted empty) instead. |  |  |
| `scope` _[NamespaceScopeConfiguration](#namespacescopeconfiguration)_ | Deprecated: Scope is only used in namespace-restricted mode, which is deprecated. |  |  |
2130
2131
2132
2133
2134
2135


#### NamespaceScopeConfiguration



2136
2137
Deprecated: NamespaceScopeConfiguration is used only by the deprecated namespace-restricted
mode and will be removed in a future release.
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167



_Appears in:_
- [NamespaceConfiguration](#namespaceconfiguration)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `leaseDuration` _[Duration](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#duration-v1-meta)_ | LeaseDuration is the duration of namespace scope marker lease before expiration | 30s |  |
| `leaseRenewInterval` _[Duration](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#duration-v1-meta)_ | LeaseRenewInterval is the interval for renewing namespace scope marker lease | 10s |  |


#### OperatorConfiguration



OperatorConfiguration is the Schema for the operator configuration.





| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `apiVersion` _string_ | `operator.config.dynamo.nvidia.com/v1alpha1` | | |
| `kind` _string_ | `OperatorConfiguration` | | |
| `server` _[ServerConfiguration](#serverconfiguration)_ | Server configuration (metrics, health probes, webhooks) |  |  |
| `leaderElection` _[LeaderElectionConfiguration](#leaderelectionconfiguration)_ | Leader election configuration |  |  |
| `namespace` _[NamespaceConfiguration](#namespaceconfiguration)_ | Namespace configuration (restricted vs cluster-wide) |  |  |
| `orchestrators` _[OrchestratorConfiguration](#orchestratorconfiguration)_ | Orchestrator configuration with optional overrides |  |  |
2168
| `dra` _[DRAConfiguration](#draconfiguration)_ | DRA (Dynamic Resource Allocation) settings with optional override |  |  |
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
2215
2216
2217
2218
2219
2220
2221
2222
2223
2224
2225
2226
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2239
2240
2241
2242
2243
2244
2245
2246
2247
2248
2249
2250
2251
2252
2253
2254
2255
2256
2257
2258
2259
2260
2261
2262
2263
| `infrastructure` _[InfrastructureConfiguration](#infrastructureconfiguration)_ | Service mesh and infrastructure addresses |  |  |
| `ingress` _[IngressConfiguration](#ingressconfiguration)_ | Ingress configuration |  |  |
| `rbac` _[RBACConfiguration](#rbacconfiguration)_ | RBAC configuration for cross-namespace resource management (cluster-wide mode) |  |  |
| `mpi` _[MPIConfiguration](#mpiconfiguration)_ | MPI SSH secret configuration |  |  |
| `checkpoint` _[CheckpointConfiguration](#checkpointconfiguration)_ | Checkpoint/restore configuration |  |  |
| `discovery` _[DiscoveryConfiguration](#discoveryconfiguration)_ | Discovery backend configuration |  |  |
| `gpu` _[GPUConfiguration](#gpuconfiguration)_ | GPU discovery configuration |  |  |
| `logging` _[LoggingConfiguration](#loggingconfiguration)_ | Logging configuration |  |  |
| `security` _[SecurityConfiguration](#securityconfiguration)_ | HTTP/2 and TLS settings |  |  |


#### OrchestratorConfiguration



OrchestratorConfiguration holds orchestrator override settings.



_Appears in:_
- [OperatorConfiguration](#operatorconfiguration)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `grove` _[GroveConfiguration](#groveconfiguration)_ | Grove orchestrator configuration |  |  |
| `lws` _[LWSConfiguration](#lwsconfiguration)_ | LWS orchestrator configuration |  |  |
| `kaiScheduler` _[KaiSchedulerConfiguration](#kaischedulerconfiguration)_ | KaiScheduler configuration |  |  |


#### RBACConfiguration



RBACConfiguration holds RBAC settings for cluster-wide mode.



_Appears in:_
- [OperatorConfiguration](#operatorconfiguration)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `plannerClusterRoleName` _string_ | PlannerClusterRoleName is the ClusterRole for planner |  |  |
| `dgdrProfilingClusterRoleName` _string_ | DGDRProfilingClusterRoleName is the ClusterRole for DGDR profiling jobs |  |  |
| `eppClusterRoleName` _string_ | EPPClusterRoleName is the ClusterRole for EPP |  |  |


#### SecurityConfiguration



SecurityConfiguration holds HTTP/2 and TLS settings.



_Appears in:_
- [OperatorConfiguration](#operatorconfiguration)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `enableHTTP2` _boolean_ | EnableHTTP2 enables HTTP/2 for metrics and webhook servers | false |  |


#### Server



Server holds a bind address and port.



_Appears in:_
- [MetricsServer](#metricsserver)
- [ServerConfiguration](#serverconfiguration)
- [WebhookServer](#webhookserver)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `bindAddress` _string_ | BindAddress is the address the server binds to |  |  |
| `port` _integer_ | Port is the port the server listens on |  |  |


#### ServerConfiguration



ServerConfiguration holds server bind addresses and ports.



_Appears in:_
- [OperatorConfiguration](#operatorconfiguration)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
2264
| `metrics` _[MetricsServer](#metricsserver)_ | Metrics server configuration | \{ bindAddress:0.0.0.0 port:8080 secure:true \} |  |
2265
2266
2267
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
| `healthProbe` _[Server](#server)_ | Health probe server configuration | \{ bindAddress:0.0.0.0 port:8081 \} |  |
| `webhook` _[WebhookServer](#webhookserver)_ | Webhook server configuration | \{ certDir:/tmp/k8s-webhook-server/serving-certs host:0.0.0.0 port:9443 \} |  |


#### WebhookServer



WebhookServer extends Server with host and certificate directory.



_Appears in:_
- [ServerConfiguration](#serverconfiguration)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `bindAddress` _string_ | BindAddress is the address the server binds to |  |  |
| `port` _integer_ | Port is the port the server listens on |  |  |
| `host` _string_ | Host is the address the webhook server binds to |  |  |
| `certDir` _string_ | CertDir is the directory containing TLS certificates |  |  |
2286
2287
2288
| `certProvisionMode` _[CertProvisionMode](#certprovisionmode)_ | CertProvisionMode controls certificate management: "auto" (built-in cert-controller) or "manual" (external) | auto |  |
| `secretName` _string_ | SecretName is the name of the Kubernetes Secret holding webhook TLS certificates | webhook-server-cert |  |
| `serviceName` _string_ | ServiceName is the name of the Kubernetes Service fronting the webhook server.<br />Used to generate certificate SANs. Set by the Helm chart. |  |  |
2289
2290


2291
2292
2293
2294
2295
2296
2297
2298
2299
2300
2301
2302
2303
2304
2305
2306
2307
2308
2309
2310
2311
2312
2313
2314
2315
2316
2317
2318
2319
2320
2321
2322
2323
2324
2325
2326
2327
2328
2329
2330
2331
2332
2333
2334
2335
2336
2337
2338
2339
2340
2341
2342
2343
2344
2345
2346
2347
2348
2349
2350
2351
2352
2353
2354
2355
2356
2357
2358
2359
2360
2361
2362
2363
2364
2365
2366
2367
2368
2369
2370
2371
2372
2373
2374
2375
2376
2377
2378
2379
2380
2381
2382
2383
2384
2385
2386
2387
2388
2389
2390
2391
2392
2393
2394
2395
2396
2397
2398
2399
2400
2401
2402
2403
2404
2405
2406
2407
2408
2409
2410
2411
2412
2413
2414
2415
2416
2417
2418
2419
2420
2421
2422
2423
# Operator Default Values Injection

The Dynamo operator automatically applies default values to various fields when they are not explicitly specified in your deployments. These defaults include:

- **Health Probes**: Startup, liveness, and readiness probes are configured differently for frontend, worker, and planner components. For example, worker components receive a startup probe with a 2-hour timeout (720 failures × 10 seconds) to accommodate long model loading times.

- **Security Context**: All components receive `fsGroup: 1000` by default to ensure proper file permissions for mounted volumes. This can be overridden via the `extraPodSpec.securityContext` field.

- **Shared Memory**: All components receive an 8Gi shared memory volume mounted at `/dev/shm` by default (can be disabled or resized via the `sharedMemory` field).

- **Environment Variables**: Components automatically receive environment variables like `DYN_NAMESPACE`, `DYN_PARENT_DGD_K8S_NAME`, `DYNAMO_PORT`, and backend-specific variables.

- **Pod Configuration**: Default `terminationGracePeriodSeconds` of 60 seconds and `restartPolicy: Always`.

- **Autoscaling**: When enabled without explicit metrics, defaults to CPU-based autoscaling with 80% target utilization.

- **Backend-Specific Behavior**: For multinode deployments, probes are automatically modified or removed for worker nodes depending on the backend framework (VLLM, SGLang, or TensorRT-LLM).

## Pod Specification Defaults

All components receive the following pod-level defaults unless overridden:

- **`terminationGracePeriodSeconds`**: `60` seconds
- **`restartPolicy`**: `Always`

## Security Context

The operator automatically applies default security context settings to all components to ensure proper file permissions, particularly for mounted volumes:

- **`fsGroup`**: `1000` - Sets the group ownership of mounted volumes and any files created in those volumes

This default ensures that non-root containers can write to mounted volumes (like model caches or persistent storage) without permission issues. The `fsGroup` setting is particularly important for:
- Model downloads and caching
- Compilation cache directories
- Persistent volume claims (PVCs)
- SSH key generation in multinode deployments

### Overriding Security Context

To override the default security context, specify your own `securityContext` in the `extraPodSpec` of your component:

```yaml
services:
  YourWorker:
    extraPodSpec:
      securityContext:
        fsGroup: 2000  # Custom group ID
        runAsUser: 1000
        runAsGroup: 1000
        runAsNonRoot: true
```

**Important**: When you provide *any* `securityContext` object in `extraPodSpec`, the operator will not inject any defaults. This gives you complete control over the security context, including the ability to run as root (by omitting `runAsNonRoot` or setting it to `false`).

### OpenShift and Security Context Constraints

In OpenShift environments with Security Context Constraints (SCCs), you may need to omit explicit UID/GID values to allow OpenShift's admission controllers to assign them dynamically:

```yaml
services:
  YourWorker:
    extraPodSpec:
      securityContext:
        # Omit fsGroup to let OpenShift assign it based on SCC
        # OpenShift will inject the appropriate UID range
```

Alternatively, if you want to keep the default `fsGroup: 1000` behavior and are certain your cluster allows it, you don't need to specify anything - the operator defaults will work.

## Shared Memory Configuration

Shared memory is enabled by default for all components:

- **Enabled**: `true` (unless explicitly disabled via `sharedMemory.disabled`)
- **Size**: `8Gi`
- **Mount Path**: `/dev/shm`
- **Volume Type**: `emptyDir` with `memory` medium

To disable shared memory or customize the size, use the `sharedMemory` field in your component specification.

## Health Probes by Component Type

The operator applies different default health probes based on the component type.

### Frontend Components

Frontend components receive the following probe configurations:

**Liveness Probe:**
- **Type**: HTTP GET
- **Path**: `/health`
- **Port**: `http` (8000)
- **Initial Delay**: 60 seconds
- **Period**: 60 seconds
- **Timeout**: 30 seconds
- **Failure Threshold**: 10

**Readiness Probe:**
- **Type**: Exec command
- **Command**: `curl -s http://localhost:${DYNAMO_PORT}/health | jq -e ".status == \"healthy\""`
- **Initial Delay**: 60 seconds
- **Period**: 60 seconds
- **Timeout**: 30 seconds
- **Failure Threshold**: 10

### Worker Components

Worker components receive the following probe configurations:

**Liveness Probe:**
- **Type**: HTTP GET
- **Path**: `/live`
- **Port**: `system` (9090)
- **Period**: 5 seconds
- **Timeout**: 30 seconds
- **Failure Threshold**: 1

**Readiness Probe:**
- **Type**: HTTP GET
- **Path**: `/health`
- **Port**: `system` (9090)
- **Period**: 10 seconds
- **Timeout**: 30 seconds
- **Failure Threshold**: 60

**Startup Probe:**
- **Type**: HTTP GET
- **Path**: `/live`
- **Port**: `system` (9090)
- **Period**: 10 seconds
- **Timeout**: 5 seconds
- **Failure Threshold**: 720 (allows up to 2 hours for startup: 10s × 720 = 7200s)

2424
2425
2426
:::{note}
For larger models (typically >70B parameters) or slower storage systems, you may need to increase the `failureThreshold` to allow more time for model loading. Calculate the required threshold based on your expected startup time: `failureThreshold = (expected_startup_seconds / period)`. Override the startup probe in your component specification if the default 2-hour window is insufficient.
:::
2427
2428
2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
2439
2440
2441
2442
2443
2444
2445
2446
2447
2448
2449
2450
2451
2452
2453
2454
2455
2456
2457
2458
2459

### Multinode Deployment Probe Modifications

For multinode deployments, the operator modifies probes based on the backend framework and node role:

#### VLLM Backend

The operator automatically selects between two deployment modes based on parallelism configuration:

**Tensor/Pipeline Parallel Mode** (when `world_size > GPUs_per_node`):
- Uses Ray for distributed execution (`--distributed-executor-backend ray`)
- **Leader nodes**: Starts Ray head and runs vLLM; all probes remain active
- **Worker nodes**: Run Ray agents only; all probes (liveness, readiness, startup) are removed

**Data Parallel Mode** (when `world_size × data_parallel_size > GPUs_per_node`):
- **Worker nodes**: All probes (liveness, readiness, startup) are removed
- **Leader nodes**: All probes remain active

#### SGLang Backend
- **Worker nodes**: All probes (liveness, readiness, startup) are removed

#### TensorRT-LLM Backend
- **Leader nodes**: All probes remain unchanged
- **Worker nodes**:
  - Liveness and startup probes are removed
  - Readiness probe is replaced with a TCP socket check on SSH port (2222):
    - **Initial Delay**: 20 seconds
    - **Period**: 20 seconds
    - **Timeout**: 5 seconds
    - **Failure Threshold**: 10

## Environment Variables

2460
The operator automatically injects environment variables into component containers based on component type, backend framework, and operator configuration. User-provided `envs` values always take precedence over operator defaults.
2461
2462
2463

### All Components

2464
2465
2466
2467
2468
2469
2470
2471
2472
2473
2474
2475
2476
2477
2478
2479
2480
2481
2482
2483
2484
2485
2486
These environment variables are injected into every component container regardless of type.

| Variable | Purpose | Default | Type | Source |
| --- | --- | --- | --- | --- |
| `DYN_NAMESPACE` | Dynamo service namespace used for service discovery and routing | Derived from DGD spec | `string` | Downward API annotation on checkpoint-restored pods |
| `DYN_COMPONENT` | Identifies the component type for runtime behavior | One of: `frontend`, `worker`, `prefill`, `decode`, `planner`, `epp` | `string` | Set from component spec |
| `DYN_PARENT_DGD_K8S_NAME` | Kubernetes name of the parent DynamoGraphDeployment resource | — | `string` | Set from DGD metadata |
| `DYN_PARENT_DGD_K8S_NAMESPACE` | Kubernetes namespace of the parent DynamoGraphDeployment resource | — | `string` | Set from DGD metadata |
| `POD_NAME` | Current pod name | — | `string` | Downward API (`metadata.name`) |
| `POD_NAMESPACE` | Current pod namespace | — | `string` | Downward API (`metadata.namespace`) |
| `POD_UID` | Current pod UID | — | `string` | Downward API (`metadata.uid`) |
| `DYN_DISCOVERY_BACKEND` | Service discovery backend for inter-component communication | `kubernetes` | `string` | Options: `kubernetes`, `etcd` |

### Infrastructure (Conditional)

These are injected into all components when the corresponding infrastructure service is configured in the operator's `OperatorConfiguration`.

| Variable | Purpose | Default | Type | Condition |
| --- | --- | --- | --- | --- |
| `NATS_SERVER` | NATS messaging server address | — | `string` | Set when `infrastructure.natsAddress` is configured |
| `ETCD_ENDPOINTS` | etcd endpoint addresses for distributed state | — | `string` | Set when `infrastructure.etcdAddress` is configured |
| `MODEL_EXPRESS_URL` | Model Express service URL for model management | — | `string` | Set when `infrastructure.modelExpressURL` is configured |
| `PROMETHEUS_ENDPOINT` | Prometheus endpoint for metrics collection | — | `string` | Set when `infrastructure.prometheusEndpoint` is configured |
2487
2488
2489

### Frontend Components

2490
2491
2492
2493
2494
| Variable | Purpose | Default | Type |
| --- | --- | --- | --- |
| `DYNAMO_PORT` | HTTP port the frontend listens on | `8000` | `int` |
| `DYN_HTTP_PORT` | HTTP port for the frontend service (alias) | `8000` | `int` |
| `DYN_NAMESPACE_PREFIX` | Namespace prefix used for frontend request routing | Same as `DYN_NAMESPACE` | `string` |
2495
2496
2497

### Worker Components

2498
2499
2500
2501
2502
2503
2504
2505
2506
2507
| Variable | Purpose | Default | Type |
| --- | --- | --- | --- |
| `DYN_SYSTEM_ENABLED` | Enables the system HTTP server for health checks and metrics | `true` | `string` (boolean) |
| `DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS` | Endpoints whose health status is used for readiness | `["generate"]` | `string` (JSON array) |
| `DYN_SYSTEM_PORT` | Port for the system HTTP server (health, metrics) | `9090` | `int` |
| `DYN_HEALTH_CHECK_ENABLED` | Disables the legacy health check mechanism in favor of the system server | `false` | `string` (boolean) |
| `NIXL_TELEMETRY_ENABLE` | Enables or disables NIXL telemetry collection | `n` | `string` | Options: `y`, `n` |
| `NIXL_TELEMETRY_EXPORTER` | Telemetry exporter format for NIXL metrics | `prometheus` | `string` |
| `NIXL_TELEMETRY_PROMETHEUS_PORT` | Port for NIXL Prometheus metrics endpoint | `19090` | `int` |
| `DYN_NAMESPACE_WORKER_SUFFIX` | Hash suffix appended to worker namespace for rolling updates | — | `string` | Only set during rolling update transitions |
2508
2509
2510

### Planner Components

2511
2512
2513
2514
2515
2516
2517
2518
2519
2520
2521
2522
| Variable | Purpose | Default | Type |
| --- | --- | --- | --- |
| `PLANNER_PROMETHEUS_PORT` | Port for the planner's Prometheus metrics endpoint | `9085` | `int` |

### EPP (Endpoint Picker Plugin) Components

| Variable | Purpose | Default | Type |
| --- | --- | --- | --- |
| `USE_STREAMING` | Enables streaming mode for inference request proxying | `true` | `string` (boolean) |
| `RUST_LOG` | Rust log level and filter configuration | `debug,dynamo_llm::kv_router=trace` | `string` |

### VLLM Backend
2523

2524
2525
2526
2527
| Variable | Purpose | Default | Type | Condition |
| --- | --- | --- | --- | --- |
| `VLLM_CACHE_ROOT` | Directory for vLLM compilation cache artifacts | — | `string` | Set when a volume mount has `useAsCompilationCache: true` |
| `VLLM_NIXL_SIDE_CHANNEL_HOST` | Host IP for the NIXL side channel in multiprocessing mode | Pod IP | `string` | Multinode mp backend only (Downward API: `status.podIP`) |
2528

2529
### TensorRT-LLM Backend
2530

2531
2532
2533
| Variable | Purpose | Default | Type | Condition |
| --- | --- | --- | --- | --- |
| `OMPI_MCA_orte_keep_fqdn_hostnames` | Instructs OpenMPI to preserve FQDN hostnames for inter-node communication | `1` | `string` | Multinode deployments only |
2534

2535
2536
2537
2538
2539
2540
## Service Accounts

The following component types automatically receive dedicated service accounts:

- **Planner**: `planner-serviceaccount`
- **EPP**: `epp-serviceaccount`
2541
2542
2543
2544
2545
2546
2547
2548
2549
2550
2551
2552
2553
2554
2555
2556
2557
2558
2559
2560
2561
2562
2563
2564
2565
2566
2567
2568
2569
2570
2571
2572
2573
2574
2575
2576

## Image Pull Secrets

The operator automatically discovers and injects image pull secrets for container images. When a component specifies a container image, the operator:

1. Scans all Kubernetes secrets of type `kubernetes.io/dockerconfigjson` in the component's namespace
2. Extracts the docker registry server URLs from each secret's authentication configuration
3. Matches the container image's registry host against the discovered registry URLs
4. Automatically injects matching secrets as `imagePullSecrets` in the pod specification

This eliminates the need to manually specify image pull secrets for each component. The operator maintains an internal index of docker secrets and their associated registries, refreshing this index periodically.

**To disable automatic image pull secret discovery** for a specific component, add the following annotation:

```yaml
annotations:
  nvidia.com/disable-image-pull-secret-discovery: "true"
```

## Autoscaling Defaults

When autoscaling is enabled but no metrics are specified, the operator applies:

- **Default Metric**: CPU utilization
- **Target Average Utilization**: `80%`

## Port Configurations

Default container ports are configured based on component type:

### Frontend Components
- **Port**: 8000
- **Protocol**: TCP
- **Name**: `http`

### Worker Components
2577
- **Port**: 9090 (system)
2578
2579
- **Protocol**: TCP
- **Name**: `system`
2580
2581
2582
- **Port**: 19090 (NIXL)
- **Protocol**: TCP
- **Name**: `nixl`
2583
2584
2585
2586
2587
2588

### Planner Components
- **Port**: 9085
- **Protocol**: TCP
- **Name**: `metrics`

2589
2590
2591
2592
2593
2594
2595
2596
2597
2598
2599
### EPP Components
- **Port**: 9002 (gRPC)
- **Protocol**: TCP
- **Name**: `grpc`
- **Port**: 9003 (gRPC health)
- **Protocol**: TCP
- **Name**: `grpc-health`
- **Port**: 9090 (metrics)
- **Protocol**: TCP
- **Name**: `metrics`

2600
2601
2602
2603
2604
2605
2606
2607
2608
2609
2610
2611
2612
2613
2614
2615
2616
2617
2618
## Backend-Specific Configurations

### VLLM
- **Ray Head Port**: 6379 (for Ray cluster coordination in multinode TP/PP deployments)
- **Data Parallel RPC Port**: 13445 (for data parallel multinode deployments)

### SGLang
- **Distribution Init Port**: 29500 (for multinode deployments)

### TensorRT-LLM
- **SSH Port**: 2222 (for multinode MPI communication)
- **OpenMPI Environment**: `OMPI_MCA_orte_keep_fqdn_hostnames=1`

## Implementation Reference

For users who want to understand the implementation details or contribute to the operator, the default values described in this document are set in the following source files:

- **Health Probes, Security Context & Pod Specifications**: [`internal/dynamo/graph.go`](https://github.com/ai-dynamo/dynamo/blob/main/deploy/operator/internal/dynamo/graph.go) - Contains the main logic for applying default probes, security context, environment variables, shared memory, and pod configurations
- **Component-Specific Defaults**:
2619
  - [`internal/dynamo/component_common.go`](https://github.com/ai-dynamo/dynamo/blob/main/deploy/operator/internal/dynamo/component_common.go) - Base container and pod spec shared by all component types
2620
2621
2622
  - [`internal/dynamo/component_frontend.go`](https://github.com/ai-dynamo/dynamo/blob/main/deploy/operator/internal/dynamo/component_frontend.go)
  - [`internal/dynamo/component_worker.go`](https://github.com/ai-dynamo/dynamo/blob/main/deploy/operator/internal/dynamo/component_worker.go)
  - [`internal/dynamo/component_planner.go`](https://github.com/ai-dynamo/dynamo/blob/main/deploy/operator/internal/dynamo/component_planner.go)
2623
  - [`internal/dynamo/component_epp.go`](https://github.com/ai-dynamo/dynamo/blob/main/deploy/operator/internal/dynamo/component_epp.go)
2624
2625
2626
2627
2628
- **Image Pull Secrets**: [`internal/secrets/docker.go`](https://github.com/ai-dynamo/dynamo/blob/main/deploy/operator/internal/secrets/docker.go) - Implements the docker secret indexer and automatic discovery
- **Backend-Specific Behavior**:
  - [`internal/dynamo/backend_vllm.go`](https://github.com/ai-dynamo/dynamo/blob/main/deploy/operator/internal/dynamo/backend_vllm.go)
  - [`internal/dynamo/backend_sglang.go`](https://github.com/ai-dynamo/dynamo/blob/main/deploy/operator/internal/dynamo/backend_sglang.go)
  - [`internal/dynamo/backend_trtllm.go`](https://github.com/ai-dynamo/dynamo/blob/main/deploy/operator/internal/dynamo/backend_trtllm.go)
2629
2630
2631
2632
- **Checkpoint / Restore**:
  - [`internal/checkpoint/podspec.go`](https://github.com/ai-dynamo/dynamo/blob/main/deploy/operator/internal/checkpoint/podspec.go) - Checkpoint env var injection and volume setup
  - [`internal/checkpoint/resolve.go`](https://github.com/ai-dynamo/dynamo/blob/main/deploy/operator/internal/checkpoint/resolve.go) - Checkpoint resolution logic
  - [`internal/checkpoint/resource.go`](https://github.com/ai-dynamo/dynamo/blob/main/deploy/operator/internal/checkpoint/resource.go) - Checkpoint resource management
2633
2634
2635
2636
2637
2638
2639
2640
2641
- **Constants & Annotations**: [`internal/consts/consts.go`](https://github.com/ai-dynamo/dynamo/blob/main/deploy/operator/internal/consts/consts.go) - Defines annotation keys and other constants

## Notes

- All these defaults can be overridden by explicitly specifying values in your DynamoComponentDeployment or DynamoGraphDeployment resources
- User-specified probes (via `livenessProbe`, `readinessProbe`, or `startupProbe` fields) take precedence over operator defaults
- For security context, if you provide *any* `securityContext` in `extraPodSpec`, no defaults will be injected, giving you full control
- For multinode deployments, some defaults are modified or removed as described above to accommodate distributed execution patterns
- The `extraPodSpec.mainContainer` field can be used to override probe configurations set by the operator