| `extraPodSpec` _[ExtraPodSpec](#extrapodspec)_ | ExtraPodSpec allows to override the main pod spec configuration.<br/>It is a k8s standard PodSpec. It also contains a MainContainer (standard k8s Container) field<br/>that allows overriding the main container configuration. | | |
| `livenessProbe` _[Probe](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#probe-v1-core)_ | LivenessProbe to detect and restart unhealthy containers. | | |
| `readinessProbe` _[Probe](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#probe-v1-core)_ | ReadinessProbe to signal when the container is ready to receive traffic. | | |
| `replicas` _integer_ | Replicas is the desired number of Pods for this component.<br/>When scalingAdapter is enabled (default), this field is managed by the<br/>DynamoGraphDeploymentScalingAdapter and should not be modified directly. | | Minimum: 0 <br/> |
| `replicas` _integer_ | Replicas is the desired number of Pods for this component.<br/>When scalingAdapter is enabled, this field is managed by the<br/>DynamoGraphDeploymentScalingAdapter and should not be modified directly. | | Minimum: 0 <br/> |
| `multinode` _[MultinodeSpec](#multinodespec)_ | Multinode is the configuration for multinode components. | | |
| `scalingAdapter` _[ScalingAdapter](#scalingadapter)_ | ScalingAdapter configures whether this service uses the DynamoGraphDeploymentScalingAdapter.<br/>When enabled (default), replicas are managed via DGDSA and external autoscalers can scale<br/>the service using the Scale subresource. When disabled, replicas can be modified directly. | | |
| `scalingAdapter` _[ScalingAdapter](#scalingadapter)_ | ScalingAdapter configures whether this service uses the DynamoGraphDeploymentScalingAdapter.<br/>When enabled, replicas are managed via DGDSA and external autoscalers can scale<br/>the service using the Scale subresource. When disabled, replicas can be modified directly. | | |
#### DynamoComponentDeploymentSpec
...
...
@@ -237,9 +237,9 @@ _Appears in:_
| `extraPodSpec` _[ExtraPodSpec](#extrapodspec)_ | ExtraPodSpec allows to override the main pod spec configuration.<br/>It is a k8s standard PodSpec. It also contains a MainContainer (standard k8s Container) field<br/>that allows overriding the main container configuration. | | |
| `livenessProbe` _[Probe](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#probe-v1-core)_ | LivenessProbe to detect and restart unhealthy containers. | | |
| `readinessProbe` _[Probe](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#probe-v1-core)_ | ReadinessProbe to signal when the container is ready to receive traffic. | | |
| `replicas` _integer_ | Replicas is the desired number of Pods for this component.<br/>When scalingAdapter is enabled (default), this field is managed by the<br/>DynamoGraphDeploymentScalingAdapter and should not be modified directly. | | Minimum: 0 <br/> |
| `replicas` _integer_ | Replicas is the desired number of Pods for this component.<br/>When scalingAdapter is enabled, this field is managed by the<br/>DynamoGraphDeploymentScalingAdapter and should not be modified directly. | | Minimum: 0 <br/> |
| `multinode` _[MultinodeSpec](#multinodespec)_ | Multinode is the configuration for multinode components. | | |
| `scalingAdapter` _[ScalingAdapter](#scalingadapter)_ | ScalingAdapter configures whether this service uses the DynamoGraphDeploymentScalingAdapter.<br/>When enabled (default), replicas are managed via DGDSA and external autoscalers can scale<br/>the service using the Scale subresource. When disabled, replicas can be modified directly. | | |
| `scalingAdapter` _[ScalingAdapter](#scalingadapter)_ | ScalingAdapter configures whether this service uses the DynamoGraphDeploymentScalingAdapter.<br/>When enabled, replicas are managed via DGDSA and external autoscalers can scale<br/>the service using the Scale subresource. When disabled, replicas can be modified directly. | | |
#### DynamoGraphDeployment
...
...
@@ -747,7 +747,7 @@ _Appears in:_
ScalingAdapter configures whether a service uses the DynamoGraphDeploymentScalingAdapter
for replica management. When enabled (default), the DGDSA owns the replicas field and
for replica management. When enabled, the DGDSA owns the replicas field and
external autoscalers (HPA, KEDA, Planner) can control scaling via the Scale subresource.
...
...
@@ -758,7 +758,7 @@ _Appears in:_
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `disable` _boolean_ | Disable indicates whether the ScalingAdapter should be disabled for this service.<br/>When false (default), a DGDSA is created and owns the replicas field.<br/>When true, no DGDSA is created and replicas can be modified directly in the DGD. | false | |
| `enabled` _boolean_ | Enabled indicates whether the ScalingAdapter should be enabled for this service.<br/>When true, a DGDSA is created and owns the replicas field.<br/>When false (default), no DGDSA is created and replicas can be modified directly in the DGD. | false | |
# use 'kubectl scale dgdsa/sglang-agg-decode --replicas=3' or update the DynamoGraphDeploymentScalingAdapter instead
```
## Disabling DGDSA for a Service
## Enabling DGDSA for a Service
If you want to manage replicas directly in the DGD (without autoscaling), you can disable the scaling adapter per service:
By default, no DGDSA is created for services, allowing direct replica management via the DGD. To enable autoscaling via HPA, KEDA, or Planner, explicitly enable the scaling adapter:
```yaml
apiVersion:nvidia.com/v1alpha1
...
...
@@ -127,24 +127,24 @@ metadata:
spec:
services:
Frontend:
replicas:2
scalingAdapter:
disable:true# ← No DGDSA created, direct edits allowed
replicas:2# ← No DGDSA by default, direct edits allowed
decode:
replicas:1# ← DGDSA created by default, managed via adapter
replicas:1
scalingAdapter:
enabled:true# ← DGDSA created, managed via adapter
```
**When to disable DGDSA:**
- You want simple, manual replica management
- You don't need autoscaling for that service
- You prefer direct DGD edits over adapter-based scaling
**When to keep DGDSA enabled (default):**
**When to enable DGDSA:**
- You want to use HPA, KEDA, or Planner for autoscaling
- You want a clear separation between "desired scale" (adapter) and "deployment config" (DGD)
- You want protection against accidental direct replica edits
**When to keep DGDSA disabled (default):**
- You want simple, manual replica management
- You don't need autoscaling for that service
- You prefer direct DGD edits over adapter-based scaling
## Autoscaling with Dynamo Planner
The Dynamo Planner is an LLM-aware autoscaler that optimizes scaling decisions based on inference-specific metrics like Time To First Token (TTFT), Inter-Token Latency (ITL), and KV cache utilization.
...
...
@@ -612,15 +612,14 @@ If you've disabled the scaling adapter for a service, edit the DGD directly: