Unverified Commit 110cd568 authored by Thomas Montfort's avatar Thomas Montfort Committed by GitHub
Browse files

feat(operator): have Scaling Adapter disabled by default (#5180)

parent a8213787
......@@ -10191,7 +10191,7 @@ spec:
replicas:
description: |-
Replicas is the desired number of Pods for this component.
When scalingAdapter is enabled (default), this field is managed by the
When scalingAdapter is enabled, this field is managed by the
DynamoGraphDeploymentScalingAdapter and should not be modified directly.
format: int32
minimum: 0
......@@ -10276,15 +10276,15 @@ spec:
scalingAdapter:
description: |-
ScalingAdapter configures whether this service uses the DynamoGraphDeploymentScalingAdapter.
When enabled (default), replicas are managed via DGDSA and external autoscalers can scale
When enabled, replicas are managed via DGDSA and external autoscalers can scale
the service using the Scale subresource. When disabled, replicas can be modified directly.
properties:
disable:
enabled:
default: false
description: |-
Disable indicates whether the ScalingAdapter should be disabled for this service.
When false (default), a DGDSA is created and owns the replicas field.
When true, no DGDSA is created and replicas can be modified directly in the DGD.
Enabled indicates whether the ScalingAdapter should be enabled for this service.
When true, a DGDSA is created and owns the replicas field.
When false (default), no DGDSA is created and replicas can be modified directly in the DGD.
type: boolean
type: object
serviceName:
......
......@@ -10326,7 +10326,7 @@ spec:
replicas:
description: |-
Replicas is the desired number of Pods for this component.
When scalingAdapter is enabled (default), this field is managed by the
When scalingAdapter is enabled, this field is managed by the
DynamoGraphDeploymentScalingAdapter and should not be modified directly.
format: int32
minimum: 0
......@@ -10411,15 +10411,15 @@ spec:
scalingAdapter:
description: |-
ScalingAdapter configures whether this service uses the DynamoGraphDeploymentScalingAdapter.
When enabled (default), replicas are managed via DGDSA and external autoscalers can scale
When enabled, replicas are managed via DGDSA and external autoscalers can scale
the service using the Scale subresource. When disabled, replicas can be modified directly.
properties:
disable:
enabled:
default: false
description: |-
Disable indicates whether the ScalingAdapter should be disabled for this service.
When false (default), a DGDSA is created and owns the replicas field.
When true, no DGDSA is created and replicas can be modified directly in the DGD.
Enabled indicates whether the ScalingAdapter should be enabled for this service.
When true, a DGDSA is created and owns the replicas field.
When false (default), no DGDSA is created and replicas can be modified directly in the DGD.
type: boolean
type: object
serviceName:
......
......@@ -125,13 +125,13 @@ type ExtraPodSpec struct {
}
// ScalingAdapter configures whether a service uses the DynamoGraphDeploymentScalingAdapter
// for replica management. When enabled (default), the DGDSA owns the replicas field and
// for replica management. When enabled, the DGDSA owns the replicas field and
// external autoscalers (HPA, KEDA, Planner) can control scaling via the Scale subresource.
type ScalingAdapter struct {
// Disable indicates whether the ScalingAdapter should be disabled for this service.
// When false (default), a DGDSA is created and owns the replicas field.
// When true, no DGDSA is created and replicas can be modified directly in the DGD.
// Enabled indicates whether the ScalingAdapter should be enabled for this service.
// When true, a DGDSA is created and owns the replicas field.
// When false (default), no DGDSA is created and replicas can be modified directly in the DGD.
// +optional
// +kubebuilder:default=false
Disable bool `json:"disable,omitempty"`
Enabled bool `json:"enabled,omitempty"`
}
......@@ -111,14 +111,14 @@ type DynamoComponentDeploymentSharedSpec struct {
// ReadinessProbe to signal when the container is ready to receive traffic.
ReadinessProbe *corev1.Probe `json:"readinessProbe,omitempty"`
// Replicas is the desired number of Pods for this component.
// When scalingAdapter is enabled (default), this field is managed by the
// When scalingAdapter is enabled, this field is managed by the
// DynamoGraphDeploymentScalingAdapter and should not be modified directly.
// +kubebuilder:validation:Minimum=0
Replicas *int32 `json:"replicas,omitempty"`
// Multinode is the configuration for multinode components.
Multinode *MultinodeSpec `json:"multinode,omitempty"`
// ScalingAdapter configures whether this service uses the DynamoGraphDeploymentScalingAdapter.
// When enabled (default), replicas are managed via DGDSA and external autoscalers can scale
// When enabled, replicas are managed via DGDSA and external autoscalers can scale
// the service using the Scale subresource. When disabled, replicas can be modified directly.
// +optional
ScalingAdapter *ScalingAdapter `json:"scalingAdapter,omitempty"`
......
......@@ -10191,7 +10191,7 @@ spec:
replicas:
description: |-
Replicas is the desired number of Pods for this component.
When scalingAdapter is enabled (default), this field is managed by the
When scalingAdapter is enabled, this field is managed by the
DynamoGraphDeploymentScalingAdapter and should not be modified directly.
format: int32
minimum: 0
......@@ -10276,15 +10276,15 @@ spec:
scalingAdapter:
description: |-
ScalingAdapter configures whether this service uses the DynamoGraphDeploymentScalingAdapter.
When enabled (default), replicas are managed via DGDSA and external autoscalers can scale
When enabled, replicas are managed via DGDSA and external autoscalers can scale
the service using the Scale subresource. When disabled, replicas can be modified directly.
properties:
disable:
enabled:
default: false
description: |-
Disable indicates whether the ScalingAdapter should be disabled for this service.
When false (default), a DGDSA is created and owns the replicas field.
When true, no DGDSA is created and replicas can be modified directly in the DGD.
Enabled indicates whether the ScalingAdapter should be enabled for this service.
When true, a DGDSA is created and owns the replicas field.
When false (default), no DGDSA is created and replicas can be modified directly in the DGD.
type: boolean
type: object
serviceName:
......
......@@ -10326,7 +10326,7 @@ spec:
replicas:
description: |-
Replicas is the desired number of Pods for this component.
When scalingAdapter is enabled (default), this field is managed by the
When scalingAdapter is enabled, this field is managed by the
DynamoGraphDeploymentScalingAdapter and should not be modified directly.
format: int32
minimum: 0
......@@ -10411,15 +10411,15 @@ spec:
scalingAdapter:
description: |-
ScalingAdapter configures whether this service uses the DynamoGraphDeploymentScalingAdapter.
When enabled (default), replicas are managed via DGDSA and external autoscalers can scale
When enabled, replicas are managed via DGDSA and external autoscalers can scale
the service using the Scale subresource. When disabled, replicas can be modified directly.
properties:
disable:
enabled:
default: false
description: |-
Disable indicates whether the ScalingAdapter should be disabled for this service.
When false (default), a DGDSA is created and owns the replicas field.
When true, no DGDSA is created and replicas can be modified directly in the DGD.
Enabled indicates whether the ScalingAdapter should be enabled for this service.
When true, a DGDSA is created and owns the replicas field.
When false (default), no DGDSA is created and replicas can be modified directly in the DGD.
type: boolean
type: object
serviceName:
......
......@@ -684,15 +684,15 @@ func (r *DynamoGraphDeploymentReconciler) reconcilePVCs(ctx context.Context, dyn
}
// reconcileScalingAdapters ensures a DynamoGraphDeploymentScalingAdapter exists for each service in the DGD
// that has scaling adapter enabled (default). Services with scalingAdapter.disable=true will not have a DGDSA.
// that has scaling adapter explicitly enabled. Services without scalingAdapter.enabled=true will not have a DGDSA.
// This enables pluggable autoscaling via HPA, KEDA, or Planner.
func (r *DynamoGraphDeploymentReconciler) reconcileScalingAdapters(ctx context.Context, dynamoDeployment *nvidiacomv1alpha1.DynamoGraphDeployment) error {
logger := log.FromContext(ctx)
// Process each service - SyncResource handles create, update, and delete via toDelete flag
for serviceName, component := range dynamoDeployment.Spec.Services {
// Check if scaling adapter is disabled for this service
scalingAdapterDisabled := component.ScalingAdapter != nil && component.ScalingAdapter.Disable
// Check if scaling adapter is enabled for this service (disabled by default)
scalingAdapterEnabled := component.ScalingAdapter != nil && component.ScalingAdapter.Enabled
// Get current replicas (default to 1 if not set)
currentReplicas := int32(1)
......@@ -721,8 +721,8 @@ func (r *DynamoGraphDeploymentReconciler) reconcileScalingAdapters(ctx context.C
},
},
}
// Return toDelete=true if scaling adapter is disabled
return adapter, scalingAdapterDisabled, nil
// Return toDelete=true if scaling adapter is not enabled
return adapter, !scalingAdapterEnabled, nil
})
if err != nil {
......
......@@ -54,7 +54,7 @@ func TestDynamoGraphDeploymentReconciler_reconcileScalingAdapters(t *testing.T)
expectDeleted []string // adapter names that should be deleted
}{
{
name: "creates adapters for all services",
name: "creates adapters for services with scalingAdapter.enabled=true",
dgd: &v1alpha1.DynamoGraphDeployment{
ObjectMeta: metav1.ObjectMeta{
Name: "test-dgd",
......@@ -64,9 +64,15 @@ func TestDynamoGraphDeploymentReconciler_reconcileScalingAdapters(t *testing.T)
Services: map[string]*v1alpha1.DynamoComponentDeploymentSharedSpec{
"Frontend": {
Replicas: ptr.To(int32(2)),
ScalingAdapter: &v1alpha1.ScalingAdapter{
Enabled: true,
},
},
"decode": {
Replicas: ptr.To(int32(3)),
ScalingAdapter: &v1alpha1.ScalingAdapter{
Enabled: true,
},
},
},
},
......@@ -86,7 +92,11 @@ func TestDynamoGraphDeploymentReconciler_reconcileScalingAdapters(t *testing.T)
},
Spec: v1alpha1.DynamoGraphDeploymentSpec{
Services: map[string]*v1alpha1.DynamoComponentDeploymentSharedSpec{
"worker": {},
"worker": {
ScalingAdapter: &v1alpha1.ScalingAdapter{
Enabled: true,
},
},
},
},
},
......@@ -96,7 +106,7 @@ func TestDynamoGraphDeploymentReconciler_reconcileScalingAdapters(t *testing.T)
},
},
{
name: "skips adapter creation when disabled",
name: "skips adapter creation when not enabled",
dgd: &v1alpha1.DynamoGraphDeployment{
ObjectMeta: metav1.ObjectMeta{
Name: "test-dgd",
......@@ -106,12 +116,13 @@ func TestDynamoGraphDeploymentReconciler_reconcileScalingAdapters(t *testing.T)
Services: map[string]*v1alpha1.DynamoComponentDeploymentSharedSpec{
"Frontend": {
Replicas: ptr.To(int32(2)),
ScalingAdapter: &v1alpha1.ScalingAdapter{
Enabled: true,
},
},
"decode": {
Replicas: ptr.To(int32(3)),
ScalingAdapter: &v1alpha1.ScalingAdapter{
Disable: true,
},
// No ScalingAdapter or Enabled=false means no adapter created
},
},
},
......@@ -133,6 +144,9 @@ func TestDynamoGraphDeploymentReconciler_reconcileScalingAdapters(t *testing.T)
Services: map[string]*v1alpha1.DynamoComponentDeploymentSharedSpec{
"Frontend": {
Replicas: ptr.To(int32(2)),
ScalingAdapter: &v1alpha1.ScalingAdapter{
Enabled: true,
},
},
},
},
......@@ -194,7 +208,7 @@ func TestDynamoGraphDeploymentReconciler_reconcileScalingAdapters(t *testing.T)
expectDeleted: []string{"test-dgd-removed"},
},
{
name: "deletes adapter when scalingAdapter.disable is set to true",
name: "deletes adapter when scalingAdapter.enabled is not set",
dgd: &v1alpha1.DynamoGraphDeployment{
ObjectMeta: metav1.ObjectMeta{
Name: "test-dgd",
......@@ -205,9 +219,7 @@ func TestDynamoGraphDeploymentReconciler_reconcileScalingAdapters(t *testing.T)
Services: map[string]*v1alpha1.DynamoComponentDeploymentSharedSpec{
"Frontend": {
Replicas: ptr.To(int32(2)),
ScalingAdapter: &v1alpha1.ScalingAdapter{
Disable: true,
},
// No ScalingAdapter means adapter should be deleted
},
},
},
......@@ -253,6 +265,9 @@ func TestDynamoGraphDeploymentReconciler_reconcileScalingAdapters(t *testing.T)
Services: map[string]*v1alpha1.DynamoComponentDeploymentSharedSpec{
"MyService": {
Replicas: ptr.To(int32(1)),
ScalingAdapter: &v1alpha1.ScalingAdapter{
Enabled: true,
},
},
},
},
......
......@@ -110,14 +110,11 @@ func (v *DynamoGraphDeploymentValidator) validateReplicasChanges(old *nvidiacomv
var errs []error
for serviceName, newService := range v.deployment.Spec.Services {
// Check if scaling adapter is enabled for this service (enabled by default)
scalingAdapterEnabled := true
if newService.ScalingAdapter != nil && newService.ScalingAdapter.Disable {
scalingAdapterEnabled = false
}
// Check if scaling adapter is enabled for this service (disabled by default)
scalingAdapterEnabled := newService.ScalingAdapter != nil && newService.ScalingAdapter.Enabled
if !scalingAdapterEnabled {
// Scaling adapter is disabled, users can modify replicas directly
// Scaling adapter is not enabled, users can modify replicas directly
continue
}
......
......@@ -199,9 +199,9 @@ _Appears in:_
| `extraPodSpec` _[ExtraPodSpec](#extrapodspec)_ | ExtraPodSpec allows to override the main pod spec configuration.<br />It is a k8s standard PodSpec. It also contains a MainContainer (standard k8s Container) field<br />that allows overriding the main container configuration. | | |
| `livenessProbe` _[Probe](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#probe-v1-core)_ | LivenessProbe to detect and restart unhealthy containers. | | |
| `readinessProbe` _[Probe](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#probe-v1-core)_ | ReadinessProbe to signal when the container is ready to receive traffic. | | |
| `replicas` _integer_ | Replicas is the desired number of Pods for this component.<br />When scalingAdapter is enabled (default), this field is managed by the<br />DynamoGraphDeploymentScalingAdapter and should not be modified directly. | | Minimum: 0 <br /> |
| `replicas` _integer_ | Replicas is the desired number of Pods for this component.<br />When scalingAdapter is enabled, this field is managed by the<br />DynamoGraphDeploymentScalingAdapter and should not be modified directly. | | Minimum: 0 <br /> |
| `multinode` _[MultinodeSpec](#multinodespec)_ | Multinode is the configuration for multinode components. | | |
| `scalingAdapter` _[ScalingAdapter](#scalingadapter)_ | ScalingAdapter configures whether this service uses the DynamoGraphDeploymentScalingAdapter.<br />When enabled (default), replicas are managed via DGDSA and external autoscalers can scale<br />the service using the Scale subresource. When disabled, replicas can be modified directly. | | |
| `scalingAdapter` _[ScalingAdapter](#scalingadapter)_ | ScalingAdapter configures whether this service uses the DynamoGraphDeploymentScalingAdapter.<br />When enabled, replicas are managed via DGDSA and external autoscalers can scale<br />the service using the Scale subresource. When disabled, replicas can be modified directly. | | |
#### DynamoComponentDeploymentSpec
......@@ -237,9 +237,9 @@ _Appears in:_
| `extraPodSpec` _[ExtraPodSpec](#extrapodspec)_ | ExtraPodSpec allows to override the main pod spec configuration.<br />It is a k8s standard PodSpec. It also contains a MainContainer (standard k8s Container) field<br />that allows overriding the main container configuration. | | |
| `livenessProbe` _[Probe](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#probe-v1-core)_ | LivenessProbe to detect and restart unhealthy containers. | | |
| `readinessProbe` _[Probe](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#probe-v1-core)_ | ReadinessProbe to signal when the container is ready to receive traffic. | | |
| `replicas` _integer_ | Replicas is the desired number of Pods for this component.<br />When scalingAdapter is enabled (default), this field is managed by the<br />DynamoGraphDeploymentScalingAdapter and should not be modified directly. | | Minimum: 0 <br /> |
| `replicas` _integer_ | Replicas is the desired number of Pods for this component.<br />When scalingAdapter is enabled, this field is managed by the<br />DynamoGraphDeploymentScalingAdapter and should not be modified directly. | | Minimum: 0 <br /> |
| `multinode` _[MultinodeSpec](#multinodespec)_ | Multinode is the configuration for multinode components. | | |
| `scalingAdapter` _[ScalingAdapter](#scalingadapter)_ | ScalingAdapter configures whether this service uses the DynamoGraphDeploymentScalingAdapter.<br />When enabled (default), replicas are managed via DGDSA and external autoscalers can scale<br />the service using the Scale subresource. When disabled, replicas can be modified directly. | | |
| `scalingAdapter` _[ScalingAdapter](#scalingadapter)_ | ScalingAdapter configures whether this service uses the DynamoGraphDeploymentScalingAdapter.<br />When enabled, replicas are managed via DGDSA and external autoscalers can scale<br />the service using the Scale subresource. When disabled, replicas can be modified directly. | | |
#### DynamoGraphDeployment
......@@ -747,7 +747,7 @@ _Appears in:_
ScalingAdapter configures whether a service uses the DynamoGraphDeploymentScalingAdapter
for replica management. When enabled (default), the DGDSA owns the replicas field and
for replica management. When enabled, the DGDSA owns the replicas field and
external autoscalers (HPA, KEDA, Planner) can control scaling via the Scale subresource.
......@@ -758,7 +758,7 @@ _Appears in:_
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `disable` _boolean_ | Disable indicates whether the ScalingAdapter should be disabled for this service.<br />When false (default), a DGDSA is created and owns the replicas field.<br />When true, no DGDSA is created and replicas can be modified directly in the DGD. | false | |
| `enabled` _boolean_ | Enabled indicates whether the ScalingAdapter should be enabled for this service.<br />When true, a DGDSA is created and owns the replicas field.<br />When false (default), no DGDSA is created and replicas can be modified directly in the DGD. | false | |
#### ServiceReplicaStatus
......
......@@ -115,9 +115,9 @@ kubectl patch dgd sglang-agg --type=merge -p '{"spec":{"services":{"decode":{"re
# use 'kubectl scale dgdsa/sglang-agg-decode --replicas=3' or update the DynamoGraphDeploymentScalingAdapter instead
```
## Disabling DGDSA for a Service
## Enabling DGDSA for a Service
If you want to manage replicas directly in the DGD (without autoscaling), you can disable the scaling adapter per service:
By default, no DGDSA is created for services, allowing direct replica management via the DGD. To enable autoscaling via HPA, KEDA, or Planner, explicitly enable the scaling adapter:
```yaml
apiVersion: nvidia.com/v1alpha1
......@@ -127,24 +127,24 @@ metadata:
spec:
services:
Frontend:
replicas: 2
scalingAdapter:
disable: true # ← No DGDSA created, direct edits allowed
replicas: 2 # ← No DGDSA by default, direct edits allowed
decode:
replicas: 1 # ← DGDSA created by default, managed via adapter
replicas: 1
scalingAdapter:
enabled: true # ← DGDSA created, managed via adapter
```
**When to disable DGDSA:**
- You want simple, manual replica management
- You don't need autoscaling for that service
- You prefer direct DGD edits over adapter-based scaling
**When to keep DGDSA enabled (default):**
**When to enable DGDSA:**
- You want to use HPA, KEDA, or Planner for autoscaling
- You want a clear separation between "desired scale" (adapter) and "deployment config" (DGD)
- You want protection against accidental direct replica edits
**When to keep DGDSA disabled (default):**
- You want simple, manual replica management
- You don't need autoscaling for that service
- You prefer direct DGD edits over adapter-based scaling
## Autoscaling with Dynamo Planner
The Dynamo Planner is an LLM-aware autoscaler that optimizes scaling decisions based on inference-specific metrics like Time To First Token (TTFT), Inter-Token Latency (ITL), and KV cache utilization.
......@@ -612,15 +612,14 @@ If you've disabled the scaling adapter for a service, edit the DGD directly:
kubectl patch dgd sglang-agg --type=merge -p '{"spec":{"services":{"decode":{"replicas":3}}}}'
```
Or edit the YAML:
Or edit the YAML (no `scalingAdapter.enabled: true` means direct edits are allowed):
```yaml
spec:
services:
decode:
replicas: 3
scalingAdapter:
disable: true
# No scalingAdapter.enabled means replicas can be edited directly
```
## Best Practices
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment