"...ssh:/git@developer.sourcefind.cn:2222/OpenDAS/dynamo.git" did not exist on "af3d8aa08957bbcf4a07f2a79cce7631cfe25a7e"
Unverified Commit eaf11e70 authored by hhzhang16's avatar hhzhang16 Committed by GitHub
Browse files

feat(operator): Refactor DGDR to use profiler's native configuration format (#3758)


Signed-off-by: default avatarHannah Zhang <hannahz@nvidia.com>
parent 7b2f95e4
...@@ -36,7 +36,7 @@ spec: ...@@ -36,7 +36,7 @@ spec:
- jsonPath: .spec.modelName - jsonPath: .spec.modelName
name: Model name: Model
type: string type: string
- jsonPath: .spec.backend - jsonPath: .status.backend
name: Backend name: Backend
type: string type: string
- jsonPath: .status.state - jsonPath: .status.state
...@@ -94,16 +94,6 @@ spec: ...@@ -94,16 +94,6 @@ spec:
after profiling completes. If false, only the spec is generated and stored in status. after profiling completes. If false, only the spec is generated and stored in status.
Users can then manually create a DGD using the generated spec. Users can then manually create a DGD using the generated spec.
type: boolean type: boolean
backend:
default: trtllm
description: |-
Backend specifies the inference backend framework to use.
Supported values are: "vllm", "sglang", "trtllm".
enum:
- vllm
- sglang
- trtllm
type: string
deploymentOverrides: deploymentOverrides:
description: |- description: |-
DeploymentOverrides allows customizing metadata for the auto-created DGD. DeploymentOverrides allows customizing metadata for the auto-created DGD.
...@@ -132,53 +122,29 @@ spec: ...@@ -132,53 +122,29 @@ spec:
If not specified, defaults to the DGDR namespace. If not specified, defaults to the DGDR namespace.
type: string type: string
type: object type: object
gpu:
description: |-
GPU defines optional GPU type and resource specifications.
These constraints guide the profiler to find configurations within specified bounds.
properties:
maxNumGPUsPerEngine:
default: 8
description: |-
MaxNumGPUsPerEngine specifies the maximum number of GPUs per engine for profiling.
The profiler will not consider configurations with more GPUs than this value.
minimum: 1
type: integer
minNumGPUsPerEngine:
default: 1
description: |-
MinNumGPUsPerEngine specifies the minimum number of GPUs per engine for profiling.
The profiler will not consider configurations with fewer GPUs than this value.
minimum: 1
type: integer
type:
description: |-
Type specifies the GPU type to target (e.g., "h200", "h100", "a100").
If specified, profiling will focus on configurations optimized for this GPU type.
type: string
type: object
modelName: modelName:
description: |- description: |-
ModelName specifies the model to deploy (e.g., "meta/llama3-70b"). ModelName specifies the model to deploy (e.g., "Qwen/Qwen3-0.6B", "meta-llama/Llama-3-70b").
This should be a valid model identifier that the profiler can resolve. This is a high-level identifier for easy reference in kubectl output and logs.
type: string type: string
online:
default: false
description: |-
Online indicates whether to use online profiler (true) or AI Configurator (false).
Online profiling uses real deployments for accurate measurements (2-4 hours).
Offline profiling uses AI Configurator for fast simulation-based profiling (20-30 seconds).
type: boolean
profilingConfig: profilingConfig:
description: |- description: |-
ProfilingConfig provides custom configuration for the profiling job. ProfilingConfig provides the complete configuration for the profiling job.
Applicable to both online and offline (AIC) profiling modes. This configuration is passed directly to the profiler.
The structure matches the profile_sla config format exactly (see ProfilingConfigSpec for schema).
The profiler will validate the configuration and report any errors.
properties: properties:
config:
description: |-
Config is the profiling configuration as arbitrary JSON/YAML. This will be passed directly to the profiler.
The profiler will validate the configuration and report any errors.
type: object
x-kubernetes-preserve-unknown-fields: true
configMapRef: configMapRef:
description: |- description: |-
ConfigMapRef is a reference to a ConfigMap containing profiling configuration. ConfigMapRef is an optional reference to a ConfigMap containing the DynamoGraphDeployment
The ConfigMap should contain a key (default: "disagg.yaml") with the configuration file. base config file (disagg.yaml). This is separate from the profiling config above.
This configuration is used by both online and offline (AIC) profiling modes. The path to this config will be set as engine.config in the profiling config.
properties: properties:
key: key:
default: disagg.yaml default: disagg.yaml
...@@ -191,45 +157,18 @@ spec: ...@@ -191,45 +157,18 @@ spec:
- name - name
type: object type: object
type: object type: object
sla:
description: |-
SLA defines the Service Level Agreement profiling targets.
The profiler uses these targets to find an optimal deployment configuration.
properties:
isl:
default: 3000
description: |-
ISL is the Input Sequence Length for profiling.
Defines the length of input sequences to use during profiling tests.
minimum: 1
type: integer
itl:
default: 10
description: |-
ITL is the target Inter-Token Latency in milliseconds.
This represents the maximum time allowed between consecutive tokens in the output.
type: integer
osl:
default: 500
description: |-
OSL is the Output Sequence Length for profiling.
Defines the expected length of output sequences to generate during profiling tests.
minimum: 1
type: integer
ttft:
default: 50
description: |-
TTFT is the target Time To First Token in milliseconds.
This represents the maximum time allowed from request submission to receiving the first token.
type: integer
type: object
required: required:
- modelName - modelName
- sla - profilingConfig
type: object type: object
status: status:
description: Status reflects the current observed state of this deployment request. description: Status reflects the current observed state of this deployment request.
properties: properties:
backend:
description: |-
Backend is extracted from profilingConfig.config.engine.backend for display purposes.
This field is populated by the controller and shown in kubectl output.
type: string
conditions: conditions:
description: |- description: |-
Conditions contains the latest observed conditions of the deployment request. Conditions contains the latest observed conditions of the deployment request.
......
...@@ -24,6 +24,7 @@ a high-level, SLA-driven interface for deploying machine learning models on Dyna ...@@ -24,6 +24,7 @@ a high-level, SLA-driven interface for deploying machine learning models on Dyna
package v1alpha1 package v1alpha1
import ( import (
apiextensionsv1 "k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
runtime "k8s.io/apimachinery/pkg/runtime" runtime "k8s.io/apimachinery/pkg/runtime"
) )
...@@ -31,61 +32,6 @@ import ( ...@@ -31,61 +32,6 @@ import (
// EDIT THIS FILE! THIS IS SCAFFOLDING FOR YOU TO OWN! // EDIT THIS FILE! THIS IS SCAFFOLDING FOR YOU TO OWN!
// NOTE: json tags are required. Any new fields you add must have json tags for the fields to be serialized. // NOTE: json tags are required. Any new fields you add must have json tags for the fields to be serialized.
// SLASpec defines Service Level Agreement targets for model profiling and deployment.
// These targets guide the profiling process to find optimal deployment configurations
// that meet the specified performance requirements.
type SLASpec struct {
// ITL is the target Inter-Token Latency in milliseconds.
// This represents the maximum time allowed between consecutive tokens in the output.
// +kubebuilder:default=10
// +optional
ITL int `json:"itl,omitempty"`
// TTFT is the target Time To First Token in milliseconds.
// This represents the maximum time allowed from request submission to receiving the first token.
// +kubebuilder:default=50
// +optional
TTFT int `json:"ttft,omitempty"`
// ISL is the Input Sequence Length for profiling.
// Defines the length of input sequences to use during profiling tests.
// +kubebuilder:default=3000
// +kubebuilder:validation:Minimum=1
// +optional
ISL int `json:"isl,omitempty"`
// OSL is the Output Sequence Length for profiling.
// Defines the expected length of output sequences to generate during profiling tests.
// +kubebuilder:default=500
// +kubebuilder:validation:Minimum=1
// +optional
OSL int `json:"osl,omitempty"`
}
// GPUSpec defines optional GPU type and resource specifications for profiling and deployment.
// These constraints help narrow down the search space during profiling to find configurations
// that fit within specified hardware bounds.
type GPUSpec struct {
// Type specifies the GPU type to target (e.g., "h200", "h100", "a100").
// If specified, profiling will focus on configurations optimized for this GPU type.
// +kubebuilder:validation:Optional
Type string `json:"type,omitempty"`
// MinNumGPUsPerEngine specifies the minimum number of GPUs per engine for profiling.
// The profiler will not consider configurations with fewer GPUs than this value.
// +kubebuilder:validation:Optional
// +kubebuilder:validation:Minimum=1
// +kubebuilder:default=1
MinNumGPUsPerEngine int `json:"minNumGPUsPerEngine,omitempty"`
// MaxNumGPUsPerEngine specifies the maximum number of GPUs per engine for profiling.
// The profiler will not consider configurations with more GPUs than this value.
// +kubebuilder:validation:Optional
// +kubebuilder:validation:Minimum=1
// +kubebuilder:default=8
MaxNumGPUsPerEngine int `json:"maxNumGPUsPerEngine,omitempty"`
}
// ConfigMapKeySelector selects a specific key from a ConfigMap. // ConfigMapKeySelector selects a specific key from a ConfigMap.
// Used to reference external configuration data stored in ConfigMaps. // Used to reference external configuration data stored in ConfigMaps.
type ConfigMapKeySelector struct { type ConfigMapKeySelector struct {
...@@ -99,11 +45,19 @@ type ConfigMapKeySelector struct { ...@@ -99,11 +45,19 @@ type ConfigMapKeySelector struct {
} }
// ProfilingConfigSpec defines configuration for the profiling process. // ProfilingConfigSpec defines configuration for the profiling process.
// Allows users to provide custom profiling parameters via ConfigMap references. // This structure maps directly to the profile_sla.py config format.
// See benchmarks/profiler/utils/profiler_argparse.py for the complete schema.
type ProfilingConfigSpec struct { type ProfilingConfigSpec struct {
// ConfigMapRef is a reference to a ConfigMap containing profiling configuration. // Config is the profiling configuration as arbitrary JSON/YAML. This will be passed directly to the profiler.
// The ConfigMap should contain a key (default: "disagg.yaml") with the configuration file. // The profiler will validate the configuration and report any errors.
// This configuration is used by both online and offline (AIC) profiling modes. // +kubebuilder:validation:Optional
// +kubebuilder:pruning:PreserveUnknownFields
// +kubebuilder:validation:Type=object
Config *apiextensionsv1.JSON `json:"config,omitempty"`
// ConfigMapRef is an optional reference to a ConfigMap containing the DynamoGraphDeployment
// base config file (disagg.yaml). This is separate from the profiling config above.
// The path to this config will be set as engine.config in the profiling config.
// +kubebuilder:validation:Optional // +kubebuilder:validation:Optional
ConfigMapRef *ConfigMapKeySelector `json:"configMapRef,omitempty"` ConfigMapRef *ConfigMapKeySelector `json:"configMapRef,omitempty"`
} }
...@@ -135,32 +89,17 @@ type DeploymentOverridesSpec struct { ...@@ -135,32 +89,17 @@ type DeploymentOverridesSpec struct {
// This CRD serves as the primary interface for users to request model deployments with // This CRD serves as the primary interface for users to request model deployments with
// specific performance constraints and resource requirements, enabling SLA-driven deployments. // specific performance constraints and resource requirements, enabling SLA-driven deployments.
type DynamoGraphDeploymentRequestSpec struct { type DynamoGraphDeploymentRequestSpec struct {
// ModelName specifies the model to deploy (e.g., "meta/llama3-70b"). // ModelName specifies the model to deploy (e.g., "Qwen/Qwen3-0.6B", "meta-llama/Llama-3-70b").
// This should be a valid model identifier that the profiler can resolve. // This is a high-level identifier for easy reference in kubectl output and logs.
// +kubebuilder:validation:Required // +kubebuilder:validation:Required
ModelName string `json:"modelName"` ModelName string `json:"modelName"`
// Backend specifies the inference backend framework to use. // ProfilingConfig provides the complete configuration for the profiling job.
// Supported values are: "vllm", "sglang", "trtllm". // This configuration is passed directly to the profiler.
// +kubebuilder:validation:Enum=vllm;sglang;trtllm // The structure matches the profile_sla config format exactly (see ProfilingConfigSpec for schema).
// +kubebuilder:default=trtllm // The profiler will validate the configuration and report any errors.
Backend string `json:"backend,omitempty"`
// SLA defines the Service Level Agreement profiling targets.
// The profiler uses these targets to find an optimal deployment configuration.
// +kubebuilder:validation:Required // +kubebuilder:validation:Required
SLA SLASpec `json:"sla"` ProfilingConfig ProfilingConfigSpec `json:"profilingConfig"`
// GPU defines optional GPU type and resource specifications.
// These constraints guide the profiler to find configurations within specified bounds.
// +kubebuilder:validation:Optional
GPU *GPUSpec `json:"gpu,omitempty"`
// Online indicates whether to use online profiler (true) or AI Configurator (false).
// Online profiling uses real deployments for accurate measurements (2-4 hours).
// Offline profiling uses AI Configurator for fast simulation-based profiling (20-30 seconds).
// +kubebuilder:default=false
Online bool `json:"online,omitempty"`
// AutoApply indicates whether to automatically create a DynamoGraphDeployment // AutoApply indicates whether to automatically create a DynamoGraphDeployment
// after profiling completes. If false, only the spec is generated and stored in status. // after profiling completes. If false, only the spec is generated and stored in status.
...@@ -172,11 +111,6 @@ type DynamoGraphDeploymentRequestSpec struct { ...@@ -172,11 +111,6 @@ type DynamoGraphDeploymentRequestSpec struct {
// Only applicable when AutoApply is true. // Only applicable when AutoApply is true.
// +kubebuilder:validation:Optional // +kubebuilder:validation:Optional
DeploymentOverrides *DeploymentOverridesSpec `json:"deploymentOverrides,omitempty"` DeploymentOverrides *DeploymentOverridesSpec `json:"deploymentOverrides,omitempty"`
// ProfilingConfig provides custom configuration for the profiling job.
// Applicable to both online and offline (AIC) profiling modes.
// +kubebuilder:validation:Optional
ProfilingConfig *ProfilingConfigSpec `json:"profilingConfig,omitempty"`
} }
// DeploymentStatus tracks the state of an auto-created DynamoGraphDeployment. // DeploymentStatus tracks the state of an auto-created DynamoGraphDeployment.
...@@ -205,6 +139,11 @@ type DynamoGraphDeploymentRequestStatus struct { ...@@ -205,6 +139,11 @@ type DynamoGraphDeploymentRequestStatus struct {
// Empty string ("") represents the initial state before initialization. // Empty string ("") represents the initial state before initialization.
State string `json:"state,omitempty"` State string `json:"state,omitempty"`
// Backend is extracted from profilingConfig.config.engine.backend for display purposes.
// This field is populated by the controller and shown in kubectl output.
// +kubebuilder:validation:Optional
Backend string `json:"backend,omitempty"`
// ObservedGeneration reflects the generation of the most recently observed spec. // ObservedGeneration reflects the generation of the most recently observed spec.
// Used to detect spec changes and enforce immutability after profiling starts. // Used to detect spec changes and enforce immutability after profiling starts.
ObservedGeneration int64 `json:"observedGeneration,omitempty"` ObservedGeneration int64 `json:"observedGeneration,omitempty"`
...@@ -253,7 +192,7 @@ type DynamoGraphDeploymentRequestStatus struct { ...@@ -253,7 +192,7 @@ type DynamoGraphDeploymentRequestStatus struct {
// +kubebuilder:subresource:status // +kubebuilder:subresource:status
// +kubebuilder:resource:shortName=dgdr // +kubebuilder:resource:shortName=dgdr
// +kubebuilder:printcolumn:name="Model",type=string,JSONPath=`.spec.modelName` // +kubebuilder:printcolumn:name="Model",type=string,JSONPath=`.spec.modelName`
// +kubebuilder:printcolumn:name="Backend",type=string,JSONPath=`.spec.backend` // +kubebuilder:printcolumn:name="Backend",type=string,JSONPath=`.status.backend`
// +kubebuilder:printcolumn:name="State",type=string,JSONPath=`.status.state` // +kubebuilder:printcolumn:name="State",type=string,JSONPath=`.status.state`
// +kubebuilder:printcolumn:name="DGD-State",type=string,JSONPath=`.status.deployment.state` // +kubebuilder:printcolumn:name="DGD-State",type=string,JSONPath=`.status.deployment.state`
// +kubebuilder:printcolumn:name="Age",type="date",JSONPath=".metadata.creationTimestamp" // +kubebuilder:printcolumn:name="Age",type="date",JSONPath=".metadata.creationTimestamp"
......
...@@ -41,6 +41,7 @@ import ( ...@@ -41,6 +41,7 @@ import (
"github.com/ai-dynamo/dynamo/deploy/cloud/operator/api/dynamo/common" "github.com/ai-dynamo/dynamo/deploy/cloud/operator/api/dynamo/common"
"k8s.io/api/autoscaling/v2" "k8s.io/api/autoscaling/v2"
"k8s.io/api/core/v1" "k8s.io/api/core/v1"
apiextensionsv1 "k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime" "k8s.io/apimachinery/pkg/runtime"
) )
...@@ -499,22 +500,12 @@ func (in *DynamoGraphDeploymentRequestList) DeepCopyObject() runtime.Object { ...@@ -499,22 +500,12 @@ func (in *DynamoGraphDeploymentRequestList) DeepCopyObject() runtime.Object {
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil. // DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *DynamoGraphDeploymentRequestSpec) DeepCopyInto(out *DynamoGraphDeploymentRequestSpec) { func (in *DynamoGraphDeploymentRequestSpec) DeepCopyInto(out *DynamoGraphDeploymentRequestSpec) {
*out = *in *out = *in
out.SLA = in.SLA in.ProfilingConfig.DeepCopyInto(&out.ProfilingConfig)
if in.GPU != nil {
in, out := &in.GPU, &out.GPU
*out = new(GPUSpec)
**out = **in
}
if in.DeploymentOverrides != nil { if in.DeploymentOverrides != nil {
in, out := &in.DeploymentOverrides, &out.DeploymentOverrides in, out := &in.DeploymentOverrides, &out.DeploymentOverrides
*out = new(DeploymentOverridesSpec) *out = new(DeploymentOverridesSpec)
(*in).DeepCopyInto(*out) (*in).DeepCopyInto(*out)
} }
if in.ProfilingConfig != nil {
in, out := &in.ProfilingConfig, &out.ProfilingConfig
*out = new(ProfilingConfigSpec)
(*in).DeepCopyInto(*out)
}
} }
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new DynamoGraphDeploymentRequestSpec. // DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new DynamoGraphDeploymentRequestSpec.
...@@ -626,21 +617,6 @@ func (in *DynamoGraphDeploymentStatus) DeepCopy() *DynamoGraphDeploymentStatus { ...@@ -626,21 +617,6 @@ func (in *DynamoGraphDeploymentStatus) DeepCopy() *DynamoGraphDeploymentStatus {
return out return out
} }
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *GPUSpec) DeepCopyInto(out *GPUSpec) {
*out = *in
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new GPUSpec.
func (in *GPUSpec) DeepCopy() *GPUSpec {
if in == nil {
return nil
}
out := new(GPUSpec)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil. // DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *IngressSpec) DeepCopyInto(out *IngressSpec) { func (in *IngressSpec) DeepCopyInto(out *IngressSpec) {
*out = *in *out = *in
...@@ -754,6 +730,11 @@ func (in *PVC) DeepCopy() *PVC { ...@@ -754,6 +730,11 @@ func (in *PVC) DeepCopy() *PVC {
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil. // DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *ProfilingConfigSpec) DeepCopyInto(out *ProfilingConfigSpec) { func (in *ProfilingConfigSpec) DeepCopyInto(out *ProfilingConfigSpec) {
*out = *in *out = *in
if in.Config != nil {
in, out := &in.Config, &out.Config
*out = new(apiextensionsv1.JSON)
(*in).DeepCopyInto(*out)
}
if in.ConfigMapRef != nil { if in.ConfigMapRef != nil {
in, out := &in.ConfigMapRef, &out.ConfigMapRef in, out := &in.ConfigMapRef, &out.ConfigMapRef
*out = new(ConfigMapKeySelector) *out = new(ConfigMapKeySelector)
...@@ -771,21 +752,6 @@ func (in *ProfilingConfigSpec) DeepCopy() *ProfilingConfigSpec { ...@@ -771,21 +752,6 @@ func (in *ProfilingConfigSpec) DeepCopy() *ProfilingConfigSpec {
return out return out
} }
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *SLASpec) DeepCopyInto(out *SLASpec) {
*out = *in
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new SLASpec.
func (in *SLASpec) DeepCopy() *SLASpec {
if in == nil {
return nil
}
out := new(SLASpec)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil. // DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *SharedMemorySpec) DeepCopyInto(out *SharedMemorySpec) { func (in *SharedMemorySpec) DeepCopyInto(out *SharedMemorySpec) {
*out = *in *out = *in
......
...@@ -36,7 +36,7 @@ spec: ...@@ -36,7 +36,7 @@ spec:
- jsonPath: .spec.modelName - jsonPath: .spec.modelName
name: Model name: Model
type: string type: string
- jsonPath: .spec.backend - jsonPath: .status.backend
name: Backend name: Backend
type: string type: string
- jsonPath: .status.state - jsonPath: .status.state
...@@ -94,16 +94,6 @@ spec: ...@@ -94,16 +94,6 @@ spec:
after profiling completes. If false, only the spec is generated and stored in status. after profiling completes. If false, only the spec is generated and stored in status.
Users can then manually create a DGD using the generated spec. Users can then manually create a DGD using the generated spec.
type: boolean type: boolean
backend:
default: trtllm
description: |-
Backend specifies the inference backend framework to use.
Supported values are: "vllm", "sglang", "trtllm".
enum:
- vllm
- sglang
- trtllm
type: string
deploymentOverrides: deploymentOverrides:
description: |- description: |-
DeploymentOverrides allows customizing metadata for the auto-created DGD. DeploymentOverrides allows customizing metadata for the auto-created DGD.
...@@ -132,53 +122,29 @@ spec: ...@@ -132,53 +122,29 @@ spec:
If not specified, defaults to the DGDR namespace. If not specified, defaults to the DGDR namespace.
type: string type: string
type: object type: object
gpu:
description: |-
GPU defines optional GPU type and resource specifications.
These constraints guide the profiler to find configurations within specified bounds.
properties:
maxNumGPUsPerEngine:
default: 8
description: |-
MaxNumGPUsPerEngine specifies the maximum number of GPUs per engine for profiling.
The profiler will not consider configurations with more GPUs than this value.
minimum: 1
type: integer
minNumGPUsPerEngine:
default: 1
description: |-
MinNumGPUsPerEngine specifies the minimum number of GPUs per engine for profiling.
The profiler will not consider configurations with fewer GPUs than this value.
minimum: 1
type: integer
type:
description: |-
Type specifies the GPU type to target (e.g., "h200", "h100", "a100").
If specified, profiling will focus on configurations optimized for this GPU type.
type: string
type: object
modelName: modelName:
description: |- description: |-
ModelName specifies the model to deploy (e.g., "meta/llama3-70b"). ModelName specifies the model to deploy (e.g., "Qwen/Qwen3-0.6B", "meta-llama/Llama-3-70b").
This should be a valid model identifier that the profiler can resolve. This is a high-level identifier for easy reference in kubectl output and logs.
type: string type: string
online:
default: false
description: |-
Online indicates whether to use online profiler (true) or AI Configurator (false).
Online profiling uses real deployments for accurate measurements (2-4 hours).
Offline profiling uses AI Configurator for fast simulation-based profiling (20-30 seconds).
type: boolean
profilingConfig: profilingConfig:
description: |- description: |-
ProfilingConfig provides custom configuration for the profiling job. ProfilingConfig provides the complete configuration for the profiling job.
Applicable to both online and offline (AIC) profiling modes. This configuration is passed directly to the profiler.
The structure matches the profile_sla config format exactly (see ProfilingConfigSpec for schema).
The profiler will validate the configuration and report any errors.
properties: properties:
config:
description: |-
Config is the profiling configuration as arbitrary JSON/YAML. This will be passed directly to the profiler.
The profiler will validate the configuration and report any errors.
type: object
x-kubernetes-preserve-unknown-fields: true
configMapRef: configMapRef:
description: |- description: |-
ConfigMapRef is a reference to a ConfigMap containing profiling configuration. ConfigMapRef is an optional reference to a ConfigMap containing the DynamoGraphDeployment
The ConfigMap should contain a key (default: "disagg.yaml") with the configuration file. base config file (disagg.yaml). This is separate from the profiling config above.
This configuration is used by both online and offline (AIC) profiling modes. The path to this config will be set as engine.config in the profiling config.
properties: properties:
key: key:
default: disagg.yaml default: disagg.yaml
...@@ -191,45 +157,18 @@ spec: ...@@ -191,45 +157,18 @@ spec:
- name - name
type: object type: object
type: object type: object
sla:
description: |-
SLA defines the Service Level Agreement profiling targets.
The profiler uses these targets to find an optimal deployment configuration.
properties:
isl:
default: 3000
description: |-
ISL is the Input Sequence Length for profiling.
Defines the length of input sequences to use during profiling tests.
minimum: 1
type: integer
itl:
default: 10
description: |-
ITL is the target Inter-Token Latency in milliseconds.
This represents the maximum time allowed between consecutive tokens in the output.
type: integer
osl:
default: 500
description: |-
OSL is the Output Sequence Length for profiling.
Defines the expected length of output sequences to generate during profiling tests.
minimum: 1
type: integer
ttft:
default: 50
description: |-
TTFT is the target Time To First Token in milliseconds.
This represents the maximum time allowed from request submission to receiving the first token.
type: integer
type: object
required: required:
- modelName - modelName
- sla - profilingConfig
type: object type: object
status: status:
description: Status reflects the current observed state of this deployment request. description: Status reflects the current observed state of this deployment request.
properties: properties:
backend:
description: |-
Backend is extracted from profilingConfig.config.engine.backend for display purposes.
This field is populated by the controller and shown in kubectl output.
type: string
conditions: conditions:
description: |- description: |-
Conditions contains the latest observed conditions of the deployment request. Conditions contains the latest observed conditions of the deployment request.
......
...@@ -18,18 +18,57 @@ kind: DynamoGraphDeploymentRequest ...@@ -18,18 +18,57 @@ kind: DynamoGraphDeploymentRequest
metadata: metadata:
name: example-llm-sla name: example-llm-sla
spec: spec:
modelName: "meta/llama3-70b" # ModelName is a high-level identifier for the model being deployed
backend: trtllm # enum: [vllm, sglang, trtllm]; default is trtllm modelName: Qwen/Qwen3-0.6B
sla: # SLA profiling targets (all fields optional with defaults)
itl: 10 # Inter-Token Latency target in milliseconds (default: 10) # ProfilingConfig maps directly to the profile_sla.py config format
ttft: 50 # Time To First Token target in milliseconds (default: 50) # See benchmarks/profiler/utils/profiler_argparse.py for complete schema
isl: 3000 # Input Sequence Length (default: 3000) profilingConfig:
osl: 500 # Output Sequence Length (default: 500) config:
gpu: # optional # Optional: Output directory for profiling results (defaults to /data in the Job)
type: h200_sxm # output_dir: "profiling_results"
minNumGPUsPerEngine: 1 # default is 1
maxNumGPUsPerEngine: 8 # default is 8 # Engine configuration
online: false # true for online profiler, false for AIC profiler engine:
backend: trtllm # Inference backend: vllm, sglang, or trtllm
max_context_length: 16384 # Maximum context length supported by the model
is_moe_model: false # Enable MoE model support (uses TEP/DEP instead of TP)
# Hardware configuration
hardware:
min_num_gpus_per_engine: 1 # Minimum GPUs to test
max_num_gpus_per_engine: 4 # Maximum GPUs to test (limited by model's num_heads/4)
num_gpus_per_node: 8 # GPUs per node (for MoE models)
# Sweep/profiling configuration
sweep:
skip_existing_results: true # Skip configurations that already have results
prefill_interpolation_granularity: 16 # Samples for TTFT interpolation
decode_interpolation_granularity: 6 # Samples for ITL interpolation
# AI Configurator mode (fast simulation-based profiling, 20-30 seconds)
use_ai_configurator: false # Set to false for online profiling (2-4 hours)
aic_system: h200_sxm # Target GPU system for AI Configurator
aic_model_name: QWEN3_0.6B # Model name for AI Configurator
aic_backend_version: "0.20.0" # Backend version for AI Configurator
# SLA targets for profiling
sla:
isl: 3000 # Input sequence length
osl: 500 # Output sequence length
ttft: 50.0 # Time To First Token target (milliseconds)
itl: 10.0 # Inter-Token Latency target (milliseconds)
# Optional: Planner-specific arguments
# planner:
# planner_min_endpoint: 2
# # Add any other planner args here (use hyphens or underscores)
# Reference to ConfigMap containing the DGD base config (disagg.yaml)
# The path to this file will be automatically set as engine.config
configMapRef:
name: my-profiling-config
key: disagg.yaml # defaults to "disagg.yaml"
# Optional: Automatically create DynamoGraphDeployment after profiling # Optional: Automatically create DynamoGraphDeployment after profiling
autoApply: true # default is false autoApply: true # default is false
...@@ -42,9 +81,3 @@ spec: ...@@ -42,9 +81,3 @@ spec:
# team: ml-platform # team: ml-platform
# annotations: # annotations:
# description: "Auto-generated from DGDR" # description: "Auto-generated from DGDR"
# Currently required for both online and offline/AIC profiling, but will be removed in the future
profilingConfig:
configMapRef:
name: my-profiling-config
key: disagg.yaml # default is "disagg.yaml"
...@@ -320,6 +320,9 @@ func (r *DynamoGraphDeploymentRequestReconciler) handleInitialState(ctx context. ...@@ -320,6 +320,9 @@ func (r *DynamoGraphDeploymentRequestReconciler) handleInitialState(ctx context.
// Set observedGeneration to track the spec we're processing // Set observedGeneration to track the spec we're processing
dgdr.Status.ObservedGeneration = dgdr.Generation dgdr.Status.ObservedGeneration = dgdr.Generation
// Extract and populate backend from config for display in kubectl output
dgdr.Status.Backend = getBackendFromConfig(dgdr)
// Initialize status // Initialize status
r.Recorder.Event(dgdr, corev1.EventTypeNormal, EventReasonInitialized, MessageInitialized) r.Recorder.Event(dgdr, corev1.EventTypeNormal, EventReasonInitialized, MessageInitialized)
return r.updateStateAndRequeue(ctx, dgdr, StatePending, MessageInitialized) return r.updateStateAndRequeue(ctx, dgdr, StatePending, MessageInitialized)
...@@ -337,7 +340,7 @@ func (r *DynamoGraphDeploymentRequestReconciler) handlePendingState(ctx context. ...@@ -337,7 +340,7 @@ func (r *DynamoGraphDeploymentRequestReconciler) handlePendingState(ctx context.
} }
// Record event with appropriate message // Record event with appropriate message
if dgdr.Spec.Online { if isOnlineProfiling(dgdr) {
r.Recorder.Event(dgdr, corev1.EventTypeNormal, EventReasonProfilingJobCreated, MessageProfilingJobCreated) r.Recorder.Event(dgdr, corev1.EventTypeNormal, EventReasonProfilingJobCreated, MessageProfilingJobCreated)
} else { } else {
r.Recorder.Event(dgdr, corev1.EventTypeNormal, EventReasonProfilingJobCreated, MessageAICProfilingJobCreated) r.Recorder.Event(dgdr, corev1.EventTypeNormal, EventReasonProfilingJobCreated, MessageAICProfilingJobCreated)
...@@ -670,15 +673,10 @@ func (r *DynamoGraphDeploymentRequestReconciler) handleFailedState(ctx context.C ...@@ -670,15 +673,10 @@ func (r *DynamoGraphDeploymentRequestReconciler) handleFailedState(ctx context.C
return ctrl.Result{}, nil return ctrl.Result{}, nil
} }
// getProfilingJobName returns the job name for a DGDR based on profiling mode // getProfilingJobName returns the job name for a DGDR
func getProfilingJobName(dgdr *nvidiacomv1alpha1.DynamoGraphDeploymentRequest) string { func getProfilingJobName(dgdr *nvidiacomv1alpha1.DynamoGraphDeploymentRequest) string {
var jobNamePrefix string // Use "profile-" prefix for all profiling jobs
if dgdr.Spec.Online { return fmt.Sprintf("profile-%s", dgdr.Name)
jobNamePrefix = JobNamePrefixOnline
} else {
jobNamePrefix = JobNamePrefixAIC
}
return fmt.Sprintf("%s%s", jobNamePrefix, dgdr.Name)
} }
// getOutputConfigMapName returns the ConfigMap name for profiling output // getOutputConfigMapName returns the ConfigMap name for profiling output
...@@ -686,32 +684,55 @@ func getOutputConfigMapName(dgdr *nvidiacomv1alpha1.DynamoGraphDeploymentRequest ...@@ -686,32 +684,55 @@ func getOutputConfigMapName(dgdr *nvidiacomv1alpha1.DynamoGraphDeploymentRequest
return fmt.Sprintf("%s%s", ConfigMapOutputPrefix, dgdr.Name) return fmt.Sprintf("%s%s", ConfigMapOutputPrefix, dgdr.Name)
} }
// validateSpec validates the DGDR spec // isOnlineProfiling determines whether online profiling or AI Configurator is being used
func (r *DynamoGraphDeploymentRequestReconciler) validateSpec(ctx context.Context, dgdr *nvidiacomv1alpha1.DynamoGraphDeploymentRequest) error { // based on the sweep.use_ai_configurator config value
if dgdr.Spec.ModelName == "" { func isOnlineProfiling(dgdr *nvidiacomv1alpha1.DynamoGraphDeploymentRequest) bool {
return errors.New(ValidationErrorModelNameRequired) if dgdr.Spec.ProfilingConfig.Config == nil {
return true
} }
if dgdr.Spec.SLA.ITL <= 0 { var config map[string]interface{}
return errors.New(ValidationErrorITLPositive) if err := yaml.Unmarshal(dgdr.Spec.ProfilingConfig.Config.Raw, &config); err != nil {
return true // Default to online on parse error
} }
if dgdr.Spec.SLA.TTFT <= 0 { if sweep, ok := config["sweep"].(map[string]interface{}); ok {
return errors.New(ValidationErrorTTFTPositive) if useAIC, exists := sweep["use_ai_configurator"].(bool); exists {
return !useAIC
}
} }
// Default to online profiling if not specified
return true
}
// Validate backend // getBackendFromConfig extracts the backend value from profilingConfig.config.engine.backend
validBackends := map[string]bool{ func getBackendFromConfig(dgdr *nvidiacomv1alpha1.DynamoGraphDeploymentRequest) string {
BackendVLLM: true, if dgdr.Spec.ProfilingConfig.Config == nil {
BackendSGLang: true, return ""
BackendTRTLLM: true,
} }
if dgdr.Spec.Backend != "" && !validBackends[dgdr.Spec.Backend] {
return fmt.Errorf(ValidationErrorInvalidBackend, dgdr.Spec.Backend) var config map[string]interface{}
if err := yaml.Unmarshal(dgdr.Spec.ProfilingConfig.Config.Raw, &config); err != nil {
return ""
} }
// Validate ConfigMap if provided (for both online and offline/AIC profiling) if engine, ok := config["engine"].(map[string]interface{}); ok {
if dgdr.Spec.ProfilingConfig != nil && dgdr.Spec.ProfilingConfig.ConfigMapRef != nil { if backend, ok := engine["backend"].(string); ok {
return backend
}
}
return ""
}
// validateSpec validates the DGDR spec
func (r *DynamoGraphDeploymentRequestReconciler) validateSpec(ctx context.Context, dgdr *nvidiacomv1alpha1.DynamoGraphDeploymentRequest) error {
// Basic validation - check that profilingConfig.config is provided
if dgdr.Spec.ProfilingConfig.Config == nil || len(dgdr.Spec.ProfilingConfig.Config.Raw) == 0 {
return errors.New("profilingConfig.config is required and must not be empty")
}
// Validate ConfigMap if provided (for the DGD base config)
if dgdr.Spec.ProfilingConfig.ConfigMapRef != nil {
cm := &corev1.ConfigMap{} cm := &corev1.ConfigMap{}
err := r.Get(ctx, types.NamespacedName{ err := r.Get(ctx, types.NamespacedName{
Name: dgdr.Spec.ProfilingConfig.ConfigMapRef.Name, Name: dgdr.Spec.ProfilingConfig.ConfigMapRef.Name,
...@@ -737,6 +758,24 @@ func (r *DynamoGraphDeploymentRequestReconciler) validateSpec(ctx context.Contex ...@@ -737,6 +758,24 @@ func (r *DynamoGraphDeploymentRequestReconciler) validateSpec(ctx context.Contex
} }
} }
// Parse config to validate structure
var config map[string]interface{}
if err := yaml.Unmarshal(dgdr.Spec.ProfilingConfig.Config.Raw, &config); err != nil {
return fmt.Errorf("failed to parse profilingConfig.config: %w", err)
}
// Additional validation: Ensure engine.config is set (either as path or will be set from ConfigMapRef)
engineConfig, hasEngine := config["engine"].(map[string]interface{})
if hasEngine {
_, hasConfig := engineConfig["config"]
if !hasConfig && dgdr.Spec.ProfilingConfig.ConfigMapRef == nil {
return errors.New("either profilingConfig.config.engine.config must be set, or profilingConfig.configMapRef must be provided")
}
} else if dgdr.Spec.ProfilingConfig.ConfigMapRef == nil {
return errors.New("profilingConfig.config must contain 'engine' section, or profilingConfig.configMapRef must be provided")
}
// The profiler will validate the rest of the configuration
return nil return nil
} }
...@@ -757,33 +796,48 @@ func (r *DynamoGraphDeploymentRequestReconciler) createProfilingJob(ctx context. ...@@ -757,33 +796,48 @@ func (r *DynamoGraphDeploymentRequestReconciler) createProfilingJob(ctx context.
} }
} }
// Use ProfilerImage for both online and offline (AIC) profiling
imageName := r.ProfilerImage
if imageName == "" {
return fmt.Errorf("profiler image not configured: the operator's profilerImage must be set in the Helm chart values (dynamo-operator.dynamo.dgdr.profilerImage). The image must contain the ai-dynamo profiler (python -m benchmarks.profiler.profile_sla entrypoint). For development, build from the ai-dynamo repository Dockerfile and push to your registry. A public image will be available in release 0.6.1")
}
logger.Info("Using profiler image", "image", imageName, "online", dgdr.Spec.Online)
// Determine label based on profiling mode
var labelValue string
if dgdr.Spec.Online {
labelValue = LabelValueDynamoProfiler
} else {
labelValue = LabelValueAICProfiler
}
// Use SyncResource to create/update the job // Use SyncResource to create/update the job
modified, job, err := commonController.SyncResource(ctx, r, dgdr, func(ctx context.Context) (*batchv1.Job, bool, error) { modified, job, err := commonController.SyncResource(ctx, r, dgdr, func(ctx context.Context) (*batchv1.Job, bool, error) {
jobName := getProfilingJobName(dgdr) jobName := getProfilingJobName(dgdr)
outputConfigMapName := getOutputConfigMapName(dgdr) outputConfigMapName := getOutputConfigMapName(dgdr)
// Build profiler container based on online vs offline (AIC) mode // Parse the profiling config from JSON
var profilerArgs []string var config map[string]interface{}
var profilerEnv []corev1.EnvVar if err := yaml.Unmarshal(dgdr.Spec.ProfilingConfig.Config.Raw, &config); err != nil {
return nil, false, fmt.Errorf("failed to parse profiling config: %w", err)
}
// Set deployment.namespace if not already set
if _, hasDeployment := config["deployment"]; !hasDeployment {
config["deployment"] = make(map[string]interface{})
}
deploymentConfig := config["deployment"].(map[string]interface{})
if _, hasNamespace := deploymentConfig["namespace"]; !hasNamespace {
deploymentConfig["namespace"] = dgdr.Namespace
}
// Set output_dir if not already set
if _, hasOutputDir := config["output_dir"]; !hasOutputDir {
config["output_dir"] = ProfilingOutputPath
}
// If ConfigMapRef is provided, set engine.config path
if dgdr.Spec.ProfilingConfig.ConfigMapRef != nil {
if _, hasEngine := config["engine"]; !hasEngine {
config["engine"] = make(map[string]interface{})
}
engineConfig := config["engine"].(map[string]interface{})
engineConfig["config"] = fmt.Sprintf("%s/%s", ProfilingConfigPath, ProfilingConfigFile)
}
// Serialize config to YAML for passing to profiler
configYAML, err := yaml.Marshal(config)
if err != nil {
return nil, false, fmt.Errorf("failed to marshal profiling config to YAML: %w", err)
}
// Common environment variables // Common environment variables
profilerEnv = []corev1.EnvVar{ profilerEnv := []corev1.EnvVar{
{ {
Name: "HUGGING_FACE_HUB_TOKEN", Name: "HUGGING_FACE_HUB_TOKEN",
ValueFrom: &corev1.EnvVarSource{ ValueFrom: &corev1.EnvVarSource{
...@@ -805,7 +859,7 @@ func (r *DynamoGraphDeploymentRequestReconciler) createProfilingJob(ctx context. ...@@ -805,7 +859,7 @@ func (r *DynamoGraphDeploymentRequestReconciler) createProfilingJob(ctx context.
}, },
} }
// Build container with volume mounts // Build volume mounts
volumeMounts := []corev1.VolumeMount{ volumeMounts := []corev1.VolumeMount{
{ {
Name: VolumeNameProfilingOutput, Name: VolumeNameProfilingOutput,
...@@ -813,49 +867,8 @@ func (r *DynamoGraphDeploymentRequestReconciler) createProfilingJob(ctx context. ...@@ -813,49 +867,8 @@ func (r *DynamoGraphDeploymentRequestReconciler) createProfilingJob(ctx context.
}, },
} }
// Determine GPU range for profiling // Add ConfigMap volume mount if provided
minGPUs := 1 if dgdr.Spec.ProfilingConfig.ConfigMapRef != nil {
maxGPUs := 8
if dgdr.Spec.GPU != nil {
if dgdr.Spec.GPU.MinNumGPUsPerEngine > 0 {
minGPUs = dgdr.Spec.GPU.MinNumGPUsPerEngine
}
if dgdr.Spec.GPU.MaxNumGPUsPerEngine > 0 {
maxGPUs = dgdr.Spec.GPU.MaxNumGPUsPerEngine
}
}
// Build common profiler args (shared by both online and offline modes)
profilerArgs = []string{
"--namespace", dgdr.Namespace,
"--backend", dgdr.Spec.Backend,
"--ttft", fmt.Sprintf("%d", dgdr.Spec.SLA.TTFT),
"--itl", fmt.Sprintf("%d", dgdr.Spec.SLA.ITL),
"--isl", fmt.Sprintf("%d", dgdr.Spec.SLA.ISL),
"--osl", fmt.Sprintf("%d", dgdr.Spec.SLA.OSL),
"--output-dir", ProfilingOutputPath,
"--min-num-gpus-per-engine", fmt.Sprintf("%d", minGPUs),
"--max-num-gpus-per-engine", fmt.Sprintf("%d", maxGPUs),
}
// Add mode-specific args
if !dgdr.Spec.Online {
// Offline (AIC) profiling: add AI Configurator args
profilerArgs = append(profilerArgs,
"--use-ai-configurator",
"--aic-model-name", dgdr.Spec.ModelName,
"--aic-backend-version", "0.20.0", // TODO: don't hardcode this
)
// Add AIC-specific GPU system type
if dgdr.Spec.GPU != nil && dgdr.Spec.GPU.Type != "" {
profilerArgs = append(profilerArgs, "--aic-system", dgdr.Spec.GPU.Type)
}
}
// Add config if provided (for both online and offline modes)
if dgdr.Spec.ProfilingConfig != nil && dgdr.Spec.ProfilingConfig.ConfigMapRef != nil {
profilerArgs = append(profilerArgs, "--config", fmt.Sprintf("%s/%s", ProfilingConfigPath, ProfilingConfigFile))
volumeMounts = append(volumeMounts, corev1.VolumeMount{ volumeMounts = append(volumeMounts, corev1.VolumeMount{
Name: VolumeNameProfilingConfig, Name: VolumeNameProfilingConfig,
MountPath: ProfilingConfigPath, MountPath: ProfilingConfigPath,
...@@ -863,6 +876,18 @@ func (r *DynamoGraphDeploymentRequestReconciler) createProfilingJob(ctx context. ...@@ -863,6 +876,18 @@ func (r *DynamoGraphDeploymentRequestReconciler) createProfilingJob(ctx context.
}) })
} }
// Profiler args: pass the config as an inline YAML string via --profile-config
profilerArgs := []string{
"--profile-config", string(configYAML),
}
// Determine profiler image
imageName := r.ProfilerImage
if imageName == "" {
return nil, false, fmt.Errorf("profiler image not configured: configure dynamo-operator.dynamo.dgdr.profilerImage in Helm values")
}
logger.Info("Using profiler image", "image", imageName)
profilerContainer := corev1.Container{ profilerContainer := corev1.Container{
Name: ContainerNameProfiler, Name: ContainerNameProfiler,
Image: imageName, Image: imageName,
...@@ -918,8 +943,8 @@ func (r *DynamoGraphDeploymentRequestReconciler) createProfilingJob(ctx context. ...@@ -918,8 +943,8 @@ func (r *DynamoGraphDeploymentRequestReconciler) createProfilingJob(ctx context.
}, },
}} }}
// Add ConfigMap volume if provided (for both online and offline/AIC) // Add ConfigMap volume if provided
if dgdr.Spec.ProfilingConfig != nil && dgdr.Spec.ProfilingConfig.ConfigMapRef != nil { if dgdr.Spec.ProfilingConfig.ConfigMapRef != nil {
key := dgdr.Spec.ProfilingConfig.ConfigMapRef.Key key := dgdr.Spec.ProfilingConfig.ConfigMapRef.Key
if key == "" { if key == "" {
key = ProfilingConfigFile key = ProfilingConfigFile
...@@ -944,6 +969,12 @@ func (r *DynamoGraphDeploymentRequestReconciler) createProfilingJob(ctx context. ...@@ -944,6 +969,12 @@ func (r *DynamoGraphDeploymentRequestReconciler) createProfilingJob(ctx context.
// Limit retries to prevent infinite loop // Limit retries to prevent infinite loop
backoffLimit := int32(3) backoffLimit := int32(3)
// Determine label based on whether AI Configurator is used
labelValue := LabelValueDynamoProfiler
if !isOnlineProfiling(dgdr) {
labelValue = LabelValueAICProfiler
}
job := &batchv1.Job{ job := &batchv1.Job{
ObjectMeta: metav1.ObjectMeta{ ObjectMeta: metav1.ObjectMeta{
Name: jobName, Name: jobName,
...@@ -978,11 +1009,7 @@ func (r *DynamoGraphDeploymentRequestReconciler) createProfilingJob(ctx context. ...@@ -978,11 +1009,7 @@ func (r *DynamoGraphDeploymentRequestReconciler) createProfilingJob(ctx context.
} }
if modified { if modified {
if dgdr.Spec.Online { logger.Info("Profiling job created/updated", "job", job.Name)
logger.Info("Online profiling job created/updated", "job", job.Name)
} else {
logger.Info("Offline (AIC) profiling job created/updated", "job", job.Name)
}
} }
return nil return nil
...@@ -1070,7 +1097,7 @@ func (r *DynamoGraphDeploymentRequestReconciler) getProfilingJobErrorDetails(ctx ...@@ -1070,7 +1097,7 @@ func (r *DynamoGraphDeploymentRequestReconciler) getProfilingJobErrorDetails(ctx
// generateDGDSpec generates DGD spec from profiling results (online or offline/AIC) // generateDGDSpec generates DGD spec from profiling results (online or offline/AIC)
func (r *DynamoGraphDeploymentRequestReconciler) generateDGDSpec(ctx context.Context, dgdr *nvidiacomv1alpha1.DynamoGraphDeploymentRequest) error { func (r *DynamoGraphDeploymentRequestReconciler) generateDGDSpec(ctx context.Context, dgdr *nvidiacomv1alpha1.DynamoGraphDeploymentRequest) error {
logger := log.FromContext(ctx) logger := log.FromContext(ctx)
logger.Info("Generating DGD spec from profiling results", "name", dgdr.Name, "online", dgdr.Spec.Online) logger.Info("Generating DGD spec from profiling results", "name", dgdr.Name)
// Read the generated spec from ConfigMap (created by sidecar) // Read the generated spec from ConfigMap (created by sidecar)
outputConfigMapName := getOutputConfigMapName(dgdr) outputConfigMapName := getOutputConfigMapName(dgdr)
......
...@@ -28,6 +28,11 @@ limitations under the License. ...@@ -28,6 +28,11 @@ limitations under the License.
Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group. Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group.
This package defines the DynamoGraphDeploymentRequest (DGDR) custom resource, which provides
a high-level, SLA-driven interface for deploying machine learning models on Dynamo.
Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group.
### Resource Types ### Resource Types
- [DynamoComponentDeployment](#dynamocomponentdeployment) - [DynamoComponentDeployment](#dynamocomponentdeployment)
- [DynamoGraphDeployment](#dynamographdeployment) - [DynamoGraphDeployment](#dynamographdeployment)
...@@ -62,7 +67,8 @@ _Appears in:_ ...@@ -62,7 +67,8 @@ _Appears in:_
ConfigMapKeySelector selects a key from a ConfigMap. ConfigMapKeySelector selects a specific key from a ConfigMap.
Used to reference external configuration data stored in ConfigMaps.
...@@ -71,15 +77,16 @@ _Appears in:_ ...@@ -71,15 +77,16 @@ _Appears in:_
| Field | Description | Default | Validation | | Field | Description | Default | Validation |
| --- | --- | --- | --- | | --- | --- | --- | --- |
| `name` _string_ | Name of the ConfigMap. | | Required: {} <br /> | | `name` _string_ | Name of the ConfigMap containing the desired data. | | Required: {} <br /> |
| `key` _string_ | Key in the ConfigMap to select. | disagg.yaml | | | `key` _string_ | Key in the ConfigMap to select. If not specified, defaults to "disagg.yaml". | disagg.yaml | |
#### DeploymentOverridesSpec #### DeploymentOverridesSpec
DeploymentOverridesSpec defines metadata overrides for the auto-created DGD. DeploymentOverridesSpec allows users to customize metadata for auto-created DynamoGraphDeployments.
When autoApply is enabled, these overrides are applied to the generated DGD resource.
...@@ -88,17 +95,18 @@ _Appears in:_ ...@@ -88,17 +95,18 @@ _Appears in:_
| Field | Description | Default | Validation | | Field | Description | Default | Validation |
| --- | --- | --- | --- | | --- | --- | --- | --- |
| `name` _string_ | Name is the name for the created DynamoGraphDeployment.<br />If not specified, defaults to the DGDR name. | | Optional: {} <br /> | | `name` _string_ | Name is the desired name for the created DynamoGraphDeployment.<br />If not specified, defaults to the DGDR name. | | Optional: {} <br /> |
| `namespace` _string_ | Namespace is the namespace for the created DynamoGraphDeployment.<br />If not specified, defaults to the DGDR namespace. | | Optional: {} <br /> | | `namespace` _string_ | Namespace is the desired namespace for the created DynamoGraphDeployment.<br />If not specified, defaults to the DGDR namespace. | | Optional: {} <br /> |
| `labels` _object (keys:string, values:string)_ | Labels are additional labels to add to the DynamoGraphDeployment.<br />These are merged with auto-generated labels. | | Optional: {} <br /> | | `labels` _object (keys:string, values:string)_ | Labels are additional labels to add to the DynamoGraphDeployment metadata.<br />These are merged with auto-generated labels from the profiling process. | | Optional: {} <br /> |
| `annotations` _object (keys:string, values:string)_ | Annotations are additional annotations to add to the DynamoGraphDeployment. | | Optional: {} <br /> | | `annotations` _object (keys:string, values:string)_ | Annotations are additional annotations to add to the DynamoGraphDeployment metadata. | | Optional: {} <br /> |
#### DeploymentStatus #### DeploymentStatus
DeploymentStatus tracks the auto-created DGD status. DeploymentStatus tracks the state of an auto-created DynamoGraphDeployment.
This status is populated when autoApply is enabled and a DGD is created.
...@@ -109,8 +117,8 @@ _Appears in:_ ...@@ -109,8 +117,8 @@ _Appears in:_
| --- | --- | --- | --- | | --- | --- | --- | --- |
| `name` _string_ | Name is the name of the created DynamoGraphDeployment. | | | | `name` _string_ | Name is the name of the created DynamoGraphDeployment. | | |
| `namespace` _string_ | Namespace is the namespace of the created DynamoGraphDeployment. | | | | `namespace` _string_ | Namespace is the namespace of the created DynamoGraphDeployment. | | |
| `state` _string_ | State is the current state of the DynamoGraphDeployment.<br />This is mirrored from the DGD's status.state field. | | | | `state` _string_ | State is the current state of the DynamoGraphDeployment.<br />This value is mirrored from the DGD's status.state field. | | |
| `created` _boolean_ | Created indicates whether the DGD has been created.<br />Used to prevent recreation if DGD is deleted by user. | | | | `created` _boolean_ | Created indicates whether the DGD has been successfully created.<br />Used to prevent recreation if the DGD is manually deleted by users. | | |
#### DynamoComponentDeployment #### DynamoComponentDeployment
...@@ -229,6 +237,19 @@ It serves as the primary interface for users to request model deployments with ...@@ -229,6 +237,19 @@ It serves as the primary interface for users to request model deployments with
specific performance and resource constraints, enabling SLA-driven deployments. specific performance and resource constraints, enabling SLA-driven deployments.
Lifecycle:
1. Initial → Pending: Validates spec and prepares for profiling
2. Pending → Profiling: Creates and runs profiling job (online or AIC)
3. Profiling → Ready/Deploying: Generates DGD spec after profiling completes
4. Deploying → Ready: When autoApply=true, monitors DGD until Ready
5. Ready: Terminal state when DGD is operational or spec is available
6. DeploymentDeleted: Terminal state when auto-created DGD is manually deleted
The spec becomes immutable once profiling starts. Users must delete and recreate
the DGDR to modify configuration after this point.
...@@ -245,9 +266,9 @@ specific performance and resource constraints, enabling SLA-driven deployments. ...@@ -245,9 +266,9 @@ specific performance and resource constraints, enabling SLA-driven deployments.
DynamoGraphDeploymentRequestSpec defines the desired state of DynamoGraphDeploymentRequest. DynamoGraphDeploymentRequestSpec defines the desired state of a DynamoGraphDeploymentRequest.
This CRD serves as the primary interface for users to request model deployments This CRD serves as the primary interface for users to request model deployments with
with specific performance and resource constraints for SLA-driven deployments. specific performance constraints and resource requirements, enabling SLA-driven deployments.
...@@ -256,21 +277,18 @@ _Appears in:_ ...@@ -256,21 +277,18 @@ _Appears in:_
| Field | Description | Default | Validation | | Field | Description | Default | Validation |
| --- | --- | --- | --- | | --- | --- | --- | --- |
| `modelName` _string_ | ModelName specifies the model to deploy (e.g., "meta/llama3-70b"). | | Required: {} <br /> | | `modelName` _string_ | ModelName specifies the model to deploy (e.g., "Qwen/Qwen3-0.6B", "meta-llama/Llama-3-70b").<br />This is a high-level identifier for easy reference in kubectl output and logs. | | Required: {} <br /> |
| `backend` _string_ | Backend specifies the backend framework to use. | trtllm | Enum: [vllm sglang trtllm] <br /> | | `profilingConfig` _[ProfilingConfigSpec](#profilingconfigspec)_ | ProfilingConfig provides the complete configuration for the profiling job.<br />This configuration is passed directly to the profiler.<br />The structure matches the profile_sla config format exactly (see ProfilingConfigSpec for schema).<br />The profiler will validate the configuration and report any errors. | | Required: {} <br /> |
| `sla` _[SLASpec](#slaspec)_ | SLA defines the Service Level Agreement profiling targets. | | Required: {} <br /> | | `autoApply` _boolean_ | AutoApply indicates whether to automatically create a DynamoGraphDeployment<br />after profiling completes. If false, only the spec is generated and stored in status.<br />Users can then manually create a DGD using the generated spec. | false | |
| `gpu` _[GPUSpec](#gpuspec)_ | GPU defines optional GPU type specification. | | Optional: {} <br /> | | `deploymentOverrides` _[DeploymentOverridesSpec](#deploymentoverridesspec)_ | DeploymentOverrides allows customizing metadata for the auto-created DGD.<br />Only applicable when AutoApply is true. | | Optional: {} <br /> |
| `online` _boolean_ | Online indicates whether to use online profiler (true) or AI Configurator (false).<br />When true, uses real deployment for profiling (2-4 hours).<br />When false, uses AI Configurator for fast profiling (20-30 seconds). | false | |
| `autoApply` _boolean_ | AutoApply indicates whether to automatically create a DynamoGraphDeployment<br />after profiling completes. If false, only the spec is generated in status. | false | |
| `deploymentOverrides` _[DeploymentOverridesSpec](#deploymentoverridesspec)_ | DeploymentOverrides allows overriding metadata for the auto-created DGD.<br />Only used when AutoApply is true. | | Optional: {} <br /> |
| `profilingConfig` _[ProfilingConfigSpec](#profilingconfigspec)_ | ProfilingConfig provides configuration for the profiling job.<br />Can be used for both online and offline (AIC) profiling. | | Optional: {} <br /> |
#### DynamoGraphDeploymentRequestStatus #### DynamoGraphDeploymentRequestStatus
DynamoGraphDeploymentRequestStatus defines the observed state of DynamoGraphDeploymentRequest. DynamoGraphDeploymentRequestStatus represents the observed state of a DynamoGraphDeploymentRequest.
The controller updates this status as the DGDR progresses through its lifecycle.
...@@ -279,12 +297,13 @@ _Appears in:_ ...@@ -279,12 +297,13 @@ _Appears in:_
| Field | Description | Default | Validation | | Field | Description | Default | Validation |
| --- | --- | --- | --- | | --- | --- | --- | --- |
| `state` _string_ | State is a high-level textual status of the deployment request lifecycle.<br />Possible values: "Pending", "Profiling", "Deploying", "Ready", "DeploymentDeleted", "Failed" | | | | `state` _string_ | State is a high-level textual status of the deployment request lifecycle.<br />Possible values: "", "Pending", "Profiling", "Deploying", "Ready", "DeploymentDeleted", "Failed"<br />Empty string ("") represents the initial state before initialization. | | |
| `observedGeneration` _integer_ | ObservedGeneration reflects the generation of the most recently observed spec.<br />Used to detect spec changes and enforce immutability. | | | | `backend` _string_ | Backend is extracted from profilingConfig.config.engine.backend for display purposes.<br />This field is populated by the controller and shown in kubectl output. | | Optional: {} <br /> |
| `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#condition-v1-meta) array_ | Conditions contains the latest observed conditions of the deployment request.<br />The slice is merged by type on patch updates. | | | | `observedGeneration` _integer_ | ObservedGeneration reflects the generation of the most recently observed spec.<br />Used to detect spec changes and enforce immutability after profiling starts. | | |
| `profilingResults` _string_ | ProfilingResults contains references to the profiling data and results. | | Optional: {} <br /> | | `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#condition-v1-meta) array_ | Conditions contains the latest observed conditions of the deployment request.<br />Standard condition types include: Validation, Profiling, SpecGenerated, DeploymentReady.<br />Conditions are merged by type on patch updates. | | |
| `generatedDeployment` _[RawExtension](#rawextension)_ | GeneratedDeployment contains the full generated DynamoGraphDeployment (including metadata)<br />based on profiling results. This can be used to create a DynamoGraphDeployment resource.<br />Stored as RawExtension to preserve all fields including metadata. | | EmbeddedResource: {} <br />Optional: {} <br /> | | `profilingResults` _string_ | ProfilingResults contains a reference to the ConfigMap holding profiling data.<br />Format: "configmap/<name>" | | Optional: {} <br /> |
| `deployment` _[DeploymentStatus](#deploymentstatus)_ | Deployment tracks the auto-created DGD if AutoApply is true. | | Optional: {} <br /> | | `generatedDeployment` _[RawExtension](#rawextension)_ | GeneratedDeployment contains the full generated DynamoGraphDeployment specification<br />including metadata, based on profiling results. Users can extract this to create<br />a DGD manually, or it's used automatically when autoApply is true.<br />Stored as RawExtension to preserve all fields including metadata. | | EmbeddedResource: {} <br />Optional: {} <br /> |
| `deployment` _[DeploymentStatus](#deploymentstatus)_ | Deployment tracks the auto-created DGD when AutoApply is true.<br />Contains name, namespace, state, and creation status of the managed DGD. | | Optional: {} <br /> |
#### DynamoGraphDeploymentSpec #### DynamoGraphDeploymentSpec
...@@ -323,24 +342,6 @@ _Appears in:_ ...@@ -323,24 +342,6 @@ _Appears in:_
| `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#condition-v1-meta) array_ | Conditions contains the latest observed conditions of the graph deployment.<br />The slice is merged by type on patch updates. | | | | `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#condition-v1-meta) array_ | Conditions contains the latest observed conditions of the graph deployment.<br />The slice is merged by type on patch updates. | | |
#### GPUSpec
GPUSpec defines optional GPU type specification.
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `type` _string_ | Type specifies the GPU type (e.g., "h200", "h100", "a100"). | | Optional: {} <br /> |
| `minNumGPUsPerEngine` _integer_ | MinNumGPUsPerEngine specifies the minimum number of GPUs per engine for profiling. | 1 | Minimum: 1 <br />Optional: {} <br /> |
| `maxNumGPUsPerEngine` _integer_ | MaxNumGPUsPerEngine specifies the maximum number of GPUs per engine for profiling. | 8 | Minimum: 1 <br />Optional: {} <br /> |
#### IngressSpec #### IngressSpec
...@@ -424,23 +425,9 @@ _Appears in:_ ...@@ -424,23 +425,9 @@ _Appears in:_
ProfilingConfigSpec defines the profiling configuration. ProfilingConfigSpec defines configuration for the profiling process.
This structure maps directly to the profile_sla.py config format.
See benchmarks/profiler/utils/profiler_argparse.py for the complete schema.
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `configMapRef` _[ConfigMapKeySelector](#configmapkeyselector)_ | ConfigMapRef is a reference to a ConfigMap containing the profiling configuration.<br />The ConfigMap should contain a key (default: "disagg.yaml") with the configuration file.<br />Can be used for both online and offline (AIC) profiling. | | Optional: {} <br /> |
#### SLASpec
SLASpec defines the Service Level Agreement profiling targets.
...@@ -449,10 +436,8 @@ _Appears in:_ ...@@ -449,10 +436,8 @@ _Appears in:_
| Field | Description | Default | Validation | | Field | Description | Default | Validation |
| --- | --- | --- | --- | | --- | --- | --- | --- |
| `itl` _integer_ | ITL is the target Inter-Token Latency in milliseconds. | | Required: {} <br /> | | `config` _[JSON](#json)_ | Config is the profiling configuration as arbitrary JSON/YAML. This will be passed directly to the profiler.<br />The profiler will validate the configuration and report any errors. | | Optional: {} <br />Type: object <br /> |
| `ttft` _integer_ | TTFT is the target Time To First Token in milliseconds. | | Required: {} <br /> | | `configMapRef` _[ConfigMapKeySelector](#configmapkeyselector)_ | ConfigMapRef is an optional reference to a ConfigMap containing the DynamoGraphDeployment<br />base config file (disagg.yaml). This is separate from the profiling config above.<br />The path to this config will be set as engine.config in the profiling config. | | Optional: {} <br /> |
| `isl` _integer_ | ISL is the Input Sequence Length for profiling. | | Minimum: 1 <br />Required: {} <br /> |
| `osl` _integer_ | OSL is the Output Sequence Length for profiling. | | Minimum: 1 <br />Required: {} <br /> |
#### SharedMemorySpec #### SharedMemorySpec
...@@ -588,7 +573,16 @@ For larger models (typically >70B parameters) or slower storage systems, you may ...@@ -588,7 +573,16 @@ For larger models (typically >70B parameters) or slower storage systems, you may
For multinode deployments, the operator modifies probes based on the backend framework and node role: For multinode deployments, the operator modifies probes based on the backend framework and node role:
#### VLLM Backend #### VLLM Backend
The operator automatically selects between two deployment modes based on parallelism configuration:
**Ray-Based Mode** (when `world_size > GPUs_per_node`):
- **Worker nodes**: All probes (liveness, readiness, startup) are removed
- **Leader nodes**: All probes remain active
**Data Parallel Mode** (when `world_size × data_parallel_size > GPUs_per_node`):
- **Worker nodes**: All probes (liveness, readiness, startup) are removed - **Worker nodes**: All probes (liveness, readiness, startup) are removed
- **Leader nodes**: All probes remain active
#### SGLang Backend #### SGLang Backend
- **Worker nodes**: All probes (liveness, readiness, startup) are removed - **Worker nodes**: All probes (liveness, readiness, startup) are removed
...@@ -686,7 +680,8 @@ Default container ports are configured based on component type: ...@@ -686,7 +680,8 @@ Default container ports are configured based on component type:
## Backend-Specific Configurations ## Backend-Specific Configurations
### VLLM ### VLLM
- **Ray Head Port**: 6379 (for multinode deployments) - **Ray Head Port**: 6379 (for Ray-based multinode deployments)
- **Data Parallel RPC Port**: 13445 (for data parallel multinode deployments)
### SGLang ### SGLang
- **Distribution Init Port**: 29500 (for multinode deployments) - **Distribution Init Port**: 29500 (for multinode deployments)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment