Unverified Commit eaf11e70 authored by hhzhang16's avatar hhzhang16 Committed by GitHub
Browse files

feat(operator): Refactor DGDR to use profiler's native configuration format (#3758)


Signed-off-by: default avatarHannah Zhang <hannahz@nvidia.com>
parent 7b2f95e4
......@@ -36,7 +36,7 @@ spec:
- jsonPath: .spec.modelName
name: Model
type: string
- jsonPath: .spec.backend
- jsonPath: .status.backend
name: Backend
type: string
- jsonPath: .status.state
......@@ -94,16 +94,6 @@ spec:
after profiling completes. If false, only the spec is generated and stored in status.
Users can then manually create a DGD using the generated spec.
type: boolean
backend:
default: trtllm
description: |-
Backend specifies the inference backend framework to use.
Supported values are: "vllm", "sglang", "trtllm".
enum:
- vllm
- sglang
- trtllm
type: string
deploymentOverrides:
description: |-
DeploymentOverrides allows customizing metadata for the auto-created DGD.
......@@ -132,53 +122,29 @@ spec:
If not specified, defaults to the DGDR namespace.
type: string
type: object
gpu:
description: |-
GPU defines optional GPU type and resource specifications.
These constraints guide the profiler to find configurations within specified bounds.
properties:
maxNumGPUsPerEngine:
default: 8
description: |-
MaxNumGPUsPerEngine specifies the maximum number of GPUs per engine for profiling.
The profiler will not consider configurations with more GPUs than this value.
minimum: 1
type: integer
minNumGPUsPerEngine:
default: 1
description: |-
MinNumGPUsPerEngine specifies the minimum number of GPUs per engine for profiling.
The profiler will not consider configurations with fewer GPUs than this value.
minimum: 1
type: integer
type:
description: |-
Type specifies the GPU type to target (e.g., "h200", "h100", "a100").
If specified, profiling will focus on configurations optimized for this GPU type.
type: string
type: object
modelName:
description: |-
ModelName specifies the model to deploy (e.g., "meta/llama3-70b").
This should be a valid model identifier that the profiler can resolve.
ModelName specifies the model to deploy (e.g., "Qwen/Qwen3-0.6B", "meta-llama/Llama-3-70b").
This is a high-level identifier for easy reference in kubectl output and logs.
type: string
online:
default: false
description: |-
Online indicates whether to use online profiler (true) or AI Configurator (false).
Online profiling uses real deployments for accurate measurements (2-4 hours).
Offline profiling uses AI Configurator for fast simulation-based profiling (20-30 seconds).
type: boolean
profilingConfig:
description: |-
ProfilingConfig provides custom configuration for the profiling job.
Applicable to both online and offline (AIC) profiling modes.
ProfilingConfig provides the complete configuration for the profiling job.
This configuration is passed directly to the profiler.
The structure matches the profile_sla config format exactly (see ProfilingConfigSpec for schema).
The profiler will validate the configuration and report any errors.
properties:
config:
description: |-
Config is the profiling configuration as arbitrary JSON/YAML. This will be passed directly to the profiler.
The profiler will validate the configuration and report any errors.
type: object
x-kubernetes-preserve-unknown-fields: true
configMapRef:
description: |-
ConfigMapRef is a reference to a ConfigMap containing profiling configuration.
The ConfigMap should contain a key (default: "disagg.yaml") with the configuration file.
This configuration is used by both online and offline (AIC) profiling modes.
ConfigMapRef is an optional reference to a ConfigMap containing the DynamoGraphDeployment
base config file (disagg.yaml). This is separate from the profiling config above.
The path to this config will be set as engine.config in the profiling config.
properties:
key:
default: disagg.yaml
......@@ -191,45 +157,18 @@ spec:
- name
type: object
type: object
sla:
description: |-
SLA defines the Service Level Agreement profiling targets.
The profiler uses these targets to find an optimal deployment configuration.
properties:
isl:
default: 3000
description: |-
ISL is the Input Sequence Length for profiling.
Defines the length of input sequences to use during profiling tests.
minimum: 1
type: integer
itl:
default: 10
description: |-
ITL is the target Inter-Token Latency in milliseconds.
This represents the maximum time allowed between consecutive tokens in the output.
type: integer
osl:
default: 500
description: |-
OSL is the Output Sequence Length for profiling.
Defines the expected length of output sequences to generate during profiling tests.
minimum: 1
type: integer
ttft:
default: 50
description: |-
TTFT is the target Time To First Token in milliseconds.
This represents the maximum time allowed from request submission to receiving the first token.
type: integer
type: object
required:
- modelName
- sla
- profilingConfig
type: object
status:
description: Status reflects the current observed state of this deployment request.
properties:
backend:
description: |-
Backend is extracted from profilingConfig.config.engine.backend for display purposes.
This field is populated by the controller and shown in kubectl output.
type: string
conditions:
description: |-
Conditions contains the latest observed conditions of the deployment request.
......
......@@ -24,6 +24,7 @@ a high-level, SLA-driven interface for deploying machine learning models on Dyna
package v1alpha1
import (
apiextensionsv1 "k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
runtime "k8s.io/apimachinery/pkg/runtime"
)
......@@ -31,61 +32,6 @@ import (
// EDIT THIS FILE! THIS IS SCAFFOLDING FOR YOU TO OWN!
// NOTE: json tags are required. Any new fields you add must have json tags for the fields to be serialized.
// SLASpec defines Service Level Agreement targets for model profiling and deployment.
// These targets guide the profiling process to find optimal deployment configurations
// that meet the specified performance requirements.
type SLASpec struct {
// ITL is the target Inter-Token Latency in milliseconds.
// This represents the maximum time allowed between consecutive tokens in the output.
// +kubebuilder:default=10
// +optional
ITL int `json:"itl,omitempty"`
// TTFT is the target Time To First Token in milliseconds.
// This represents the maximum time allowed from request submission to receiving the first token.
// +kubebuilder:default=50
// +optional
TTFT int `json:"ttft,omitempty"`
// ISL is the Input Sequence Length for profiling.
// Defines the length of input sequences to use during profiling tests.
// +kubebuilder:default=3000
// +kubebuilder:validation:Minimum=1
// +optional
ISL int `json:"isl,omitempty"`
// OSL is the Output Sequence Length for profiling.
// Defines the expected length of output sequences to generate during profiling tests.
// +kubebuilder:default=500
// +kubebuilder:validation:Minimum=1
// +optional
OSL int `json:"osl,omitempty"`
}
// GPUSpec defines optional GPU type and resource specifications for profiling and deployment.
// These constraints help narrow down the search space during profiling to find configurations
// that fit within specified hardware bounds.
type GPUSpec struct {
// Type specifies the GPU type to target (e.g., "h200", "h100", "a100").
// If specified, profiling will focus on configurations optimized for this GPU type.
// +kubebuilder:validation:Optional
Type string `json:"type,omitempty"`
// MinNumGPUsPerEngine specifies the minimum number of GPUs per engine for profiling.
// The profiler will not consider configurations with fewer GPUs than this value.
// +kubebuilder:validation:Optional
// +kubebuilder:validation:Minimum=1
// +kubebuilder:default=1
MinNumGPUsPerEngine int `json:"minNumGPUsPerEngine,omitempty"`
// MaxNumGPUsPerEngine specifies the maximum number of GPUs per engine for profiling.
// The profiler will not consider configurations with more GPUs than this value.
// +kubebuilder:validation:Optional
// +kubebuilder:validation:Minimum=1
// +kubebuilder:default=8
MaxNumGPUsPerEngine int `json:"maxNumGPUsPerEngine,omitempty"`
}
// ConfigMapKeySelector selects a specific key from a ConfigMap.
// Used to reference external configuration data stored in ConfigMaps.
type ConfigMapKeySelector struct {
......@@ -99,11 +45,19 @@ type ConfigMapKeySelector struct {
}
// ProfilingConfigSpec defines configuration for the profiling process.
// Allows users to provide custom profiling parameters via ConfigMap references.
// This structure maps directly to the profile_sla.py config format.
// See benchmarks/profiler/utils/profiler_argparse.py for the complete schema.
type ProfilingConfigSpec struct {
// ConfigMapRef is a reference to a ConfigMap containing profiling configuration.
// The ConfigMap should contain a key (default: "disagg.yaml") with the configuration file.
// This configuration is used by both online and offline (AIC) profiling modes.
// Config is the profiling configuration as arbitrary JSON/YAML. This will be passed directly to the profiler.
// The profiler will validate the configuration and report any errors.
// +kubebuilder:validation:Optional
// +kubebuilder:pruning:PreserveUnknownFields
// +kubebuilder:validation:Type=object
Config *apiextensionsv1.JSON `json:"config,omitempty"`
// ConfigMapRef is an optional reference to a ConfigMap containing the DynamoGraphDeployment
// base config file (disagg.yaml). This is separate from the profiling config above.
// The path to this config will be set as engine.config in the profiling config.
// +kubebuilder:validation:Optional
ConfigMapRef *ConfigMapKeySelector `json:"configMapRef,omitempty"`
}
......@@ -135,32 +89,17 @@ type DeploymentOverridesSpec struct {
// This CRD serves as the primary interface for users to request model deployments with
// specific performance constraints and resource requirements, enabling SLA-driven deployments.
type DynamoGraphDeploymentRequestSpec struct {
// ModelName specifies the model to deploy (e.g., "meta/llama3-70b").
// This should be a valid model identifier that the profiler can resolve.
// ModelName specifies the model to deploy (e.g., "Qwen/Qwen3-0.6B", "meta-llama/Llama-3-70b").
// This is a high-level identifier for easy reference in kubectl output and logs.
// +kubebuilder:validation:Required
ModelName string `json:"modelName"`
// Backend specifies the inference backend framework to use.
// Supported values are: "vllm", "sglang", "trtllm".
// +kubebuilder:validation:Enum=vllm;sglang;trtllm
// +kubebuilder:default=trtllm
Backend string `json:"backend,omitempty"`
// SLA defines the Service Level Agreement profiling targets.
// The profiler uses these targets to find an optimal deployment configuration.
// ProfilingConfig provides the complete configuration for the profiling job.
// This configuration is passed directly to the profiler.
// The structure matches the profile_sla config format exactly (see ProfilingConfigSpec for schema).
// The profiler will validate the configuration and report any errors.
// +kubebuilder:validation:Required
SLA SLASpec `json:"sla"`
// GPU defines optional GPU type and resource specifications.
// These constraints guide the profiler to find configurations within specified bounds.
// +kubebuilder:validation:Optional
GPU *GPUSpec `json:"gpu,omitempty"`
// Online indicates whether to use online profiler (true) or AI Configurator (false).
// Online profiling uses real deployments for accurate measurements (2-4 hours).
// Offline profiling uses AI Configurator for fast simulation-based profiling (20-30 seconds).
// +kubebuilder:default=false
Online bool `json:"online,omitempty"`
ProfilingConfig ProfilingConfigSpec `json:"profilingConfig"`
// AutoApply indicates whether to automatically create a DynamoGraphDeployment
// after profiling completes. If false, only the spec is generated and stored in status.
......@@ -172,11 +111,6 @@ type DynamoGraphDeploymentRequestSpec struct {
// Only applicable when AutoApply is true.
// +kubebuilder:validation:Optional
DeploymentOverrides *DeploymentOverridesSpec `json:"deploymentOverrides,omitempty"`
// ProfilingConfig provides custom configuration for the profiling job.
// Applicable to both online and offline (AIC) profiling modes.
// +kubebuilder:validation:Optional
ProfilingConfig *ProfilingConfigSpec `json:"profilingConfig,omitempty"`
}
// DeploymentStatus tracks the state of an auto-created DynamoGraphDeployment.
......@@ -205,6 +139,11 @@ type DynamoGraphDeploymentRequestStatus struct {
// Empty string ("") represents the initial state before initialization.
State string `json:"state,omitempty"`
// Backend is extracted from profilingConfig.config.engine.backend for display purposes.
// This field is populated by the controller and shown in kubectl output.
// +kubebuilder:validation:Optional
Backend string `json:"backend,omitempty"`
// ObservedGeneration reflects the generation of the most recently observed spec.
// Used to detect spec changes and enforce immutability after profiling starts.
ObservedGeneration int64 `json:"observedGeneration,omitempty"`
......@@ -253,7 +192,7 @@ type DynamoGraphDeploymentRequestStatus struct {
// +kubebuilder:subresource:status
// +kubebuilder:resource:shortName=dgdr
// +kubebuilder:printcolumn:name="Model",type=string,JSONPath=`.spec.modelName`
// +kubebuilder:printcolumn:name="Backend",type=string,JSONPath=`.spec.backend`
// +kubebuilder:printcolumn:name="Backend",type=string,JSONPath=`.status.backend`
// +kubebuilder:printcolumn:name="State",type=string,JSONPath=`.status.state`
// +kubebuilder:printcolumn:name="DGD-State",type=string,JSONPath=`.status.deployment.state`
// +kubebuilder:printcolumn:name="Age",type="date",JSONPath=".metadata.creationTimestamp"
......
......@@ -41,6 +41,7 @@ import (
"github.com/ai-dynamo/dynamo/deploy/cloud/operator/api/dynamo/common"
"k8s.io/api/autoscaling/v2"
"k8s.io/api/core/v1"
apiextensionsv1 "k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
)
......@@ -499,22 +500,12 @@ func (in *DynamoGraphDeploymentRequestList) DeepCopyObject() runtime.Object {
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *DynamoGraphDeploymentRequestSpec) DeepCopyInto(out *DynamoGraphDeploymentRequestSpec) {
*out = *in
out.SLA = in.SLA
if in.GPU != nil {
in, out := &in.GPU, &out.GPU
*out = new(GPUSpec)
**out = **in
}
in.ProfilingConfig.DeepCopyInto(&out.ProfilingConfig)
if in.DeploymentOverrides != nil {
in, out := &in.DeploymentOverrides, &out.DeploymentOverrides
*out = new(DeploymentOverridesSpec)
(*in).DeepCopyInto(*out)
}
if in.ProfilingConfig != nil {
in, out := &in.ProfilingConfig, &out.ProfilingConfig
*out = new(ProfilingConfigSpec)
(*in).DeepCopyInto(*out)
}
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new DynamoGraphDeploymentRequestSpec.
......@@ -626,21 +617,6 @@ func (in *DynamoGraphDeploymentStatus) DeepCopy() *DynamoGraphDeploymentStatus {
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *GPUSpec) DeepCopyInto(out *GPUSpec) {
*out = *in
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new GPUSpec.
func (in *GPUSpec) DeepCopy() *GPUSpec {
if in == nil {
return nil
}
out := new(GPUSpec)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *IngressSpec) DeepCopyInto(out *IngressSpec) {
*out = *in
......@@ -754,6 +730,11 @@ func (in *PVC) DeepCopy() *PVC {
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *ProfilingConfigSpec) DeepCopyInto(out *ProfilingConfigSpec) {
*out = *in
if in.Config != nil {
in, out := &in.Config, &out.Config
*out = new(apiextensionsv1.JSON)
(*in).DeepCopyInto(*out)
}
if in.ConfigMapRef != nil {
in, out := &in.ConfigMapRef, &out.ConfigMapRef
*out = new(ConfigMapKeySelector)
......@@ -771,21 +752,6 @@ func (in *ProfilingConfigSpec) DeepCopy() *ProfilingConfigSpec {
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *SLASpec) DeepCopyInto(out *SLASpec) {
*out = *in
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new SLASpec.
func (in *SLASpec) DeepCopy() *SLASpec {
if in == nil {
return nil
}
out := new(SLASpec)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *SharedMemorySpec) DeepCopyInto(out *SharedMemorySpec) {
*out = *in
......
......@@ -36,7 +36,7 @@ spec:
- jsonPath: .spec.modelName
name: Model
type: string
- jsonPath: .spec.backend
- jsonPath: .status.backend
name: Backend
type: string
- jsonPath: .status.state
......@@ -94,16 +94,6 @@ spec:
after profiling completes. If false, only the spec is generated and stored in status.
Users can then manually create a DGD using the generated spec.
type: boolean
backend:
default: trtllm
description: |-
Backend specifies the inference backend framework to use.
Supported values are: "vllm", "sglang", "trtllm".
enum:
- vllm
- sglang
- trtllm
type: string
deploymentOverrides:
description: |-
DeploymentOverrides allows customizing metadata for the auto-created DGD.
......@@ -132,53 +122,29 @@ spec:
If not specified, defaults to the DGDR namespace.
type: string
type: object
gpu:
description: |-
GPU defines optional GPU type and resource specifications.
These constraints guide the profiler to find configurations within specified bounds.
properties:
maxNumGPUsPerEngine:
default: 8
description: |-
MaxNumGPUsPerEngine specifies the maximum number of GPUs per engine for profiling.
The profiler will not consider configurations with more GPUs than this value.
minimum: 1
type: integer
minNumGPUsPerEngine:
default: 1
description: |-
MinNumGPUsPerEngine specifies the minimum number of GPUs per engine for profiling.
The profiler will not consider configurations with fewer GPUs than this value.
minimum: 1
type: integer
type:
description: |-
Type specifies the GPU type to target (e.g., "h200", "h100", "a100").
If specified, profiling will focus on configurations optimized for this GPU type.
type: string
type: object
modelName:
description: |-
ModelName specifies the model to deploy (e.g., "meta/llama3-70b").
This should be a valid model identifier that the profiler can resolve.
ModelName specifies the model to deploy (e.g., "Qwen/Qwen3-0.6B", "meta-llama/Llama-3-70b").
This is a high-level identifier for easy reference in kubectl output and logs.
type: string
online:
default: false
description: |-
Online indicates whether to use online profiler (true) or AI Configurator (false).
Online profiling uses real deployments for accurate measurements (2-4 hours).
Offline profiling uses AI Configurator for fast simulation-based profiling (20-30 seconds).
type: boolean
profilingConfig:
description: |-
ProfilingConfig provides custom configuration for the profiling job.
Applicable to both online and offline (AIC) profiling modes.
ProfilingConfig provides the complete configuration for the profiling job.
This configuration is passed directly to the profiler.
The structure matches the profile_sla config format exactly (see ProfilingConfigSpec for schema).
The profiler will validate the configuration and report any errors.
properties:
config:
description: |-
Config is the profiling configuration as arbitrary JSON/YAML. This will be passed directly to the profiler.
The profiler will validate the configuration and report any errors.
type: object
x-kubernetes-preserve-unknown-fields: true
configMapRef:
description: |-
ConfigMapRef is a reference to a ConfigMap containing profiling configuration.
The ConfigMap should contain a key (default: "disagg.yaml") with the configuration file.
This configuration is used by both online and offline (AIC) profiling modes.
ConfigMapRef is an optional reference to a ConfigMap containing the DynamoGraphDeployment
base config file (disagg.yaml). This is separate from the profiling config above.
The path to this config will be set as engine.config in the profiling config.
properties:
key:
default: disagg.yaml
......@@ -191,45 +157,18 @@ spec:
- name
type: object
type: object
sla:
description: |-
SLA defines the Service Level Agreement profiling targets.
The profiler uses these targets to find an optimal deployment configuration.
properties:
isl:
default: 3000
description: |-
ISL is the Input Sequence Length for profiling.
Defines the length of input sequences to use during profiling tests.
minimum: 1
type: integer
itl:
default: 10
description: |-
ITL is the target Inter-Token Latency in milliseconds.
This represents the maximum time allowed between consecutive tokens in the output.
type: integer
osl:
default: 500
description: |-
OSL is the Output Sequence Length for profiling.
Defines the expected length of output sequences to generate during profiling tests.
minimum: 1
type: integer
ttft:
default: 50
description: |-
TTFT is the target Time To First Token in milliseconds.
This represents the maximum time allowed from request submission to receiving the first token.
type: integer
type: object
required:
- modelName
- sla
- profilingConfig
type: object
status:
description: Status reflects the current observed state of this deployment request.
properties:
backend:
description: |-
Backend is extracted from profilingConfig.config.engine.backend for display purposes.
This field is populated by the controller and shown in kubectl output.
type: string
conditions:
description: |-
Conditions contains the latest observed conditions of the deployment request.
......
......@@ -18,18 +18,57 @@ kind: DynamoGraphDeploymentRequest
metadata:
name: example-llm-sla
spec:
modelName: "meta/llama3-70b"
backend: trtllm # enum: [vllm, sglang, trtllm]; default is trtllm
sla: # SLA profiling targets (all fields optional with defaults)
itl: 10 # Inter-Token Latency target in milliseconds (default: 10)
ttft: 50 # Time To First Token target in milliseconds (default: 50)
isl: 3000 # Input Sequence Length (default: 3000)
osl: 500 # Output Sequence Length (default: 500)
gpu: # optional
type: h200_sxm
minNumGPUsPerEngine: 1 # default is 1
maxNumGPUsPerEngine: 8 # default is 8
online: false # true for online profiler, false for AIC profiler
# ModelName is a high-level identifier for the model being deployed
modelName: Qwen/Qwen3-0.6B
# ProfilingConfig maps directly to the profile_sla.py config format
# See benchmarks/profiler/utils/profiler_argparse.py for complete schema
profilingConfig:
config:
# Optional: Output directory for profiling results (defaults to /data in the Job)
# output_dir: "profiling_results"
# Engine configuration
engine:
backend: trtllm # Inference backend: vllm, sglang, or trtllm
max_context_length: 16384 # Maximum context length supported by the model
is_moe_model: false # Enable MoE model support (uses TEP/DEP instead of TP)
# Hardware configuration
hardware:
min_num_gpus_per_engine: 1 # Minimum GPUs to test
max_num_gpus_per_engine: 4 # Maximum GPUs to test (limited by model's num_heads/4)
num_gpus_per_node: 8 # GPUs per node (for MoE models)
# Sweep/profiling configuration
sweep:
skip_existing_results: true # Skip configurations that already have results
prefill_interpolation_granularity: 16 # Samples for TTFT interpolation
decode_interpolation_granularity: 6 # Samples for ITL interpolation
# AI Configurator mode (fast simulation-based profiling, 20-30 seconds)
use_ai_configurator: false # Set to false for online profiling (2-4 hours)
aic_system: h200_sxm # Target GPU system for AI Configurator
aic_model_name: QWEN3_0.6B # Model name for AI Configurator
aic_backend_version: "0.20.0" # Backend version for AI Configurator
# SLA targets for profiling
sla:
isl: 3000 # Input sequence length
osl: 500 # Output sequence length
ttft: 50.0 # Time To First Token target (milliseconds)
itl: 10.0 # Inter-Token Latency target (milliseconds)
# Optional: Planner-specific arguments
# planner:
# planner_min_endpoint: 2
# # Add any other planner args here (use hyphens or underscores)
# Reference to ConfigMap containing the DGD base config (disagg.yaml)
# The path to this file will be automatically set as engine.config
configMapRef:
name: my-profiling-config
key: disagg.yaml # defaults to "disagg.yaml"
# Optional: Automatically create DynamoGraphDeployment after profiling
autoApply: true # default is false
......@@ -42,9 +81,3 @@ spec:
# team: ml-platform
# annotations:
# description: "Auto-generated from DGDR"
# Currently required for both online and offline/AIC profiling, but will be removed in the future
profilingConfig:
configMapRef:
name: my-profiling-config
key: disagg.yaml # default is "disagg.yaml"
......@@ -320,6 +320,9 @@ func (r *DynamoGraphDeploymentRequestReconciler) handleInitialState(ctx context.
// Set observedGeneration to track the spec we're processing
dgdr.Status.ObservedGeneration = dgdr.Generation
// Extract and populate backend from config for display in kubectl output
dgdr.Status.Backend = getBackendFromConfig(dgdr)
// Initialize status
r.Recorder.Event(dgdr, corev1.EventTypeNormal, EventReasonInitialized, MessageInitialized)
return r.updateStateAndRequeue(ctx, dgdr, StatePending, MessageInitialized)
......@@ -337,7 +340,7 @@ func (r *DynamoGraphDeploymentRequestReconciler) handlePendingState(ctx context.
}
// Record event with appropriate message
if dgdr.Spec.Online {
if isOnlineProfiling(dgdr) {
r.Recorder.Event(dgdr, corev1.EventTypeNormal, EventReasonProfilingJobCreated, MessageProfilingJobCreated)
} else {
r.Recorder.Event(dgdr, corev1.EventTypeNormal, EventReasonProfilingJobCreated, MessageAICProfilingJobCreated)
......@@ -670,15 +673,10 @@ func (r *DynamoGraphDeploymentRequestReconciler) handleFailedState(ctx context.C
return ctrl.Result{}, nil
}
// getProfilingJobName returns the job name for a DGDR based on profiling mode
// getProfilingJobName returns the job name for a DGDR
func getProfilingJobName(dgdr *nvidiacomv1alpha1.DynamoGraphDeploymentRequest) string {
var jobNamePrefix string
if dgdr.Spec.Online {
jobNamePrefix = JobNamePrefixOnline
} else {
jobNamePrefix = JobNamePrefixAIC
}
return fmt.Sprintf("%s%s", jobNamePrefix, dgdr.Name)
// Use "profile-" prefix for all profiling jobs
return fmt.Sprintf("profile-%s", dgdr.Name)
}
// getOutputConfigMapName returns the ConfigMap name for profiling output
......@@ -686,32 +684,55 @@ func getOutputConfigMapName(dgdr *nvidiacomv1alpha1.DynamoGraphDeploymentRequest
return fmt.Sprintf("%s%s", ConfigMapOutputPrefix, dgdr.Name)
}
// validateSpec validates the DGDR spec
func (r *DynamoGraphDeploymentRequestReconciler) validateSpec(ctx context.Context, dgdr *nvidiacomv1alpha1.DynamoGraphDeploymentRequest) error {
if dgdr.Spec.ModelName == "" {
return errors.New(ValidationErrorModelNameRequired)
// isOnlineProfiling determines whether online profiling or AI Configurator is being used
// based on the sweep.use_ai_configurator config value
func isOnlineProfiling(dgdr *nvidiacomv1alpha1.DynamoGraphDeploymentRequest) bool {
if dgdr.Spec.ProfilingConfig.Config == nil {
return true
}
var config map[string]interface{}
if err := yaml.Unmarshal(dgdr.Spec.ProfilingConfig.Config.Raw, &config); err != nil {
return true // Default to online on parse error
}
if sweep, ok := config["sweep"].(map[string]interface{}); ok {
if useAIC, exists := sweep["use_ai_configurator"].(bool); exists {
return !useAIC
}
}
// Default to online profiling if not specified
return true
}
if dgdr.Spec.SLA.ITL <= 0 {
return errors.New(ValidationErrorITLPositive)
// getBackendFromConfig extracts the backend value from profilingConfig.config.engine.backend
func getBackendFromConfig(dgdr *nvidiacomv1alpha1.DynamoGraphDeploymentRequest) string {
if dgdr.Spec.ProfilingConfig.Config == nil {
return ""
}
if dgdr.Spec.SLA.TTFT <= 0 {
return errors.New(ValidationErrorTTFTPositive)
var config map[string]interface{}
if err := yaml.Unmarshal(dgdr.Spec.ProfilingConfig.Config.Raw, &config); err != nil {
return ""
}
// Validate backend
validBackends := map[string]bool{
BackendVLLM: true,
BackendSGLang: true,
BackendTRTLLM: true,
if engine, ok := config["engine"].(map[string]interface{}); ok {
if backend, ok := engine["backend"].(string); ok {
return backend
}
}
if dgdr.Spec.Backend != "" && !validBackends[dgdr.Spec.Backend] {
return fmt.Errorf(ValidationErrorInvalidBackend, dgdr.Spec.Backend)
return ""
}
// validateSpec validates the DGDR spec
func (r *DynamoGraphDeploymentRequestReconciler) validateSpec(ctx context.Context, dgdr *nvidiacomv1alpha1.DynamoGraphDeploymentRequest) error {
// Basic validation - check that profilingConfig.config is provided
if dgdr.Spec.ProfilingConfig.Config == nil || len(dgdr.Spec.ProfilingConfig.Config.Raw) == 0 {
return errors.New("profilingConfig.config is required and must not be empty")
}
// Validate ConfigMap if provided (for both online and offline/AIC profiling)
if dgdr.Spec.ProfilingConfig != nil && dgdr.Spec.ProfilingConfig.ConfigMapRef != nil {
// Validate ConfigMap if provided (for the DGD base config)
if dgdr.Spec.ProfilingConfig.ConfigMapRef != nil {
cm := &corev1.ConfigMap{}
err := r.Get(ctx, types.NamespacedName{
Name: dgdr.Spec.ProfilingConfig.ConfigMapRef.Name,
......@@ -737,6 +758,24 @@ func (r *DynamoGraphDeploymentRequestReconciler) validateSpec(ctx context.Contex
}
}
// Parse config to validate structure
var config map[string]interface{}
if err := yaml.Unmarshal(dgdr.Spec.ProfilingConfig.Config.Raw, &config); err != nil {
return fmt.Errorf("failed to parse profilingConfig.config: %w", err)
}
// Additional validation: Ensure engine.config is set (either as path or will be set from ConfigMapRef)
engineConfig, hasEngine := config["engine"].(map[string]interface{})
if hasEngine {
_, hasConfig := engineConfig["config"]
if !hasConfig && dgdr.Spec.ProfilingConfig.ConfigMapRef == nil {
return errors.New("either profilingConfig.config.engine.config must be set, or profilingConfig.configMapRef must be provided")
}
} else if dgdr.Spec.ProfilingConfig.ConfigMapRef == nil {
return errors.New("profilingConfig.config must contain 'engine' section, or profilingConfig.configMapRef must be provided")
}
// The profiler will validate the rest of the configuration
return nil
}
......@@ -757,33 +796,48 @@ func (r *DynamoGraphDeploymentRequestReconciler) createProfilingJob(ctx context.
}
}
// Use ProfilerImage for both online and offline (AIC) profiling
imageName := r.ProfilerImage
if imageName == "" {
return fmt.Errorf("profiler image not configured: the operator's profilerImage must be set in the Helm chart values (dynamo-operator.dynamo.dgdr.profilerImage). The image must contain the ai-dynamo profiler (python -m benchmarks.profiler.profile_sla entrypoint). For development, build from the ai-dynamo repository Dockerfile and push to your registry. A public image will be available in release 0.6.1")
// Use SyncResource to create/update the job
modified, job, err := commonController.SyncResource(ctx, r, dgdr, func(ctx context.Context) (*batchv1.Job, bool, error) {
jobName := getProfilingJobName(dgdr)
outputConfigMapName := getOutputConfigMapName(dgdr)
// Parse the profiling config from JSON
var config map[string]interface{}
if err := yaml.Unmarshal(dgdr.Spec.ProfilingConfig.Config.Raw, &config); err != nil {
return nil, false, fmt.Errorf("failed to parse profiling config: %w", err)
}
logger.Info("Using profiler image", "image", imageName, "online", dgdr.Spec.Online)
// Set deployment.namespace if not already set
if _, hasDeployment := config["deployment"]; !hasDeployment {
config["deployment"] = make(map[string]interface{})
}
deploymentConfig := config["deployment"].(map[string]interface{})
if _, hasNamespace := deploymentConfig["namespace"]; !hasNamespace {
deploymentConfig["namespace"] = dgdr.Namespace
}
// Determine label based on profiling mode
var labelValue string
if dgdr.Spec.Online {
labelValue = LabelValueDynamoProfiler
} else {
labelValue = LabelValueAICProfiler
// Set output_dir if not already set
if _, hasOutputDir := config["output_dir"]; !hasOutputDir {
config["output_dir"] = ProfilingOutputPath
}
// Use SyncResource to create/update the job
modified, job, err := commonController.SyncResource(ctx, r, dgdr, func(ctx context.Context) (*batchv1.Job, bool, error) {
jobName := getProfilingJobName(dgdr)
outputConfigMapName := getOutputConfigMapName(dgdr)
// If ConfigMapRef is provided, set engine.config path
if dgdr.Spec.ProfilingConfig.ConfigMapRef != nil {
if _, hasEngine := config["engine"]; !hasEngine {
config["engine"] = make(map[string]interface{})
}
engineConfig := config["engine"].(map[string]interface{})
engineConfig["config"] = fmt.Sprintf("%s/%s", ProfilingConfigPath, ProfilingConfigFile)
}
// Build profiler container based on online vs offline (AIC) mode
var profilerArgs []string
var profilerEnv []corev1.EnvVar
// Serialize config to YAML for passing to profiler
configYAML, err := yaml.Marshal(config)
if err != nil {
return nil, false, fmt.Errorf("failed to marshal profiling config to YAML: %w", err)
}
// Common environment variables
profilerEnv = []corev1.EnvVar{
profilerEnv := []corev1.EnvVar{
{
Name: "HUGGING_FACE_HUB_TOKEN",
ValueFrom: &corev1.EnvVarSource{
......@@ -805,7 +859,7 @@ func (r *DynamoGraphDeploymentRequestReconciler) createProfilingJob(ctx context.
},
}
// Build container with volume mounts
// Build volume mounts
volumeMounts := []corev1.VolumeMount{
{
Name: VolumeNameProfilingOutput,
......@@ -813,49 +867,8 @@ func (r *DynamoGraphDeploymentRequestReconciler) createProfilingJob(ctx context.
},
}
// Determine GPU range for profiling
minGPUs := 1
maxGPUs := 8
if dgdr.Spec.GPU != nil {
if dgdr.Spec.GPU.MinNumGPUsPerEngine > 0 {
minGPUs = dgdr.Spec.GPU.MinNumGPUsPerEngine
}
if dgdr.Spec.GPU.MaxNumGPUsPerEngine > 0 {
maxGPUs = dgdr.Spec.GPU.MaxNumGPUsPerEngine
}
}
// Build common profiler args (shared by both online and offline modes)
profilerArgs = []string{
"--namespace", dgdr.Namespace,
"--backend", dgdr.Spec.Backend,
"--ttft", fmt.Sprintf("%d", dgdr.Spec.SLA.TTFT),
"--itl", fmt.Sprintf("%d", dgdr.Spec.SLA.ITL),
"--isl", fmt.Sprintf("%d", dgdr.Spec.SLA.ISL),
"--osl", fmt.Sprintf("%d", dgdr.Spec.SLA.OSL),
"--output-dir", ProfilingOutputPath,
"--min-num-gpus-per-engine", fmt.Sprintf("%d", minGPUs),
"--max-num-gpus-per-engine", fmt.Sprintf("%d", maxGPUs),
}
// Add mode-specific args
if !dgdr.Spec.Online {
// Offline (AIC) profiling: add AI Configurator args
profilerArgs = append(profilerArgs,
"--use-ai-configurator",
"--aic-model-name", dgdr.Spec.ModelName,
"--aic-backend-version", "0.20.0", // TODO: don't hardcode this
)
// Add AIC-specific GPU system type
if dgdr.Spec.GPU != nil && dgdr.Spec.GPU.Type != "" {
profilerArgs = append(profilerArgs, "--aic-system", dgdr.Spec.GPU.Type)
}
}
// Add config if provided (for both online and offline modes)
if dgdr.Spec.ProfilingConfig != nil && dgdr.Spec.ProfilingConfig.ConfigMapRef != nil {
profilerArgs = append(profilerArgs, "--config", fmt.Sprintf("%s/%s", ProfilingConfigPath, ProfilingConfigFile))
// Add ConfigMap volume mount if provided
if dgdr.Spec.ProfilingConfig.ConfigMapRef != nil {
volumeMounts = append(volumeMounts, corev1.VolumeMount{
Name: VolumeNameProfilingConfig,
MountPath: ProfilingConfigPath,
......@@ -863,6 +876,18 @@ func (r *DynamoGraphDeploymentRequestReconciler) createProfilingJob(ctx context.
})
}
// Profiler args: pass the config as an inline YAML string via --profile-config
profilerArgs := []string{
"--profile-config", string(configYAML),
}
// Determine profiler image
imageName := r.ProfilerImage
if imageName == "" {
return nil, false, fmt.Errorf("profiler image not configured: configure dynamo-operator.dynamo.dgdr.profilerImage in Helm values")
}
logger.Info("Using profiler image", "image", imageName)
profilerContainer := corev1.Container{
Name: ContainerNameProfiler,
Image: imageName,
......@@ -918,8 +943,8 @@ func (r *DynamoGraphDeploymentRequestReconciler) createProfilingJob(ctx context.
},
}}
// Add ConfigMap volume if provided (for both online and offline/AIC)
if dgdr.Spec.ProfilingConfig != nil && dgdr.Spec.ProfilingConfig.ConfigMapRef != nil {
// Add ConfigMap volume if provided
if dgdr.Spec.ProfilingConfig.ConfigMapRef != nil {
key := dgdr.Spec.ProfilingConfig.ConfigMapRef.Key
if key == "" {
key = ProfilingConfigFile
......@@ -944,6 +969,12 @@ func (r *DynamoGraphDeploymentRequestReconciler) createProfilingJob(ctx context.
// Limit retries to prevent infinite loop
backoffLimit := int32(3)
// Determine label based on whether AI Configurator is used
labelValue := LabelValueDynamoProfiler
if !isOnlineProfiling(dgdr) {
labelValue = LabelValueAICProfiler
}
job := &batchv1.Job{
ObjectMeta: metav1.ObjectMeta{
Name: jobName,
......@@ -978,11 +1009,7 @@ func (r *DynamoGraphDeploymentRequestReconciler) createProfilingJob(ctx context.
}
if modified {
if dgdr.Spec.Online {
logger.Info("Online profiling job created/updated", "job", job.Name)
} else {
logger.Info("Offline (AIC) profiling job created/updated", "job", job.Name)
}
logger.Info("Profiling job created/updated", "job", job.Name)
}
return nil
......@@ -1070,7 +1097,7 @@ func (r *DynamoGraphDeploymentRequestReconciler) getProfilingJobErrorDetails(ctx
// generateDGDSpec generates DGD spec from profiling results (online or offline/AIC)
func (r *DynamoGraphDeploymentRequestReconciler) generateDGDSpec(ctx context.Context, dgdr *nvidiacomv1alpha1.DynamoGraphDeploymentRequest) error {
logger := log.FromContext(ctx)
logger.Info("Generating DGD spec from profiling results", "name", dgdr.Name, "online", dgdr.Spec.Online)
logger.Info("Generating DGD spec from profiling results", "name", dgdr.Name)
// Read the generated spec from ConfigMap (created by sidecar)
outputConfigMapName := getOutputConfigMapName(dgdr)
......
......@@ -28,6 +28,11 @@ limitations under the License.
Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group.
This package defines the DynamoGraphDeploymentRequest (DGDR) custom resource, which provides
a high-level, SLA-driven interface for deploying machine learning models on Dynamo.
Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group.
### Resource Types
- [DynamoComponentDeployment](#dynamocomponentdeployment)
- [DynamoGraphDeployment](#dynamographdeployment)
......@@ -62,7 +67,8 @@ _Appears in:_
ConfigMapKeySelector selects a key from a ConfigMap.
ConfigMapKeySelector selects a specific key from a ConfigMap.
Used to reference external configuration data stored in ConfigMaps.
......@@ -71,15 +77,16 @@ _Appears in:_
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `name` _string_ | Name of the ConfigMap. | | Required: {} <br /> |
| `key` _string_ | Key in the ConfigMap to select. | disagg.yaml | |
| `name` _string_ | Name of the ConfigMap containing the desired data. | | Required: {} <br /> |
| `key` _string_ | Key in the ConfigMap to select. If not specified, defaults to "disagg.yaml". | disagg.yaml | |
#### DeploymentOverridesSpec
DeploymentOverridesSpec defines metadata overrides for the auto-created DGD.
DeploymentOverridesSpec allows users to customize metadata for auto-created DynamoGraphDeployments.
When autoApply is enabled, these overrides are applied to the generated DGD resource.
......@@ -88,17 +95,18 @@ _Appears in:_
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `name` _string_ | Name is the name for the created DynamoGraphDeployment.<br />If not specified, defaults to the DGDR name. | | Optional: {} <br /> |
| `namespace` _string_ | Namespace is the namespace for the created DynamoGraphDeployment.<br />If not specified, defaults to the DGDR namespace. | | Optional: {} <br /> |
| `labels` _object (keys:string, values:string)_ | Labels are additional labels to add to the DynamoGraphDeployment.<br />These are merged with auto-generated labels. | | Optional: {} <br /> |
| `annotations` _object (keys:string, values:string)_ | Annotations are additional annotations to add to the DynamoGraphDeployment. | | Optional: {} <br /> |
| `name` _string_ | Name is the desired name for the created DynamoGraphDeployment.<br />If not specified, defaults to the DGDR name. | | Optional: {} <br /> |
| `namespace` _string_ | Namespace is the desired namespace for the created DynamoGraphDeployment.<br />If not specified, defaults to the DGDR namespace. | | Optional: {} <br /> |
| `labels` _object (keys:string, values:string)_ | Labels are additional labels to add to the DynamoGraphDeployment metadata.<br />These are merged with auto-generated labels from the profiling process. | | Optional: {} <br /> |
| `annotations` _object (keys:string, values:string)_ | Annotations are additional annotations to add to the DynamoGraphDeployment metadata. | | Optional: {} <br /> |
#### DeploymentStatus
DeploymentStatus tracks the auto-created DGD status.
DeploymentStatus tracks the state of an auto-created DynamoGraphDeployment.
This status is populated when autoApply is enabled and a DGD is created.
......@@ -109,8 +117,8 @@ _Appears in:_
| --- | --- | --- | --- |
| `name` _string_ | Name is the name of the created DynamoGraphDeployment. | | |
| `namespace` _string_ | Namespace is the namespace of the created DynamoGraphDeployment. | | |
| `state` _string_ | State is the current state of the DynamoGraphDeployment.<br />This is mirrored from the DGD's status.state field. | | |
| `created` _boolean_ | Created indicates whether the DGD has been created.<br />Used to prevent recreation if DGD is deleted by user. | | |
| `state` _string_ | State is the current state of the DynamoGraphDeployment.<br />This value is mirrored from the DGD's status.state field. | | |
| `created` _boolean_ | Created indicates whether the DGD has been successfully created.<br />Used to prevent recreation if the DGD is manually deleted by users. | | |
#### DynamoComponentDeployment
......@@ -229,6 +237,19 @@ It serves as the primary interface for users to request model deployments with
specific performance and resource constraints, enabling SLA-driven deployments.
Lifecycle:
1. Initial → Pending: Validates spec and prepares for profiling
2. Pending → Profiling: Creates and runs profiling job (online or AIC)
3. Profiling → Ready/Deploying: Generates DGD spec after profiling completes
4. Deploying → Ready: When autoApply=true, monitors DGD until Ready
5. Ready: Terminal state when DGD is operational or spec is available
6. DeploymentDeleted: Terminal state when auto-created DGD is manually deleted
The spec becomes immutable once profiling starts. Users must delete and recreate
the DGDR to modify configuration after this point.
......@@ -245,9 +266,9 @@ specific performance and resource constraints, enabling SLA-driven deployments.
DynamoGraphDeploymentRequestSpec defines the desired state of DynamoGraphDeploymentRequest.
This CRD serves as the primary interface for users to request model deployments
with specific performance and resource constraints for SLA-driven deployments.
DynamoGraphDeploymentRequestSpec defines the desired state of a DynamoGraphDeploymentRequest.
This CRD serves as the primary interface for users to request model deployments with
specific performance constraints and resource requirements, enabling SLA-driven deployments.
......@@ -256,21 +277,18 @@ _Appears in:_
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `modelName` _string_ | ModelName specifies the model to deploy (e.g., "meta/llama3-70b"). | | Required: {} <br /> |
| `backend` _string_ | Backend specifies the backend framework to use. | trtllm | Enum: [vllm sglang trtllm] <br /> |
| `sla` _[SLASpec](#slaspec)_ | SLA defines the Service Level Agreement profiling targets. | | Required: {} <br /> |
| `gpu` _[GPUSpec](#gpuspec)_ | GPU defines optional GPU type specification. | | Optional: {} <br /> |
| `online` _boolean_ | Online indicates whether to use online profiler (true) or AI Configurator (false).<br />When true, uses real deployment for profiling (2-4 hours).<br />When false, uses AI Configurator for fast profiling (20-30 seconds). | false | |
| `autoApply` _boolean_ | AutoApply indicates whether to automatically create a DynamoGraphDeployment<br />after profiling completes. If false, only the spec is generated in status. | false | |
| `deploymentOverrides` _[DeploymentOverridesSpec](#deploymentoverridesspec)_ | DeploymentOverrides allows overriding metadata for the auto-created DGD.<br />Only used when AutoApply is true. | | Optional: {} <br /> |
| `profilingConfig` _[ProfilingConfigSpec](#profilingconfigspec)_ | ProfilingConfig provides configuration for the profiling job.<br />Can be used for both online and offline (AIC) profiling. | | Optional: {} <br /> |
| `modelName` _string_ | ModelName specifies the model to deploy (e.g., "Qwen/Qwen3-0.6B", "meta-llama/Llama-3-70b").<br />This is a high-level identifier for easy reference in kubectl output and logs. | | Required: {} <br /> |
| `profilingConfig` _[ProfilingConfigSpec](#profilingconfigspec)_ | ProfilingConfig provides the complete configuration for the profiling job.<br />This configuration is passed directly to the profiler.<br />The structure matches the profile_sla config format exactly (see ProfilingConfigSpec for schema).<br />The profiler will validate the configuration and report any errors. | | Required: {} <br /> |
| `autoApply` _boolean_ | AutoApply indicates whether to automatically create a DynamoGraphDeployment<br />after profiling completes. If false, only the spec is generated and stored in status.<br />Users can then manually create a DGD using the generated spec. | false | |
| `deploymentOverrides` _[DeploymentOverridesSpec](#deploymentoverridesspec)_ | DeploymentOverrides allows customizing metadata for the auto-created DGD.<br />Only applicable when AutoApply is true. | | Optional: {} <br /> |
#### DynamoGraphDeploymentRequestStatus
DynamoGraphDeploymentRequestStatus defines the observed state of DynamoGraphDeploymentRequest.
DynamoGraphDeploymentRequestStatus represents the observed state of a DynamoGraphDeploymentRequest.
The controller updates this status as the DGDR progresses through its lifecycle.
......@@ -279,12 +297,13 @@ _Appears in:_
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `state` _string_ | State is a high-level textual status of the deployment request lifecycle.<br />Possible values: "Pending", "Profiling", "Deploying", "Ready", "DeploymentDeleted", "Failed" | | |
| `observedGeneration` _integer_ | ObservedGeneration reflects the generation of the most recently observed spec.<br />Used to detect spec changes and enforce immutability. | | |
| `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#condition-v1-meta) array_ | Conditions contains the latest observed conditions of the deployment request.<br />The slice is merged by type on patch updates. | | |
| `profilingResults` _string_ | ProfilingResults contains references to the profiling data and results. | | Optional: {} <br /> |
| `generatedDeployment` _[RawExtension](#rawextension)_ | GeneratedDeployment contains the full generated DynamoGraphDeployment (including metadata)<br />based on profiling results. This can be used to create a DynamoGraphDeployment resource.<br />Stored as RawExtension to preserve all fields including metadata. | | EmbeddedResource: {} <br />Optional: {} <br /> |
| `deployment` _[DeploymentStatus](#deploymentstatus)_ | Deployment tracks the auto-created DGD if AutoApply is true. | | Optional: {} <br /> |
| `state` _string_ | State is a high-level textual status of the deployment request lifecycle.<br />Possible values: "", "Pending", "Profiling", "Deploying", "Ready", "DeploymentDeleted", "Failed"<br />Empty string ("") represents the initial state before initialization. | | |
| `backend` _string_ | Backend is extracted from profilingConfig.config.engine.backend for display purposes.<br />This field is populated by the controller and shown in kubectl output. | | Optional: {} <br /> |
| `observedGeneration` _integer_ | ObservedGeneration reflects the generation of the most recently observed spec.<br />Used to detect spec changes and enforce immutability after profiling starts. | | |
| `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#condition-v1-meta) array_ | Conditions contains the latest observed conditions of the deployment request.<br />Standard condition types include: Validation, Profiling, SpecGenerated, DeploymentReady.<br />Conditions are merged by type on patch updates. | | |
| `profilingResults` _string_ | ProfilingResults contains a reference to the ConfigMap holding profiling data.<br />Format: "configmap/<name>" | | Optional: {} <br /> |
| `generatedDeployment` _[RawExtension](#rawextension)_ | GeneratedDeployment contains the full generated DynamoGraphDeployment specification<br />including metadata, based on profiling results. Users can extract this to create<br />a DGD manually, or it's used automatically when autoApply is true.<br />Stored as RawExtension to preserve all fields including metadata. | | EmbeddedResource: {} <br />Optional: {} <br /> |
| `deployment` _[DeploymentStatus](#deploymentstatus)_ | Deployment tracks the auto-created DGD when AutoApply is true.<br />Contains name, namespace, state, and creation status of the managed DGD. | | Optional: {} <br /> |
#### DynamoGraphDeploymentSpec
......@@ -323,24 +342,6 @@ _Appears in:_
| `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#condition-v1-meta) array_ | Conditions contains the latest observed conditions of the graph deployment.<br />The slice is merged by type on patch updates. | | |
#### GPUSpec
GPUSpec defines optional GPU type specification.
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `type` _string_ | Type specifies the GPU type (e.g., "h200", "h100", "a100"). | | Optional: {} <br /> |
| `minNumGPUsPerEngine` _integer_ | MinNumGPUsPerEngine specifies the minimum number of GPUs per engine for profiling. | 1 | Minimum: 1 <br />Optional: {} <br /> |
| `maxNumGPUsPerEngine` _integer_ | MaxNumGPUsPerEngine specifies the maximum number of GPUs per engine for profiling. | 8 | Minimum: 1 <br />Optional: {} <br /> |
#### IngressSpec
......@@ -424,23 +425,9 @@ _Appears in:_
ProfilingConfigSpec defines the profiling configuration.
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `configMapRef` _[ConfigMapKeySelector](#configmapkeyselector)_ | ConfigMapRef is a reference to a ConfigMap containing the profiling configuration.<br />The ConfigMap should contain a key (default: "disagg.yaml") with the configuration file.<br />Can be used for both online and offline (AIC) profiling. | | Optional: {} <br /> |
#### SLASpec
SLASpec defines the Service Level Agreement profiling targets.
ProfilingConfigSpec defines configuration for the profiling process.
This structure maps directly to the profile_sla.py config format.
See benchmarks/profiler/utils/profiler_argparse.py for the complete schema.
......@@ -449,10 +436,8 @@ _Appears in:_
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `itl` _integer_ | ITL is the target Inter-Token Latency in milliseconds. | | Required: {} <br /> |
| `ttft` _integer_ | TTFT is the target Time To First Token in milliseconds. | | Required: {} <br /> |
| `isl` _integer_ | ISL is the Input Sequence Length for profiling. | | Minimum: 1 <br />Required: {} <br /> |
| `osl` _integer_ | OSL is the Output Sequence Length for profiling. | | Minimum: 1 <br />Required: {} <br /> |
| `config` _[JSON](#json)_ | Config is the profiling configuration as arbitrary JSON/YAML. This will be passed directly to the profiler.<br />The profiler will validate the configuration and report any errors. | | Optional: {} <br />Type: object <br /> |
| `configMapRef` _[ConfigMapKeySelector](#configmapkeyselector)_ | ConfigMapRef is an optional reference to a ConfigMap containing the DynamoGraphDeployment<br />base config file (disagg.yaml). This is separate from the profiling config above.<br />The path to this config will be set as engine.config in the profiling config. | | Optional: {} <br /> |
#### SharedMemorySpec
......@@ -588,7 +573,16 @@ For larger models (typically >70B parameters) or slower storage systems, you may
For multinode deployments, the operator modifies probes based on the backend framework and node role:
#### VLLM Backend
The operator automatically selects between two deployment modes based on parallelism configuration:
**Ray-Based Mode** (when `world_size > GPUs_per_node`):
- **Worker nodes**: All probes (liveness, readiness, startup) are removed
- **Leader nodes**: All probes remain active
**Data Parallel Mode** (when `world_size × data_parallel_size > GPUs_per_node`):
- **Worker nodes**: All probes (liveness, readiness, startup) are removed
- **Leader nodes**: All probes remain active
#### SGLang Backend
- **Worker nodes**: All probes (liveness, readiness, startup) are removed
......@@ -686,7 +680,8 @@ Default container ports are configured based on component type:
## Backend-Specific Configurations
### VLLM
- **Ray Head Port**: 6379 (for multinode deployments)
- **Ray Head Port**: 6379 (for Ray-based multinode deployments)
- **Data Parallel RPC Port**: 13445 (for data parallel multinode deployments)
### SGLang
- **Distribution Init Port**: 29500 (for multinode deployments)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment