Unverified Commit 4e6493ae authored by hhzhang16's avatar hhzhang16 Committed by GitHub
Browse files

feat: add useMocker field to DGDR (#4813)


Signed-off-by: default avatarHannah Zhang <hannahz@nvidia.com>
parent c341ee8d
...@@ -96,8 +96,9 @@ spec: ...@@ -96,8 +96,9 @@ spec:
type: boolean type: boolean
backend: backend:
description: |- description: |-
Backend specifies the inference backend to use. Backend specifies the inference backend for profiling.
The controller automatically sets this value in profilingConfig.config.engine.backend. The controller automatically sets this value in profilingConfig.config.engine.backend.
Profiling runs on real GPUs or via AIC simulation to collect performance data.
enum: enum:
- vllm - vllm
- sglang - sglang
...@@ -304,6 +305,15 @@ spec: ...@@ -304,6 +305,15 @@ spec:
required: required:
- profilerImage - profilerImage
type: object type: object
useMocker:
default: false
description: |-
UseMocker indicates whether to deploy a mocker DynamoGraphDeployment instead of
a real backend deployment. When true, the deployment uses simulated engines that
don't require GPUs, using the profiling data to simulate realistic timing behavior.
Mocker is available in all backend images and useful for large-scale experiments.
Profiling still runs against the real backend (specified above) to collect performance data.
type: boolean
required: required:
- backend - backend
- model - model
...@@ -404,6 +414,7 @@ spec: ...@@ -404,6 +414,7 @@ spec:
including metadata, based on profiling results. Users can extract this to create including metadata, based on profiling results. Users can extract this to create
a DGD manually, or it's used automatically when autoApply is true. a DGD manually, or it's used automatically when autoApply is true.
Stored as RawExtension to preserve all fields including metadata. Stored as RawExtension to preserve all fields including metadata.
For mocker backends, this contains the mocker DGD spec.
type: object type: object
x-kubernetes-embedded-resource: true x-kubernetes-embedded-resource: true
x-kubernetes-preserve-unknown-fields: true x-kubernetes-preserve-unknown-fields: true
......
...@@ -129,12 +129,21 @@ type DynamoGraphDeploymentRequestSpec struct { ...@@ -129,12 +129,21 @@ type DynamoGraphDeploymentRequestSpec struct {
// +kubebuilder:validation:Required // +kubebuilder:validation:Required
Model string `json:"model"` Model string `json:"model"`
// Backend specifies the inference backend to use. // Backend specifies the inference backend for profiling.
// The controller automatically sets this value in profilingConfig.config.engine.backend. // The controller automatically sets this value in profilingConfig.config.engine.backend.
// Profiling runs on real GPUs or via AIC simulation to collect performance data.
// +kubebuilder:validation:Required // +kubebuilder:validation:Required
// +kubebuilder:validation:Enum=vllm;sglang;trtllm // +kubebuilder:validation:Enum=vllm;sglang;trtllm
Backend string `json:"backend"` Backend string `json:"backend"`
// UseMocker indicates whether to deploy a mocker DynamoGraphDeployment instead of
// a real backend deployment. When true, the deployment uses simulated engines that
// don't require GPUs, using the profiling data to simulate realistic timing behavior.
// Mocker is available in all backend images and useful for large-scale experiments.
// Profiling still runs against the real backend (specified above) to collect performance data.
// +kubebuilder:default=false
UseMocker bool `json:"useMocker,omitempty"`
// EnableGpuDiscovery controls whether the profiler should automatically discover GPU // EnableGpuDiscovery controls whether the profiler should automatically discover GPU
// resources from the Kubernetes cluster nodes. When enabled, the profiler will override // resources from the Kubernetes cluster nodes. When enabled, the profiler will override
// any manually specified hardware configuration (min_num_gpus_per_engine, max_num_gpus_per_engine, // any manually specified hardware configuration (min_num_gpus_per_engine, max_num_gpus_per_engine,
...@@ -213,6 +222,7 @@ type DynamoGraphDeploymentRequestStatus struct { ...@@ -213,6 +222,7 @@ type DynamoGraphDeploymentRequestStatus struct {
// including metadata, based on profiling results. Users can extract this to create // including metadata, based on profiling results. Users can extract this to create
// a DGD manually, or it's used automatically when autoApply is true. // a DGD manually, or it's used automatically when autoApply is true.
// Stored as RawExtension to preserve all fields including metadata. // Stored as RawExtension to preserve all fields including metadata.
// For mocker backends, this contains the mocker DGD spec.
// +kubebuilder:validation:Optional // +kubebuilder:validation:Optional
// +kubebuilder:pruning:PreserveUnknownFields // +kubebuilder:pruning:PreserveUnknownFields
// +kubebuilder:validation:EmbeddedResource // +kubebuilder:validation:EmbeddedResource
......
...@@ -96,8 +96,9 @@ spec: ...@@ -96,8 +96,9 @@ spec:
type: boolean type: boolean
backend: backend:
description: |- description: |-
Backend specifies the inference backend to use. Backend specifies the inference backend for profiling.
The controller automatically sets this value in profilingConfig.config.engine.backend. The controller automatically sets this value in profilingConfig.config.engine.backend.
Profiling runs on real GPUs or via AIC simulation to collect performance data.
enum: enum:
- vllm - vllm
- sglang - sglang
...@@ -304,6 +305,15 @@ spec: ...@@ -304,6 +305,15 @@ spec:
required: required:
- profilerImage - profilerImage
type: object type: object
useMocker:
default: false
description: |-
UseMocker indicates whether to deploy a mocker DynamoGraphDeployment instead of
a real backend deployment. When true, the deployment uses simulated engines that
don't require GPUs, using the profiling data to simulate realistic timing behavior.
Mocker is available in all backend images and useful for large-scale experiments.
Profiling still runs against the real backend (specified above) to collect performance data.
type: boolean
required: required:
- backend - backend
- model - model
...@@ -404,6 +414,7 @@ spec: ...@@ -404,6 +414,7 @@ spec:
including metadata, based on profiling results. Users can extract this to create including metadata, based on profiling results. Users can extract this to create
a DGD manually, or it's used automatically when autoApply is true. a DGD manually, or it's used automatically when autoApply is true.
Stored as RawExtension to preserve all fields including metadata. Stored as RawExtension to preserve all fields including metadata.
For mocker backends, this contains the mocker DGD spec.
type: object type: object
x-kubernetes-embedded-resource: true x-kubernetes-embedded-resource: true
x-kubernetes-preserve-unknown-fields: true x-kubernetes-preserve-unknown-fields: true
......
...@@ -118,10 +118,11 @@ const ( ...@@ -118,10 +118,11 @@ const (
VolumeNameProfilingOutput = "profiling-output" VolumeNameProfilingOutput = "profiling-output"
// Volume paths // Volume paths
ProfilingOutputPath = "/data" ProfilingOutputPath = "/data"
ProfilingOutputFile = "config_with_planner.yaml" ProfilingOutputFile = "config_with_planner.yaml"
ProfilingConfigPath = "/config" ProfilingOutputFileMocker = "mocker_config_with_planner.yaml"
ProfilingConfigFile = "disagg.yaml" ProfilingConfigPath = "/config"
ProfilingConfigFile = "disagg.yaml"
// Command line arguments // Command line arguments
ArgModel = "--model" ArgModel = "--model"
...@@ -202,6 +203,13 @@ data: ...@@ -202,6 +203,13 @@ data:
EOF EOF
sed 's/^/ /' {{.OutputPath}}/{{.OutputFile}} >> /tmp/cm.yaml sed 's/^/ /' {{.OutputPath}}/{{.OutputFile}} >> /tmp/cm.yaml
# Add mocker config (profiler always generates both real and mocker configs)
if [ -f {{.OutputPath}}/{{.MockerOutputFile}} ]; then
echo " {{.MockerOutputFile}}: |" >> /tmp/cm.yaml
sed 's/^/ /' {{.OutputPath}}/{{.MockerOutputFile}} >> /tmp/cm.yaml
echo "Added mocker config to ConfigMap"
fi
# Note: Profiling data (raw_data.npz converted to JSON) is included in the # Note: Profiling data (raw_data.npz converted to JSON) is included in the
# generated DGD YAML as a separate ConfigMap by the profiler, no need to add it here # generated DGD YAML as a separate ConfigMap by the profiler, no need to add it here
...@@ -582,6 +590,7 @@ func (r *DynamoGraphDeploymentRequestReconciler) createDGD(ctx context.Context, ...@@ -582,6 +590,7 @@ func (r *DynamoGraphDeploymentRequestReconciler) createDGD(ctx context.Context,
dgdName := generatedDGD.Name dgdName := generatedDGD.Name
dgdNamespace := dgdr.Namespace dgdNamespace := dgdr.Namespace
// Apply deployment overrides
if dgdr.Spec.DeploymentOverrides != nil { if dgdr.Spec.DeploymentOverrides != nil {
if dgdr.Spec.DeploymentOverrides.Name != "" { if dgdr.Spec.DeploymentOverrides.Name != "" {
dgdName = dgdr.Spec.DeploymentOverrides.Name dgdName = dgdr.Spec.DeploymentOverrides.Name
...@@ -987,11 +996,12 @@ func (r *DynamoGraphDeploymentRequestReconciler) createProfilingJob(ctx context. ...@@ -987,11 +996,12 @@ func (r *DynamoGraphDeploymentRequestReconciler) createProfilingJob(ctx context.
var scriptBuf bytes.Buffer var scriptBuf bytes.Buffer
err = tmpl.Execute(&scriptBuf, map[string]string{ err = tmpl.Execute(&scriptBuf, map[string]string{
"OutputPath": ProfilingOutputPath, "OutputPath": ProfilingOutputPath,
"OutputFile": ProfilingOutputFile, "OutputFile": ProfilingOutputFile,
"ConfigMapName": outputConfigMapName, "MockerOutputFile": ProfilingOutputFileMocker,
"Namespace": dgdr.Namespace, "ConfigMapName": outputConfigMapName,
"DGDRName": dgdr.Name, "Namespace": dgdr.Namespace,
"DGDRName": dgdr.Name,
}) })
if err != nil { if err != nil {
return nil, false, fmt.Errorf("failed to execute sidecar script template: %w", err) return nil, false, fmt.Errorf("failed to execute sidecar script template: %w", err)
...@@ -1265,7 +1275,7 @@ func (r *DynamoGraphDeploymentRequestReconciler) getProfilingJobErrorDetails(ctx ...@@ -1265,7 +1275,7 @@ func (r *DynamoGraphDeploymentRequestReconciler) getProfilingJobErrorDetails(ctx
// generateDGDSpec generates DGD spec from profiling results (online or offline/AIC) // generateDGDSpec generates DGD spec from profiling results (online or offline/AIC)
func (r *DynamoGraphDeploymentRequestReconciler) generateDGDSpec(ctx context.Context, dgdr *nvidiacomv1alpha1.DynamoGraphDeploymentRequest) error { func (r *DynamoGraphDeploymentRequestReconciler) generateDGDSpec(ctx context.Context, dgdr *nvidiacomv1alpha1.DynamoGraphDeploymentRequest) error {
logger := log.FromContext(ctx) logger := log.FromContext(ctx)
logger.Info("Generating DGD spec from profiling results", "name", dgdr.Name) logger.Info("Generating DGD spec from profiling results", "name", dgdr.Name, "backend", dgdr.Spec.Backend)
// Read the generated spec from ConfigMap (created by sidecar) // Read the generated spec from ConfigMap (created by sidecar)
outputConfigMapName := getOutputConfigMapName(dgdr) outputConfigMapName := getOutputConfigMapName(dgdr)
...@@ -1282,18 +1292,28 @@ func (r *DynamoGraphDeploymentRequestReconciler) generateDGDSpec(ctx context.Con ...@@ -1282,18 +1292,28 @@ func (r *DynamoGraphDeploymentRequestReconciler) generateDGDSpec(ctx context.Con
return fmt.Errorf("failed to get output ConfigMap: %w", err) return fmt.Errorf("failed to get output ConfigMap: %w", err)
} }
// Select the right config file based on useMocker flag
// Profiler always generates both real and mocker configs
var outputFile string
if dgdr.Spec.UseMocker {
outputFile = ProfilingOutputFileMocker
logger.Info("Using mocker deployment config")
} else {
outputFile = ProfilingOutputFile
}
// Get YAML content from ConfigMap // Get YAML content from ConfigMap
yamlContent, exists := cm.Data[ProfilingOutputFile] yamlContent, exists := cm.Data[outputFile]
if !exists { if !exists {
return fmt.Errorf("key %s not found in ConfigMap %s", ProfilingOutputFile, outputConfigMapName) return fmt.Errorf("key %s not found in ConfigMap %s", outputFile, outputConfigMapName)
} }
logger.Info("Found profiling output in ConfigMap", "configMap", outputConfigMapName, "size", len(yamlContent)) logger.Info("Found profiling output in ConfigMap", "configMap", outputConfigMapName, "outputFile", outputFile, "size", len(yamlContent))
// Extract DGD and any supporting resources from potentially multi-document YAML (ConfigMap + DGD) // Extract DGD and any supporting resources from potentially multi-document YAML (ConfigMap + DGD)
dgd, additionalResources, err := r.extractResourcesFromYAML([]byte(yamlContent)) dgd, additionalResources, err := r.extractResourcesFromYAML([]byte(yamlContent))
if err != nil { if err != nil {
return fmt.Errorf("failed to extract DGD from %s: %w", ProfilingOutputFile, err) return fmt.Errorf("failed to extract DGD from %s: %w", outputFile, err)
} }
logger.Info("Parsed profiling output", "dgdName", dgd.Name, "additionalResources", len(additionalResources)) logger.Info("Parsed profiling output", "dgdName", dgd.Name, "additionalResources", len(additionalResources))
......
...@@ -309,7 +309,8 @@ _Appears in:_ ...@@ -309,7 +309,8 @@ _Appears in:_
| Field | Description | Default | Validation | | Field | Description | Default | Validation |
| --- | --- | --- | --- | | --- | --- | --- | --- |
| `model` _string_ | Model specifies the model to deploy (e.g., "Qwen/Qwen3-0.6B", "meta-llama/Llama-3-70b").<br />This is a high-level identifier for easy reference in kubectl output and logs.<br />The controller automatically sets this value in profilingConfig.config.deployment.model. | | Required: \{\} <br /> | | `model` _string_ | Model specifies the model to deploy (e.g., "Qwen/Qwen3-0.6B", "meta-llama/Llama-3-70b").<br />This is a high-level identifier for easy reference in kubectl output and logs.<br />The controller automatically sets this value in profilingConfig.config.deployment.model. | | Required: \{\} <br /> |
| `backend` _string_ | Backend specifies the inference backend to use.<br />The controller automatically sets this value in profilingConfig.config.engine.backend. | | Enum: [vllm sglang trtllm] <br />Required: \{\} <br /> | | `backend` _string_ | Backend specifies the inference backend for profiling.<br />The controller automatically sets this value in profilingConfig.config.engine.backend.<br />Profiling runs on real GPUs or via AIC simulation to collect performance data. | | Enum: [vllm sglang trtllm] <br />Required: \{\} <br /> |
| `useMocker` _boolean_ | UseMocker indicates whether to deploy a mocker DynamoGraphDeployment instead of<br />a real backend deployment. When true, the deployment uses simulated engines that<br />don't require GPUs, using the profiling data to simulate realistic timing behavior.<br />Mocker is available in all backend images and useful for large-scale experiments.<br />Profiling still runs against the real backend (specified above) to collect performance data. | false | |
| `enableGpuDiscovery` _boolean_ | EnableGpuDiscovery controls whether the profiler should automatically discover GPU<br />resources from the Kubernetes cluster nodes. When enabled, the profiler will override<br />any manually specified hardware configuration (min_num_gpus_per_engine, max_num_gpus_per_engine,<br />num_gpus_per_node) with values detected from the cluster.<br />Requires cluster-wide node access permissions - only available with cluster-scoped operators. | false | Optional: \{\} <br /> | | `enableGpuDiscovery` _boolean_ | EnableGpuDiscovery controls whether the profiler should automatically discover GPU<br />resources from the Kubernetes cluster nodes. When enabled, the profiler will override<br />any manually specified hardware configuration (min_num_gpus_per_engine, max_num_gpus_per_engine,<br />num_gpus_per_node) with values detected from the cluster.<br />Requires cluster-wide node access permissions - only available with cluster-scoped operators. | false | Optional: \{\} <br /> |
| `profilingConfig` _[ProfilingConfigSpec](#profilingconfigspec)_ | ProfilingConfig provides the complete configuration for the profiling job.<br />This configuration is passed directly to the profiler.<br />The structure matches the profile_sla config format exactly (see ProfilingConfigSpec for schema).<br />Note: deployment.model and engine.backend are automatically set from the high-level<br />modelName and backend fields and should not be specified in this config. | | Required: \{\} <br /> | | `profilingConfig` _[ProfilingConfigSpec](#profilingconfigspec)_ | ProfilingConfig provides the complete configuration for the profiling job.<br />This configuration is passed directly to the profiler.<br />The structure matches the profile_sla config format exactly (see ProfilingConfigSpec for schema).<br />Note: deployment.model and engine.backend are automatically set from the high-level<br />modelName and backend fields and should not be specified in this config. | | Required: \{\} <br /> |
| `autoApply` _boolean_ | AutoApply indicates whether to automatically create a DynamoGraphDeployment<br />after profiling completes. If false, only the spec is generated and stored in status.<br />Users can then manually create a DGD using the generated spec. | false | | | `autoApply` _boolean_ | AutoApply indicates whether to automatically create a DynamoGraphDeployment<br />after profiling completes. If false, only the spec is generated and stored in status.<br />Users can then manually create a DGD using the generated spec. | false | |
...@@ -335,7 +336,7 @@ _Appears in:_ ...@@ -335,7 +336,7 @@ _Appears in:_
| `observedGeneration` _integer_ | ObservedGeneration reflects the generation of the most recently observed spec.<br />Used to detect spec changes and enforce immutability after profiling starts. | | | | `observedGeneration` _integer_ | ObservedGeneration reflects the generation of the most recently observed spec.<br />Used to detect spec changes and enforce immutability after profiling starts. | | |
| `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#condition-v1-meta) array_ | Conditions contains the latest observed conditions of the deployment request.<br />Standard condition types include: Validation, Profiling, SpecGenerated, DeploymentReady.<br />Conditions are merged by type on patch updates. | | | | `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#condition-v1-meta) array_ | Conditions contains the latest observed conditions of the deployment request.<br />Standard condition types include: Validation, Profiling, SpecGenerated, DeploymentReady.<br />Conditions are merged by type on patch updates. | | |
| `profilingResults` _string_ | ProfilingResults contains a reference to the ConfigMap holding profiling data.<br />Format: "configmap/<name>" | | Optional: \{\} <br /> | | `profilingResults` _string_ | ProfilingResults contains a reference to the ConfigMap holding profiling data.<br />Format: "configmap/<name>" | | Optional: \{\} <br /> |
| `generatedDeployment` _[RawExtension](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg)_ | GeneratedDeployment contains the full generated DynamoGraphDeployment specification<br />including metadata, based on profiling results. Users can extract this to create<br />a DGD manually, or it's used automatically when autoApply is true.<br />Stored as RawExtension to preserve all fields including metadata. | | EmbeddedResource: \{\} <br />Optional: \{\} <br /> | | `generatedDeployment` _[RawExtension](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#rawextension-runtime-pkg)_ | GeneratedDeployment contains the full generated DynamoGraphDeployment specification<br />including metadata, based on profiling results. Users can extract this to create<br />a DGD manually, or it's used automatically when autoApply is true.<br />Stored as RawExtension to preserve all fields including metadata.<br />For mocker backends, this contains the mocker DGD spec. | | EmbeddedResource: \{\} <br />Optional: \{\} <br /> |
| `deployment` _[DeploymentStatus](#deploymentstatus)_ | Deployment tracks the auto-created DGD when AutoApply is true.<br />Contains name, namespace, state, and creation status of the managed DGD. | | Optional: \{\} <br /> | | `deployment` _[DeploymentStatus](#deploymentstatus)_ | Deployment tracks the auto-created DGD when AutoApply is true.<br />Contains name, namespace, state, and creation status of the managed DGD. | | Optional: \{\} <br /> |
......
...@@ -355,6 +355,29 @@ For details about the profiling process, performance plots, and interpolation da ...@@ -355,6 +355,29 @@ For details about the profiling process, performance plots, and interpolation da
## Advanced Topics ## Advanced Topics
### Mocker Deployment
Instead of a real DGD that uses GPU resources, you can deploy a mocker deployment that uses simulated engines rather than GPUs. Mocker is available in all backend images and uses profiling data to simulate realistic GPU timing behavior. It is useful for:
- Large-scale experiments without GPU resources
- Testing Planner behavior and infrastructure
- Validating deployment configurations
To deploy mocker instead of the real backend, set `useMocker: true`:
```yaml
spec:
model: <model-name>
backend: trtllm # Real backend for profiling (vllm, sglang, or trtllm)
useMocker: true # Deploy mocker instead of real backend
profilingConfig:
profilerImage: "nvcr.io/nvidia/dynamo/trtllm-runtime:<image-tag>"
...
autoApply: true
```
Profiling still runs against the real backend (via GPUs or AIC) to collect performance data. The mocker deployment then uses this data to simulate realistic timing behavior.
### DGDR Immutability ### DGDR Immutability
DGDRs are **immutable** - if you need to update SLAs or configuration: DGDRs are **immutable** - if you need to update SLAs or configuration:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment