# DGDR v1beta1 End-to-End Test Suite This directory contains the end-to-end test suite for **DynamoGraphDeploymentRequest (DGDR) v1beta1** — the high-level, SLA-driven Kubernetes API for deploying inference models with Dynamo. ## What's tested | Test group | Marker(s) | GPU req | Mocker OK? | What it covers | |---|---|---|---|---| | `TestDGDRValidation` | `gpu_0`, `pre_merge` | None | ✅ | Webhook validation: rejected/accepted specs, value enforcement, storage version, shortname | | `TestDGDRVersionConversion` | `gpu_0`, `pre_merge` | None | ✅ | v1alpha1 → v1beta1 conversion webhook | | `TestDGDRMinimalDeployment` | `gpu_1`, `pre_merge`, `e2e` | 1+ | ⚠️ see note | Full Pending → Profiling → Ready → Deploying → Deployed lifecycle | | `TestDGDRBackendSelection` | `gpu_1`, `nightly`, `e2e` | 1+ | ⚠️ vllm+trtllm only | vllm and trtllm pass; sglang **skipped** (no AIC silicon data for sglang on the mocker GPU SKU) | | `TestDGDRSearchStrategies` | `gpu_1`/`gpu_8`, `e2e` | 1 or 8 | ⚠️ rapid only | `rapid` uses AIC and works; `thorough` requires real GPU sweeps | | `TestDGDRSLATargets` | `gpu_1`, `nightly`, `e2e` | 1+ | ✅ | ttft+itl, e2eLatency, optimizationType (latency/throughput) | | `TestDGDRWorkloadPickingModes` | `gpu_1`, `nightly`, `e2e` | 1+ | ✅ | requestRate, concurrency, isl/osl | | `TestDGDRFeatures` | `gpu_1`, `nightly`, `e2e` | 1+ | ⚠️ see note | planner (rapid/none sweep), mocker | | `TestDGDRModelCache` | `gpu_1`, `nightly`, `e2e` | 1+ | ✅ | PVC-backed model cache, cache propagated to DGD | | `TestDGDRHardwareOverride` | `gpu_1`, `pre_merge`, `e2e` | ✅ | ✅ | Manual gpuSku/numGpusPerNode/totalGpus/vramMb | | `TestDGDRAutoApply` | `gpu_1`, `pre_merge`, `e2e` | 1+ | ⚠️ see note | autoApply=true **skipped** in mocker (operator race); autoApply=false keeps Ready | | `TestDGDROverrides` | `gpu_1`, `nightly`, `e2e` | 1+ | ✅ | Profiling job tolerations; DGD metadata label merging **xfail** (operator gap) | | `TestDGDRStatusAndConditions` | `gpu_1`, `pre_merge`, `e2e` | 1+ | ⚠️ see note | All conditions set correctly, sub-phases tracked, Pareto configs; all-conditions **xfail** in mocker; pareto **skipped** in mocker | | `TestDGDRImmutability` | mixed | 0–1 | ⚠️ see note | Spec rejected in Profiling/Deployed, metadata always allowed | | `TestDGDRCleanup` | `gpu_1`, `pre_merge`, `e2e` | 1+ | ⚠️ see note | Job deleted with DGDR; DGD preserved; ConfigMap cleanup **xfail** (operator gap); DGD-persistence test **skipped** in mocker | | `TestDGDRMoEModels` | `gpu_8`, `nightly`, `e2e` | 8 | ❌ | DeepSeek-R1 MoE on SGLang — requires real 8-GPU node | ## Prerequisites 1. A running Kubernetes cluster with GPU nodes (or see [GPU-free mode](#gpu-free-mocker-mode) below) 2. The Dynamo operator installed (including CRDs and webhooks) 3. `kubectl` configured and pointing at the cluster 4. Python 3.10+ with `pytest` and `pyyaml` installed: ```bash pip install pytest pyyaml # or, from the repo root: pip install -e ".[test]" ``` ## One-time cluster setup Before running any tests, ensure the following are in place in your cluster. These are required even for GPU-free (mocker) mode. ### 1. Install the Dynamo operator ```bash cd deploy/operator helm install dynamo-operator helm/dynamo-operator -n dynamo-system --create-namespace ``` ### 2. Deploy NATS Mocker workers (and real workers) connect to NATS for inter-component messaging. The operator expects NATS at `nats://dynamo-operator-nats.dynamo-system.svc.cluster.local:4222`. ```bash helm repo add nats https://nats-io.github.io/k8s/helm/charts/ helm repo update helm install dynamo-operator-nats nats/nats -n dynamo-system --create-namespace ``` ### 3. Create the HuggingFace token secret The profiling job reads the HF token from a secret named `hf-token-secret` using the key `HF_TOKEN` (not `HUGGING_FACE_HUB_TOKEN`). ```bash kubectl create secret generic hf-token-secret \ --from-literal=HF_TOKEN= \ -n default # If running in a non-default namespace, adjust -n accordingly ``` > **Important:** The key must be `HF_TOKEN`. The secret name must be `hf-token-secret`. > Using a different key name will cause the profiling job to fail silently. ## Running the tests There are two main ways to run the suite depending on whether you have GPU hardware. --- ### GPU-free (mocker mode) — recommended for local development and CI No GPU nodes required. Uses AIC simulation for profiling and mock inference workers for deployment. Covers all `gpu_0` and `gpu_1` tests (~45 tests); `gpu_8` tests are excluded because they require a real 8-GPU node even in mocker mode. ```bash python3 -m pytest tests/dgdr/ -m "gpu_0 or gpu_1" -v \ --dgdr-namespace=default \ --dgdr-image= ``` Expect: 37 passed, 6 skipped (2 model-cache PVC; sglang backend; pareto in mocker; DGD-persistence in mocker; auto-apply-true in mocker), 4 xfail (DGD label merging; all-conditions requires Deployed; dry-run immutability requires Deployed; ConfigMap cleanup on deletion). `test_backend[sglang]` is one of the 6 skips (no AIC silicon data for sglang in mocker mode). --- ### Full suite with real GPUs — for production/nightly validation Requires a Kubernetes cluster with GPU nodes. Set `--dgdr-no-mocker` to disable mocker injection and run against real hardware. `gpu_8` tests additionally require an 8-GPU node. ```bash # gpu_0 + gpu_1 tests on real GPUs (single-GPU node sufficient) python3 -m pytest tests/dgdr/ -m "gpu_0 or gpu_1" -v \ --dgdr-namespace=dynamo-test \ --dgdr-image= \ --dgdr-no-mocker \ --dgdr-profiling-timeout=3600 \ --dgdr-deploy-timeout=1800 # Full nightly suite including 8-GPU tests python3 -m pytest tests/dgdr/ -v \ --dgdr-namespace=dynamo-test \ --dgdr-image= \ --dgdr-no-mocker \ --dgdr-pvc-name=model-cache \ --dgdr-profiling-timeout=14400 \ --dgdr-deploy-timeout=3600 ``` Expect (gpu_0 + gpu_1, with `--dgdr-pvc-name`): **~43 passed, 0 skipped, 2 xfail** (DGD label-merging operator gap; ConfigMap cleanup operator gap). Without `--dgdr-pvc-name`: 2 additional skips for the model-cache tests. > **Note:** Two xfails are **permanent operator gaps** that persist in both mocker and GPU mode: > - `test_dgd_override_injects_custom_labels` — the operator does not yet merge `spec.overrides.dgd.metadata.labels` onto the created DGD. > - `test_deletion_removes_output_configmap` — the operator's `FinalizeResource` is a no-op and does not delete the output ConfigMap on DGDR deletion. > All other mocker-mode xfails/skips disappear in GPU mode and are expected to pass. --- ### Other useful invocations ```bash # Validation + conversion tests only (no cluster setup required beyond CRDs) python3 -m pytest tests/dgdr/ -m "gpu_0" -v \ --dgdr-namespace=default \ --dgdr-image= # Pre-merge gate (GPU-free) python3 -m pytest tests/dgdr/ -m "pre_merge" -v \ --dgdr-namespace=default \ --dgdr-image= # Single test class python3 -m pytest tests/dgdr/test_dgdr_v1beta1.py::TestDGDRAutoApply -v \ --dgdr-namespace=default \ --dgdr-image= ``` ## CLI options | Option | Default | Description | |---|---|---| | `--dgdr-namespace` | _(required)_ | Kubernetes namespace for test resources | | `--dgdr-image` | _(required)_ | Container image for profiling and inference workers | | `--dgdr-model` | `Qwen/Qwen3-0.6B` | HuggingFace model ID used by most tests | | `--dgdr-backend` | `vllm` | Default backend for DGDR tests | | `--dgdr-pvc-name` | _(empty)_ | PVC name holding pre-downloaded model weights (PVC tests are skipped if unset) | | `--dgdr-profiling-timeout` | `3600` | Seconds to wait for profiling to complete | | `--dgdr-deploy-timeout` | `600` | Seconds to wait for DGD to reach Deployed phase | | `--dgdr-no-mocker` | `false` | Disable mocker mode (require real GPU nodes) | ## DGDR v1beta1 feature coverage matrix The following spec fields are exercised by at least one test: | Field | Tests that exercise it | |---|---| | `spec.model` | All tests | | `spec.backend` (auto/vllm/sglang/trtllm) | `TestDGDRBackendSelection`, `TestDGDRValidation` | | `spec.image` | All tests | | `spec.searchStrategy` (rapid/thorough) | `TestDGDRSearchStrategies` | | `spec.sla.ttft` + `spec.sla.itl` | `TestDGDRSLATargets::test_sla_ttft_and_itl` | | `spec.sla.e2eLatency` | `TestDGDRSLATargets::test_sla_e2e_latency` | | `spec.sla.optimizationType` | `TestDGDRSLATargets::test_sla_optimization_type_*` | | `spec.workload.isl` + `spec.workload.osl` | `TestDGDRWorkloadPickingModes` | | `spec.workload.requestRate` | `TestDGDRWorkloadPickingModes::test_request_rate_picking` | | `spec.workload.concurrency` | `TestDGDRWorkloadPickingModes::test_concurrency_picking` | | `spec.features.planner` (opaque config) | `TestDGDRFeatures::test_planner_enabled_*` | | `spec.features.mocker.enabled` | `TestDGDRFeatures::test_mocker_enabled` | | `spec.modelCache.pvcName` | `TestDGDRModelCache` | | `spec.hardware.gpuSku` | `TestDGDRHardwareOverride::test_hardware_manual_override` | | `spec.hardware.numGpusPerNode` | `TestDGDRHardwareOverride` | | `spec.hardware.totalGpus` / `spec.hardware.vramMb` | `TestDGDRHardwareOverride::test_hardware_total_gpus_and_vram` | | `spec.autoApply` | `TestDGDRAutoApply` | | `spec.overrides.profilingJob` | `TestDGDROverrides::test_profiling_job_toleration_override` | | `spec.overrides.dgd` | `TestDGDROverrides::test_dgd_override_injects_custom_labels` | | `status.phase` | All lifecycle tests | | `status.profilingPhase` | `TestDGDRStatusAndConditions::test_profiling_sub_phase_tracked` | | `status.profilingJobName` | `TestDGDRStatusAndConditions::test_profiling_job_name_populated` | | `status.dgdName` | `TestDGDRAutoApply`, `TestDGDRMinimalDeployment` | | `status.profilingResults.selectedConfig` | Multiple | | `status.profilingResults.pareto` | `TestDGDRStatusAndConditions::test_pareto_configs_in_profiling_results` | | `status.deploymentInfo` | `TestDGDRMinimalDeployment` | | `status.conditions` (all types) | `TestDGDRStatusAndConditions` | | `status.observedGeneration` | `TestDGDRStatusAndConditions::test_observed_generation_tracks_spec` | ## GPU-free mode (default) By default, the test suite runs the full DGDR lifecycle **without any GPU nodes** by combining two simulation features: | Feature | How it's enabled | Which phase it affects | |---|---|---| | **AIC (AI Configurator)** | `searchStrategy: rapid` (the default) | **Profiling** — profiler runs CPU-only simulation instead of online GPU sweep | | **Mocker** | Enabled by default (disable with `--dgdr-no-mocker`) | **Deployment** — DGD uses mock inference workers (no GPU resources requested) | **How it works:** - `searchStrategy: rapid` is the default for v1beta1 DGDRs. The profiler automatically uses AI Configurator (AIC) simulation when rapid is set — no additional config needed. - Mocker mode is **enabled by default**. The `dgdr_factory` fixture automatically injects `spec.features.mocker.enabled: true` and a default `spec.hardware` config into every DGDR. - AIC profiling creates a Kubernetes Job that runs CPU-only (job prefix: `profile-aic-`). The profiling pod does not request GPU resources. - Mocker deployment selects the profiler's `mocker_config_with_planner.yaml` output instead of the real deployment config, resulting in DGD pods that don't request GPUs. - Pass `--dgdr-no-mocker` to disable mocker mode and run against real GPU hardware. > **Note:** Some test assertions (e.g., status.deploymentInfo.gpuCount, pareto configs) > may produce different values under mocker than under real GPU profiling. > The tests are written to validate structure and phase transitions, not exact > profiling output values, so they work correctly in both modes. > **Note:** `searchStrategy: thorough` requires online (GPU) profiling even with mocker, > since thorough performs real benchmark measurements. Use rapid for GPU-free testing. > **Note:** `TestDGDRFeatures::test_planner_enabled_with_rapid_sweep` runs with > `auto_apply=False` in mocker mode (same root cause as the note below — the operator > pre-sets `Status.DGDName` from the profiling output and then immediately fires > `handleDGDDeleted` when the DGD cannot be found). In mocker mode the test only > validates that spec generation succeeds (waits for `PHASE_READY` and checks `dgdName` > + `selectedConfig`). Full deployment with rapid sweeping is verified outside mocker > mode. `test_planner_enabled_no_pre_deployment_sweep` and `test_mocker_enabled` are > likewise restricted to `PHASE_READY` in mocker mode. > **Note:** `auto_apply=True` consistently hits `handleDGDDeleted` in mocker mode. The > operator's `generateDGDSpec` pre-populates `Status.DGDName` from the profiling output > (e.g. `mocker-disagg`) _before_ the DGD is actually created. When `handleDeployingPhase` > then runs it checks `DGDName != ""` and immediately tries to GET that DGD; since it does > not exist yet it fires `handleDGDDeleted` and the DGDR transitions to Failed. > All tests that would enter the Deploying phase in mocker mode therefore use > `auto_apply=False`/`PHASE_READY` instead (minimal lifecycle, backend selection, > mocker feature, planner-no-sweep, planner-rapid-sweep, DGD label override). > Tests whose sole purpose is to verify `auto_apply=True` DGD creation are skipped in > mocker mode (`test_auto_apply_true_creates_dgd_automatically`, > `test_deletion_does_not_remove_created_dgd`). > Non-mocker mode (real GPU cluster) is unaffected. > **Note:** `TestDGDRImmutability::test_spec_immutable_in_deployed_via_dry_run` is **xfail** > in mocker mode. The test relies on the session `deployed_dgdr` fixture which, in mocker > mode, stops at `PHASE_READY` instead of `PHASE_DEPLOYED`. The webhook's > `ValidateUpdate` immutability enforcement only activates when the DGDR is in `Deployed` > phase, so the server-dry-run mutation is accepted rather than rejected. > **Note:** `gpu_8` tests cannot be run with mocker and require a real 8-GPU node. > `TestDGDRSearchStrategies::test_thorough_strategy_completes` uses `searchStrategy: thorough` > which performs real GPU benchmark sweeps. `TestDGDRMoEModels` (DeepSeek-R1) requires 8 GPUs > for the real inference workload. Exclude them from GPU-free runs with `-m "gpu_0 or gpu_1"`. ### AIC silicon data availability AIC operates in **silicon mode**: it looks up pre-recorded per-op performance data files shipped inside the `aiconfigurator` Python package. These files are organised by `{gpu_sku}/{backend}/{backend_version}/`. The mocker fixture injects `gpuSku: a100_sxm` into every DGDR — but the package only ships vllm data for that SKU: | Backend | a100_sxm data? | Mocker result | |---|---|---| | `vllm` | ✅ present | Profiling succeeds | | `trtllm` | ✅ present | Profiling succeeds | | `sglang` | ❌ missing | Test **skipped** automatically (no `sglang/0.5.8` perf data for `a100_sxm`) | To test sglang/trtllm, run against a real GPU cluster (`--dgdr-no-mocker`) where AIC can use a GPU SKU for which those data files are present. ## Cleanup Tests clean up their own DGDRs via the `dgdr_factory` fixture. If a test is interrupted, resources can be cleaned up manually: ```bash # Delete all DGDRs created by the test suite (they are labelled automatically) kubectl delete dgdr -n default -l "test.dynamo/managed=true" # If you used a custom namespace: kubectl delete dgdr -n -l "test.dynamo/managed=true" ``` ## Architecture notes - All tests interact with the cluster **exclusively via `kubectl`** subprocess calls, consistent with the rest of the Dynamo test suite. - The `dgdr_factory` fixture ensures DGDR cleanup via `yield` regardless of test outcome. - Tests that require an optional PVC (`--dgdr-pvc-name`) skip automatically when the option is not provided. - Timeout values are configurable to accommodate clusters with varying profiling speeds.