README.md 15.6 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
# DGDR v1beta1 End-to-End Test Suite

This directory contains the end-to-end test suite for **DynamoGraphDeploymentRequest
(DGDR) v1beta1** — the high-level, SLA-driven Kubernetes API for deploying
inference models with Dynamo.

## What's tested

| Test group | Marker(s) | GPU req | Mocker OK? | What it covers |
|---|---|---|---|---|
| `TestDGDRValidation` | `gpu_0`, `pre_merge` | None | ✅ | Webhook validation: rejected/accepted specs, value enforcement, storage version, shortname |
| `TestDGDRVersionConversion` | `gpu_0`, `pre_merge` | None | ✅ | v1alpha1 → v1beta1 conversion webhook |
| `TestDGDRMinimalDeployment` | `gpu_1`, `pre_merge`, `e2e` | 1+ | ⚠️ see note | Full Pending → Profiling → Ready → Deploying → Deployed lifecycle |
| `TestDGDRBackendSelection` | `gpu_1`, `nightly`, `e2e` | 1+ | ⚠️ vllm+trtllm only | vllm and trtllm pass; sglang **skipped** (no AIC silicon data for sglang on the mocker GPU SKU) |
| `TestDGDRSearchStrategies` | `gpu_1`/`gpu_8`, `e2e` | 1 or 8 | ⚠️ rapid only | `rapid` uses AIC and works; `thorough` requires real GPU sweeps |
| `TestDGDRSLATargets` | `gpu_1`, `nightly`, `e2e` | 1+ | ✅ | ttft+itl, e2eLatency, optimizationType (latency/throughput) |
| `TestDGDRWorkloadPickingModes` | `gpu_1`, `nightly`, `e2e` | 1+ | ✅ | requestRate, concurrency, isl/osl |
| `TestDGDRFeatures` | `gpu_1`, `nightly`, `e2e` | 1+ | ⚠️ see note | planner (rapid/none sweep), mocker |
| `TestDGDRModelCache` | `gpu_1`, `nightly`, `e2e` | 1+ | ✅ | PVC-backed model cache, cache propagated to DGD |
| `TestDGDRHardwareOverride` | `gpu_1`, `pre_merge`, `e2e` | ✅ | ✅ | Manual gpuSku/numGpusPerNode/totalGpus/vramMb |
| `TestDGDRAutoApply` | `gpu_1`, `pre_merge`, `e2e` | 1+ | ⚠️ see note | autoApply=true **skipped** in mocker (operator race); autoApply=false keeps Ready |
| `TestDGDROverrides` | `gpu_1`, `nightly`, `e2e` | 1+ | ✅ | Profiling job tolerations; DGD metadata label merging **xfail** (operator gap) |
| `TestDGDRStatusAndConditions` | `gpu_1`, `pre_merge`, `e2e` | 1+ | ⚠️ see note | All conditions set correctly, sub-phases tracked, Pareto configs; all-conditions **xfail** in mocker; pareto **skipped** in mocker |
| `TestDGDRImmutability` | mixed | 0–1 | ⚠️ see note | Spec rejected in Profiling/Deployed, metadata always allowed |
| `TestDGDRCleanup` | `gpu_1`, `pre_merge`, `e2e` | 1+ | ⚠️ see note | Job deleted with DGDR; DGD preserved; ConfigMap cleanup **xfail** (operator gap); DGD-persistence test **skipped** in mocker |
| `TestDGDRMoEModels` | `gpu_8`, `nightly`, `e2e` | 8 | ❌ | DeepSeek-R1 MoE on SGLang — requires real 8-GPU node |

## Prerequisites

1. A running Kubernetes cluster with GPU nodes (or see [GPU-free mode](#gpu-free-mocker-mode) below)
2. The Dynamo operator installed (including CRDs and webhooks)
3. `kubectl` configured and pointing at the cluster
4. Python 3.10+ with `pytest` and `pyyaml` installed:
   ```bash
   pip install pytest pyyaml
   # or, from the repo root:
   pip install -e ".[test]"
   ```

## One-time cluster setup

Before running any tests, ensure the following are in place in your cluster.
These are required even for GPU-free (mocker) mode.

### 1. Install the Dynamo operator

```bash
cd deploy/operator
helm install dynamo-operator helm/dynamo-operator -n dynamo-system --create-namespace
```

### 2. Deploy NATS

Mocker workers (and real workers) connect to NATS for inter-component messaging.
The operator expects NATS at `nats://dynamo-operator-nats.dynamo-system.svc.cluster.local:4222`.

```bash
helm repo add nats https://nats-io.github.io/k8s/helm/charts/
helm repo update
helm install dynamo-operator-nats nats/nats -n dynamo-system --create-namespace
```

### 3. Create the HuggingFace token secret

The profiling job reads the HF token from a secret named `hf-token-secret` using the
key `HF_TOKEN` (not `HUGGING_FACE_HUB_TOKEN`).

```bash
kubectl create secret generic hf-token-secret \
  --from-literal=HF_TOKEN=<your-hf-token> \
  -n default
# If running in a non-default namespace, adjust -n accordingly
```

> **Important:** The key must be `HF_TOKEN`.  The secret name must be `hf-token-secret`.
> Using a different key name will cause the profiling job to fail silently.

## Running the tests

There are two main ways to run the suite depending on whether you have GPU hardware.

---

### GPU-free (mocker mode) — recommended for local development and CI

No GPU nodes required.  Uses AIC simulation for profiling and mock inference workers
for deployment.  Covers all `gpu_0` and `gpu_1` tests (~45 tests); `gpu_8` tests are
excluded because they require a real 8-GPU node even in mocker mode.

```bash
python3 -m pytest tests/dgdr/ -m "gpu_0 or gpu_1" -v \
  --dgdr-namespace=default \
  --dgdr-image=<your-image>
```

Expect: 37 passed, 6 skipped (2 model-cache PVC; sglang backend; pareto in mocker; DGD-persistence in mocker; auto-apply-true in mocker), 4 xfail (DGD label merging; all-conditions requires Deployed; dry-run immutability requires Deployed; ConfigMap cleanup on deletion).
`test_backend[sglang]` is one of the 6 skips (no AIC silicon data for sglang in mocker mode).

---

### Full suite with real GPUs — for production/nightly validation

Requires a Kubernetes cluster with GPU nodes.  Set `--dgdr-no-mocker` to disable mocker
injection and run against real hardware.  `gpu_8` tests additionally require an 8-GPU node.

```bash
# gpu_0 + gpu_1 tests on real GPUs (single-GPU node sufficient)
python3 -m pytest tests/dgdr/ -m "gpu_0 or gpu_1" -v \
  --dgdr-namespace=dynamo-test \
  --dgdr-image=<your-image> \
  --dgdr-no-mocker \
  --dgdr-profiling-timeout=3600 \
  --dgdr-deploy-timeout=1800

# Full nightly suite including 8-GPU tests
python3 -m pytest tests/dgdr/ -v \
  --dgdr-namespace=dynamo-test \
  --dgdr-image=<your-image> \
  --dgdr-no-mocker \
  --dgdr-pvc-name=model-cache \
  --dgdr-profiling-timeout=14400 \
  --dgdr-deploy-timeout=3600
```

Expect (gpu_0 + gpu_1, with `--dgdr-pvc-name`): **~43 passed, 0 skipped, 2 xfail** (DGD label-merging operator gap; ConfigMap cleanup operator gap).
Without `--dgdr-pvc-name`: 2 additional skips for the model-cache tests.

> **Note:** Two xfails are **permanent operator gaps** that persist in both mocker and GPU mode:
> - `test_dgd_override_injects_custom_labels` — the operator does not yet merge `spec.overrides.dgd.metadata.labels` onto the created DGD.
> - `test_deletion_removes_output_configmap` — the operator's `FinalizeResource` is a no-op and does not delete the output ConfigMap on DGDR deletion.
> All other mocker-mode xfails/skips disappear in GPU mode and are expected to pass.

---

### Other useful invocations

```bash
# Validation + conversion tests only (no cluster setup required beyond CRDs)
python3 -m pytest tests/dgdr/ -m "gpu_0" -v \
  --dgdr-namespace=default \
  --dgdr-image=<your-image>

# Pre-merge gate (GPU-free)
python3 -m pytest tests/dgdr/ -m "pre_merge" -v \
  --dgdr-namespace=default \
  --dgdr-image=<your-image>

# Single test class
python3 -m pytest tests/dgdr/test_dgdr_v1beta1.py::TestDGDRAutoApply -v \
  --dgdr-namespace=default \
  --dgdr-image=<your-image>
```

## CLI options

| Option | Default | Description |
|---|---|---|
| `--dgdr-namespace` | _(required)_ | Kubernetes namespace for test resources |
| `--dgdr-image` | _(required)_ | Container image for profiling and inference workers |
| `--dgdr-model` | `Qwen/Qwen3-0.6B` | HuggingFace model ID used by most tests |
| `--dgdr-backend` | `vllm` | Default backend for DGDR tests |
| `--dgdr-pvc-name` | _(empty)_ | PVC name holding pre-downloaded model weights (PVC tests are skipped if unset) |
| `--dgdr-profiling-timeout` | `3600` | Seconds to wait for profiling to complete |
| `--dgdr-deploy-timeout` | `600` | Seconds to wait for DGD to reach Deployed phase |
| `--dgdr-no-mocker` | `false` | Disable mocker mode (require real GPU nodes) |

## DGDR v1beta1 feature coverage matrix

The following spec fields are exercised by at least one test:

| Field | Tests that exercise it |
|---|---|
| `spec.model` | All tests |
| `spec.backend` (auto/vllm/sglang/trtllm) | `TestDGDRBackendSelection`, `TestDGDRValidation` |
| `spec.image` | All tests |
| `spec.searchStrategy` (rapid/thorough) | `TestDGDRSearchStrategies` |
| `spec.sla.ttft` + `spec.sla.itl` | `TestDGDRSLATargets::test_sla_ttft_and_itl` |
| `spec.sla.e2eLatency` | `TestDGDRSLATargets::test_sla_e2e_latency` |
| `spec.sla.optimizationType` | `TestDGDRSLATargets::test_sla_optimization_type_*` |
| `spec.workload.isl` + `spec.workload.osl` | `TestDGDRWorkloadPickingModes` |
| `spec.workload.requestRate` | `TestDGDRWorkloadPickingModes::test_request_rate_picking` |
| `spec.workload.concurrency` | `TestDGDRWorkloadPickingModes::test_concurrency_picking` |
| `spec.features.planner` (opaque config) | `TestDGDRFeatures::test_planner_enabled_*` |
| `spec.features.mocker.enabled` | `TestDGDRFeatures::test_mocker_enabled` |
| `spec.modelCache.pvcName` | `TestDGDRModelCache` |
| `spec.hardware.gpuSku` | `TestDGDRHardwareOverride::test_hardware_manual_override` |
| `spec.hardware.numGpusPerNode` | `TestDGDRHardwareOverride` |
| `spec.hardware.totalGpus` / `spec.hardware.vramMb` | `TestDGDRHardwareOverride::test_hardware_total_gpus_and_vram` |
| `spec.autoApply` | `TestDGDRAutoApply` |
| `spec.overrides.profilingJob` | `TestDGDROverrides::test_profiling_job_toleration_override` |
| `spec.overrides.dgd` | `TestDGDROverrides::test_dgd_override_injects_custom_labels` |
| `status.phase` | All lifecycle tests |
| `status.profilingPhase` | `TestDGDRStatusAndConditions::test_profiling_sub_phase_tracked` |
| `status.profilingJobName` | `TestDGDRStatusAndConditions::test_profiling_job_name_populated` |
| `status.dgdName` | `TestDGDRAutoApply`, `TestDGDRMinimalDeployment` |
| `status.profilingResults.selectedConfig` | Multiple |
| `status.profilingResults.pareto` | `TestDGDRStatusAndConditions::test_pareto_configs_in_profiling_results` |
| `status.deploymentInfo` | `TestDGDRMinimalDeployment` |
| `status.conditions` (all types) | `TestDGDRStatusAndConditions` |
| `status.observedGeneration` | `TestDGDRStatusAndConditions::test_observed_generation_tracks_spec` |

## GPU-free mode (default)

By default, the test suite runs the full DGDR lifecycle **without any GPU nodes**
by combining two simulation features:

| Feature | How it's enabled | Which phase it affects |
|---|---|---|
| **AIC (AI Configurator)** | `searchStrategy: rapid` (the default) | **Profiling** — profiler runs CPU-only simulation instead of online GPU sweep |
| **Mocker** | Enabled by default (disable with `--dgdr-no-mocker`) | **Deployment** — DGD uses mock inference workers (no GPU resources requested) |

**How it works:**

- `searchStrategy: rapid` is the default for v1beta1 DGDRs.  The profiler automatically
  uses AI Configurator (AIC) simulation when rapid is set — no additional config needed.
- Mocker mode is **enabled by default**.  The `dgdr_factory` fixture automatically injects
  `spec.features.mocker.enabled: true` and a default `spec.hardware` config into every DGDR.
- AIC profiling creates a Kubernetes Job that runs CPU-only (job prefix: `profile-aic-`).
  The profiling pod does not request GPU resources.
- Mocker deployment selects the profiler's `mocker_config_with_planner.yaml` output
  instead of the real deployment config, resulting in DGD pods that don't request GPUs.
- Pass `--dgdr-no-mocker` to disable mocker mode and run against real GPU hardware.

> **Note:** Some test assertions (e.g., status.deploymentInfo.gpuCount, pareto configs)
> may produce different values under mocker than under real GPU profiling.
> The tests are written to validate structure and phase transitions, not exact
> profiling output values, so they work correctly in both modes.

> **Note:** `searchStrategy: thorough` requires online (GPU) profiling even with mocker,
> since thorough performs real benchmark measurements.  Use rapid for GPU-free testing.

> **Note:** `TestDGDRFeatures::test_planner_enabled_with_rapid_sweep` runs with
> `auto_apply=False` in mocker mode (same root cause as the note below — the operator
> pre-sets `Status.DGDName` from the profiling output and then immediately fires
> `handleDGDDeleted` when the DGD cannot be found).  In mocker mode the test only
> validates that spec generation succeeds (waits for `PHASE_READY` and checks `dgdName`
> + `selectedConfig`).  Full deployment with rapid sweeping is verified outside mocker
> mode.  `test_planner_enabled_no_pre_deployment_sweep` and `test_mocker_enabled` are
> likewise restricted to `PHASE_READY` in mocker mode.

> **Note:** `auto_apply=True` consistently hits `handleDGDDeleted` in mocker mode. The
> operator's `generateDGDSpec` pre-populates `Status.DGDName` from the profiling output
> (e.g. `mocker-disagg`) _before_ the DGD is actually created.  When `handleDeployingPhase`
> then runs it checks `DGDName != ""` and immediately tries to GET that DGD; since it does
> not exist yet it fires `handleDGDDeleted` and the DGDR transitions to Failed.
> All tests that would enter the Deploying phase in mocker mode therefore use
> `auto_apply=False`/`PHASE_READY` instead (minimal lifecycle, backend selection,
> mocker feature, planner-no-sweep, planner-rapid-sweep, DGD label override).
> Tests whose sole purpose is to verify `auto_apply=True` DGD creation are skipped in
> mocker mode (`test_auto_apply_true_creates_dgd_automatically`,
> `test_deletion_does_not_remove_created_dgd`).
> Non-mocker mode (real GPU cluster) is unaffected.

> **Note:** `TestDGDRImmutability::test_spec_immutable_in_deployed_via_dry_run` is **xfail**
> in mocker mode.  The test relies on the session `deployed_dgdr` fixture which, in mocker
> mode, stops at `PHASE_READY` instead of `PHASE_DEPLOYED`.  The webhook's
> `ValidateUpdate` immutability enforcement only activates when the DGDR is in `Deployed`
> phase, so the server-dry-run mutation is accepted rather than rejected.

> **Note:** `gpu_8` tests cannot be run with mocker and require a real 8-GPU node.
> `TestDGDRSearchStrategies::test_thorough_strategy_completes` uses `searchStrategy: thorough`
> which performs real GPU benchmark sweeps.  `TestDGDRMoEModels` (DeepSeek-R1) requires 8 GPUs
> for the real inference workload.  Exclude them from GPU-free runs with `-m "gpu_0 or gpu_1"`.

### AIC silicon data availability

AIC operates in **silicon mode**: it looks up pre-recorded per-op performance data
files shipped inside the `aiconfigurator` Python package.  These files are organised
by `{gpu_sku}/{backend}/{backend_version}/`.  The mocker fixture injects
`gpuSku: a100_sxm` into every DGDR — but the package only ships vllm data for that SKU:

| Backend | a100_sxm data? | Mocker result |
|---|---|---|
| `vllm` | ✅ present | Profiling succeeds |
| `trtllm` | ✅ present | Profiling succeeds |
| `sglang` | ❌ missing | Test **skipped** automatically (no `sglang/0.5.8` perf data for `a100_sxm`) |

To test sglang/trtllm, run against a real GPU cluster (`--dgdr-no-mocker`) where AIC
can use a GPU SKU for which those data files are present.

## Cleanup

Tests clean up their own DGDRs via the `dgdr_factory` fixture.  If a test is
interrupted, resources can be cleaned up manually:

```bash
# Delete all DGDRs created by the test suite (they are labelled automatically)
kubectl delete dgdr -n default -l "test.dynamo/managed=true"

# If you used a custom namespace:
kubectl delete dgdr -n <namespace> -l "test.dynamo/managed=true"
```

## Architecture notes

- All tests interact with the cluster **exclusively via `kubectl`** subprocess calls,
  consistent with the rest of the Dynamo test suite.
- The `dgdr_factory` fixture ensures DGDR cleanup via `yield` regardless of test
  outcome.
- Tests that require an optional PVC (`--dgdr-pvc-name`) skip automatically when the
  option is not provided.
- Timeout values are configurable to accommodate clusters with varying profiling speeds.