# Pytest Guidelines Rules and conventions for Python tests in this repository. ## Running Tests **Always run your tests locally before pushing to CI.** Every failed CI run wastes shared GPU minutes and blocks other PRs. A local run catches most failures in seconds. Always use the venv-aware invocation -- never bare `pytest`: ```bash export HF_HUB_OFFLINE=1 HF_TOKEN="$(cat ~/.cache/huggingface/token)" python3 -m pytest -xvv --basetemp=/tmp/pytest_temp --durations=0 tests/ ``` - `python3 -m pytest` ensures the venv's pytest runs with the correct `sys.path`. The system `pytest` at `/usr/local/bin/pytest` is **outside** the venv and cannot see venv-installed packages (like `dynamo`). - `-xvv` stops at first failure with verbose output. - `--durations=0` shows timing for all tests (helps detect slow/flaky tests). ### Filtering by markers ```bash python3 -m pytest -m "vllm and gpu_1 and pre_merge" -v python3 -m pytest -m "vllm and e2e and gpu_1" -v python3 -m pytest -m "vllm and unit and gpu_0" -v python3 -m pytest tests/serve/ -m "vllm and gpu_1 and pre_merge" -vv --tb=short ``` Use `--durations=10` locally to find the 10 slowest tests. ### Filtering by keyword (`-k`) Use `-k` to select tests by name pattern. Be aware that `-k` matches substrings: ```bash # BAD -- also matches "disaggregated" tests python3 -m pytest tests/serve/ -k "aggregated" -v # GOOD -- excludes disaggregated python3 -m pytest tests/serve/ -k "aggregated and not disagg" -v --tb=short ``` ## Critical Rules These are the most common sources of flaky, non-hermetic tests. Violating any will block your PR. ### DO NOT hardcode ports **Always flag** literal port numbers (`port=8000`, `port=8081`) in test code. Parallel tests sharing any resource (port, file, env var, etc.) will collide. Use `dynamo_dynamic_ports` (allocates `frontend_port` + `system_ports` per test) or `allocate_port()` / `allocate_ports()` from `tests.utils.port_utils`. ```python # BAD resp = requests.get("http://localhost:8000/v1/models") # GOOD def test_example(dynamo_dynamic_ports): port = dynamo_dynamic_ports.frontend_port resp = requests.get(f"http://localhost:{port}/v1/models") ``` ### DO NOT hardcode temp paths **Always flag** fixed paths like `/tmp/my-test.log` or `/tmp/output/` in test code. Parallel workers will clobber each other's files. Use pytest's `tmp_path` fixture or Python's `tempfile` module -- both provide unique paths with auto-cleanup. ```python # BAD -- hardcoded path collides with parallel tests with open("/tmp/test-output.json", "w") as f: json.dump(result, f) # BAD -- "ghost fixture": accepts tmp_path but ignores it and writes to /tmp anyway. # Flag any test that requests tmp_path but still references /tmp/ or hardcoded paths. def test_example(tmp_path): with open("/tmp/test-output.json", "w") as f: json.dump(result, f) # GOOD def test_example(tmp_path): out = tmp_path / "test-output.json" out.write_text(json.dumps(result)) ``` ### DO NOT write output files into the repository tree **Always flag** any test that writes to paths relative to `__file__` or the repo root. This pollutes the working tree and creates untracked noise in `git status`. **Exception:** The autouse `logger` fixture writes to `test_output//` by design -- this is sanctioned shared infra, not ad-hoc test output. Do not flag it. ```python # BAD -- writes into the repo alongside the test file; flag this output = os.path.join(os.path.dirname(__file__), "scratch_output.txt") with open(output, "w") as f: f.write("debug output\n") # GOOD -- use tmp_path; cleaned up automatically def test_example(tmp_path): output = tmp_path / "scratch_output.txt" output.write_text("debug output\n") ``` ### DO NOT write custom engine start/stop logic **Always flag** hand-rolled `subprocess.Popen` / `os.system` / `time.sleep` patterns for engine or infra lifecycle. Homegrown lifecycle code leaks processes, misses cleanup on failure, and races with parallel tests. Use the existing fixtures and context managers: - **Fixtures:** `runtime_services_dynamic_ports`, `start_services_with_http`, `start_services_with_grpc`, `start_services_with_mocker` - **Context managers:** `DynamoFrontendProcess`, `DynamoWorkerProcess`, `ManagedProcess`, `EtcdServer`, `NatsServer` These handle health-checking, port allocation, log capture, and graceful teardown automatically. Extend the shared fixtures if needed -- don't reinvent them. ```python # BAD -- hand-rolled subprocess management proc = subprocess.Popen(["python3", "-m", "dynamo.mocker", ...]) time.sleep(10) # hope it's ready try: run_test() finally: proc.kill() # GOOD -- use the provided fixture def test_example(start_services_with_mocker): frontend_port = start_services_with_mocker # engine is already up, health-checked, and will be cleaned up automatically ``` ### DO NOT copy-paste test infrastructure -- reuse and refactor **Always flag** duplicated setup logic, helpers, or fixture code across test files. Copy-pasted infra means bugs get fixed in one copy but not the others. - Check `tests/conftest.py`, subdirectory `conftest.py` files, and `tests/utils/` before writing anything new. - If two or more tests share setup, extract it into a fixture or `tests/utils/`. - If tests differ only in config, use `@pytest.mark.parametrize` with indirect fixtures instead of separate functions. ```python # BAD -- same setup copy-pasted across three test files def test_vllm_chat(): proc = start_engine("vllm", model="Qwen/Qwen3-0.6B") wait_for_ready(proc) resp = send_chat_request(proc.port) assert resp.status_code == 200 def test_vllm_completion(): # 90% identical to above proc = start_engine("vllm", model="Qwen/Qwen3-0.6B") wait_for_ready(proc) resp = send_completion_request(proc.port) assert resp.status_code == 200 # GOOD -- shared fixture, parametrized payloads @pytest.mark.parametrize("payload_fn", [chat_payload_default, completion_payload_default]) def test_vllm_requests(start_serve_deployment, payload_fn): resp = send_request(start_serve_deployment.port, payload_fn()) assert resp.status_code == 200 ``` Extend shared code rather than forking a private copy. --- ## Markers `--strict-markers` and `--strict-config` are enforced. Using an undefined marker **fails collection**. Register all markers in both `pyproject.toml` and `tests/conftest.py:pytest_configure`. ### Required markers Every test must have **at least**: 1. **A scheduling marker** -- when the test runs in CI: - `pre_merge` -- runs on every PR before merge - `post_merge` -- runs after merge to main - `nightly` -- runs nightly - `weekly` -- runs weekly - `release` -- runs on release pipelines 2. **A GPU marker** -- how many GPUs are needed: - `gpu_0` -- no GPU required - `gpu_1` -- single GPU - `gpu_2`, `gpu_4`, `gpu_8` -- multi-GPU 3. **A type marker** -- what kind of test: - `unit` -- unit test - `integration` -- integration test - `e2e` -- end-to-end test ### Scheduling marker guidance CI compute is finite. Choose placement carefully: - Only use `pre_merge` for tests that are **absolutely critical** -- every pre-merge test slows down every PR for every contributor. - **Tests averaging over 60 seconds should default to `post_merge`** unless they guard a critical path that justifies blocking every PR. If a test must stay `pre_merge` despite being slow, add a comment explaining why. - E2E tests involve more components and tend to be flakier. Prefer `post_merge` for E2E tests unless they guard a critical path. - Consider `nightly` or `weekly` for expensive, GPU-heavy, or stress tests. ### Framework markers Apply when the test depends on a specific inference backend: - `vllm`, `trtllm`, `sglang` ### Timeouts Tests over 30 seconds **must** have `@pytest.mark.timeout()`. Set the timeout to **3x measured average** to absorb variance. **Always flag** any test that runs over 30 seconds or contains `time.sleep()`, polling loops, network calls, or subprocess waits but lacks a `@pytest.mark.timeout(...)` marker. This is a **required change, not a style suggestion** -- a missing timeout can hang CI indefinitely. **Also flag** real `time.sleep()` in `pre_merge` + `unit` tests. Unit tests should not burn wall-clock time. Mock the sleep, use shared fixtures, or reclassify as `integration`/`e2e` with `@pytest.mark.slow`. ```python # BAD -- sleeps and loops with no timeout marker; can hang CI forever @pytest.mark.pre_merge @pytest.mark.gpu_0 @pytest.mark.unit def test_poll_server(): for _ in range(50): time.sleep(0.1) assert True # GOOD -- timeout prevents infinite hangs @pytest.mark.timeout(300) # ~100s average, 3x buffer @pytest.mark.pre_merge @pytest.mark.gpu_0 @pytest.mark.unit def test_vllm_aggregated(...): ... ``` Timing comments let AI/automation understand requirements when shuffling test suites. ### Other commonly used markers - `model("org/model-name")` -- declares the HF model used; the `predownload_models` fixture reads these to download only what's needed. - `slow` -- known slow test. - `parallel` -- safe to run with pytest-xdist. - `h100` -- requires H100 hardware. - `fault_tolerance`, `deploy`, `router`, `planner`, `kvbm` -- component markers. - `k8s` -- requires Kubernetes. ### Example ```python @pytest.mark.pre_merge @pytest.mark.gpu_1 @pytest.mark.e2e @pytest.mark.vllm @pytest.mark.model("Qwen/Qwen3-0.6B") @pytest.mark.timeout(300) def test_vllm_aggregated(start_serve_deployment): ... ``` ## Async Tests `asyncio_mode = "auto"` is configured in `pyproject.toml`, so all `async def test_*` functions are collected automatically. **Always flag** `@pytest.mark.asyncio` -- this is a **required change, not a style suggestion**. The marker is never needed in this repo. Remove it. ```python # BAD -- redundant marker; asyncio_mode = "auto" handles this @pytest.mark.asyncio @pytest.mark.pre_merge @pytest.mark.gpu_0 @pytest.mark.unit async def test_async_endpoint(): await asyncio.sleep(0.01) # GOOD -- no marker needed, pytest collects it automatically @pytest.mark.pre_merge @pytest.mark.gpu_0 @pytest.mark.unit async def test_async_endpoint(): await asyncio.sleep(0.01) ``` ## Hermetic Testing Tests must be isolated. Every test must run in any order, on any machine, and produce deterministic results with no side-effects. Multiple tests must be able to execute in parallel without conflicts. ### Additional anti-patterns - **Module-level mutable state.** **Always flag** any mutable object (`{}`, `[]`, `set()`) at module scope that tests read or write. This makes tests order-dependent and produces phantom xdist failures. ```python # BAD -- module-level dict shared across all tests; flag this _shared_results = {} def test_a(): _shared_results["worker-1"] = "registered" def test_b(): # Passes only if test_a ran first! assert _shared_results["worker-1"] == "registered" # GOOD -- each test gets its own state @pytest.fixture def results(): return {} def test_a(results): results["worker-1"] = "registered" assert results["worker-1"] == "registered" ``` - **Colliding `dyn://` registration paths across tests.** Dynamo workers register under `dyn://{namespace}.{component}.{endpoint}` in etcd/NATS. Hardcoding namespace, component, and endpoint strings is fine on its own -- the problem is when two tests that share an etcd/NATS instance use the **same full path**, causing flaky collisions under parallel execution. **Always flag** tests whose full `dyn://` path can collide with another test's. The simplest fix is to randomize at least one segment (typically namespace). ```python # BAD -- two tests using this identical path will collide namespace = "dynamo" component = "backend" endpoint = f"dyn://{namespace}.{component}.generate" # GOOD -- unique namespace prevents collisions; component/endpoint can stay fixed from tests.router.common import generate_random_suffix namespace = f"dynamo-{generate_random_suffix()}" component = "backend" endpoint = f"dyn://{namespace}.{component}.generate" ``` - **Leaking environment variables.** **Always flag** direct `os.environ[...] = ...` or `os.environ.update(...)` in tests. These mutations persist into subsequent tests and cause order-dependent failures. ```python # BAD -- env var leaks into every test that runs after this one; flag this def test_service_discovery(): os.environ["NATS_SERVER"] = "nats://rogue-server:4222" assert connect() # GOOD -- monkeypatch auto-restores after each test def test_service_discovery(monkeypatch): monkeypatch.setenv("NATS_SERVER", "nats://rogue-server:4222") assert connect() ``` - **Mutable default arguments in test helpers.** **Always flag** any function with a mutable default (`[]`, `{}`, `set()`). Defaults are evaluated once and shared across calls, so mutations accumulate silently between invocations. ```python # BAD -- registry list is shared across all calls; flag this def register_workers(new_worker, registry=[]): registry.append(new_worker) return registry # GOOD -- None sentinel, fresh list each call def register_workers(new_worker, registry=None): if registry is None: registry = [] registry.append(new_worker) return registry ``` See also: `python-guidelines.md` > "Mutable default arguments" for the general rule. ### Optimization tips - Combine multiple assertions in one engine launch/teardown cycle when tests share the same deployment config. - Use the mock engine (`dynamo.mocker`) instead of a real vLLM/SGLang/TRT-LLM engine when the test doesn't need real inference. - Mock external services (APIs, databases, etc.) to keep tests fast and deterministic. ## Fixtures ### Service infrastructure - **`runtime_services_dynamic_ports`** -- preferred for xdist-safe tests. Spins up per-test NATS and etcd on dynamic ports, sets `NATS_SERVER` / `ETCD_ENDPOINTS` env vars, cleans up after. - **`runtime_services`** -- simpler, uses default ports. Not xdist-safe. - **`runtime_services_session`** -- session-scoped, shared across xdist workers via file locks. Good for large test suites where per-test instances are too expensive. ### Port allocation - **`dynamo_dynamic_ports`** -- allocates `frontend_port` + `system_ports` per test. Never hardcode ports (8000, 8081, etc.) in tests. - **`num_system_ports`** -- defaults to 1. Use indirect parametrize for more: `@pytest.mark.parametrize("num_system_ports", [2], indirect=True)` ### Model management - **`predownload_models`** (session-scoped) -- downloads full models. Reads `@pytest.mark.model(...)` from collected tests to download only what's needed. Sets `HF_HUB_OFFLINE=1` after download so workers skip redundant API calls. - **`predownload_tokenizers`** (session-scoped) -- same, but skips weight files. ### Backend-specific parametrize - **`discovery_backend`** -- defaults to `"etcd"`. Parametrize with `["file", "etcd"]`. - **`request_plane`** -- defaults to `"nats"`. Parametrize with `["nats", "tcp"]`. - **`durable_kv_events`** -- defaults to `False`. Set `[True]` for JetStream mode. ### Logging An autouse `logger` fixture writes per-test logs to `test_output//test.log.txt`. Some sub-suites (e.g. `tests/planner/`) override this with a no-op fixture. ## xdist / Parallel Safety - Use `runtime_services_dynamic_ports` + `dynamo_dynamic_ports` for port isolation. - Use `SharedEtcdServer` / `SharedNatsServer` (via `runtime_services_session`) for session-scoped shared services with file-lock coordination. - Never rely on fixed ports or global state across workers. - Each xdist worker is a separate process -- env vars don't leak. ## Warnings `filterwarnings = ["error"]` is set globally, with specific ignores for known third-party deprecations (CUDA, protobuf, pynvml, torchao, etc.). If your test triggers a new warning, either fix the root cause or add a targeted ignore in `pyproject.toml` with a comment explaining why. ## Error Handling in Tests - No blanket `except Exception` -- let failures propagate. - Catch only specific exceptions you can actually handle. - Prefer fixtures for setup/teardown over try/finally in test bodies. ## Linter Suppression (`# noqa`) **Always flag** `# noqa` that suppresses warnings for anti-patterns documented in these guidelines. If the linter caught a real problem (`E711` for `== None`, `E712` for `== True`, `F841` for unused variable), fix the code. ```python # BAD -- noqa hides the very bug the linter caught; flag this assert error == None # noqa: E711 result = compute() # noqa: F841 # GOOD -- fix the code assert error is None result = compute() assert result == expected ``` The only acceptable `# noqa` is for genuine false positives. Always explain: `# noqa: F401 -- imported for side-effects`. ## Test File Organization ``` tests/ conftest.py # Root fixtures: services, ports, model downloads, logging serve/ # Backend serve tests (vllm, trtllm, sglang) conftest.py # Image server, MinIO LoRA fixtures frontend/ # Frontend HTTP/gRPC tests conftest.py # HTTP/gRPC service fixtures, mocker workers grpc/ # gRPC-specific tests planner/ # Planner component tests unit/ # Planner unit tests router/ # Router E2E tests fault_tolerance/ # Fault tolerance tests cancellation/ migration/ etcd_ha/ gpu_memory_service/ deploy/ kvbm_integration/ # KV block manager integration tests deploy/ # Deployment tests basic/ # Basic smoke tests (wheel contents, CUDA version) dependencies/ # Import/dependency tests utils/ # Shared test utilities (NOT test files) constants.py # Model IDs, default ports managed_process.py # ManagedProcess for subprocess lifecycle port_utils.py # Dynamic port allocation test_output.py # Test output path resolution ``` ## Serve Tests Pattern Backend serve tests (`tests/serve/test_vllm.py`, etc.) follow a config-driven pattern: ```python vllm_configs = { "aggregated": VLLMConfig( name="aggregated", directory=vllm_dir, script_name="agg.sh", marks=[pytest.mark.gpu_1, pytest.mark.pre_merge, pytest.mark.timeout(300)], model="Qwen/Qwen3-0.6B", request_payloads=[...], ), } ``` Configs are parametrized into test functions via `params_with_model_mark()`, which auto-applies the `model` marker from the config's model field.