Unverified Commit f0ac86e8 authored by Keiven C's avatar Keiven C Committed by GitHub
Browse files

feat: add extra CodeRabbit review guidelines for Python and pytest (#7144)


Signed-off-by: default avatarKeiven Chang <keivenchang@users.noreply.github.com>
Co-authored-by: default avatarKeiven Chang <keivenchang@users.noreply.github.com>
parent 5265787b
# Pytest Guidelines
Rules and conventions for Python tests in this repository.
## Running Tests
Always use the venv-aware invocation -- never bare `pytest`:
```bash
export HF_HUB_OFFLINE=1 HF_TOKEN="$(cat ~/.cache/huggingface/token)"
python3 -m pytest -xvv --basetemp=/tmp/pytest_temp --durations=0 tests/
```
- `python3 -m pytest` ensures the venv's pytest runs with the correct `sys.path`.
The system `pytest` at `/usr/local/bin/pytest` is **outside** the venv and cannot
see venv-installed packages (like `dynamo`).
- `-xvv` stops at first failure with verbose output.
- `--durations=0` shows timing for all tests (helps detect slow/flaky tests).
### Filtering by markers
```bash
python3 -m pytest -m "vllm and gpu_1 and pre_merge" -v
python3 -m pytest -m "vllm and e2e and gpu_1" -v
python3 -m pytest -m "vllm and unit and gpu_0" -v
python3 -m pytest tests/serve/ -m "vllm and gpu_1 and pre_merge" -vv --tb=short
```
Use `--durations=10` locally to find the 10 slowest tests.
### Filtering by keyword (`-k`)
Use `-k` to select tests by name pattern. Be aware that `-k` matches substrings:
```bash
# BAD -- also matches "disaggregated" tests
python3 -m pytest tests/serve/ -k "aggregated" -v
# GOOD -- excludes disaggregated
python3 -m pytest tests/serve/ -k "aggregated and not disagg" -v --tb=short
```
## Critical Rules
These are the most common sources of flaky, non-hermetic tests. Violating any of
these will block your PR.
Hardcoded values in tests are cringy code. They signal that the author didn't
think about parallel execution, reproducibility, or the next person who has to
debug a phantom CI failure at 2 AM. Don't be that person.
### DO NOT hardcode ports
Never use literal port numbers (e.g. `port=8000`, `port=8081`) in test code. Two
tests that share a port will collide when run in parallel, causing mysterious
failures that only reproduce in CI.
**Instead:** Use the `dynamo_dynamic_ports` fixture (allocates `frontend_port` +
`system_ports` per test) or call `allocate_port()` / `allocate_ports()` directly
from `tests.utils.port_utils`.
```python
# BAD
resp = requests.get("http://localhost:8000/v1/models")
# GOOD
def test_example(dynamo_dynamic_ports):
port = dynamo_dynamic_ports.frontend_port
resp = requests.get(f"http://localhost:{port}/v1/models")
```
### DO NOT hardcode temp paths
Never write to fixed paths like `/tmp/my-test.log` or `/tmp/output/`. The next test
(or a parallel worker) will pick up stale files, creating subtle side-effects.
**Instead:** Use Python's `tempfile` module or pytest's `tmp_path` fixture. Both
provide unique paths and auto-cleanup.
```python
# BAD
with open("/tmp/test-output.json", "w") as f:
json.dump(result, f)
# GOOD
def test_example(tmp_path):
out = tmp_path / "test-output.json"
out.write_text(json.dumps(result))
```
Also: never dump output files into the repo working tree. This pollutes the repo and
risks clobbering real files, especially in dev (root user) containers.
### DO NOT write custom engine start/stop logic
Never write your own subprocess management code to launch, health-check, or tear down
engines (vLLM, SGLang, TRT-LLM) or infrastructure (NATS, etcd, frontends). Homegrown
lifecycle code inevitably leaks processes, misses cleanup on failure, or races with
parallel tests.
**Instead:** Use the existing fixtures and context managers:
- **Fixtures:** `runtime_services_dynamic_ports`, `start_services_with_http`,
`start_services_with_grpc`, `start_services_with_mocker`
- **Context managers:** `DynamoFrontendProcess`, `DynamoWorkerProcess`,
`ManagedProcess`, `EtcdServer`, `NatsServer`
These handle health-checking, port allocation, log capture, straggler cleanup,
and graceful teardown automatically. If your test needs something the existing
infrastructure doesn't support, extend the shared fixtures -- don't reinvent them
in your test file.
```python
# BAD -- hand-rolled subprocess management
proc = subprocess.Popen(["python3", "-m", "dynamo.mocker", ...])
time.sleep(10) # hope it's ready
try:
run_test()
finally:
proc.kill()
# GOOD -- use the provided fixture
def test_example(start_services_with_mocker):
frontend_port = start_services_with_mocker
# engine is already up, health-checked, and will be cleaned up automatically
```
### DO NOT copy-paste test infrastructure -- reuse and refactor
Do not duplicate setup logic, helper functions, or fixture code across test files.
Copy-pasted code means the same bug gets fixed in one place but not the others, and
changing a shared pattern requires hunting down every copy.
**Instead:**
- **Reuse existing fixtures and helpers.** Check `tests/conftest.py`,
subdirectory `conftest.py` files, and `tests/utils/` before writing anything new.
- **Extract shared logic into fixtures or utility functions.** If two or more tests
need the same setup, it belongs in a `conftest.py` or `tests/utils/`.
- **Parametrize rather than duplicate.** If tests differ only in config (model,
backend, port count), use `@pytest.mark.parametrize` with indirect fixtures
instead of writing separate test functions.
```python
# BAD -- same setup copy-pasted across three test files
def test_vllm_chat():
proc = start_engine("vllm", model="Qwen/Qwen3-0.6B")
wait_for_ready(proc)
resp = send_chat_request(proc.port)
assert resp.status_code == 200
def test_vllm_completion(): # 90% identical to above
proc = start_engine("vllm", model="Qwen/Qwen3-0.6B")
wait_for_ready(proc)
resp = send_completion_request(proc.port)
assert resp.status_code == 200
# GOOD -- shared fixture, parametrized payloads
@pytest.mark.parametrize("payload_fn", [chat_payload_default, completion_payload_default])
def test_vllm_requests(start_serve_deployment, payload_fn):
resp = send_request(start_serve_deployment.port, payload_fn())
assert resp.status_code == 200
```
When the existing infrastructure doesn't fit your needs, extend the shared code
(add a parameter to a fixture, add a helper to `tests/utils/`) rather than forking
a private copy.
---
## Markers
`--strict-markers` and `--strict-config` are enforced in `pyproject.toml`. Using an
undefined marker **fails collection**. All markers must be registered in both
`pyproject.toml [tool.pytest.ini_options].markers` and `tests/conftest.py:pytest_configure`.
### Required markers
Every test must have **at least**:
1. **A scheduling marker** -- when the test runs in CI:
- `pre_merge` -- runs on every PR before merge
- `post_merge` -- runs after merge to main
- `nightly` -- runs nightly
- `weekly` -- runs weekly
- `release` -- runs on release pipelines
2. **A GPU marker** -- how many GPUs are needed:
- `gpu_0` -- no GPU required
- `gpu_1` -- single GPU
- `gpu_2`, `gpu_4`, `gpu_8` -- multi-GPU
3. **A type marker** -- what kind of test:
- `unit` -- unit test
- `integration` -- integration test
- `e2e` -- end-to-end test
### Scheduling marker guidance
CI compute is finite. Choose placement carefully:
- Only use `pre_merge` for tests that are **absolutely critical** -- every pre-merge
test slows down every PR for every contributor.
- E2E tests involve more components and tend to be flakier. Prefer `post_merge` for
E2E tests unless they guard a critical path.
- Consider `nightly` or `weekly` for expensive, GPU-heavy, or stress tests.
### Framework markers
Apply when the test depends on a specific inference backend:
- `vllm`, `trtllm`, `sglang`
### Timeouts
Tests that run longer than 30 seconds **must** have `@pytest.mark.timeout(<seconds>)`.
Set the timeout to **3x the measured average duration** to absorb variance.
Measure your test 5-10 times, then add a timing comment:
```python
@pytest.mark.timeout(300) # ~100s average, 3x buffer
def test_vllm_aggregated(...):
# on average this test takes about 1.5 minutes
...
```
Timing comments let AI/automation understand requirements when shuffling test suites.
### Other commonly used markers
- `model("org/model-name")` -- declares the HF model used; the `predownload_models`
fixture reads these to download only what's needed.
- `slow` -- known slow test.
- `parallel` -- safe to run with pytest-xdist.
- `h100` -- requires H100 hardware.
- `fault_tolerance`, `deploy`, `router`, `planner`, `kvbm` -- component markers.
- `k8s` -- requires Kubernetes.
### Example
```python
@pytest.mark.pre_merge
@pytest.mark.gpu_1
@pytest.mark.e2e
@pytest.mark.vllm
@pytest.mark.model("Qwen/Qwen3-0.6B")
@pytest.mark.timeout(300)
def test_vllm_aggregated(start_serve_deployment):
...
```
## Async Tests
`asyncio_mode = "auto"` is configured in `pyproject.toml`. Do **not** add
`@pytest.mark.asyncio` manually -- all `async def test_*` functions are collected
automatically.
## Hermetic Testing
Tests must be isolated and must not interfere with each other. Every test should:
- Run reliably in any order, on any machine, at any time.
- Produce deterministic results.
- Not create side-effects for other tests.
- Clean up properly after itself.
- Fail fast when something is wrong.
Given enough resources, multiple tests must be able to execute in parallel without
conflicts or race conditions. See the **Critical Rules** section at the top for the
three most important requirements (dynamic ports, temp paths, shared fixtures).
### Additional anti-patterns
- **Reusing namespace/component/endpoint names** across tests that share a
registration service. Use unique names per test.
- **Leaking environment variables**. Use `monkeypatch.setenv()` or save/restore
patterns so env changes don't persist across tests.
### Optimization tips
- Combine multiple assertions in one engine launch/teardown cycle when tests share
the same deployment config.
- Use the mock engine (`dynamo.mocker`) instead of a real vLLM/SGLang/TRT-LLM engine
when the test doesn't need real inference.
- Mock external services (APIs, databases, etc.) to keep tests fast and deterministic.
## Fixtures
### Service infrastructure
- **`runtime_services_dynamic_ports`** -- preferred for xdist-safe tests. Spins up
per-test NATS and etcd on dynamic ports, sets `NATS_SERVER` / `ETCD_ENDPOINTS`
env vars, cleans up after.
- **`runtime_services`** -- simpler, uses default ports. Not xdist-safe.
- **`runtime_services_session`** -- session-scoped, shared across xdist workers via
file locks. Good for large test suites where per-test instances are too expensive.
### Port allocation
- **`dynamo_dynamic_ports`** -- allocates `frontend_port` + `system_ports` per test.
Never hardcode ports (8000, 8081, etc.) in tests.
- **`num_system_ports`** -- defaults to 1. Use indirect parametrize for more:
`@pytest.mark.parametrize("num_system_ports", [2], indirect=True)`
### Model management
- **`predownload_models`** (session-scoped) -- downloads full models. Reads
`@pytest.mark.model(...)` from collected tests to download only what's needed.
Sets `HF_HUB_OFFLINE=1` after download so workers skip redundant API calls.
- **`predownload_tokenizers`** (session-scoped) -- same, but skips weight files.
### Backend-specific parametrize
- **`discovery_backend`** -- defaults to `"etcd"`. Parametrize with `["file", "etcd"]`.
- **`request_plane`** -- defaults to `"nats"`. Parametrize with `["nats", "tcp"]`.
- **`durable_kv_events`** -- defaults to `False`. Set `[True]` for JetStream mode.
### Logging
An autouse `logger` fixture writes per-test logs to `test_output/<test_name>/test.log.txt`.
Some sub-suites (e.g. `tests/planner/`) override this with a no-op fixture.
## xdist / Parallel Safety
- Use `runtime_services_dynamic_ports` + `dynamo_dynamic_ports` for port isolation.
- Use `SharedEtcdServer` / `SharedNatsServer` (via `runtime_services_session`) for
session-scoped shared services with file-lock coordination.
- Never rely on fixed ports or global state across workers.
- Each xdist worker is a separate process -- env vars don't leak.
## Warnings
`filterwarnings = ["error"]` is set globally, with specific ignores for known
third-party deprecations (CUDA, protobuf, pynvml, torchao, etc.). If your test
triggers a new warning, either fix the root cause or add a targeted ignore in
`pyproject.toml` with a comment explaining why.
## Error Handling in Tests
- No blanket `except Exception` -- let failures propagate.
- Catch only specific exceptions you can actually handle.
- Prefer fixtures for setup/teardown over try/finally in test bodies.
## Test File Organization
```
tests/
conftest.py # Root fixtures: services, ports, model downloads, logging
serve/ # Backend serve tests (vllm, trtllm, sglang)
conftest.py # Image server, MinIO LoRA fixtures
frontend/ # Frontend HTTP/gRPC tests
conftest.py # HTTP/gRPC service fixtures, mocker workers
grpc/ # gRPC-specific tests
planner/ # Planner component tests
unit/ # Planner unit tests
router/ # Router E2E tests
fault_tolerance/ # Fault tolerance tests
cancellation/
migration/
etcd_ha/
gpu_memory_service/
deploy/
kvbm_integration/ # KV block manager integration tests
deploy/ # Deployment tests
basic/ # Basic smoke tests (wheel contents, CUDA version)
dependencies/ # Import/dependency tests
utils/ # Shared test utilities (NOT test files)
constants.py # Model IDs, default ports
managed_process.py # ManagedProcess for subprocess lifecycle
port_utils.py # Dynamic port allocation
test_output.py # Test output path resolution
```
## Serve Tests Pattern
Backend serve tests (`tests/serve/test_vllm.py`, etc.) follow a config-driven pattern:
```python
vllm_configs = {
"aggregated": VLLMConfig(
name="aggregated",
directory=vllm_dir,
script_name="agg.sh",
marks=[pytest.mark.gpu_1, pytest.mark.pre_merge, pytest.mark.timeout(300)],
model="Qwen/Qwen3-0.6B",
request_payloads=[...],
),
}
```
Configs are parametrized into test functions via `params_with_model_mark()`, which
auto-applies the `model` marker from the config's model field.
# Python Guidelines
Rules and conventions for Python code in this repository.
## Critical Rules
These are the conventions that matter most for code quality and maintainability.
Exceptions exist, but they should be rare and justified in a code comment.
### Keep imports at the top of the file
Avoid importing inside functions because it hides dependencies, makes the module
harder to understand at a glance, and can mask missing packages until a specific
code path is hit at runtime.
Avoid wrapping imports in `try/except` because it creates two execution modes --
one with the library and one without -- which doubles the testing surface and
produces confusing behavior when the dependency is unexpectedly absent.
```python
# BAD -- import inside function; dependency is invisible until this path runs
def process_data():
import json
return json.loads(data)
# BAD -- try/except hides a missing dependency behind a flag
try:
import requests
HAS_REQUESTS = True
except ImportError:
HAS_REQUESTS = False
# GOOD -- direct import at module top; fails immediately if missing
import json
import requests
```
Failing immediately on a missing dependency is better than hiding the problem
until a user hits an obscure code path in production. If a package is required,
add it to `requirements.txt` or `pyproject.toml` so it's installed upfront.
**Exception: optional dependencies and pytest collection.** This repo has multiple
optional backends (vLLM, SGLang, TRT-LLM, cupy, etc.) that are not installed in
every environment. Using `try/except ImportError` is the correct pattern when:
- An optional backend dependency (e.g. `tritonclient.grpc`, `vllm_omni`,
`torch_memory_saver`) may not be installed, and the code provides a
fallback or sets the import to `None`.
- A test file needs to skip collection when optional packages are absent
(e.g. `except ImportError: pytest.skip(..., allow_module_level=True)`).
- A stdlib module has version-dependent availability
(e.g. `tomllib` on Python 3.11+ vs `tomli` fallback).
```python
# OK -- optional backend, graceful fallback to None
try:
from vllm_omni.diffusion.data import DiffusionParallelConfig
except ImportError:
DiffusionParallelConfig = None
# OK -- skip test collection when optional deps are missing
try:
from dynamo.profiler.rapid import WorkloadSpec
except ImportError as e:
pytest.skip(f"Skip (missing dependency): {e}", allow_module_level=True)
# OK -- stdlib version compatibility
try:
import tomllib
except ImportError:
import tomli as tomllib
```
### Prefer failing fast over hiding errors
From PEP 20 (The Zen of Python):
> *"Errors should never pass silently. Unless explicitly silenced."*
> *"Explicit is better than implicit."*
Failing immediately when something goes wrong is better than silently continuing
with bad state, because:
- The person who caused the error sees it right away, while the context is fresh.
- The stack trace points directly at the root cause, not at a downstream symptom
three function calls later.
- Hidden errors compound -- a swallowed exception in one place produces confusing
behavior somewhere else, and the debugging cost grows exponentially with distance
from the original failure.
```python
# BAD -- all of these hide errors
except Exception:
pass
except Exception as e:
logging.error(e) # logs but silently continues!
return [] # returns a fake default
# BAD -- bare except catches KeyboardInterrupt and SystemExit too,
# making the process impossible to kill with Ctrl-C or sys.exit()
try:
do_work()
except:
log_error()
# GOOD -- just let it crash
result = something()
# GOOD -- catch SPECIFIC exceptions you can actually handle
try:
result = json.loads(text)
except json.JSONDecodeError:
result = {}
# GOOD -- if you must catch broad, catch Exception (not bare except)
# and re-raise after logging
try:
result = something()
except Exception as e:
logger.error(f"Failed: {e}")
raise
```
**Three rules:**
1. Remove the try/except if possible -- let it crash.
2. Catch **specific** exceptions only (`FileNotFoundError`, `ValueError`, `json.JSONDecodeError`, etc.).
3. If you must catch broadly, use `except Exception:` (never bare `except:`) and **always** re-raise after logging.
### NO defensive `getattr()` on known types
When an object has a known type with known attributes, access them directly.
Using `getattr()` with a default hides bugs by silently returning a fallback
when the attribute should always exist.
```python
# BAD -- node has a job_name attribute, getattr hides AttributeError
name = getattr(node, "job_name", "")
# GOOD -- direct access, fails loudly if something is wrong
name = node.job_name
```
---
## Common Pitfalls
### Mutable default arguments
Default argument values are evaluated once at function definition time, not on
each call. A mutable default (list, dict, set) is shared across all invocations.
```python
# BAD -- the list is shared across all calls
def add_item(item, items=[]):
items.append(item)
return items
add_item("a") # ["a"]
add_item("b") # ["a", "b"] -- not ["b"]!
# GOOD
def add_item(item, items=None):
if items is None:
items = []
items.append(item)
return items
```
### Always use context managers for resources
Files, network connections, subprocesses, and locks should be opened with `with`
so they are released even if an exception occurs.
```python
# BAD -- file handle leaks if json.load raises
f = open("data.json")
data = json.load(f)
f.close()
# GOOD
with open("data.json") as f:
data = json.load(f)
```
This is especially important in this project where tests manage subprocesses,
etcd/NATS connections, and temp directories. Use `ManagedProcess`,
`tempfile.TemporaryDirectory`, and similar context managers rather than
manual setup/teardown.
### Do not shadow built-in names
Naming a variable `list`, `dict`, `id`, `type`, `input`, `open`, or `format`
overwrites the built-in and causes confusing errors later in the same scope.
```python
# BAD
list = get_items() # shadows built-in list()
filtered = list(some_gen) # TypeError: 'list' object is not callable
# GOOD
items = get_items()
filtered = list(some_gen)
```
### Use `is` for None / True / False comparisons
`==` invokes `__eq__`, which can be overridden. `is` checks identity, which is
what you want for singletons.
```python
# BAD
if result == None:
if flag == True:
# GOOD
if result is None:
if flag is True: # or just: if flag:
```
### Do not modify a collection while iterating
Adding or removing items during iteration causes skipped elements or
`RuntimeError`. Build a new collection or iterate over a copy.
```python
# BAD -- skips elements
for item in items:
if item.is_stale():
items.remove(item)
# GOOD
items = [item for item in items if not item.is_stale()]
```
### Prefer `join()` over string concatenation in loops
Repeated `+=` on strings creates a new string object each time, which is O(n^2)
for large loops.
```python
# BAD
result = ""
for line in lines:
result += line + "\n"
# GOOD
result = "\n".join(lines)
```
### Watch out for late-binding closures in loops
Closures capture the variable reference, not its value at the time of creation.
```python
# BAD -- all lambdas return 4 (the final value of i)
fns = [lambda: i for i in range(5)]
[f() for f in fns] # [4, 4, 4, 4, 4]
# GOOD -- default argument captures current value
fns = [lambda i=i: i for i in range(5)]
[f() for f in fns] # [0, 1, 2, 3, 4]
```
### Do not use `assert` for runtime validation
Assertions are stripped when Python runs with `-O` (optimize). Use explicit
`if/raise` for validation that must always execute.
```python
# BAD -- silently skipped under python -O
assert user_id is not None, "user_id required"
# GOOD
if user_id is None:
raise ValueError("user_id required")
```
---
## Code Style
- Follow PEP 8.
- `snake_case` for variables and functions.
- `PascalCase` for classes.
- Add type hints where they improve readability.
- Use docstrings for public functions and classes.
- Use `dataclass` instead of plain dicts when a structure has >4 fields (better
type inference and IDE support).
## File Organization
- `__init__.py` for package initialization.
- Clear module separation.
- Tests in `tests/` directory.
## Formatting and Linting
### Preferred workflow
Auto-fix formatting, then lint:
```bash
ruff format <touched_paths>
ruff check --fix <touched_paths>
```
Or use pre-commit (runs isort, black, flake8, ruff, and more):
```bash
pre-commit run --files <touched_files>
pre-commit run --all-files # for broad changes
```
### Pre-commit hooks
The repo's `.pre-commit-config.yaml` runs these Python hooks:
- **isort** -- import sorting (`profile = "black"`, configured in `pyproject.toml`)
- **black** -- code formatting
- **flake8** -- style checks (`max-line-length=88`)
- **ruff** -- fast linting with auto-fix
- **codespell** -- spelling checks
- **trailing-whitespace**, **end-of-file-fixer**, **check-yaml**, **check-json**, **check-toml**
### Before committing
Always run:
```bash
pre-commit run --files <changed_files>
```
For broader changes or if in doubt:
```bash
pre-commit run --all-files
```
### Indentation verification
Indentation errors are common and hard to spot visually. After editing Python
files, verify mechanically:
```bash
ruff format <touched_paths> # auto-fix (preferred)
python3 -m compileall -q <touched_paths> # fast parse-only check
```
When fixing indentation errors, always read 20-30 lines of surrounding context
and fix the whole block -- adjacent lines often share the same mistake.
## Error Handling
See the **Critical Rules** section above for the full policy. Summary:
- Let exceptions propagate by default.
- Catch only specific exceptions you can actually handle.
- If you catch `Exception`, you must re-raise after logging.
- No `except Exception: pass`. Ever.
### Regex caution
Be careful with escaping in raw strings (`\s` vs `\\s`). When changing a critical
regex, add a one-line test to prove it matches.
## License Headers
Every `.py` file must have the SPDX header:
```python
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
```
## Import Order
Imports are sorted by isort with `profile = "black"` (configured in `pyproject.toml`).
The order is:
1. Standard library
2. Third-party (known: `vllm`, `tensorrt_llm`, `sglang`, `aiconfigurator`)
3. First-party (`dynamo`, `deploy`)
Run `isort` or `pre-commit run isort` to auto-fix ordering.
...@@ -27,5 +27,14 @@ reviews: ...@@ -27,5 +27,14 @@ reviews:
enabled: false enabled: false
unit_tests: unit_tests:
enabled: false enabled: false
path_instructions:
- path: "**/*.py"
instructions: "Review against .ai/python-guidelines.md"
- path: "**/tests/**/*.py"
instructions: "Review against .ai/pytest-guidelines.md"
knowledge_base:
code_guidelines:
filePatterns:
- ".ai/**/*.md"
chat: chat:
auto_reply: true auto_reply: true
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment