Unverified Commit 5101f08c authored by Keiven C's avatar Keiven C Committed by GitHub
Browse files

docs: tighten .ai/ review guidelines based on CodeRabbit feedback (#7201)


Signed-off-by: default avatarKeiven Chang <keivenchang@users.noreply.github.com>
parent 727d32d7
......@@ -4,6 +4,10 @@ Rules and conventions for Python tests in this repository.
## Running Tests
**Always run your tests locally before pushing to CI.** Every failed CI run
wastes shared GPU minutes and blocks other PRs. A local run catches most
failures in seconds.
Always use the venv-aware invocation -- never bare `pytest`:
```bash
......@@ -42,22 +46,16 @@ python3 -m pytest tests/serve/ -k "aggregated and not disagg" -v --tb=short
## Critical Rules
These are the most common sources of flaky, non-hermetic tests. Violating any of
these will block your PR.
Hardcoded values in tests are cringy code. They signal that the author didn't
think about parallel execution, reproducibility, or the next person who has to
debug a phantom CI failure at 2 AM. Don't be that person.
These are the most common sources of flaky, non-hermetic tests. Violating any
will block your PR.
### DO NOT hardcode ports
Never use literal port numbers (e.g. `port=8000`, `port=8081`) in test code. Two
tests that share a port will collide when run in parallel, causing mysterious
failures that only reproduce in CI.
**Always flag** literal port numbers (`port=8000`, `port=8081`) in test code.
Parallel tests sharing any resource (port, file, env var, etc.) will collide.
**Instead:** Use the `dynamo_dynamic_ports` fixture (allocates `frontend_port` +
`system_ports` per test) or call `allocate_port()` / `allocate_ports()` directly
from `tests.utils.port_utils`.
Use `dynamo_dynamic_ports` (allocates `frontend_port` + `system_ports` per test)
or `allocate_port()` / `allocate_ports()` from `tests.utils.port_utils`.
```python
# BAD
......@@ -71,44 +69,65 @@ def test_example(dynamo_dynamic_ports):
### DO NOT hardcode temp paths
Never write to fixed paths like `/tmp/my-test.log` or `/tmp/output/`. The next test
(or a parallel worker) will pick up stale files, creating subtle side-effects.
**Always flag** fixed paths like `/tmp/my-test.log` or `/tmp/output/` in test code.
Parallel workers will clobber each other's files.
**Instead:** Use Python's `tempfile` module or pytest's `tmp_path` fixture. Both
provide unique paths and auto-cleanup.
Use pytest's `tmp_path` fixture or Python's `tempfile` module -- both provide
unique paths with auto-cleanup.
```python
# BAD
# BAD -- hardcoded path collides with parallel tests
with open("/tmp/test-output.json", "w") as f:
json.dump(result, f)
# BAD -- "ghost fixture": accepts tmp_path but ignores it and writes to /tmp anyway.
# Flag any test that requests tmp_path but still references /tmp/ or hardcoded paths.
def test_example(tmp_path):
with open("/tmp/test-output.json", "w") as f:
json.dump(result, f)
# GOOD
def test_example(tmp_path):
out = tmp_path / "test-output.json"
out.write_text(json.dumps(result))
```
Also: never dump output files into the repo working tree. This pollutes the repo and
risks clobbering real files, especially in dev (root user) containers.
### DO NOT write output files into the repository tree
**Always flag** any test that writes to paths relative to `__file__` or the repo
root. This pollutes the working tree and creates untracked noise in `git status`.
**Exception:** The autouse `logger` fixture writes to `test_output/<test_name>/`
by design -- this is sanctioned shared infra, not ad-hoc test output. Do not
flag it.
```python
# BAD -- writes into the repo alongside the test file; flag this
output = os.path.join(os.path.dirname(__file__), "scratch_output.txt")
with open(output, "w") as f:
f.write("debug output\n")
# GOOD -- use tmp_path; cleaned up automatically
def test_example(tmp_path):
output = tmp_path / "scratch_output.txt"
output.write_text("debug output\n")
```
### DO NOT write custom engine start/stop logic
Never write your own subprocess management code to launch, health-check, or tear down
engines (vLLM, SGLang, TRT-LLM) or infrastructure (NATS, etcd, frontends). Homegrown
lifecycle code inevitably leaks processes, misses cleanup on failure, or races with
parallel tests.
**Always flag** hand-rolled `subprocess.Popen` / `os.system` / `time.sleep`
patterns for engine or infra lifecycle. Homegrown lifecycle code leaks processes,
misses cleanup on failure, and races with parallel tests.
**Instead:** Use the existing fixtures and context managers:
Use the existing fixtures and context managers:
- **Fixtures:** `runtime_services_dynamic_ports`, `start_services_with_http`,
`start_services_with_grpc`, `start_services_with_mocker`
- **Context managers:** `DynamoFrontendProcess`, `DynamoWorkerProcess`,
`ManagedProcess`, `EtcdServer`, `NatsServer`
These handle health-checking, port allocation, log capture, straggler cleanup,
and graceful teardown automatically. If your test needs something the existing
infrastructure doesn't support, extend the shared fixtures -- don't reinvent them
in your test file.
These handle health-checking, port allocation, log capture, and graceful teardown
automatically. Extend the shared fixtures if needed -- don't reinvent them.
```python
# BAD -- hand-rolled subprocess management
......@@ -127,19 +146,14 @@ def test_example(start_services_with_mocker):
### DO NOT copy-paste test infrastructure -- reuse and refactor
Do not duplicate setup logic, helper functions, or fixture code across test files.
Copy-pasted code means the same bug gets fixed in one place but not the others, and
changing a shared pattern requires hunting down every copy.
**Always flag** duplicated setup logic, helpers, or fixture code across test files.
Copy-pasted infra means bugs get fixed in one copy but not the others.
**Instead:**
- **Reuse existing fixtures and helpers.** Check `tests/conftest.py`,
subdirectory `conftest.py` files, and `tests/utils/` before writing anything new.
- **Extract shared logic into fixtures or utility functions.** If two or more tests
need the same setup, it belongs in a `conftest.py` or `tests/utils/`.
- **Parametrize rather than duplicate.** If tests differ only in config (model,
backend, port count), use `@pytest.mark.parametrize` with indirect fixtures
instead of writing separate test functions.
- Check `tests/conftest.py`, subdirectory `conftest.py` files, and `tests/utils/`
before writing anything new.
- If two or more tests share setup, extract it into a fixture or `tests/utils/`.
- If tests differ only in config, use `@pytest.mark.parametrize` with indirect
fixtures instead of separate functions.
```python
# BAD -- same setup copy-pasted across three test files
......@@ -162,17 +176,15 @@ def test_vllm_requests(start_serve_deployment, payload_fn):
assert resp.status_code == 200
```
When the existing infrastructure doesn't fit your needs, extend the shared code
(add a parameter to a fixture, add a helper to `tests/utils/`) rather than forking
a private copy.
Extend shared code rather than forking a private copy.
---
## Markers
`--strict-markers` and `--strict-config` are enforced in `pyproject.toml`. Using an
undefined marker **fails collection**. All markers must be registered in both
`pyproject.toml [tool.pytest.ini_options].markers` and `tests/conftest.py:pytest_configure`.
`--strict-markers` and `--strict-config` are enforced. Using an undefined marker
**fails collection**. Register all markers in both `pyproject.toml` and
`tests/conftest.py:pytest_configure`.
### Required markers
......@@ -201,6 +213,9 @@ CI compute is finite. Choose placement carefully:
- Only use `pre_merge` for tests that are **absolutely critical** -- every pre-merge
test slows down every PR for every contributor.
- **Tests averaging over 60 seconds should default to `post_merge`** unless they
guard a critical path that justifies blocking every PR. If a test must stay
`pre_merge` despite being slow, add a comment explaining why.
- E2E tests involve more components and tend to be flakier. Prefer `post_merge` for
E2E tests unless they guard a critical path.
- Consider `nightly` or `weekly` for expensive, GPU-heavy, or stress tests.
......@@ -212,15 +227,34 @@ Apply when the test depends on a specific inference backend:
### Timeouts
Tests that run longer than 30 seconds **must** have `@pytest.mark.timeout(<seconds>)`.
Set the timeout to **3x the measured average duration** to absorb variance.
Tests over 30 seconds **must** have `@pytest.mark.timeout(<seconds>)`. Set the
timeout to **3x measured average** to absorb variance.
**Always flag** any test that runs over 30 seconds or contains `time.sleep()`,
polling loops, network calls, or subprocess waits but lacks a
`@pytest.mark.timeout(...)` marker. This is a **required change, not a style
suggestion** -- a missing timeout can hang CI indefinitely.
Measure your test 5-10 times, then add a timing comment:
**Also flag** real `time.sleep()` in `pre_merge` + `unit` tests. Unit tests
should not burn wall-clock time. Mock the sleep, use shared fixtures, or
reclassify as `integration`/`e2e` with `@pytest.mark.slow`.
```python
# BAD -- sleeps and loops with no timeout marker; can hang CI forever
@pytest.mark.pre_merge
@pytest.mark.gpu_0
@pytest.mark.unit
def test_poll_server():
for _ in range(50):
time.sleep(0.1)
assert True
# GOOD -- timeout prevents infinite hangs
@pytest.mark.timeout(300) # ~100s average, 3x buffer
@pytest.mark.pre_merge
@pytest.mark.gpu_0
@pytest.mark.unit
def test_vllm_aggregated(...):
# on average this test takes about 1.5 minutes
...
```
......@@ -251,30 +285,119 @@ def test_vllm_aggregated(start_serve_deployment):
## Async Tests
`asyncio_mode = "auto"` is configured in `pyproject.toml`. Do **not** add
`@pytest.mark.asyncio` manually -- all `async def test_*` functions are collected
automatically.
`asyncio_mode = "auto"` is configured in `pyproject.toml`, so all `async def test_*`
functions are collected automatically.
## Hermetic Testing
**Always flag** `@pytest.mark.asyncio` -- this is a **required change, not a style
suggestion**. The marker is never needed in this repo. Remove it.
Tests must be isolated and must not interfere with each other. Every test should:
```python
# BAD -- redundant marker; asyncio_mode = "auto" handles this
@pytest.mark.asyncio
@pytest.mark.pre_merge
@pytest.mark.gpu_0
@pytest.mark.unit
async def test_async_endpoint():
await asyncio.sleep(0.01)
# GOOD -- no marker needed, pytest collects it automatically
@pytest.mark.pre_merge
@pytest.mark.gpu_0
@pytest.mark.unit
async def test_async_endpoint():
await asyncio.sleep(0.01)
```
- Run reliably in any order, on any machine, at any time.
- Produce deterministic results.
- Not create side-effects for other tests.
- Clean up properly after itself.
- Fail fast when something is wrong.
## Hermetic Testing
Given enough resources, multiple tests must be able to execute in parallel without
conflicts or race conditions. See the **Critical Rules** section at the top for the
three most important requirements (dynamic ports, temp paths, shared fixtures).
Tests must be isolated. Every test must run in any order, on any machine, and
produce deterministic results with no side-effects. Multiple tests must be able
to execute in parallel without conflicts.
### Additional anti-patterns
- **Reusing namespace/component/endpoint names** across tests that share a
registration service. Use unique names per test.
- **Leaking environment variables**. Use `monkeypatch.setenv()` or save/restore
patterns so env changes don't persist across tests.
- **Module-level mutable state.** **Always flag** any mutable object (`{}`, `[]`,
`set()`) at module scope that tests read or write. This makes tests
order-dependent and produces phantom xdist failures.
```python
# BAD -- module-level dict shared across all tests; flag this
_shared_results = {}
def test_a():
_shared_results["worker-1"] = "registered"
def test_b():
# Passes only if test_a ran first!
assert _shared_results["worker-1"] == "registered"
# GOOD -- each test gets its own state
@pytest.fixture
def results():
return {}
def test_a(results):
results["worker-1"] = "registered"
assert results["worker-1"] == "registered"
```
- **Colliding `dyn://` registration paths across tests.** Dynamo workers register
under `dyn://{namespace}.{component}.{endpoint}` in etcd/NATS. Hardcoding
namespace, component, and endpoint strings is fine on its own -- the problem
is when two tests that share an etcd/NATS instance use the **same full path**,
causing flaky collisions under parallel execution.
**Always flag** tests whose full `dyn://` path can collide with another test's.
The simplest fix is to randomize at least one segment (typically namespace).
```python
# BAD -- two tests using this identical path will collide
namespace = "dynamo"
component = "backend"
endpoint = f"dyn://{namespace}.{component}.generate"
# GOOD -- unique namespace prevents collisions; component/endpoint can stay fixed
from tests.router.common import generate_random_suffix
namespace = f"dynamo-{generate_random_suffix()}"
component = "backend"
endpoint = f"dyn://{namespace}.{component}.generate"
```
- **Leaking environment variables.** **Always flag** direct `os.environ[...] = ...`
or `os.environ.update(...)` in tests. These mutations persist into subsequent
tests and cause order-dependent failures.
```python
# BAD -- env var leaks into every test that runs after this one; flag this
def test_service_discovery():
os.environ["NATS_SERVER"] = "nats://rogue-server:4222"
assert connect()
# GOOD -- monkeypatch auto-restores after each test
def test_service_discovery(monkeypatch):
monkeypatch.setenv("NATS_SERVER", "nats://rogue-server:4222")
assert connect()
```
- **Mutable default arguments in test helpers.** **Always flag** any function
with a mutable default (`[]`, `{}`, `set()`). Defaults are evaluated once and
shared across calls, so mutations accumulate silently between invocations.
```python
# BAD -- registry list is shared across all calls; flag this
def register_workers(new_worker, registry=[]):
registry.append(new_worker)
return registry
# GOOD -- None sentinel, fresh list each call
def register_workers(new_worker, registry=None):
if registry is None:
registry = []
registry.append(new_worker)
return registry
```
See also: `python-guidelines.md` > "Mutable default arguments" for the general rule.
### Optimization tips
......@@ -341,6 +464,26 @@ triggers a new warning, either fix the root cause or add a targeted ignore in
- Catch only specific exceptions you can actually handle.
- Prefer fixtures for setup/teardown over try/finally in test bodies.
## Linter Suppression (`# noqa`)
**Always flag** `# noqa` that suppresses warnings for anti-patterns documented in
these guidelines. If the linter caught a real problem (`E711` for `== None`,
`E712` for `== True`, `F841` for unused variable), fix the code.
```python
# BAD -- noqa hides the very bug the linter caught; flag this
assert error == None # noqa: E711
result = compute() # noqa: F841
# GOOD -- fix the code
assert error is None
result = compute()
assert result == expected
```
The only acceptable `# noqa` is for genuine false positives. Always explain:
`# noqa: F401 -- imported for side-effects`.
## Test File Organization
```
......
......@@ -9,16 +9,19 @@ Exceptions exist, but they should be rare and justified in a code comment.
### Keep imports at the top of the file
Avoid importing inside functions because it hides dependencies, makes the module
harder to understand at a glance, and can mask missing packages until a specific
code path is hit at runtime.
**Always flag** any `import` statement that appears inside a function body, method,
or class. Imports inside functions hide dependencies, make the module harder to
understand at a glance, and can mask missing packages until a specific code path
is hit at runtime.
Avoid wrapping imports in `try/except` because it creates two execution modes --
one with the library and one without -- which doubles the testing surface and
produces confusing behavior when the dependency is unexpectedly absent.
**Always flag** `try/except ImportError` around imports (except for the documented
exceptions below). This pattern creates two execution modes -- one with the library
and one without -- which doubles the testing surface and produces confusing
behavior when the dependency is unexpectedly absent.
```python
# BAD -- import inside function; dependency is invisible until this path runs
# ALWAYS FLAG THIS PATTERN
def process_data():
import json
return json.loads(data)
......@@ -129,29 +132,40 @@ except Exception as e:
### NO defensive `getattr()` on known types
When an object has a known type with known attributes, access them directly.
Using `getattr()` with a default hides bugs by silently returning a fallback
when the attribute should always exist.
**Always flag** `getattr(obj, "attr", default)` when the object's type is known
and the attribute is part of its definition (class attribute, `__init__` parameter,
dataclass field, etc.). Using `getattr()` with a default hides bugs by silently
returning a fallback when the attribute should always exist. Direct attribute
access fails loudly if the type contract changes, which is what you want.
```python
# BAD -- node has a job_name attribute, getattr hides AttributeError
name = getattr(node, "job_name", "")
# BAD -- cfg is a ServiceConfig with host/port; getattr hides AttributeError
# ALWAYS FLAG THIS PATTERN
cfg = ServiceConfig(host="0.0.0.0", port=8080)
host = getattr(cfg, "host", "localhost")
port = getattr(cfg, "port", 9999)
# GOOD -- direct access, fails loudly if something is wrong
name = node.job_name
host = cfg.host
port = cfg.port
```
---
## Common Pitfalls
## Anti-Patterns That Must Be Flagged in Review
Every item below is a **mandatory review check**. If any of these patterns appear
in a pull request, flag it and request changes. These are not style preferences --
they are sources of real bugs, resource leaks, and CI flakiness.
### Mutable default arguments
Default argument values are evaluated once at function definition time, not on
each call. A mutable default (list, dict, set) is shared across all invocations.
**Always flag** any function whose default argument is a mutable object (`[]`, `{}`,
`set()`). Default values are evaluated once at function definition time and shared
across all calls, so mutations accumulate silently between invocations.
```python
# BAD -- the list is shared across all calls
# BAD -- the list is shared across all calls; flag this
def add_item(item, items=[]):
items.append(item)
return items
......@@ -159,7 +173,7 @@ def add_item(item, items=[]):
add_item("a") # ["a"]
add_item("b") # ["a", "b"] -- not ["b"]!
# GOOD
# GOOD -- use None sentinel, create a new list each call
def add_item(item, items=None):
if items is None:
items = []
......@@ -167,13 +181,15 @@ def add_item(item, items=None):
return items
```
### Always use context managers for resources
### Leaked file handles -- always use context managers
Files, network connections, subprocesses, and locks should be opened with `with`
so they are released even if an exception occurs.
**Always flag** any `open()` call that is not wrapped in a `with` statement.
Files, network connections, subprocesses, and locks must be opened with `with`
so they are released even if an exception occurs. Bare `open()` followed by
manual `.close()` leaks the handle when an exception fires between the two calls.
```python
# BAD -- file handle leaks if json.load raises
# BAD -- file handle leaks if json.load raises; flag this
f = open("data.json")
data = json.load(f)
f.close()
......@@ -188,14 +204,17 @@ etcd/NATS connections, and temp directories. Use `ManagedProcess`,
`tempfile.TemporaryDirectory`, and similar context managers rather than
manual setup/teardown.
### Do not shadow built-in names
### Shadowing built-in names
Naming a variable `list`, `dict`, `id`, `type`, `input`, `open`, or `format`
overwrites the built-in and causes confusing errors later in the same scope.
**Always flag** any variable named `list`, `dict`, `id`, `type`, `input`, `open`,
`format`, `set`, `map`, `filter`, `range`, `str`, `int`, `float`, `bool`, `bytes`,
`tuple`, `hash`, `len`, `min`, `max`, `sum`, `any`, `all`, `zip`, `enumerate`,
`sorted`, `reversed`, or `next`. Assigning to these names overwrites the built-in
and causes confusing `TypeError`s later in the same scope.
```python
# BAD
list = get_items() # shadows built-in list()
# BAD -- shadows built-in list(); flag this
list = get_items()
filtered = list(some_gen) # TypeError: 'list' object is not callable
# GOOD
......@@ -205,26 +224,30 @@ filtered = list(some_gen)
### Use `is` for None / True / False comparisons
`==` invokes `__eq__`, which can be overridden. `is` checks identity, which is
what you want for singletons.
**Always flag** `== None`, `== True`, `== False`, `!= None`, `!= True`, and
`!= False`. These invoke `__eq__`, which can be overridden and produce
surprising results. Use `is` / `is not` for singleton comparisons.
```python
# BAD
# BAD -- flag these
if result == None:
if flag == True:
if done == False:
# GOOD
if result is None:
if flag is True: # or just: if flag:
if not done:
```
### Do not modify a collection while iterating
Adding or removing items during iteration causes skipped elements or
`RuntimeError`. Build a new collection or iterate over a copy.
**Always flag** any loop that adds, removes, or deletes from the collection it is
iterating over. This causes skipped elements, `RuntimeError` (for dicts), or
infinite loops. Build a new collection or iterate over a copy.
```python
# BAD -- skips elements
# BAD -- RuntimeError on dict, skips elements on list; flag this
for item in items:
if item.is_stale():
items.remove(item)
......@@ -235,25 +258,28 @@ items = [item for item in items if not item.is_stale()]
### Prefer `join()` over string concatenation in loops
Repeated `+=` on strings creates a new string object each time, which is O(n^2)
for large loops.
**Always flag** `+=` on a string variable inside a loop. Repeated `+=` on strings
creates a new string object each time, which is O(n^2) for large loops.
```python
# BAD
# BAD -- O(n^2) string building; flag this
result = ""
for line in lines:
result += line + "\n"
# GOOD
# GOOD -- O(n) with join
result = "\n".join(lines)
```
### Watch out for late-binding closures in loops
### Late-binding closures in loops
Closures capture the variable reference, not its value at the time of creation.
**Always flag** lambdas or inner functions created inside a loop that reference the
loop variable without binding it as a default argument. Closures capture the
variable reference, not its value at the time of creation, so all closures end up
with the final loop value.
```python
# BAD -- all lambdas return 4 (the final value of i)
# BAD -- all lambdas return 4 (the final value of i); flag this
fns = [lambda: i for i in range(5)]
[f() for f in fns] # [4, 4, 4, 4, 4]
......@@ -264,11 +290,13 @@ fns = [lambda i=i: i for i in range(5)]
### Do not use `assert` for runtime validation
Assertions are stripped when Python runs with `-O` (optimize). Use explicit
`if/raise` for validation that must always execute.
**Always flag** `assert` statements used to validate function arguments, request
payloads, configuration, or any data that comes from outside the current function.
Assertions are stripped when Python runs with `-O` (optimize), silently removing
the validation. Use explicit `if/raise` for checks that must always execute.
```python
# BAD -- silently skipped under python -O
# BAD -- silently skipped under python -O; flag this
assert user_id is not None, "user_id required"
# GOOD
......@@ -364,15 +392,6 @@ See the **Critical Rules** section above for the full policy. Summary:
Be careful with escaping in raw strings (`\s` vs `\\s`). When changing a critical
regex, add a one-line test to prove it matches.
## License Headers
Every `.py` file must have the SPDX header:
```python
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
```
## Import Order
Imports are sorted by isort with `profile = "black"` (configured in `pyproject.toml`).
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment