docs: testing and container docs update (#6806)

Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>

docs: testing and container docs update (#6806)
Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
c55f38dc · Keiven C · GitHub · adc95380 · c55f38dc · c55f38dc
Unverified Commit c55f38dc authored Mar 04, 2026 by Keiven C Committed by GitHub Mar 04, 2026
Hide whitespace changes
Inline Side-by-side

Showing with 52 additions and 28 deletions

container/README.md container/README.md +44 -21

tests/README.md tests/README.md +8 -7

No files found.
--- a/container/README.md
+++ b/container/README.md
@@ -79,13 +79,13 @@ The scripts in this directory abstract away the complexity of Docker commands wh
 ### Convenience Scripts vs Direct Docker Commands
-The `run.sh` script and rendering scripts are convenience that simplify common Docker operations. They automatically handle:
+The `run.sh` script and rendering scripts are conveniences that simplify common Docker operations. They automatically handle:
 - GPU access configuration and runtime selection
 - Volume mount setup for development workflows
 - Environment variable management
 - Build argument construction for multi-stage builds
-**You can always use Docker commands directly** if you prefer more control or want to customize beyond what the scripts provide. The `run.sh` uses a `--dry-run` flag to show you the exact commands they would execute, making it easy to understand and modify the underlying operations.
+**You can always use Docker commands directly** if you prefer more control or want to customize beyond what the scripts provide. `run.sh` supports a `--dry-run` flag to show you the exact commands they would execute, making it easy to understand and modify the underlying operations.
 ## Development Targets Feature Matrix
@@ -115,7 +115,7 @@ The `run.sh` script and rendering scripts are convenience that simplify common D
 ### 1. runtime target (runs as non-root dynamo user):
 ```bash
 # Build runtime image
-python container/render.py --framework vllm --target runtime --output-short-filename
+container/render.py --framework vllm --target runtime --output-short-filename
 docker build -t dynamo:latest-vllm-runtime -f container/rendered.Dockerfile .
 # Run runtime container
@@ -225,20 +225,22 @@ Note: `uv` commands set `UV_CACHE_DIR` per `RUN` so `uv` always uses the same pa
 **Common Usage Examples:**
 ```bash
-# Build vLLM dev image called dynamo:latest-vllm (default). This runs as root and is for development.
+# Build a vLLM local-dev image called dynamo:latest-vllm-local-dev. The local-dev image will run as `dynamo` with UID/GID matched to your host user,
-python container/render.py --framework=vllm --target=dev --output-short-filename
-docker build -t dynamo:latest-vllm-dev -f container/rendered.Dockerfile .
-# Build a local-dev image. The local-dev image will run as `dynamo` with UID/GID matched to your host user,
 # which is useful when mounting partitions for development.
-python container/render.py --framework=vllm --target=local-dev --output-short-filename
+container/render.py --framework=vllm --target=local-dev --output-short-filename
 docker build --build-arg USER_UID=$(id -u) --build-arg USER_GID=$(id -g) -f container/rendered.Dockerfile -t dynamo:latest-vllm-local-dev .
-# Build TensorRT-LLM development image called dynamo:latest-trtllm
+# Build TensorRT-LLM runtime image called dynamo:latest-trtllm-runtime
-python container/render.py --framework=trtllm --target=runtime --output-short-filename --cuda-version=13.1
+container/render.py --framework=trtllm --target=runtime --output-short-filename --cuda-version=13.1
 docker build -t dynamo:latest-trtllm-runtime -f container/rendered.Dockerfile .
 ```
+After building, use `run.sh` to launch the container (see [run.sh - Container Runtime Manager](#runsh---container-runtime-manager) below for full options):
+```bash
+# Launch local-dev container with workspace mounted for live editing
+container/run.sh --image dynamo:latest-vllm-local-dev --mount-workspace -it
+```
 ### Building the Frontend Image
 The frontend image is a specialized container that includes the Dynamo components (Dynamo, NIXL, etc) along with the Endpoint Picker (EPP) for Kubernetes Gateway API Inference Extension integration. This image is primarily used for inference gateway deployments.
@@ -261,7 +263,7 @@ EPP_IMAGE="dynamo/dynamo-epp:${EPP_GIT_TAG}"
 **Build Frontend Image**
 ```bash
 # Build the frontend image (automatically builds EPP image as a dependency)
-python container/render.py --framework=dynamo --target=frontend --output-short-filename
+container/render.py --framework=dynamo --target=frontend --output-short-filename
 docker build -t dynamo:frontend --build-arg EPP_IMAGE=${EPP_IMAGE} -f container/rendered.Dockerfile .
 ```
@@ -421,14 +423,19 @@ See Docker documentation for custom network creation and management.
 ### Development Workflow
 ```bash
 # 1. Build local-dev image (builds runtime, then dev as intermediate, then local-dev as final image)
-python container/render.py --framework=vllm --target=local-dev --output-short-filename
+container/render.py --framework=vllm --target=local-dev --output-short-filename
 docker build --build-arg USER_UID=$(id -u) --build-arg USER_GID=$(id -g) -f container/rendered.Dockerfile -t dynamo:latest-vllm-local-dev .
 # 2. Run development container using the local-dev image
 # RECOMMENDED: --mount-workspace for live editing in dev and local-dev images
 container/run.sh --image dynamo:latest-vllm-local-dev --mount-workspace -v $HOME/.cache:/home/dynamo/.cache -it
-# 3. Inside container, run inference (requires both frontend and backend)
+# From this point forward, commands run inside the container started in step 2.
+# 3. Sanity check (optional but recommended)
+deploy/sanity_check.py
+# 4. Run inference (requires both frontend and backend)
 # Start frontend
 python -m dynamo.frontend &
@@ -439,7 +446,7 @@ python -m dynamo.vllm --model Qwen/Qwen3-0.6B --gpu-memory-utilization 0.20 &
 ### Production Workflow
 ```bash
 # 1. Build production runtime image (runs as non-root dynamo user)
-python container/render.py --framework=vllm --target=runtime --output-short-filename
+container/render.py --framework=vllm --target=runtime --output-short-filename
 docker build -t dynamo:latest-vllm-runtime -f container/rendered.Dockerfile .
 # 2. Run production container as non-root dynamo user
@@ -449,19 +456,35 @@ container/run.sh --image dynamo:latest-vllm-runtime --gpus all -v $HOME/.cache:/
 ### Testing Workflow
 ```bash
 # 1. Build dev image
-python container/render.py --framework=vllm --target=dev --output-short-filename
+container/render.py --framework=vllm --target=dev --output-short-filename
 docker build -t dynamo:latest-vllm-dev -f container/rendered.Dockerfile .
-# 2. Run tests with network isolation for reproducible results (no -it needed for CI)
+# 2. Launch the container
-container/run.sh --image dynamo:latest-vllm --mount-workspace --network bridge -v $HOME/.cache:/home/dynamo/.cache -- python -m pytest tests/
+# Without --network (default: host networking, ports shared with host -- simplest for development)
+container/run.sh --image dynamo:latest-vllm-dev --mount-workspace -v $HOME/.cache:/home/dynamo/.cache -it
+# Or with --network bridge (isolated networking, no port conflicts with host)
+container/run.sh --image dynamo:latest-vllm-dev --mount-workspace --network bridge -v $HOME/.cache:/home/dynamo/.cache -it
+# From this point forward, commands run inside the container started in step 2.
-# 3. Inside the container with bridge networking, start services
+# 3. Start infrastructure services
-# Note: Services are only accessible from the same container - no port conflicts with host
 nats-server -js &
 etcd --listen-client-urls http://0.0.0.0:2379 --advertise-client-urls http://0.0.0.0:2379 --data-dir /tmp/etcd &
+# 4. Compile code
+cargo build --locked --features dynamo-llm/block-manager --workspace
+cd lib/bindings/python && maturin develop --uv && cd -
+# 5. Sanity check (optional but recommended)
+deploy/sanity_check.py --runtime-check-only
+# 6. Run tests
+python -m pytest tests/
+# 7. (Optional) Start frontend and backend for interactive testing
 python -m dynamo.frontend &
-# 4. Start worker backend (choose one framework):
+# Start worker backend (choose one framework):
 # vLLM
 DYN_SYSTEM_PORT=8081 python -m dynamo.vllm --model Qwen/Qwen3-0.6B --gpu-memory-utilization 0.20 --enforce-eager --no-enable-prefix-caching --max-num-seqs 64 &

--- a/tests/README.md
+++ b/tests/README.md
@@ -13,7 +13,7 @@ All tests run inside containers. See the [Container Development Guide](../contai
 Each area can have one or more of the following types of tests:
 1. **Unit** -- Exercises a single function, class, or module in isolation. No external services, no GPU. Each test typically runs in milliseconds; all unit tests combined may take <5 minutes.
-2. **Integration** -- Wires multiple components together using **mock engines** (`dynamo.mocker`) and **real infrastructure** (ETCD for service discovery, NATS for messaging if enabled). Validates that the router, planner, frontend gRPC, and similar subsystems work together without launching a real inference engine. No GPU required. Each test typically runs in seconds; all integration tests combined may take <30 minutes.
+2. **Integration** -- Wires multiple components together using **mock engines** (`dynamo.mocker`) and **real infrastructure** (ETCD for service discovery, NATS for messaging, if enabled). Validates that the router, planner, frontend gRPC, and similar subsystems work together without launching a real inference engine. No GPU required. Each test typically runs in seconds; all integration tests combined may take <30 minutes.
 3. **End-to-End (E2E)** -- Starts a **real inference engine** (vLLM, SGLang, or TRT-LLM), sends requests through the frontend, and validates responses. Requires GPU. Each test typically runs in minutes; the full E2E suite may take several hours.
 It is absolutely important to be mindful of how long a test you write takes. Slow tests have a compounding cost: they burn GPU-hours in CI (GPUs are expensive and shared), they discourage engineers from running suites locally (so bugs slip through to CI), and they slow down the entire team's development velocity. A test suite that takes too long becomes a test suite that nobody runs. When adding or modifying tests, include a per-test time estimate in your PR description -- CI GPU resources are limited and these estimates help the team schedule tests across pre-merge, nightly, and weekly pipelines.
@@ -107,8 +107,8 @@ dynamo/
 Markers are required for all tests. They are used for test selection in CI and local runs.
 ### Marker Requirements
- Every test must have at least one **Lifecycle** marker, and **test type** and **Hardware** markers.
+- Every test must have at least one **Lifecycle** marker, and **Test Type** and **Hardware** markers.
- **component** markers are required as applicable.
+- **Component/Framework** markers are required as applicable.
 ### Marker Table
 | Category                | Marker(s)                                                        | Description                        |
@@ -181,7 +181,6 @@ cargo test --features integration
 ```
 ### Additional Options
 - **Feature gates:** Use Cargo features to run specific test subsets, e.g. `cargo test --features planner`. Integration tests must be behind the `integration` feature gate.
 - **Ignored tests:** Use `#[ignore]` to mark slow or special-case tests. Run them explicitly with `cargo test -- --ignored`.
@@ -216,7 +215,9 @@ This section assumes you are already inside a running **runtime**, **local-dev**
 1. Build a development container (`render.py ...` + `docker build ...`)
 2. Launch it (`run.sh ...`)
-3. Inside the container, compile code and run tests (see below)
+3. Inside the container, compile code and run tests
+All commands below are meant to be run **inside the container**.
 **Local-dev / dev containers** -- you must compile the Rust bindings before running pytest. Without this step, tests that import `dynamo._internal` will fail with `ImportError`:
 ```bash
@@ -302,7 +303,7 @@ pytest -m "(pre_merge or post_merge) and vllm and gpu_1" -v --tb=short
 ### Running tests locally outside of a container
-To run tests outside of the development container, ensure that you have properly setup your environment and have installed the following dependencies in your `venv`:
+To run tests outside of the development container, ensure that you have properly set up your environment and have installed the following dependencies in your `venv`:
 ```bash
 uv pip install pytest-mypy
@@ -341,7 +342,7 @@ Runs per framework (vllm, sglang, trtllm). Each framework goes through: **Build*
 | Stage | What it does | Local equivalent |
 |-------|-------------|-----------------|
-| Build image | Render Dockerfile, build runtime container | `python container/render.py --framework=vllm --target=runtime && docker build ...` |
+| Build image | Render Dockerfile, build runtime container | `container/render.py --framework=vllm --target=runtime && docker build ...` |
 | Sanity check | Verify packages are installed in the image | `docker run --rm <image> /workspace/deploy/sanity_check.py --runtime-check --no-gpu-check` |
 | CPU-only tests (parallel) | `(pre_merge or post_merge) and <framework> and gpu_0` | `pytest -m "(pre_merge or post_merge) and vllm and gpu_0" -n auto --dist=loadscope -v --tb=short` |
 | Single GPU tests (sequential) | `(pre_merge or post_merge) and <framework> and gpu_1` | `pytest -m "(pre_merge or post_merge) and vllm and gpu_1" -v --tb=short` |