Unverified Commit c55f38dc authored by Keiven C's avatar Keiven C Committed by GitHub
Browse files

docs: testing and container docs update (#6806)


Signed-off-by: default avatarKeiven Chang <keivenchang@users.noreply.github.com>
parent adc95380
...@@ -79,13 +79,13 @@ The scripts in this directory abstract away the complexity of Docker commands wh ...@@ -79,13 +79,13 @@ The scripts in this directory abstract away the complexity of Docker commands wh
### Convenience Scripts vs Direct Docker Commands ### Convenience Scripts vs Direct Docker Commands
The `run.sh` script and rendering scripts are convenience that simplify common Docker operations. They automatically handle: The `run.sh` script and rendering scripts are conveniences that simplify common Docker operations. They automatically handle:
- GPU access configuration and runtime selection - GPU access configuration and runtime selection
- Volume mount setup for development workflows - Volume mount setup for development workflows
- Environment variable management - Environment variable management
- Build argument construction for multi-stage builds - Build argument construction for multi-stage builds
**You can always use Docker commands directly** if you prefer more control or want to customize beyond what the scripts provide. The `run.sh` uses a `--dry-run` flag to show you the exact commands they would execute, making it easy to understand and modify the underlying operations. **You can always use Docker commands directly** if you prefer more control or want to customize beyond what the scripts provide. `run.sh` supports a `--dry-run` flag to show you the exact commands they would execute, making it easy to understand and modify the underlying operations.
## Development Targets Feature Matrix ## Development Targets Feature Matrix
...@@ -115,7 +115,7 @@ The `run.sh` script and rendering scripts are convenience that simplify common D ...@@ -115,7 +115,7 @@ The `run.sh` script and rendering scripts are convenience that simplify common D
### 1. runtime target (runs as non-root dynamo user): ### 1. runtime target (runs as non-root dynamo user):
```bash ```bash
# Build runtime image # Build runtime image
python container/render.py --framework vllm --target runtime --output-short-filename container/render.py --framework vllm --target runtime --output-short-filename
docker build -t dynamo:latest-vllm-runtime -f container/rendered.Dockerfile . docker build -t dynamo:latest-vllm-runtime -f container/rendered.Dockerfile .
# Run runtime container # Run runtime container
...@@ -225,20 +225,22 @@ Note: `uv` commands set `UV_CACHE_DIR` per `RUN` so `uv` always uses the same pa ...@@ -225,20 +225,22 @@ Note: `uv` commands set `UV_CACHE_DIR` per `RUN` so `uv` always uses the same pa
**Common Usage Examples:** **Common Usage Examples:**
```bash ```bash
# Build vLLM dev image called dynamo:latest-vllm (default). This runs as root and is for development. # Build a vLLM local-dev image called dynamo:latest-vllm-local-dev. The local-dev image will run as `dynamo` with UID/GID matched to your host user,
python container/render.py --framework=vllm --target=dev --output-short-filename
docker build -t dynamo:latest-vllm-dev -f container/rendered.Dockerfile .
# Build a local-dev image. The local-dev image will run as `dynamo` with UID/GID matched to your host user,
# which is useful when mounting partitions for development. # which is useful when mounting partitions for development.
python container/render.py --framework=vllm --target=local-dev --output-short-filename container/render.py --framework=vllm --target=local-dev --output-short-filename
docker build --build-arg USER_UID=$(id -u) --build-arg USER_GID=$(id -g) -f container/rendered.Dockerfile -t dynamo:latest-vllm-local-dev . docker build --build-arg USER_UID=$(id -u) --build-arg USER_GID=$(id -g) -f container/rendered.Dockerfile -t dynamo:latest-vllm-local-dev .
# Build TensorRT-LLM development image called dynamo:latest-trtllm # Build TensorRT-LLM runtime image called dynamo:latest-trtllm-runtime
python container/render.py --framework=trtllm --target=runtime --output-short-filename --cuda-version=13.1 container/render.py --framework=trtllm --target=runtime --output-short-filename --cuda-version=13.1
docker build -t dynamo:latest-trtllm-runtime -f container/rendered.Dockerfile . docker build -t dynamo:latest-trtllm-runtime -f container/rendered.Dockerfile .
``` ```
After building, use `run.sh` to launch the container (see [run.sh - Container Runtime Manager](#runsh---container-runtime-manager) below for full options):
```bash
# Launch local-dev container with workspace mounted for live editing
container/run.sh --image dynamo:latest-vllm-local-dev --mount-workspace -it
```
### Building the Frontend Image ### Building the Frontend Image
The frontend image is a specialized container that includes the Dynamo components (Dynamo, NIXL, etc) along with the Endpoint Picker (EPP) for Kubernetes Gateway API Inference Extension integration. This image is primarily used for inference gateway deployments. The frontend image is a specialized container that includes the Dynamo components (Dynamo, NIXL, etc) along with the Endpoint Picker (EPP) for Kubernetes Gateway API Inference Extension integration. This image is primarily used for inference gateway deployments.
...@@ -261,7 +263,7 @@ EPP_IMAGE="dynamo/dynamo-epp:${EPP_GIT_TAG}" ...@@ -261,7 +263,7 @@ EPP_IMAGE="dynamo/dynamo-epp:${EPP_GIT_TAG}"
**Build Frontend Image** **Build Frontend Image**
```bash ```bash
# Build the frontend image (automatically builds EPP image as a dependency) # Build the frontend image (automatically builds EPP image as a dependency)
python container/render.py --framework=dynamo --target=frontend --output-short-filename container/render.py --framework=dynamo --target=frontend --output-short-filename
docker build -t dynamo:frontend --build-arg EPP_IMAGE=${EPP_IMAGE} -f container/rendered.Dockerfile . docker build -t dynamo:frontend --build-arg EPP_IMAGE=${EPP_IMAGE} -f container/rendered.Dockerfile .
``` ```
...@@ -421,14 +423,19 @@ See Docker documentation for custom network creation and management. ...@@ -421,14 +423,19 @@ See Docker documentation for custom network creation and management.
### Development Workflow ### Development Workflow
```bash ```bash
# 1. Build local-dev image (builds runtime, then dev as intermediate, then local-dev as final image) # 1. Build local-dev image (builds runtime, then dev as intermediate, then local-dev as final image)
python container/render.py --framework=vllm --target=local-dev --output-short-filename container/render.py --framework=vllm --target=local-dev --output-short-filename
docker build --build-arg USER_UID=$(id -u) --build-arg USER_GID=$(id -g) -f container/rendered.Dockerfile -t dynamo:latest-vllm-local-dev . docker build --build-arg USER_UID=$(id -u) --build-arg USER_GID=$(id -g) -f container/rendered.Dockerfile -t dynamo:latest-vllm-local-dev .
# 2. Run development container using the local-dev image # 2. Run development container using the local-dev image
# RECOMMENDED: --mount-workspace for live editing in dev and local-dev images # RECOMMENDED: --mount-workspace for live editing in dev and local-dev images
container/run.sh --image dynamo:latest-vllm-local-dev --mount-workspace -v $HOME/.cache:/home/dynamo/.cache -it container/run.sh --image dynamo:latest-vllm-local-dev --mount-workspace -v $HOME/.cache:/home/dynamo/.cache -it
# 3. Inside container, run inference (requires both frontend and backend) # From this point forward, commands run inside the container started in step 2.
# 3. Sanity check (optional but recommended)
deploy/sanity_check.py
# 4. Run inference (requires both frontend and backend)
# Start frontend # Start frontend
python -m dynamo.frontend & python -m dynamo.frontend &
...@@ -439,7 +446,7 @@ python -m dynamo.vllm --model Qwen/Qwen3-0.6B --gpu-memory-utilization 0.20 & ...@@ -439,7 +446,7 @@ python -m dynamo.vllm --model Qwen/Qwen3-0.6B --gpu-memory-utilization 0.20 &
### Production Workflow ### Production Workflow
```bash ```bash
# 1. Build production runtime image (runs as non-root dynamo user) # 1. Build production runtime image (runs as non-root dynamo user)
python container/render.py --framework=vllm --target=runtime --output-short-filename container/render.py --framework=vllm --target=runtime --output-short-filename
docker build -t dynamo:latest-vllm-runtime -f container/rendered.Dockerfile . docker build -t dynamo:latest-vllm-runtime -f container/rendered.Dockerfile .
# 2. Run production container as non-root dynamo user # 2. Run production container as non-root dynamo user
...@@ -449,19 +456,35 @@ container/run.sh --image dynamo:latest-vllm-runtime --gpus all -v $HOME/.cache:/ ...@@ -449,19 +456,35 @@ container/run.sh --image dynamo:latest-vllm-runtime --gpus all -v $HOME/.cache:/
### Testing Workflow ### Testing Workflow
```bash ```bash
# 1. Build dev image # 1. Build dev image
python container/render.py --framework=vllm --target=dev --output-short-filename container/render.py --framework=vllm --target=dev --output-short-filename
docker build -t dynamo:latest-vllm-dev -f container/rendered.Dockerfile . docker build -t dynamo:latest-vllm-dev -f container/rendered.Dockerfile .
# 2. Run tests with network isolation for reproducible results (no -it needed for CI) # 2. Launch the container
container/run.sh --image dynamo:latest-vllm --mount-workspace --network bridge -v $HOME/.cache:/home/dynamo/.cache -- python -m pytest tests/ # Without --network (default: host networking, ports shared with host -- simplest for development)
container/run.sh --image dynamo:latest-vllm-dev --mount-workspace -v $HOME/.cache:/home/dynamo/.cache -it
# Or with --network bridge (isolated networking, no port conflicts with host)
container/run.sh --image dynamo:latest-vllm-dev --mount-workspace --network bridge -v $HOME/.cache:/home/dynamo/.cache -it
# From this point forward, commands run inside the container started in step 2.
# 3. Inside the container with bridge networking, start services # 3. Start infrastructure services
# Note: Services are only accessible from the same container - no port conflicts with host
nats-server -js & nats-server -js &
etcd --listen-client-urls http://0.0.0.0:2379 --advertise-client-urls http://0.0.0.0:2379 --data-dir /tmp/etcd & etcd --listen-client-urls http://0.0.0.0:2379 --advertise-client-urls http://0.0.0.0:2379 --data-dir /tmp/etcd &
# 4. Compile code
cargo build --locked --features dynamo-llm/block-manager --workspace
cd lib/bindings/python && maturin develop --uv && cd -
# 5. Sanity check (optional but recommended)
deploy/sanity_check.py --runtime-check-only
# 6. Run tests
python -m pytest tests/
# 7. (Optional) Start frontend and backend for interactive testing
python -m dynamo.frontend & python -m dynamo.frontend &
# 4. Start worker backend (choose one framework): # Start worker backend (choose one framework):
# vLLM # vLLM
DYN_SYSTEM_PORT=8081 python -m dynamo.vllm --model Qwen/Qwen3-0.6B --gpu-memory-utilization 0.20 --enforce-eager --no-enable-prefix-caching --max-num-seqs 64 & DYN_SYSTEM_PORT=8081 python -m dynamo.vllm --model Qwen/Qwen3-0.6B --gpu-memory-utilization 0.20 --enforce-eager --no-enable-prefix-caching --max-num-seqs 64 &
......
...@@ -13,7 +13,7 @@ All tests run inside containers. See the [Container Development Guide](../contai ...@@ -13,7 +13,7 @@ All tests run inside containers. See the [Container Development Guide](../contai
Each area can have one or more of the following types of tests: Each area can have one or more of the following types of tests:
1. **Unit** -- Exercises a single function, class, or module in isolation. No external services, no GPU. Each test typically runs in milliseconds; all unit tests combined may take <5 minutes. 1. **Unit** -- Exercises a single function, class, or module in isolation. No external services, no GPU. Each test typically runs in milliseconds; all unit tests combined may take <5 minutes.
2. **Integration** -- Wires multiple components together using **mock engines** (`dynamo.mocker`) and **real infrastructure** (ETCD for service discovery, NATS for messaging if enabled). Validates that the router, planner, frontend gRPC, and similar subsystems work together without launching a real inference engine. No GPU required. Each test typically runs in seconds; all integration tests combined may take <30 minutes. 2. **Integration** -- Wires multiple components together using **mock engines** (`dynamo.mocker`) and **real infrastructure** (ETCD for service discovery, NATS for messaging, if enabled). Validates that the router, planner, frontend gRPC, and similar subsystems work together without launching a real inference engine. No GPU required. Each test typically runs in seconds; all integration tests combined may take <30 minutes.
3. **End-to-End (E2E)** -- Starts a **real inference engine** (vLLM, SGLang, or TRT-LLM), sends requests through the frontend, and validates responses. Requires GPU. Each test typically runs in minutes; the full E2E suite may take several hours. 3. **End-to-End (E2E)** -- Starts a **real inference engine** (vLLM, SGLang, or TRT-LLM), sends requests through the frontend, and validates responses. Requires GPU. Each test typically runs in minutes; the full E2E suite may take several hours.
It is absolutely important to be mindful of how long a test you write takes. Slow tests have a compounding cost: they burn GPU-hours in CI (GPUs are expensive and shared), they discourage engineers from running suites locally (so bugs slip through to CI), and they slow down the entire team's development velocity. A test suite that takes too long becomes a test suite that nobody runs. When adding or modifying tests, include a per-test time estimate in your PR description -- CI GPU resources are limited and these estimates help the team schedule tests across pre-merge, nightly, and weekly pipelines. It is absolutely important to be mindful of how long a test you write takes. Slow tests have a compounding cost: they burn GPU-hours in CI (GPUs are expensive and shared), they discourage engineers from running suites locally (so bugs slip through to CI), and they slow down the entire team's development velocity. A test suite that takes too long becomes a test suite that nobody runs. When adding or modifying tests, include a per-test time estimate in your PR description -- CI GPU resources are limited and these estimates help the team schedule tests across pre-merge, nightly, and weekly pipelines.
...@@ -107,8 +107,8 @@ dynamo/ ...@@ -107,8 +107,8 @@ dynamo/
Markers are required for all tests. They are used for test selection in CI and local runs. Markers are required for all tests. They are used for test selection in CI and local runs.
### Marker Requirements ### Marker Requirements
- Every test must have at least one **Lifecycle** marker, and **test type** and **Hardware** markers. - Every test must have at least one **Lifecycle** marker, and **Test Type** and **Hardware** markers.
- **component** markers are required as applicable. - **Component/Framework** markers are required as applicable.
### Marker Table ### Marker Table
| Category | Marker(s) | Description | | Category | Marker(s) | Description |
...@@ -181,7 +181,6 @@ cargo test --features integration ...@@ -181,7 +181,6 @@ cargo test --features integration
``` ```
### Additional Options ### Additional Options
- **Feature gates:** Use Cargo features to run specific test subsets, e.g. `cargo test --features planner`. Integration tests must be behind the `integration` feature gate. - **Feature gates:** Use Cargo features to run specific test subsets, e.g. `cargo test --features planner`. Integration tests must be behind the `integration` feature gate.
- **Ignored tests:** Use `#[ignore]` to mark slow or special-case tests. Run them explicitly with `cargo test -- --ignored`. - **Ignored tests:** Use `#[ignore]` to mark slow or special-case tests. Run them explicitly with `cargo test -- --ignored`.
...@@ -216,7 +215,9 @@ This section assumes you are already inside a running **runtime**, **local-dev** ...@@ -216,7 +215,9 @@ This section assumes you are already inside a running **runtime**, **local-dev**
1. Build a development container (`render.py ...` + `docker build ...`) 1. Build a development container (`render.py ...` + `docker build ...`)
2. Launch it (`run.sh ...`) 2. Launch it (`run.sh ...`)
3. Inside the container, compile code and run tests (see below) 3. Inside the container, compile code and run tests
All commands below are meant to be run **inside the container**.
**Local-dev / dev containers** -- you must compile the Rust bindings before running pytest. Without this step, tests that import `dynamo._internal` will fail with `ImportError`: **Local-dev / dev containers** -- you must compile the Rust bindings before running pytest. Without this step, tests that import `dynamo._internal` will fail with `ImportError`:
```bash ```bash
...@@ -302,7 +303,7 @@ pytest -m "(pre_merge or post_merge) and vllm and gpu_1" -v --tb=short ...@@ -302,7 +303,7 @@ pytest -m "(pre_merge or post_merge) and vllm and gpu_1" -v --tb=short
### Running tests locally outside of a container ### Running tests locally outside of a container
To run tests outside of the development container, ensure that you have properly setup your environment and have installed the following dependencies in your `venv`: To run tests outside of the development container, ensure that you have properly set up your environment and have installed the following dependencies in your `venv`:
```bash ```bash
uv pip install pytest-mypy uv pip install pytest-mypy
...@@ -341,7 +342,7 @@ Runs per framework (vllm, sglang, trtllm). Each framework goes through: **Build* ...@@ -341,7 +342,7 @@ Runs per framework (vllm, sglang, trtllm). Each framework goes through: **Build*
| Stage | What it does | Local equivalent | | Stage | What it does | Local equivalent |
|-------|-------------|-----------------| |-------|-------------|-----------------|
| Build image | Render Dockerfile, build runtime container | `python container/render.py --framework=vllm --target=runtime && docker build ...` | | Build image | Render Dockerfile, build runtime container | `container/render.py --framework=vllm --target=runtime && docker build ...` |
| Sanity check | Verify packages are installed in the image | `docker run --rm <image> /workspace/deploy/sanity_check.py --runtime-check --no-gpu-check` | | Sanity check | Verify packages are installed in the image | `docker run --rm <image> /workspace/deploy/sanity_check.py --runtime-check --no-gpu-check` |
| CPU-only tests (parallel) | `(pre_merge or post_merge) and <framework> and gpu_0` | `pytest -m "(pre_merge or post_merge) and vllm and gpu_0" -n auto --dist=loadscope -v --tb=short` | | CPU-only tests (parallel) | `(pre_merge or post_merge) and <framework> and gpu_0` | `pytest -m "(pre_merge or post_merge) and vllm and gpu_0" -n auto --dist=loadscope -v --tb=short` |
| Single GPU tests (sequential) | `(pre_merge or post_merge) and <framework> and gpu_1` | `pytest -m "(pre_merge or post_merge) and vllm and gpu_1" -v --tb=short` | | Single GPU tests (sequential) | `(pre_merge or post_merge) and <framework> and gpu_1` | `pytest -m "(pre_merge or post_merge) and vllm and gpu_1" -v --tb=short` |
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment