Unverified Commit 31642b96 authored by zhongdaor-nv's avatar zhongdaor-nv Committed by GitHub
Browse files

chore: add docs for vllm mm router (#6568)


Signed-off-by: default avatarzhongdaor <zhongdaor@nvidia.com>
Signed-off-by: default avatarzhongdaor-nv <zhongdaor@nvidia.com>
Co-authored-by: default avatarRyan McCormick <rmccormick@nvidia.com>
parent 36cc2ead
<!--
SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
-->
# MM Router Worker (vLLM)
Multimodal-aware KV cache routing worker for vLLM backends.
## Overview
This worker sits between the Dynamo frontend and vLLM workers, providing MM-aware KV cache routing:
1. Receives OpenAI-format requests from the frontend
2. Downloads images and computes `mm_hash` (routing only)
3. Builds multimodal routing metadata (`mm_routing_info`)
4. Uses `KvRouter` to pick the best vLLM worker based on KV overlap
5. Forwards the request to the selected vLLM worker and streams responses back
## Architecture
```text
Frontend (standard) MM Router Worker (this) vLLM Worker (standard)
┌──────────────┐ ┌──────────────────────┐ ┌─────────────────────┐
│ │───────>│ 1. Download images │───────>│ python -m dynamo.vllm│
│ round-robin │ │ 2. Compute mm_hash │ │ --enable-multimodal │
│ to mm_router │<───────│ 3. Build routing │<───────│ (publishes KV events)│
└──────────────┘ │ 4. KvRouter route │ └─────────────────────┘
└──────────────────────┘
v
┌──────────┐
│ NATS │
└──────────┘
```
## Prerequisites
- 1+ GPU with enough memory for your chosen multimodal model
- Docker (for local `etcd` + `nats`)
- Python environment with Dynamo installed (including vLLM backend support and Python bindings)
Throughout this README, assume:
```bash
export DYNAMO_ROOT=/path/to/dynamo
export MODEL_NAME=Qwen/Qwen3-VL-2B-Instruct
```
This guide assumes Dynamo is already installed in your current Python environment.
If the model is gated/private, also set `HF_TOKEN`.
### vLLM Patch Requirement (as of 2026-02-25)
This MM router depends on vLLM PR [`#33304`](https://github.com/vllm-project/vllm/pull/33304) (`[feat] Add per-block extra_keys to KV events`) so Dynamo can reconstruct MM-aware block hashes from KV events.
As of **2026-02-25**:
- PR [`#33304`](https://github.com/vllm-project/vllm/pull/33304) is merged into `vllm-project/vllm:main` (merged on **2026-02-21**)
- the latest vLLM release shown on GitHub releases is **v0.16.0 (2026-02-13)**, which predates this PR
If you are using a released vLLM package, you likely need to patch your installed `site-packages` manually.
Example (patch installed vLLM in-place, `site-packages` layout):
```bash
SITE_PACKAGES_ROOT="$(python - <<'PY'
import pathlib
import vllm
print(pathlib.Path(vllm.__file__).resolve().parent.parent)
PY
)"
cd "$SITE_PACKAGES_ROOT"
# Filter the PR diff to only files under vllm/, since site-packages does not
# contain the full vLLM repo layout (tests/, docs/, etc.).
curl -sL https://github.com/vllm-project/vllm/pull/33304.diff | python3 -c '
import sys
chunks = sys.stdin.read().split("diff --git ")
filtered = [c for c in chunks if c.startswith("a/vllm/")]
print("".join("diff --git " + c for c in filtered), end="")
' > /tmp/vllm_pr33304_vllm_only.diff
patch --dry-run -p1 < /tmp/vllm_pr33304_vllm_only.diff
patch -p1 < /tmp/vllm_pr33304_vllm_only.diff
```
After patching, restart the vLLM backend and MM router processes.
## Usage
### Quick Start
Start `etcd` + `NATS` first (separately), then run the launcher:
```bash
cd "$DYNAMO_ROOT"
docker compose -f deploy/docker-compose.yml up -d
```
```bash
cd "$DYNAMO_ROOT/examples/backends/vllm/mm_router_worker"
./launch.sh
```
Override defaults with environment variables, for example:
```bash
MODEL="$MODEL_NAME" HTTP_PORT=8001 ./launch.sh
```
### Quick Try (Manual, Step-by-Step)
Open 5 terminals.
### Terminal 1: Start `etcd` + `NATS`
```bash
cd "$DYNAMO_ROOT"
docker compose -f deploy/docker-compose.yml up -d
```
### Common Environment (all runtime terminals)
Use the same environment in terminals 2/3/4/5:
```bash
cd "$DYNAMO_ROOT"
export DYN_NAMESPACE=dynamo
export DYN_REQUEST_PLANE=nats
export NATS_SERVER=nats://127.0.0.1:4222
export ETCD_ENDPOINTS=http://127.0.0.1:2379
```
### Terminal 2: Start vLLM Worker #1 (backend)
Use the same model string here and in the MM router.
```bash
cd "$DYNAMO_ROOT"
export DYN_NAMESPACE=dynamo
export DYN_REQUEST_PLANE=nats
export NATS_SERVER=nats://127.0.0.1:4222
export ETCD_ENDPOINTS=http://127.0.0.1:2379
export DYN_SYSTEM_PORT=18081
export DYN_VLLM_KV_EVENT_PORT=20080
python -m dynamo.vllm \
--model "$MODEL_NAME" \
--served-model-name "${MODEL_NAME}__internal_1" \
--enable-multimodal \
--enforce-eager \
--gpu-memory-utilization 0.85 \
--max-model-len 8192
```
Notes:
- Current `dynamo.vllm` default component name is `backend` (used below by the MM router).
- MM-aware routing depends on KV events from the vLLM worker. In current Dynamo builds, KV events are auto-configured when prefix caching is enabled.
- When running multiple vLLM workers on the same host, each worker must use a unique KV events port (for example `20080`, `20081`) via `DYN_VLLM_KV_EVENT_PORT`; otherwise the second worker can fail with `Address already in use (addr='tcp://*:20080')`.
### Terminal 3: Start vLLM Worker #2 (backend)
Start a second backend worker so we can verify the MM router picks the same
worker again for a repeated multimodal request.
```bash
cd "$DYNAMO_ROOT"
export DYN_NAMESPACE=dynamo
export DYN_REQUEST_PLANE=nats
export NATS_SERVER=nats://127.0.0.1:4222
export ETCD_ENDPOINTS=http://127.0.0.1:2379
export DYN_SYSTEM_PORT=18083
export DYN_VLLM_KV_EVENT_PORT=20081
python -m dynamo.vllm \
--model "$MODEL_NAME" \
--served-model-name "${MODEL_NAME}__internal_2" \
--enable-multimodal \
--enforce-eager \
--gpu-memory-utilization 0.85 \
--max-model-len 8192
```
If you are running both workers on a single ~48 GB GPU with `Qwen/Qwen3-VL-2B-Instruct`, replace the resource-related flags in both worker commands with smaller limits, for example:
```bash
--gpu-memory-utilization 0.45 \
--max-model-len 1024 \
--max-num-seqs 1 \
--max-num-batched-tokens 512
```
### Terminal 4: Start MM Router Worker (vLLM)
Important:
- The quickstart command below uses defaults for namespace/component/endpoint wiring
(`dynamo`, `mm_router`, `generate`, `backend`, `generate`) to keep the first run simple.
- If you customize backend/MM router component names, update the MM router CLI args to match.
- `--block-size` defaults to `16`; if your vLLM backend uses a different KV cache block size,
pass the same value to the MM router.
```bash
cd "$DYNAMO_ROOT"
export DYN_NAMESPACE=dynamo
export DYN_REQUEST_PLANE=nats
export NATS_SERVER=nats://127.0.0.1:4222
export ETCD_ENDPOINTS=http://127.0.0.1:2379
export DYN_LOG=debug
python -m examples.backends.vllm.mm_router_worker \
--model "$MODEL_NAME"
```
### Terminal 5: Start Frontend
`--router-mode round-robin` is used here rather than `--router-mode kv` because the MM router worker will be the one handling the KV routing logic. If there are multiple replicas of the MM router worker, the frontend will route in round-robin order between them. The MM router worker itself will perform KV-aware routing to the vLLM backend workers.
```bash
cd "$DYNAMO_ROOT"
export DYN_NAMESPACE=dynamo
export DYN_REQUEST_PLANE=nats
export NATS_SERVER=nats://127.0.0.1:4222
export ETCD_ENDPOINTS=http://127.0.0.1:2379
python -m dynamo.frontend \
--http-port 8000 \
--router-mode round-robin
```
## Test Request
Send the same multimodal request twice. With two backend workers running, the
second request should typically be routed to the same backend and show higher
cache reuse in scheduler logs (and possibly higher overlap in debug routing
logs, if enabled).
```bash
MODEL="$MODEL_NAME"
IMAGE_URL="http://images.cocodataset.org/test2017/000000000001.jpg"
curl http://127.0.0.1:8000/v1/chat/completions \
-H 'Content-Type: application/json' \
--data @- <<EOF
{
"model": "${MODEL_NAME}",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image briefly."},
{"type": "image_url", "image_url": {"url": "${IMAGE_URL}"}}
]
}],
"max_tokens": 100
}
EOF
```
Run the same `curl` command again. In the MM router worker logs (terminal 4),
look for scheduler logs that show cached-block reuse.
Expected behavior:
- First request: selected worker typically has low / zero `cached blocks`
- Repeated identical request: scheduler selects a worker with higher `cached blocks`
Example (second identical request; values will vary by run):
```text
INFO dynamo_llm::kv_router::scheduler: Formula for worker_id=... with 0 cached blocks: 34.375 = 1.0 * prefill_blocks + decode_blocks = 1.0 * 17.375 + 17.000
INFO dynamo_llm::kv_router::scheduler: Formula for worker_id=... with 17 cached blocks: 17.375 = 1.0 * prefill_blocks + decode_blocks = 1.0 * 0.375 + 17.000
INFO dynamo_llm::kv_router::scheduler: Selected worker: worker_id=... dp_rank=0, logit: 17.375, cached blocks: 17, tree size: ..., total blocks: ...
DEBUG kv_router.select_worker: dynamo_llm::kv_router::push_router: [ROUTING] Best: worker_... dp_rank=0 with 17/18 blocks overlap request_id=... worker_id=... dp_rank=0 overlap_blocks=17 total_blocks=18
```
The key signal is `cached blocks: 17` on the selected worker.
If MM-aware routing and prefix reuse are working, after sending the same request twice you should typically observe:
- Scheduler logs show the selected backend has higher `cached blocks` on the second request
- If debug routing logs are enabled, they may also show a large overlap jump on the second request
- Response metadata may show prompt cache reuse on the second request (for example `usage.prompt_tokens_details.cached_tokens`)
- End-to-end latency may drop on the second request (for example lower `nvext.timing.total_time_ms`)
## Configuration
| Argument | Default | Description |
|----------|---------|-------------|
| `--model` | `Qwen/Qwen3-VL-8B-Instruct` | Model path or HuggingFace ID |
| `--block-size` | `16` | KV cache block size used for routing (must match backend) |
| `--namespace` | `default` | Dynamo namespace |
| `--component` | `mm_router` | This worker's component name |
| `--endpoint` | `generate` | This worker's endpoint name |
| `--downstream-component` | `backend` | Downstream component name (use `backend` for current `dynamo.vllm` defaults) |
| `--downstream-endpoint` | `generate` | Downstream vLLM endpoint name |
## How It Works
### MM Hash Computation
The worker computes image hashes using the same image-UUID path Dynamo uses for vLLM multimodal inputs (`compute_mm_uuids_from_images`), then converts those UUIDs into integer `mm_hash` values for routing-only block hash computation.
This ensures:
- Same image -> same `mm_hash` -> same MM-aware block hashes -> cache reuse
- Different image -> different `mm_hash` -> different block hashes -> avoid false hits
### KV-Aware Routing
The worker calls `KvRouter.generate(...)` with:
- execution payload (`token_ids`, `multi_modal_data`, etc.)
- routing payload (`mm_routing_info`)
`mm_routing_info` contains:
- `routing_token_ids`: processor-expanded routing tokens (not frontend placeholder-only tokens)
- `block_mm_infos`: per-block MM metadata
This lets `KvRouter` compute MM-aware overlap and pick the best backend worker.
### Block MM Info Structure
Each routing block gets either `None` or a multimodal descriptor:
```python
block_mm_infos = [
None,
{"mm_objects": [{"mm_hash": 12345, "offsets": []}]},
{"mm_objects": [{"mm_hash": 12345, "offsets": []}]},
]
```
For repeated identical images, multiple entries may appear in the same block when image boundaries overlap a block boundary. This matches vLLM's KV block hash boundary semantics.
## Files
| File | Description |
|------|-------------|
| `mm_router_worker.py` | Main worker (`@dynamo_worker`) and CLI |
| `handler.py` | `MMRouterHandler` routing logic |
| `mm_processor.py` | Image loading, token expansion, MM hash, block MM metadata |
| `__main__.py` | Module entry point |
## Dependencies
- `dynamo` (runtime + `KvRouter`)
- `transformers` (`AutoTokenizer`, `AutoProcessor`)
- `Pillow` (`PIL`) for image loading
- `requests` for `http(s)` image URLs
- vLLM-capable backend worker via `python -m dynamo.vllm`
## Known Limitations
- `mm_processor.py` currently only supports Qwen-style multimodal processors for per-image visual token counting (`Qwen2-VL`, `Qwen2.5-VL`, `Qwen3-VL` style processors).
#!/bin/bash
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
# Launch script for vLLM MM Router Worker demo:
# Frontend (round-robin) -> MM Router Worker -> vLLM backend
#
# This script is intended as a step-by-step runnable demo on a single machine.
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
DYNAMO_ROOT="$(cd "${SCRIPT_DIR}/../../../.." && pwd)"
cd "${DYNAMO_ROOT}"
# ---------------------------------------------------------------------------
# Configuration (override with environment variables)
# ---------------------------------------------------------------------------
MODEL="${MODEL:-Qwen/Qwen3-VL-8B-Instruct}"
NAMESPACE="${NAMESPACE:-dynamo}"
HTTP_PORT="${HTTP_PORT:-8000}"
BLOCK_SIZE="${BLOCK_SIZE:-16}" # Must match vLLM backend KV block size
GPU_MEMORY_UTILIZATION="${GPU_MEMORY_UTILIZATION:-0.85}"
MAX_MODEL_LEN="${MAX_MODEL_LEN:-8192}"
NATS_SERVER="${NATS_SERVER:-nats://127.0.0.1:4222}"
ETCD_ENDPOINTS="${ETCD_ENDPOINTS:-http://127.0.0.1:2379}"
VLLM_SYSTEM_PORT="${VLLM_SYSTEM_PORT:-18081}"
MM_ROUTER_SYSTEM_PORT="${MM_ROUTER_SYSTEM_PORT:-18082}"
MM_ROUTER_COMPONENT="${MM_ROUTER_COMPONENT:-mm_router}"
BACKEND_COMPONENT="${BACKEND_COMPONENT:-backend}" # dynamo.vllm default
# Extra args (word-splitting is intentional for shell-style overrides)
VLLM_EXTRA_ARGS="${VLLM_EXTRA_ARGS:-}"
FRONTEND_EXTRA_ARGS="${FRONTEND_EXTRA_ARGS:-}"
MM_ROUTER_EXTRA_ARGS="${MM_ROUTER_EXTRA_ARGS:-}"
echo "=== vLLM MM Router Worker Launch Script ==="
echo "Working directory: ${DYNAMO_ROOT}"
echo "MODEL=${MODEL}"
echo "NAMESPACE=${NAMESPACE}"
echo "HTTP_PORT=${HTTP_PORT}"
echo "BLOCK_SIZE=${BLOCK_SIZE}"
echo "NATS_SERVER=${NATS_SERVER}"
echo "ETCD_ENDPOINTS=${ETCD_ENDPOINTS}"
echo "VLLM_SYSTEM_PORT=${VLLM_SYSTEM_PORT}"
echo "MM_ROUTER_SYSTEM_PORT=${MM_ROUTER_SYSTEM_PORT}"
echo
PIDS=()
cleanup() {
echo
echo "Cleaning up background processes..."
for pid in "${PIDS[@]:-}"; do
kill "${pid}" 2>/dev/null || true
done
wait 2>/dev/null || true
}
trap cleanup EXIT INT TERM
wait_ready() {
local url="$1"
local name="$2"
local timeout_s="${3:-240}"
local deadline=$((SECONDS + timeout_s))
echo "Waiting for ${name} at ${url} ..."
while (( SECONDS < deadline )); do
if curl -fsS "${url}" 2>/dev/null | grep -q '"status"[[:space:]]*:[[:space:]]*"ready"'; then
echo "${name} is ready"
return 0
fi
sleep 1
done
echo "Timed out waiting for ${name} (${url})" >&2
return 1
}
wait_frontend_models() {
local url="$1"
local timeout_s="${2:-240}"
local deadline=$((SECONDS + timeout_s))
echo "Waiting for frontend models API at ${url} ..."
while (( SECONDS < deadline )); do
if curl -fsS "${url}" >/dev/null 2>&1; then
echo "Frontend is ready"
return 0
fi
sleep 1
done
echo "Timed out waiting for frontend (${url})" >&2
return 1
}
echo "Prerequisite: start etcd and NATS yourself before running this script."
echo "Example:"
echo " docker compose -f deploy/docker-compose.yml up -d"
echo
COMMON_ENV=(
"DYN_NAMESPACE=${NAMESPACE}"
"DYN_REQUEST_PLANE=nats"
"NATS_SERVER=${NATS_SERVER}"
"ETCD_ENDPOINTS=${ETCD_ENDPOINTS}"
)
echo
echo "=== Starting vLLM backend worker ==="
# Use an internal served-model-name so frontend traffic goes to the MM router
# (which registers the public model name) instead of directly to the backend.
env "${COMMON_ENV[@]}" \
"DYN_SYSTEM_PORT=${VLLM_SYSTEM_PORT}" \
python -m dynamo.vllm \
--model "${MODEL}" \
--enable-multimodal \
--block-size "${BLOCK_SIZE}" \
--enforce-eager \
--gpu-memory-utilization "${GPU_MEMORY_UTILIZATION}" \
--max-model-len "${MAX_MODEL_LEN}" \
--served-model-name "${MODEL}__internal" \
${VLLM_EXTRA_ARGS} &
PIDS+=($!)
wait_ready "http://127.0.0.1:${VLLM_SYSTEM_PORT}/health" "vLLM backend" 900
echo
echo "=== Starting vLLM MM Router Worker ==="
env "${COMMON_ENV[@]}" \
"DYN_LOG=debug" \
"DYN_SYSTEM_PORT=${MM_ROUTER_SYSTEM_PORT}" \
'DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS=["generate"]' \
python -m examples.backends.vllm.mm_router_worker \
--model "${MODEL}" \
--namespace "${NAMESPACE}" \
--component "${MM_ROUTER_COMPONENT}" \
--endpoint generate \
--downstream-component "${BACKEND_COMPONENT}" \
--downstream-endpoint generate \
--block-size "${BLOCK_SIZE}" \
${MM_ROUTER_EXTRA_ARGS} &
PIDS+=($!)
wait_ready "http://127.0.0.1:${MM_ROUTER_SYSTEM_PORT}/health" "MM router" 300
echo
echo "=== Starting frontend ==="
env "${COMMON_ENV[@]}" \
"DYN_LOG=info" \
python -m dynamo.frontend \
--http-port "${HTTP_PORT}" \
--router-mode round-robin \
${FRONTEND_EXTRA_ARGS} &
PIDS+=($!)
wait_frontend_models "http://127.0.0.1:${HTTP_PORT}/v1/models" 300
echo
echo "=== All services are ready ==="
echo "Frontend: http://127.0.0.1:${HTTP_PORT}"
echo "MM Router: http://127.0.0.1:${MM_ROUTER_SYSTEM_PORT}/health"
echo "vLLM backend:http://127.0.0.1:${VLLM_SYSTEM_PORT}/health"
echo
echo "Try the same multimodal request twice and compare MM router logs for:"
echo ' [ROUTING] Best: worker_... with X/Y blocks overlap'
echo
echo "Example:"
echo " curl http://127.0.0.1:${HTTP_PORT}/v1/chat/completions \\"
echo " -H 'Content-Type: application/json' \\"
echo " -d '{\"model\":\"${MODEL}\",\"messages\":[{\"role\":\"user\",\"content\":[{\"type\":\"text\",\"text\":\"Describe this image\"},{\"type\":\"image_url\",\"image_url\":{\"url\":\"http://images.cocodataset.org/test2017/000000000001.jpg\"}}]}],\"max_tokens\":32}'"
echo
echo "Press Ctrl+C to stop all services"
wait
......@@ -125,7 +125,9 @@ def build_block_mm_infos(
mm_objects = [
{"mm_hash": mm_hash, "offsets": []}
for mm_hash, (img_start, img_end) in zip(mm_hashes, image_ranges)
if block_end > img_start and block_start < img_end
# FIXME: Revisit the bounds checks here
# https://github.com/ai-dynamo/dynamo/issues/6588
if block_end > img_start and block_start <= img_end
]
result.append({"mm_objects": mm_objects} if mm_objects else None)
......
......@@ -10,11 +10,11 @@ overlap, and forwards the request to that worker.
Usage:
python -m examples.backends.vllm.mm_router_worker \
--model Qwen/Qwen2.5-VL-7B-Instruct \
--model Qwen/Qwen3-VL-8B-Instruct \
--namespace default \
--component mm_router \
--endpoint generate \
--downstream-component VllmWorker \
--downstream-component backend \
--downstream-endpoint generate
"""
......@@ -44,13 +44,13 @@ def parse_args() -> argparse.Namespace:
parser.add_argument(
"--model",
type=str,
default="Qwen/Qwen2.5-VL-7B-Instruct",
default="Qwen/Qwen3-VL-8B-Instruct",
help="Model path or HuggingFace model ID",
)
parser.add_argument(
"--block-size",
type=int,
default=32,
default=16,
help="KV cache block size",
)
......@@ -78,7 +78,7 @@ def parse_args() -> argparse.Namespace:
parser.add_argument(
"--downstream-component",
type=str,
default="VllmWorker",
default="backend",
help="Downstream vLLM workers' component name",
)
parser.add_argument(
......
......@@ -30,10 +30,10 @@ from tests.utils.managed_process import ManagedProcess
from tests.utils.payloads import check_models_api
from tests.utils.port_utils import allocate_ports
VLLM_MM_MODEL = "Qwen/Qwen2.5-VL-7B-Instruct"
VLLM_MM_MODEL = os.getenv("DYN_TEST_VLLM_MM_MODEL", "Qwen/Qwen3-VL-2B-Instruct")
BLOCK_SIZE = 16
NAMESPACE = "dynamo"
THREE_IMAGE_TOTAL_BLOCKS_RANGE = (200, 340)
THREE_IMAGE_TOTAL_BLOCKS_RANGE = (180, 340)
SINGLE_IMAGE_TOTAL_BLOCKS_RANGE = (60, 160)
pytestmark = [
......@@ -480,6 +480,7 @@ def test_vllm_mm_overlap_staircase_single_to_double_to_triple_identical_image(
overlap_1, total_1, _ = _send_request_get_overlap(
frontend_port, router_proc, payload_single, "staircase_1x_image"
)
time.sleep(1)
overlap_2, total_2, segment_2 = _send_request_get_overlap(
frontend_port, router_proc, payload_double, "staircase_2x_image"
)
......@@ -493,13 +494,20 @@ def test_vllm_mm_overlap_staircase_single_to_double_to_triple_identical_image(
f"1x={overlap_1}/{total_1}, 2x={overlap_2}/{total_2}.\n"
f"Recent router logs:\n{segment_2[-4000:]}"
)
assert abs(overlap_3 - overlap_2) <= 1, (
"Expected first 3-image request overlap to stay near 2-image overlap "
"(third-image suffix is cold on first 3-image request), got "
assert overlap_3 > overlap_2, (
"Expected overlap to increase from 2 images to 3 images, got "
f"2x={overlap_2}/{total_2}, 3x={overlap_3}/{total_3}.\n"
f"Recent router logs:\n{segment_3[-4000:]}"
)
delta21 = overlap_2 - overlap_1
delta32 = overlap_3 - overlap_2
assert abs(delta32 - delta21) <= 4, (
"Expected similar overlap increment per additional identical image, got "
f"step(1->2)={delta21}, step(2->3)={delta32}.\n"
f"Recent router logs:\n{segment_3[-4000:]}"
)
total_step_12 = total_2 - total_1
total_step_23 = total_3 - total_2
assert abs(total_step_12 - total_step_23) <= 4, (
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment