Unverified Commit 33604afa authored by Keiven C's avatar Keiven C Committed by GitHub
Browse files

test: generalize router test infrastructure and expand documentations (#7327)


Signed-off-by: default avatarKeiven Chang <keivenchang@users.noreply.github.com>
parent cb9b1cd5
...@@ -10,6 +10,57 @@ subtitle: Enable KV-aware routing using Router for Dynamo deployments ...@@ -10,6 +10,57 @@ subtitle: Enable KV-aware routing using Router for Dynamo deployments
The Dynamo KV Router intelligently routes requests by evaluating their computational costs across different workers. It considers both decoding costs (from active blocks) and prefill costs (from newly computed blocks), using KV cache overlap to minimize redundant computation. Optimizing the KV Router is critical for achieving maximum throughput and minimum latency in distributed inference setups. The Dynamo KV Router intelligently routes requests by evaluating their computational costs across different workers. It considers both decoding costs (from active blocks) and prefill costs (from newly computed blocks), using KV cache overlap to minimize redundant computation. Optimizing the KV Router is critical for achieving maximum throughput and minimum latency in distributed inference setups.
This guide helps you get started with using the Dynamo router, with further details on configuration, disaggregated serving setup, and parameter tuning. This guide helps you get started with using the Dynamo router, with further details on configuration, disaggregated serving setup, and parameter tuning.
## Deployment Modes
The Dynamo router can be deployed in several configurations. The table below shows every combination and when to use it:
| Mode | Command | Routing Logic | KV Events | Topology | Use Case |
|------|---------|---------------|-----------|----------|----------|
| **Frontend + Round-Robin** | `python -m dynamo.frontend --router-mode round-robin` | Cycles through workers | None | Aggregated | Simplest baseline; no KV awareness |
| **Frontend + Random** | `python -m dynamo.frontend --router-mode random` | Random worker selection | None | Aggregated | Stateless load balancing |
| **Frontend + KV (Aggregated)** | `python -m dynamo.frontend --router-mode kv` | KV cache overlap + load | NATS Core / JetStream / ZMQ / Approx | Aggregated | Production single-pool serving with cache reuse |
| **Frontend + KV (Disaggregated)** | `python -m dynamo.frontend --router-mode kv` with prefill + decode workers | KV cache overlap + load | NATS Core / JetStream / ZMQ / Approx | Disaggregated (prefill + decode pools) | Separate prefill/decode for large-scale serving |
| **Frontend + Direct** | `python -m dynamo.frontend --router-mode direct` | Worker ID from request hints | None | Aggregated | External orchestrator (e.g., EPP/GAIE) selects workers |
| **Standalone Router** | `python -m dynamo.router` | KV cache overlap + load | NATS Core / JetStream / ZMQ | Any | Routing without the HTTP frontend (multi-tier, custom pipelines) |
### Routing Modes (`--router-mode`)
| Mode | Value | How Workers Are Selected |
|------|-------|-------------------------|
| **Round-Robin** | `round-robin` (default) | Cycles through available workers in order |
| **Random** | `random` | Selects a random worker for each request |
| **KV** | `kv` | Evaluates KV cache overlap and decode load per worker; picks lowest cost |
| **Direct** | `direct` | Reads the target `worker_id` from the request's routing hints; no selection logic |
### KV Event Transport Modes (within `--router-mode kv`)
When using KV routing, the router needs to know what each worker has cached. There are four ways to get this information:
| Event Mode | How to Enable | Description |
|------------|---------------|-------------|
| **NATS Core (local indexer)** | Default (no extra flags) | Workers maintain a local indexer; router queries workers on startup and receives events via NATS Core |
| **JetStream (durable)** | `--router-durable-kv-events` | Events persisted in NATS JetStream; supports snapshots and durable consumers. *Deprecated.* |
| **ZMQ** | `--event-plane zmq` | Workers publish via ZMQ PUB sockets; standalone indexer aggregates events |
| **Approximate (no events)** | `--no-router-kv-events` | No events consumed; router predicts cache state from its own routing decisions with TTL-based expiration |
### Aggregated vs. Disaggregated Topology
| Topology | Workers | How It Works |
|----------|---------|--------------|
| **Aggregated** | Single pool (prefill + decode in one process) | All workers handle the full request lifecycle |
| **Disaggregated** | Separate prefill and decode pools | Frontend routes to a prefill worker first, then to a decode worker; requires workers registered with `ModelType.Prefill` |
Disaggregated mode is activated automatically when prefill workers register alongside decode workers. See [Disaggregated Serving](#disaggregated-serving) for details.
### Frontend-Embedded vs. Standalone Router
| Deployment | Process | Metrics Port | Use Case |
|------------|---------|--------------|----------|
| **Frontend-embedded** | `python -m dynamo.frontend --router-mode kv` | Frontend HTTP port (default 8000) | Standard deployment; router runs inside the frontend process |
| **Standalone** | `python -m dynamo.router` | `DYN_SYSTEM_PORT` (if set) | Multi-tier architectures, SGLang disagg prefill routing, custom pipelines |
The standalone router does not include the HTTP frontend (no `/v1/chat/completions` endpoint). It exposes only the `RouterRequestMetrics` via the system status server. See the [Standalone Router README](../../../components/src/dynamo/router/README.md).
## Quick Start ## Quick Start
### Python / CLI Deployment ### Python / CLI Deployment
......
...@@ -223,7 +223,29 @@ Suppose the backend allows 3 concurrent requests and there are 10 clients contin ...@@ -223,7 +223,29 @@ Suppose the backend allows 3 concurrent requests and there are 10 clients contin
The router exposes metrics for monitoring routing decisions and overhead. Defined in `lib/llm/src/kv_router/metrics.rs`. The router exposes metrics for monitoring routing decisions and overhead. Defined in `lib/llm/src/kv_router/metrics.rs`.
For router configuration and tuning, see the [Router Guide](../components/router/router-guide.md). For router configuration, deployment modes, and tuning, see the [Router Guide](../components/router/router-guide.md).
#### Metrics Availability by Configuration
Not all metrics appear in every deployment. The chart below shows which metric groups are **registered** and **populated** in each configuration:
| Metric Group | Frontend + KV (agg) | Frontend + KV (disagg) | Frontend + non-KV (round-robin/random/direct) | Standalone Router |
|---|---|---|---|---|
| `dynamo_component_router_*` (request metrics) | Registered and populated | Registered and populated | Registered, **always zero** | Populated (on `DYN_SYSTEM_PORT`) |
| `dynamo_router_overhead_*` (routing overhead) | Registered and populated | Registered and populated | **Not registered** | **Not created** |
| `dynamo_frontend_router_queue_*` (queue depth) | Registered; populated when `--router-queue-threshold` set | Registered; populated when `--router-queue-threshold` set | **Not registered** | **Not created** |
| `dynamo_component_kv_cache_events_applied` (indexer) | Populated when KV events are received | Populated when KV events are received | **Not registered** | Populated when KV events are received |
| `dynamo_frontend_worker_*` (per-worker load/timing) | Registered and populated | Registered and populated (`worker_type`=`prefill`/`decode`) | Registered and populated (`worker_type`=`decode`) | **Not created** |
**Key:**
- **Registered and populated**: Metric appears at `/metrics` with real values
- **Registered, always zero**: Metric appears at `/metrics` but the counter/histogram is never incremented (useful for dashboards that expect the metric to exist)
- **Not registered / Not created**: Metric does not appear at `/metrics` at all
**Scrape endpoints:**
- Frontend: `/metrics` on HTTP port (default 8000, configurable via `--http-port` or `DYN_HTTP_PORT`)
- Standalone router: `/metrics` on `DYN_SYSTEM_PORT` (must be set explicitly; default is `-1` / disabled)
- Backend workers: `/metrics` on `DYN_SYSTEM_PORT` (separate from frontend metrics)
#### Router Request Metrics (`dynamo_component_router_*`) #### Router Request Metrics (`dynamo_component_router_*`)
...@@ -242,7 +264,7 @@ All metrics carry the standard hierarchy labels (`dynamo_namespace`, `dynamo_com ...@@ -242,7 +264,7 @@ All metrics carry the standard hierarchy labels (`dynamo_namespace`, `dynamo_com
#### Per-Request Routing Overhead (`dynamo_router_overhead_*`) #### Per-Request Routing Overhead (`dynamo_router_overhead_*`)
Histograms (in milliseconds) tracking the time spent in each phase of the routing decision for every request. Registered on the frontend port (default 8000) at `/metrics` with a `router_id` label (the frontend's discovery instance ID). Histograms (in milliseconds) tracking the time spent in each phase of the routing decision for every request. Registered on the frontend port (default 8000) at `/metrics` with a `router_id` label (the frontend's discovery instance ID). These metrics are only created when the frontend has DRT discovery enabled (i.e., `--router-mode kv`); they do not appear in non-KV modes or on the standalone router.
| Metric | Type | Description | | Metric | Type | Description |
|--------|------|-------------| |--------|------|-------------|
...@@ -252,6 +274,16 @@ Histograms (in milliseconds) tracking the time spent in each phase of the routin ...@@ -252,6 +274,16 @@ Histograms (in milliseconds) tracking the time spent in each phase of the routin
| `dynamo_router_overhead_scheduling_ms` | Histogram | Time in scheduler worker selection | | `dynamo_router_overhead_scheduling_ms` | Histogram | Time in scheduler worker selection |
| `dynamo_router_overhead_total_ms` | Histogram | Total routing overhead per request | | `dynamo_router_overhead_total_ms` | Histogram | Total routing overhead per request |
#### Router Queue Metrics (`dynamo_frontend_router_queue_*`)
Gauge tracking the number of requests pending in the router's scheduler queue. Only registered when `--router-queue-threshold` is set. Labeled by `worker_type` to distinguish prefill vs. decode queues in disaggregated mode.
| Metric | Type | Description |
|--------|------|-------------|
| `dynamo_frontend_router_queue_pending_requests` | Gauge | Requests pending in the router scheduler queue |
**Labels:** `worker_type` (`prefill` or `decode`)
#### KV Indexer Metrics #### KV Indexer Metrics
Tracks KV cache events applied to the router's radix tree index. Only appears when `--router-kv-overlap-score-weight` is greater than 0 (default) and workers are publishing KV events. Will not appear if `--router-kv-overlap-score-weight 0` is set or no KV events have been received. Tracks KV cache events applied to the router's radix tree index. Only appears when `--router-kv-overlap-score-weight` is greater than 0 (default) and workers are publishing KV events. Will not appear if `--router-kv-overlap-score-weight 0` is set or no KV events have been received.
...@@ -260,11 +292,11 @@ Tracks KV cache events applied to the router's radix tree index. Only appears wh ...@@ -260,11 +292,11 @@ Tracks KV cache events applied to the router's radix tree index. Only appears wh
|--------|------|-------------| |--------|------|-------------|
| `dynamo_component_kv_cache_events_applied` | Counter | KV cache events applied to the index | | `dynamo_component_kv_cache_events_applied` | Counter | KV cache events applied to the index |
**Additional labels:** `status` (`ok` / `error`), `event_type` (`stored` / `removed` / `cleared`) **Additional labels:** `status` (`ok` / `parent_block_not_found` / `block_not_found` / `invalid_block`), `event_type` (`stored` / `removed` / `cleared`)
#### Per-Worker Load and Timing Gauges (`dynamo_frontend_worker_*`) #### Per-Worker Load and Timing Gauges (`dynamo_frontend_worker_*`)
These appear once workers register and begin serving requests. They are registered on the frontend's local Prometheus registry (not component-scoped) and do not carry `dynamo_namespace` or `dynamo_component` labels. These appear once workers register and begin serving requests. They are registered on the frontend's local Prometheus registry (not component-scoped) and do not carry `dynamo_namespace` or `dynamo_component` labels. These metrics are frontend-only and are not available on the standalone router.
| Metric | Type | Description | | Metric | Type | Description |
|--------|------|-------------| |--------|------|-------------|
......
...@@ -23,7 +23,7 @@ from tests.router.helper import ( ...@@ -23,7 +23,7 @@ from tests.router.helper import (
wait_for_frontend_ready, wait_for_frontend_ready,
wait_for_workers_ready, wait_for_workers_ready,
) )
from tests.router.router_process import KVRouterProcess from tests.router.router_process import FrontendRouterProcess, KVRouterProcess
if TYPE_CHECKING: if TYPE_CHECKING:
from tests.conftest import NatsServer from tests.conftest import NatsServer
...@@ -46,6 +46,8 @@ def _test_router_basic( ...@@ -46,6 +46,8 @@ def _test_router_basic(
frontend_timeout: int = 120, frontend_timeout: int = 120,
store_backend: str = "etcd", store_backend: str = "etcd",
request_plane: str = "nats", request_plane: str = "nats",
router_mode: str = "kv",
enforce_disagg: bool = False,
): ):
"""Basic router test: start router, wait for workers and send concurrent requests via HTTP frontend. """Basic router test: start router, wait for workers and send concurrent requests via HTTP frontend.
...@@ -54,6 +56,9 @@ def _test_router_basic( ...@@ -54,6 +56,9 @@ def _test_router_basic(
This is a shared test implementation for both mocker and vLLM workers. This is a shared test implementation for both mocker and vLLM workers.
Always waits for workers to be properly registered before sending requests to avoid flakiness. Always waits for workers to be properly registered before sending requests to avoid flakiness.
Supports any router_mode (defaults to "kv" for existing callers).
block_size is only sent to the frontend CLI when router_mode is "kv".
Args: Args:
engine_workers: Backend worker instance ({MockerProcess, VLLMProcess, TRTLLMProcess}) (already initialized with __enter__()) engine_workers: Backend worker instance ({MockerProcess, VLLMProcess, TRTLLMProcess}) (already initialized with __enter__())
block_size: Block size for KV cache block_size: Block size for KV cache
...@@ -64,21 +69,27 @@ def _test_router_basic( ...@@ -64,21 +69,27 @@ def _test_router_basic(
frontend_timeout: Timeout for frontend readiness check (default: 120s) frontend_timeout: Timeout for frontend readiness check (default: 120s)
store_backend: Storage backend to use ("etcd" or "file"). Defaults to "etcd". store_backend: Storage backend to use ("etcd" or "file"). Defaults to "etcd".
request_plane: Request plane to use ("nats", "tcp", or "http"). Defaults to "nats". request_plane: Request plane to use ("nats", "tcp", or "http"). Defaults to "nats".
router_mode: Router mode ("kv", "round-robin", "random", "direct"). Defaults to "kv".
enforce_disagg: Whether to pass --enforce-disagg to the frontend. Defaults to False.
Raises: Raises:
AssertionError: If requests fail or frontend doesn't become ready AssertionError: If requests fail or frontend doesn't become ready
TimeoutError: If frontend doesn't become ready within timeout TimeoutError: If frontend doesn't become ready within timeout
""" """
with KVRouterProcess( with FrontendRouterProcess(
request, request,
block_size, block_size,
frontend_port, frontend_port,
engine_workers.namespace, engine_workers.namespace,
store_backend, store_backend,
enforce_disagg=enforce_disagg,
request_plane=request_plane, request_plane=request_plane,
router_mode=router_mode,
): ):
# Start KV router frontend # Start router frontend
logger.info(f"Starting KV router frontend on port {frontend_port}") logger.info(
f"Starting frontend --router-mode {router_mode} on port {frontend_port}"
)
frontend_url = f"http://localhost:{frontend_port}" frontend_url = f"http://localhost:{frontend_port}"
......
...@@ -6,8 +6,13 @@ import os ...@@ -6,8 +6,13 @@ import os
from tests.utils.managed_process import ManagedProcess from tests.utils.managed_process import ManagedProcess
class KVRouterProcess(ManagedProcess): class FrontendRouterProcess(ManagedProcess):
"""Manages the KV router process using dynamo.frontend""" """Manages a dynamo.frontend process with configurable --router-mode.
Supports all router modes (round-robin, random, kv, direct) and all
KV-specific options (block size, thresholds, durable events, disagg).
block_size is only sent to the CLI when router_mode is "kv".
"""
def __init__( def __init__(
self, self,
...@@ -22,15 +27,14 @@ class KVRouterProcess(ManagedProcess): ...@@ -22,15 +27,14 @@ class KVRouterProcess(ManagedProcess):
tokens_threshold_frac: float | None = None, tokens_threshold_frac: float | None = None,
request_plane: str = "nats", request_plane: str = "nats",
durable_kv_events: bool = False, durable_kv_events: bool = False,
router_mode: str = "kv",
): ):
command = [ command = [
"python3", "python3",
"-m", "-m",
"dynamo.frontend", "dynamo.frontend",
"--kv-cache-block-size",
str(block_size),
"--router-mode", "--router-mode",
"kv", router_mode,
"--http-port", "--http-port",
str(frontend_port), str(frontend_port),
"--discovery-backend", "--discovery-backend",
...@@ -39,6 +43,9 @@ class KVRouterProcess(ManagedProcess): ...@@ -39,6 +43,9 @@ class KVRouterProcess(ManagedProcess):
namespace, namespace,
] ]
if router_mode == "kv":
command.extend(["--kv-cache-block-size", str(block_size)])
if enforce_disagg: if enforce_disagg:
command.append("--enforce-disagg") command.append("--enforce-disagg")
...@@ -72,10 +79,16 @@ class KVRouterProcess(ManagedProcess): ...@@ -72,10 +79,16 @@ class KVRouterProcess(ManagedProcess):
terminate_all_matching_process_names=False, terminate_all_matching_process_names=False,
) )
self.port = frontend_port self.port = frontend_port
self.router_mode = router_mode
def _check_ready(self, response): def _check_ready(self, response):
"""Check if KV router is ready""" """Check if KV, random, round-robin, or direct router is ready"""
return response.status_code == 200 return response.status_code == 200
def __exit__(self, exc_type, exc_val, exc_tb): def __exit__(self, exc_type, exc_val, exc_tb):
super().__exit__(exc_type, exc_val, exc_tb) super().__exit__(exc_type, exc_val, exc_tb)
# Backward-compatible alias so existing callers that import KVRouterProcess
# continue to work without changes.
KVRouterProcess = FrontendRouterProcess
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0 # SPDX-License-Identifier: Apache-2.0
# Parallelization: Hermetic tests (xdist-safe via dynamic ports + per-test namespaces). # NOTE: These tests run reliably in serial but have encountered intermittent failures
# Tested on: Linux container. # under pytest-xdist parallel execution (-n auto). Each test spawns its own
# Combined pre_merge wall time (this file): # DistributedRuntime with isolated etcd/NATS and unique namespaces, but the Rust
# - Serialized: 304.01s. # runtime may use process-global state (e.g. lazy_static / OnceLock singletons for
# - Parallel (-n auto): 34.55s (269.46s saved, 8.80x). # endpoint tables) that races under concurrent xdist workers. Do not add
# @pytest.mark.parallel until DRT endpoint registration is confirmed thread-safe.
# #
# NOTE: TCP request plane is NOT tested here. These tests use --num-workers > 1 which spawns # NOTE: TCP request plane is NOT tested here. These tests use --num-workers > 1 which spawns
# multiple workers in a single process sharing one TCP server. The shared TCP server uses # multiple workers in a single process sharing one TCP server. The shared TCP server uses
...@@ -637,25 +638,33 @@ class DisaggMockerProcess: ...@@ -637,25 +638,33 @@ class DisaggMockerProcess:
@pytest.mark.timeout(120) # bumped for xdist contention (was 42s; ~13.80s serial avg) @pytest.mark.timeout(120) # bumped for xdist contention (was 42s; ~13.80s serial avg)
@pytest.mark.parametrize("request_plane", ["nats", "tcp"], indirect=True)
@pytest.mark.parametrize( @pytest.mark.parametrize(
"durable_kv_events", [False], indirect=True "router_mode,durable_kv_events",
) # Use NATS Core (local indexer) [
def test_mocker_kv_router( pytest.param("kv", False, id="kv-nondurable"),
pytest.param("kv", True, id="kv-durable"),
pytest.param("round-robin", False, id="roundrobin"),
pytest.param("random", False, id="random"),
],
indirect=["durable_kv_events"],
)
@pytest.mark.parametrize("request_plane", ["nats", "tcp"], indirect=True)
def test_mocker_router(
request, request,
runtime_services_dynamic_ports, runtime_services_dynamic_ports,
predownload_tokenizers, predownload_tokenizers,
router_mode,
request_plane, request_plane,
durable_kv_events, durable_kv_events,
): ):
""" """Test router with multiple mocker engine instances across all router modes.
Test KV router with multiple mocker engine instances.
This test doesn't require GPUs and runs quickly for pre-merge validation.
Tests both NATS and TCP request planes.
"""
Covers kv, round-robin, and random routing. Tests both NATS and TCP request planes.
"""
# runtime_services starts etcd and optionally nats based on request_plane # runtime_services starts etcd and optionally nats based on request_plane
logger.info(f"Starting mocker KV router test with request_plane={request_plane}") logger.info(
f"Starting mocker router test: router_mode={router_mode}, request_plane={request_plane}"
)
# Create mocker args dictionary - use local indexer (NATS Core mode) # Create mocker args dictionary - use local indexer (NATS Core mode)
mocker_args = { mocker_args = {
...@@ -688,12 +697,13 @@ def test_mocker_kv_router( ...@@ -688,12 +697,13 @@ def test_mocker_kv_router(
test_payload=TEST_PAYLOAD, test_payload=TEST_PAYLOAD,
num_requests=NUM_REQUESTS, num_requests=NUM_REQUESTS,
request_plane=request_plane, request_plane=request_plane,
router_mode=router_mode,
) )
@pytest.mark.parametrize("store_backend", ["etcd", "file"]) @pytest.mark.parametrize("store_backend", ["etcd", "file"])
@pytest.mark.parametrize( @pytest.mark.parametrize(
"durable_kv_events", [False], indirect=True "durable_kv_events", [False], ids=["nondurable"], indirect=True
) # Use NATS Core (local indexer) ) # Use NATS Core (local indexer)
@pytest.mark.timeout(180) # bumped for xdist contention (was 60s; ~19.86s serial avg) @pytest.mark.timeout(180) # bumped for xdist contention (was 60s; ~19.86s serial avg)
def test_mocker_two_kv_router( def test_mocker_two_kv_router(
...@@ -752,7 +762,7 @@ def test_mocker_two_kv_router( ...@@ -752,7 +762,7 @@ def test_mocker_two_kv_router(
@pytest.mark.skip(reason="Flaky, temporarily disabled") @pytest.mark.skip(reason="Flaky, temporarily disabled")
@pytest.mark.parametrize( @pytest.mark.parametrize(
"durable_kv_events", [False], indirect=True "durable_kv_events", [False], ids=["nondurable"], indirect=True
) # Use NATS Core (local indexer) ) # Use NATS Core (local indexer)
@pytest.mark.timeout(60) # ~3x average (~19.86s), rounded up (when enabled) @pytest.mark.timeout(60) # ~3x average (~19.86s), rounded up (when enabled)
def test_mocker_kv_router_overload_503( def test_mocker_kv_router_overload_503(
...@@ -790,7 +800,7 @@ def test_mocker_kv_router_overload_503( ...@@ -790,7 +800,7 @@ def test_mocker_kv_router_overload_503(
@pytest.mark.timeout(90) # bumped for xdist contention (was 22s; ~7.10s serial avg) @pytest.mark.timeout(90) # bumped for xdist contention (was 22s; ~7.10s serial avg)
@pytest.mark.parametrize("request_plane", ["nats", "tcp"], indirect=True) @pytest.mark.parametrize("request_plane", ["nats", "tcp"], indirect=True)
@pytest.mark.parametrize( @pytest.mark.parametrize(
"durable_kv_events", [False], indirect=True "durable_kv_events", [False], ids=["nondurable"], indirect=True
) # Use NATS Core (local indexer) ) # Use NATS Core (local indexer)
def test_kv_router_bindings( def test_kv_router_bindings(
request, request,
...@@ -922,7 +932,7 @@ def test_indexers_sync( ...@@ -922,7 +932,7 @@ def test_indexers_sync(
@pytest.mark.timeout(120) # bumped for xdist contention (was 42s; ~13.80s serial avg) @pytest.mark.timeout(120) # bumped for xdist contention (was 42s; ~13.80s serial avg)
@pytest.mark.parametrize( @pytest.mark.parametrize(
"durable_kv_events", [False], indirect=True "durable_kv_events", [False], ids=["nondurable"], indirect=True
) # Use NATS Core (local indexer) ) # Use NATS Core (local indexer)
def test_query_instance_id_returns_worker_and_tokens( def test_query_instance_id_returns_worker_and_tokens(
request, runtime_services_dynamic_ports, predownload_tokenizers, durable_kv_events request, runtime_services_dynamic_ports, predownload_tokenizers, durable_kv_events
...@@ -1155,7 +1165,7 @@ def test_router_decisions_disagg( ...@@ -1155,7 +1165,7 @@ def test_router_decisions_disagg(
@pytest.mark.parametrize("request_plane", ["nats", "tcp"], indirect=True) @pytest.mark.parametrize("request_plane", ["nats", "tcp"], indirect=True)
@pytest.mark.parametrize( @pytest.mark.parametrize(
"durable_kv_events", [False], indirect=True "durable_kv_events", [False], ids=["nondurable"], indirect=True
) # Use NATS Core (local indexer) ) # Use NATS Core (local indexer)
@pytest.mark.timeout(120) # bumped for xdist contention (was 39s; ~12.84s serial avg) @pytest.mark.timeout(120) # bumped for xdist contention (was 39s; ~12.84s serial avg)
def test_busy_threshold_endpoint( def test_busy_threshold_endpoint(
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment