@@ -81,7 +81,7 @@ When `--router-queue-threshold` is set and the queue is active, this value shift
...
@@ -81,7 +81,7 @@ When `--router-queue-threshold` is set and the queue is active, this value shift
Expected output sequence length — the estimated number of output tokens the request will generate. The router uses this hint in two ways:
Expected output sequence length — the estimated number of output tokens the request will generate. The router uses this hint in two ways:
1.**Output block tracking**: When `--track-output-blocks` is enabled, the router adds placeholder blocks during generation and applies fractional decay based on progress toward `osl`.
1.**Output block tracking**: When `--router-track-output-blocks` is enabled, the router adds placeholder blocks during generation and applies fractional decay based on progress toward `osl`.
2.**Resource estimation**: Helps the router estimate total resource requirements when making routing decisions.
2.**Resource estimation**: Helps the router estimate total resource requirements when making routing decisions.
@@ -49,11 +49,11 @@ A request with `latency_sensitivity: 5.0` arriving at time `T` is treated as if
...
@@ -49,11 +49,11 @@ A request with `latency_sensitivity: 5.0` arriving at time `T` is treated as if
Expected output sequence length — the estimated number of output tokens the request will generate. The router uses this hint in two ways:
Expected output sequence length — the estimated number of output tokens the request will generate. The router uses this hint in two ways:
1.**Output block tracking**: When output block tracking is enabled (frontend: `--track-output-blocks`; standalone router: `--router-track-output-blocks`), the router adds placeholder blocks during generation and applies fractional decay based on progress toward `osl`. This gives the router a more accurate picture of each worker's KV cache utilization for long-running requests.
1.**Output block tracking**: When output block tracking is enabled (`--router-track-output-blocks`), the router adds placeholder blocks during generation and applies fractional decay based on progress toward `osl`. This gives the router a more accurate picture of each worker's KV cache utilization for long-running requests.
2.**Resource estimation**: Helps the router estimate total resource requirements when making routing decisions.
2.**Resource estimation**: Helps the router estimate total resource requirements when making routing decisions.
-**Type**: `u32` (optional)
-**Type**: `u32` (optional)
-**Requires**: `--track-output-blocks` (frontend) or `--router-track-output-blocks` (standalone router) for output block tracking behavior
-**Requires**: `--router-track-output-blocks` (frontend or standalone router) for output block tracking behavior