The `KvPushRouter` provides additional methods for fine-grained control:
-**`best_worker_id()`**: Query which worker would be selected for given tokens without actually routing the request. Returns `(worker_id, overlap_blocks)`.
-**`get_potential_loads()`**: Get detailed load information for all workers including potential prefill tokens and active decode blocks.
-**`worker_id` parameter in `generate()`**: Force routing to a specific worker by passing `worker_id=<id>` to bypass the automatic KV-aware selection.
The `router_config_override` parameter allows you to adjust routing behavior per request without recreating the router. This is useful for implementing different routing strategies based on request characteristics.
The `router_config_override` parameter allows you to adjust routing behavior per request without recreating the router. This is useful for implementing different routing strategies based on request characteristics.
### Custom Routing Example: Minimizing TTFT
Here's an example of using `get_potential_loads()` to implement custom routing that minimizes Time To First Token (TTFT) by selecting the worker with the least prefill work:
This approach gives you complete control over routing decisions, allowing you to optimize for different metrics based on your specific requirements. As some examples:
-**Minimize TTFT**: Select worker with lowest `potential_prefill_tokens`
-**Maximize cache reuse**: Use `best_worker_id()` which considers both prefill and decode loads
-**Balance load**: Consider both `potential_prefill_tokens` and `potential_decode_blocks` together
See [KV Router Architecture](../components/router/README.md) for performance tuning details.