@@ -143,4 +143,71 @@ The `router_temperature` parameter controls routing randomness:
...
@@ -143,4 +143,71 @@ The `router_temperature` parameter controls routing randomness:
3. Adjust `kv-overlap-score-weight` to meet your performance goals:
3. Adjust `kv-overlap-score-weight` to meet your performance goals:
- To reduce TTFT: Increase the weight
- To reduce TTFT: Increase the weight
- To reduce ITL: Decrease the weight
- To reduce ITL: Decrease the weight
4. If you observe severe load imbalance, increase the temperature setting
4. If you observe severe load imbalance, increase the temperature setting
\ No newline at end of file
## Using KvPushRouter Python API
Instead of launching the KV Router via command line, you can create a `KvPushRouter` object directly in Python. This allows per-request routing configuration overrides.
The `router_config_override` parameter allows you to adjust routing behavior per request without recreating the router. This is useful for implementing different routing strategies based on request characteristics.
However, for maximum performance with shared prefixes and multi-turn conversations, KV routing provides significant advantages by minimizing redundant computation.
However, for maximum performance with shared prefixes and multi-turn conversations, KV routing provides significant advantages by minimizing redundant computation.
For detailed router configuration and tuning options, see the [KV Router Documentation](../../../docs/components/router/README.md).