Dynamo supports multiple transport mechanisms for its request plane (the communication layer between services). You can choose from three different request plane modes based on your deployment requirements:
-**TCP**: Direct TCP connection for optimal performance
-**TCP** (default): Direct TCP connection for optimal performance
-**NATS**: Message broker-based request plane
-**HTTP**: HTTP/2-based request plane
This guide explains how to configure and use request plane in your Dynamo deployment.
...
...
@@ -37,16 +37,21 @@ The request plane is the transport layer that handles communication between Dyna
| **TCP** | Low-latency direct communication | Direct connections, minimal overhead |
| **HTTP** | Standard deployments, debugging | HTTP/2 protocol, easier observability with standard tools, widely compatible |
## KV Routing and NATS
## Request Plane vs KV Event Plane
Dynamo's Key-Value (KV) cache based routing optimizes large language model inference by intelligently directing requests to workers with the most relevant KV cache data. KV-aware routing improves both Time To First Token (TTFT) through better cache locality and Inter-Token Latency (ITL) through intelligent load balancing.
Dynamo has **two independent communication planes**:
Please refer to the [KV Cache Routing documentation](../router/kv_cache_routing.md) for more details.
-**Request plane** (**`DYN_REQUEST_PLANE`**): how **RPC requests** flow between components (frontend → router → worker), via `tcp`, `http`, or `nats`.
-**KV event plane** (currently only **NATS** is supported): how **KV cache events** (and optional router replica sync) are distributed/persisted for KV-aware routing.
There are two modes of KV based routing:
- Exact KV routing (needs NATS): KV routing is based KV events indexing in a radix tree scoring the best match for the request. *This requires NATS* to persist and distribute KV events across routers.
**Note:** if you are using `tcp` or `http` request plane and choose to use NATS for KV events, you must still configure NATS server using `NATS_SERVER` environment variable, e.g. `NATS_SERVER=nats://nats-hostname:port`.
- Approximate KV routing (does not need NATS): KV routing is based on approximate load heuristics. *This does not require NATS*.
Because they are independent, you can mix them.
For example, a deployment with TCP request plane can use different KV event planes:
-**JetStream KV events**: requests use TCP, KV routing still uses NATS JetStream + object store for persistence.
-**NATS Core KV events (local indexer)**: requests use TCP, KV events use NATS Core pub/sub and persistence lives on workers.
-**no KV events**: requests use TCP and KV routing runs without events (no NATS required, but no event-backed persistence).
- Currently (HA) highly available routers require durable messages persisted in NATS message broker. If you want to completely disable NATS, KV based routing won't be available
- Multiple frontends and backends
- Need for message replay and persistence features
Limitations:
- NATS does not support payloads beyond 16MB (use TCP for larger payloads)
### Using TCP
### Using TCP (Default)
TCP provides direct, low-latency communication between services.
TCP is the default request plane and provides direct, low-latency communication between services.
**Configuration:**
```bash
# Set request plane to TCP
# TCP is the default, so no need to set DYN_REQUEST_PLANE explicitly
@@ -47,6 +47,11 @@ The main KV-aware routing arguments:
> - **NATS Core with Local Indexer mode** (`--enable-local-indexer` on workers): State persists on workers—router rebuilds state by querying workers on startup.
> - **No KV events** (`--no-kv-events`): State persistence is not supported.
>
> **Request plane is independent of KV event transport.**
> `DYN_REQUEST_PLANE` controls how **requests** are sent (TCP/HTTP/NATS), but KV-aware routing still uses **NATS** for KV events in both JetStream and NATS Core + Local Indexer modes.
> If you run with `DYN_REQUEST_PLANE=tcp` (or `http`) and KV events enabled (default), you must also configure NATS, e.g. `NATS_SERVER=nats://...`.
> Only `--no-kv-events` removes the NATS requirement.
>
> When `--kv-overlap-score-weight` is set to 0 or `--no-kv-events` is set, no KvIndexer will be launched to drain and process KV events. It's recommended to disable your backend workers from relaying events through `KvEventPublisher` to avoid event accumulation in JetStream. WIP to enable disabling publishing of KV events completely in these cases.
>
> The cli args `--router-ttl`, `--router-max-tree-size`, and `--router-prune-target-ratio` control local cache management when the router operates without receiving events from workers. When KV events are enabled (default), the router relies on worker-side eviction events and these parameters are ignored.