Dynamo supports multiple transport mechanisms for its request plane (the communication layer between services). You can choose from three different request plane modes based on your deployment requirements:
Dynamo supports multiple transport mechanisms for its request plane (the communication layer between services). You can choose from three different request plane modes based on your deployment requirements:
-**TCP** (default): Direct TCP connection for optimal performance
-**TCP**: Direct TCP connection for optimal performance
-**NATS**: Message broker-based request plane
-**HTTP**: HTTP/2-based request plane
-**HTTP**: HTTP/2-based request plane
This guide explains how to configure and use request plane in your Dynamo deployment.
This guide explains how to configure and use request plane in your Dynamo deployment.
...
@@ -37,16 +37,21 @@ The request plane is the transport layer that handles communication between Dyna
...
@@ -37,16 +37,21 @@ The request plane is the transport layer that handles communication between Dyna
| **TCP** | Low-latency direct communication | Direct connections, minimal overhead |
| **TCP** | Low-latency direct communication | Direct connections, minimal overhead |
| **HTTP** | Standard deployments, debugging | HTTP/2 protocol, easier observability with standard tools, widely compatible |
| **HTTP** | Standard deployments, debugging | HTTP/2 protocol, easier observability with standard tools, widely compatible |
## KV Routing and NATS
## Request Plane vs KV Event Plane
Dynamo's Key-Value (KV) cache based routing optimizes large language model inference by intelligently directing requests to workers with the most relevant KV cache data. KV-aware routing improves both Time To First Token (TTFT) through better cache locality and Inter-Token Latency (ITL) through intelligent load balancing.
Dynamo has **two independent communication planes**:
Please refer to the [KV Cache Routing documentation](../router/kv_cache_routing.md) for more details.
-**Request plane** (**`DYN_REQUEST_PLANE`**): how **RPC requests** flow between components (frontend → router → worker), via `tcp`, `http`, or `nats`.
-**KV event plane** (currently only **NATS** is supported): how **KV cache events** (and optional router replica sync) are distributed/persisted for KV-aware routing.
There are two modes of KV based routing:
**Note:** if you are using `tcp` or `http` request plane and choose to use NATS for KV events, you must still configure NATS server using `NATS_SERVER` environment variable, e.g. `NATS_SERVER=nats://nats-hostname:port`.
- Exact KV routing (needs NATS): KV routing is based KV events indexing in a radix tree scoring the best match for the request. *This requires NATS* to persist and distribute KV events across routers.
- Approximate KV routing (does not need NATS): KV routing is based on approximate load heuristics. *This does not require NATS*.
Because they are independent, you can mix them.
For example, a deployment with TCP request plane can use different KV event planes:
-**JetStream KV events**: requests use TCP, KV routing still uses NATS JetStream + object store for persistence.
-**NATS Core KV events (local indexer)**: requests use TCP, KV events use NATS Core pub/sub and persistence lives on workers.
-**no KV events**: requests use TCP and KV routing runs without events (no NATS required, but no event-backed persistence).
- Currently (HA) highly available routers require durable messages persisted in NATS message broker. If you want to completely disable NATS, KV based routing won't be available
- Multiple frontends and backends
- Need for message replay and persistence features
Limitations:
- NATS does not support payloads beyond 16MB (use TCP for larger payloads)
### Using TCP
TCP provides direct, low-latency communication between services.
TCP is the default request plane and provides direct, low-latency communication between services.
**Configuration:**
**Configuration:**
```bash
```bash
# Set request plane to TCP
# TCP is the default, so no need to set DYN_REQUEST_PLANE explicitly
@@ -47,6 +47,11 @@ The main KV-aware routing arguments:
...
@@ -47,6 +47,11 @@ The main KV-aware routing arguments:
> - **NATS Core with Local Indexer mode** (`--enable-local-indexer` on workers): State persists on workers—router rebuilds state by querying workers on startup.
> - **NATS Core with Local Indexer mode** (`--enable-local-indexer` on workers): State persists on workers—router rebuilds state by querying workers on startup.
> - **No KV events** (`--no-kv-events`): State persistence is not supported.
> - **No KV events** (`--no-kv-events`): State persistence is not supported.
>
>
> **Request plane is independent of KV event transport.**
> `DYN_REQUEST_PLANE` controls how **requests** are sent (TCP/HTTP/NATS), but KV-aware routing still uses **NATS** for KV events in both JetStream and NATS Core + Local Indexer modes.
> If you run with `DYN_REQUEST_PLANE=tcp` (or `http`) and KV events enabled (default), you must also configure NATS, e.g. `NATS_SERVER=nats://...`.
> Only `--no-kv-events` removes the NATS requirement.
>
> When `--kv-overlap-score-weight` is set to 0 or `--no-kv-events` is set, no KvIndexer will be launched to drain and process KV events. It's recommended to disable your backend workers from relaying events through `KvEventPublisher` to avoid event accumulation in JetStream. WIP to enable disabling publishing of KV events completely in these cases.
> When `--kv-overlap-score-weight` is set to 0 or `--no-kv-events` is set, no KvIndexer will be launched to drain and process KV events. It's recommended to disable your backend workers from relaying events through `KvEventPublisher` to avoid event accumulation in JetStream. WIP to enable disabling publishing of KV events completely in these cases.
>
>
> The cli args `--router-ttl`, `--router-max-tree-size`, and `--router-prune-target-ratio` control local cache management when the router operates without receiving events from workers. When KV events are enabled (default), the router relies on worker-side eviction events and these parameters are ignored.
> The cli args `--router-ttl`, `--router-max-tree-size`, and `--router-prune-target-ratio` control local cache management when the router operates without receiving events from workers. When KV events are enabled (default), the router relies on worker-side eviction events and these parameters are ignored.