SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# Dynamo Request Planes User Guide
## Overview
Dynamo supports multiple transport mechanisms for its request plane (the communication layer between services). You can choose from three different request plane modes based on your deployment requirements:
-**TCP**: Direct TCP connection for optimal performance
-**HTTP**: HTTP/2-based request plane
This guide explains how to configure and use request plane in your Dynamo deployment.
## What is a Request Plane?
The request plane is the transport layer that handles communication between Dynamo services (e.g., frontend to backend, worker to worker). Different request planes offer different trade-offs:
| Request Plane | Suitable For | Characteristics |
|--------------|----------|-----------------|
| **NATS** | Production deployments with KV routing | Requires NATS infrastructure, provides pub/sub patterns, highest flexibility |
| **TCP** | Low-latency direct communication | Direct connections, minimal overhead |
| **HTTP** | Standard deployments, debugging | HTTP/2 protocol, easier observability with standard tools, widely compatible |
## KV Routing and NATS
Dynamo's Key-Value (KV) cache based routing optimizes large language model inference by intelligently directing requests to workers with the most relevant KV cache data. KV-aware routing improves both Time To First Token (TTFT) through better cache locality and Inter-Token Latency (ITL) through intelligent load balancing.
Please refer to the [KV Cache Routing documentation](../router/kv_cache_routing.md) for more details.
There are two modes of KV based routing:
- Exact KV routing (needs NATS): KV routing is based KV events indexing in a radix tree scoring the best match for the request. *This requires NATS* to persist and distribute KV events across routers.
- Approximate KV routing (does not need NATS): KV routing is based on approximate load heuristics. *This does not require NATS*.
## Configuration
### Environment Variable
Set the request plane mode using the `DYN_REQUEST_PLANE` environment variable:
```bash
export DYN_REQUEST_PLANE=<mode>
```
Where `<mode>` is one of:
-`nats` (default)
-`tcp`
-`http`
The value is case-insensitive.
### Default Behavior
If `DYN_REQUEST_PLANE` is not set or contains an invalid value, Dynamo defaults to `nats`.
## Usage Examples
### Using NATS (Default)
NATS is the default request plane and provides the most flexibility for complex deployments.
**Prerequisites:**
- NATS server must be running and accessible
- Configure NATS connection via standard Dynamo NATS environment variables
```bash
# Explicitly set to NATS (optional, as it's the default)
- Currently (HA) highly available routers require durable messages persisted in NATS message broker. If you want to completely disable NATS, KV based routing won't be available
- Multiple frontends and backends
- Need for message replay and persistence features
Limitations:
- NATS does not support payloads beyond 16MB (use TCP for larger payloads)
### Using TCP
TCP provides direct, low-latency communication between services.
-`DYN_HTTP_RPC_HOST`: Server host address (default: auto-detected)
-`DYN_HTTP_RPC_PORT`: Server port (default: 8888)
-`DYN_HTTP_RPC_ROOT_PATH`: Root path for RPC endpoints (default: /v1/rpc)
`DYN_HTTP2_*`: Various HTTP/2 client configuration options
-`DYN_HTTP2_MAX_FRAME_SIZE`: Maximum frame size for HTTP client (default: 1MB)
-`DYN_HTTP2_MAX_CONCURRENT_STREAMS`: Maximum concurrent streams for HTTP client (default: 1000)
-`DYN_HTTP2_POOL_MAX_IDLE_PER_HOST`: Maximum idle connections per host for HTTP client (default: 100)
-`DYN_HTTP2_POOL_IDLE_TIMEOUT_SECS`: Idle timeout for HTTP client (default: 90 seconds)
-`DYN_HTTP2_KEEP_ALIVE_INTERVAL_SECS`: Keep-alive interval for HTTP client (default: 30 seconds)
-`DYN_HTTP2_KEEP_ALIVE_TIMEOUT_SECS`: Keep-alive timeout for HTTP client (default: 10 seconds)
-`DYN_HTTP2_ADAPTIVE_WINDOW`: Enable adaptive flow control (default: true)
## Complete Example
Here's a complete example showing how to launch a Dynamo deployment with different request planes:
See [`examples/backends/vllm/launch/agg_request_planes.sh`](../../examples/backends/vllm/launch/agg_request_planes.sh) for a complete working example that demonstrates launching Dynamo with TCP, HTTP, or NATS request planes.
## Real-World Example
The Dynamo repository includes a complete example demonstrating all three request planes: