event-plane.md 5.04 KB
Newer Older
1
2
3
---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
4
title: Event Plane
5
6
---

7
The event plane provides Dynamo with a pub/sub layer for near real-time event exchange between components. It delivers KV cache updates, worker load metrics, and sequence tracking events, enabling features like KV-aware routing and disaggregated serving.
8

9
## When Is the Event Plane Used?
10

11
Key use cases:
12

13
14
15
- **KV cache events** -- Workers publish cache state so the router can make cache-aware scheduling decisions.
- **Worker load metrics** -- Workers report utilization so the router can balance load.
- **Sequence tracking** -- Coordinates active sequences across router replicas for fault-tolerant routing.
16

17
![Event plane architecture showing NATS and ZMQ transport options connecting Frontend, Planner, and Worker](../assets/img/event-plane-transport.svg)
18

19
## Choosing a Transport
20

21
The event plane supports two transports:
22

23
24
25
26
27
| | NATS (default) | ZMQ |
|---|---|---|
| **External infrastructure** | Requires a NATS server | None (peer-to-peer) |
| **Setup complexity** | Simple -- point at a NATS server | Automatic -- workers bind sockets and register via discovery |
| **Best for** | Large-scale deployments | Low operational overhead |
28

29
## Configuration
30

31
### Transport Selection
32

33
Set the `DYN_EVENT_PLANE` environment variable to choose a transport:
34
35

```bash
36
37
# Use NATS (default -- no need to set explicitly)
export DYN_EVENT_PLANE=nats
38

39
40
# Use ZMQ
export DYN_EVENT_PLANE=zmq
41
42
```

43
Python components also accept this as a CLI flag:
44

45
46
47
```bash
# SGLang backend
python3 -m dynamo.sglang --event-plane zmq --model Qwen/Qwen3-0.6B
48
49
50

# vLLM backend
python3 -m dynamo.vllm --event-plane zmq --model Qwen/Qwen3-0.6B
51
52
```

53
### Environment Variables
54
55
56

| Variable | Description | Default |
|----------|-------------|---------|
57
| `DYN_EVENT_PLANE` | Transport: `nats` or `zmq` | Context-dependent (see below) |
58
| `NATS_SERVER` | NATS server URL (NATS transport only) | `nats://localhost:4222` |
59

60
61
62
63
64
65
66
When `DYN_EVENT_PLANE` is not set, the default is chosen based on the discovery backend:

- `--discovery-backend file` or `mem` (local backends): defaults to **zmq** — no external services required.
- `--discovery-backend etcd` or `kubernetes` (distributed backends): defaults to **nats**.

Set `DYN_EVENT_PLANE` explicitly to override this automatic selection.

67
## NATS Transport
68

69
When using NATS (`DYN_EVENT_PLANE=nats`, or unset with a distributed backend):
70

71
72
73
- Requires a running NATS server. Set `NATS_SERVER` if it is not on `localhost:4222`.
- Events are published to NATS subjects scoped by namespace and component.
- Built-in reconnection and message buffering during brief disconnections.
74

75
Example setup:
76
77

```bash
78
79
export NATS_SERVER=nats://nats-server:4222
export DYN_EVENT_PLANE=nats
80

81
82
83
# Start workers -- explicitly enable KV event publishing
python3 -m dynamo.vllm --model Qwen/Qwen3-0.6B \
    --kv-events-config '{"publisher":"nats","topic":"kv-events","enable_kv_cache_events":true}'
84

85
86
# Start frontend -- it subscribes to events from NATS automatically
python3 -m dynamo.frontend --router-mode kv
87
88
```

89
## ZMQ Transport
90

91
When using ZMQ (`DYN_EVENT_PLANE=zmq`):
92

93
94
95
96
97
- No external server required. Each worker binds a ZMQ PUB socket and advertises its address through the discovery system.
- Subscribers automatically discover and connect to all active publishers.
- When publishers come and go (e.g., workers scaling up/down), subscribers dynamically adjust their connections.

Example setup:
98
99

```bash
100
export DYN_EVENT_PLANE=zmq
101

102
# Start workers -- each binds a ZMQ socket, registers with discovery
103
104
python3 -m dynamo.vllm --model Qwen/Qwen3-0.6B \
  --kv-events-config '{"publisher":"zmq","endpoint":"tcp://*:20080","enable_kv_cache_events":true}'
105

106
107
# Start frontend -- discovers workers and connects directly
python3 -m dynamo.frontend --router-mode kv
108
109
```

110
## Disabling the Event Plane
111

112
If you do not need KV-aware routing, you can disable the event plane entirely:
113
114

```bash
115
python3 -m dynamo.frontend --router-mode kv --no-router-kv-events
116
117
```

118
With `--no-router-kv-events`:
119

120
121
122
- The router falls back to prediction-based cache-aware routing (estimates cache state from routing decisions).
- No NATS server or ZMQ sockets are needed.
- TTL-based expiration and LRU pruning keep predicted state from growing stale.
123

124
## Deployment Modes
125

126
### Bare Metal / Local
127

128
Both transports work out of the box:
129
130

```bash
131
132
133
134
135
# NATS (requires nats-server running)
export NATS_SERVER=nats://localhost:4222

# OR ZMQ (no extra infrastructure)
export DYN_EVENT_PLANE=zmq
136
137
```

138
139
140
### Kubernetes (with Dynamo Operator)

The operator can inject `DYN_EVENT_PLANE` into pods. The same transport options apply. If using NATS, deploy a NATS server in the cluster and set `NATS_SERVER` accordingly.
141
142
143

## Related Documentation

144
145
146
147
- [Discovery Plane](discovery-plane.md) -- Service discovery and coordination (etcd, Kubernetes)
- [Distributed Runtime](distributed-runtime.md) -- Runtime architecture
- [Request Plane](request-plane.md) -- Request transport configuration
- [Fault Tolerance](../fault-tolerance/README.md) -- Failure handling