README.md 4.41 KB
Newer Older
Ryan Olson's avatar
Ryan Olson committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
# velo-transports

Multi-transport active message routing for distributed systems.

## Overview

`velo-transports` abstracts TCP, HTTP, NATS, gRPC, and UCX behind a unified `Transport` trait. It provides:

- **Zero-copy `Bytes`** — inbound frames share the receive buffer via `bytes::Bytes` slicing
- **Fire-and-forget with error callbacks**`send_message` is non-blocking; failures are reported through `TransportErrorHandler`
- **Priority-based peer routing** — each peer is registered with all compatible transports; the highest-priority one becomes the primary
- **3-phase graceful shutdown** — Gate → Drain → Teardown with RAII in-flight tracking
- **Peer discovery via `WorkerAddress`** — MessagePack-encoded map of transport key → endpoint bytes

## Architecture

```mermaid
graph TD
    VB[VeloBackend] -->|priority routing| TCP[TcpTransport]
    VB -->|priority routing| HTTP[HttpTransport]
    VB -->|priority routing| NATS[NatsTransport]
    VB -->|priority routing| GRPC[GrpcTransport]
    VB -->|priority routing| UCX[UcxTransport]

    TCP -->|inbound| DS[DataStreams]
    HTTP -->|inbound| DS
    NATS -->|inbound| DS
    GRPC -->|inbound| DS
    UCX -->|inbound| DS

    DS -->|message_stream| APP[Application]
    DS -->|response_stream| APP
    DS -->|event_stream| APP
```

`VeloBackend` is the central orchestrator. It starts each transport, builds a composite `WorkerAddress` advertising all endpoints, and manages the peer registry. Inbound frames arrive via three independent `flume` channels in `DataStreams`.

## Feature Flags

| Feature | Default | Dependencies | Description |
|---------|---------|-------------|-------------|
| `http`  | ✓ | axum, reqwest | HTTP transport with Axum server |
| `nats`  | ✓ | async-nats, bs58 | NATS pub-sub transport |
| `grpc`  | ✓ | tonic, prost, tower | gRPC bidirectional streaming |
| `ucx`   | ✗ | async-ucx | UCX/RDMA transport (requires UCX libraries) |

TCP transport is always available (no feature gate).

## Transport Summary

| Transport | Protocol | Framing | Key Properties |
|-----------|----------|---------|----------------|
| **TCP** | Raw TCP | 11-byte preamble + header + payload | Zero-copy codec, DashMap connection pool, CPU pinning, keepalive |
| **HTTP** | HTTP/1.1 POST | Base64 header in `X-Transport-Header`, raw body | Fire-and-forget (202 Accepted), Axum server, drain → 503 |
| **NATS** | NATS pub-sub | Base64 header in NATS HeaderMap, raw payload | Subject scheme `velo.{b58}.{type}`, request/reply health, drain via unsub |
| **gRPC** | HTTP/2 streaming | Protobuf `FramedData` wrapper (preamble + header + payload) | Bidirectional streaming, tonic channels, exponential backoff reconnect |
| **UCX** | RDMA Active Messages | 3 fixed lanes (msg/resp/event) | `LocalSet` thread for `Rc<Worker>`, lazy endpoints, zero-copy RDMA |

## Shutdown Model

Graceful shutdown follows three phases:

1. **Gate**`begin_drain()` flips an atomic flag. Transports reject new inbound `Message` frames (TCP sends `ShuttingDown` response, HTTP returns 503, NATS unsubscribes from message subject).
2. **Drain**`wait_for_drain()` blocks until all `InFlightGuard`s are dropped. Policy is either `WaitForever` or `Timeout(Duration)`.
3. **Teardown** — Cancel the teardown token, stopping all listener loops and writer tasks. Call `shutdown()` on each transport to clean up connections.

Response, Ack, and Event frames continue flowing during drain so in-flight work can complete.

## Wire Format (TCP)

```text
┌──────────────┬───────────┬──────────────┬───────────────┬────────┬─────────┐
│ version (2B) │ type (1B) │ hdr_len (4B) │ pay_len (4B)  │ header │ payload │
│   u16 BE     │   u8      │   u32 BE     │   u32 BE      │ bytes  │ bytes   │
└──────────────┴───────────┴──────────────┴───────────────┴────────┴─────────┘
```

- **version**: Schema version (currently 1)
- **type**: `Message(0)`, `Response(1)`, `Ack(2)`, `Event(3)`, `ShuttingDown(4)`
- **hdr_len / pay_len**: Lengths of the following header and payload sections
- **Max frame size**: 16 MB

The gRPC transport wraps the same preamble + header + payload in a Protobuf `FramedData` message.