standalone-indexer.md 22.9 KB
Newer Older
1
2
3
4
---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
title: Standalone KV Indexer
5
subtitle: Run the KV cache indexer as an independent HTTP service for querying block state
6
7
8
9
---

## Overview

10
The standalone KV indexer (`python -m dynamo.indexer`) is a lightweight service that maintains a radix tree of cached blocks and exposes HTTP endpoints for querying and managing workers. It supports two operational modes:
11

12
- **Standalone mode** (default): subscribes to ZMQ KV event streams directly from workers. No Dynamo runtime discovery, registration, or event-plane integration required.
13
- **Dynamo runtime mode** (`--dynamo-runtime`): integrates with the Dynamo runtime for automatic worker discovery via MDC, KV event ingestion via the event plane (NATS or ZMQ), and overlap queries over the request plane for remote frontends.
14

15
This is distinct from the [Standalone Router](../../../components/src/dynamo/router/README.md), which is a full routing service. The standalone indexer provides only the indexing and query layer without routing logic.
16

17
18
The HTTP API follows the [Mooncake KV Indexer RFC](https://github.com/kvcache-ai/Mooncake/issues/1403) conventions.

19
20
21
22
`DYN_ROUTER_MIN_INITIAL_WORKERS` is also honored here. When set to a positive integer, the
standalone indexer waits for that many workers to register before opening its startup-ready
gate, matching the frontend/router startup behavior.

23
24
25
26
27
28
29
30
## Multi-Model and Multi-Tenant Support

The indexer maintains one radix tree per `(model_name, tenant_id)` pair. Workers registered with different model names or tenant IDs are isolated into separate indexers — queries against one model/tenant never return scores from another.

- **`model_name`** (required on `/register` and `/query`): Identifies the model. Workers serving different models get separate radix trees.
- **`tenant_id`** (optional, defaults to `"default"`): Enables multi-tenant isolation within the same model. Omit for single-tenant deployments.
- **`block_size`** is per-indexer: the first `/register` call for a given `(model_name, tenant_id)` sets the block size. Subsequent registrations for the same pair must use the same block size or the request will fail.

31
32
## Compatibility

33
34
35
In standalone mode, the indexer works with any engine that publishes KV cache events over ZMQ in the expected msgpack format. This includes bare vLLM and SGLang engines, which emit ZMQ KV events natively — no Dynamo-specific wrapper is required.

In Dynamo runtime mode, the indexer discovers workers automatically via MDC and receives KV events through the event plane. It also registers a query endpoint on the request plane, allowing frontends to query overlap scores remotely without needing direct HTTP access.
36

37
38
39
40
41
42
## Use Cases

- **Debugging**: Inspect the radix tree state to verify which blocks are cached on which workers.
- **State verification**: Confirm that the indexer's view of KV cache state matches the router's internal state (used in integration tests).
- **Custom routing**: Build external routing logic that queries the indexer for overlap scores and makes its own worker selection decisions.
- **Monitoring**: Observe KV cache distribution across workers without running a full router.
43
- **Remote indexing**: In Dynamo runtime mode, frontends can offload KV cache indexing to a dedicated service and query it over the request plane.
44

45
46
47
48
49
50
## P2P Recovery

Multiple indexer replicas can subscribe to the same ZMQ worker endpoints for fault tolerance. When a replica starts (or restarts after a crash), it bootstraps its radix tree state from a healthy peer before processing live events.

### How It Works

51
52
1. Workers are registered via `--workers` or `/register`. Each ZMQ listener enters `pending` state and begins its initial subscribe/connect attempt in the background.
2. A 1-second delay biases peer recovery past the slow-joiner window, so the dump covers events that may have occurred before a fresh listener can safely start draining.
53
54
3. The indexer fetches a `/dump` from the first reachable peer in `--peers`.
4. Dump events are applied to populate the radix tree.
55
5. After recovery completes, the ready gate opens. Any listener whose initial ZMQ connect has already succeeded transitions to `active` and begins draining buffered events; listeners for workers that are still down remain `pending` until they connect.
56
57
58
59
60
61
62

If no peers are reachable, the indexer starts with an empty state.

### Example: Two-Replica Setup

```bash
# Replica A (first instance, no peers)
63
python -m dynamo.indexer --port 8090 --block-size 16 \
64
65
66
  --workers "1=tcp://worker1:5557,2=tcp://worker2:5558"

# Replica B (recovers from A on startup)
67
python -m dynamo.indexer --port 8091 --block-size 16 \
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
  --workers "1=tcp://worker1:5557,2=tcp://worker2:5558" \
  --peers "http://localhost:8090"
```

Both replicas subscribe to the same workers. Replica B recovers A's tree state on startup, then both independently process live ZMQ events going forward.

### Consistency

The dump is a weakly consistent BFS snapshot of the radix tree — concurrent writes may race with the traversal. This is acceptable because:

- **Stale blocks** (partially removed branches): live `Remove` events will clean them up.
- **Missing blocks** (partially added branches): live `Stored` events will add them.
- The tree converges to the correct state after live events catch up.

### Peer Management

Peers can be registered at startup via `--peers` or dynamically via the HTTP API. The peer list is used for recovery only — peers do not synchronize state in real time.

86
87
## Building

88
The service is exposed through the Python bindings package and launched with `python -m dynamo.indexer` after building the bindings with maturin. Feature flags control which capabilities are compiled in:
89
90
91

| Feature | Description |
|---------|-------------|
92
| `kv-indexer` | Core standalone indexer service path (`python -m dynamo.indexer`: HTTP API, ZMQ listeners, P2P recovery) |
93
94
| `kv-indexer-metrics` | Optional `/metrics` endpoint |
| `kv-indexer-runtime` | Dynamo runtime integration (`--dynamo-runtime`, discovery, event plane, request plane) |
95

96
### Standalone build
97
98

```bash
99
cd lib/bindings/python && VIRTUAL_ENV=../../.venv ../../.venv/bin/maturin develop --uv --features kv-indexer
100
101
```

102
After installation, launch the service with `python -m dynamo.indexer`.
103
104
105
106

### Standalone build with metrics

```bash
107
cd lib/bindings/python && VIRTUAL_ENV=../../.venv ../../.venv/bin/maturin develop --uv --features kv-indexer,kv-indexer-metrics
108
109
```

110
This keeps the default `kv-indexer` build lean while still allowing Prometheus metrics when needed.
111
112
113
114

### Runtime-enabled build

```bash
115
cd lib/bindings/python && VIRTUAL_ENV=../../.venv ../../.venv/bin/maturin develop --uv --features kv-indexer,kv-indexer-runtime
116
117
```

118
This enables the `--dynamo-runtime` CLI flag for MDC discovery, event-plane subscription, and request-plane queries. It also includes the metrics endpoint.
119

120
121
## CLI

122
123
### Standalone mode (default)

124
```bash
125
python -m dynamo.indexer --port 8090 [--threads 4] [--block-size 16 --model-name my-model --tenant-id default --workers "1=tcp://host:5557,2:1=tcp://host:5558"] [--peers "http://peer1:8090,http://peer2:8091"]
126
127
```

128
### Dynamo runtime mode
129
130

```bash
131
python -m dynamo.indexer --dynamo-runtime --namespace default --component-name kv-indexer --worker-component backend --port 8090 [--threads 4]
132
133
134
135
```

In runtime mode, workers are discovered automatically via MDC. The `--workers` flag can still be used to register additional static workers alongside discovered ones.

136
137
| Flag | Default | Description |
|------|---------|-------------|
138
| `--block-size` | (none) | KV cache block size for initial `--workers` (required when `--workers` is set) |
139
| `--port` | `8090` | HTTP server listen port |
140
141
| `--threads` | `4` | Number of indexer threads (1 = single-threaded, >1 = thread pool) |
| `--workers` | (none) | Initial workers as `instance_id[:dp_rank]=zmq_address,...` pairs (dp_rank defaults to 0) |
142
143
| `--model-name` | `default` | Model name for initial `--workers` |
| `--tenant-id` | `default` | Tenant ID for initial `--workers` |
144
| `--peers` | (none) | Comma-separated peer indexer URLs for P2P recovery on startup |
145
146
147
148
| `--dynamo-runtime` | `false` | Enable Dynamo runtime integration (requires `kv-indexer-runtime`) |
| `--namespace` | `default` | Dynamo namespace to register the indexer component under |
| `--component-name` | `kv-indexer` | Component name for this indexer in the Dynamo runtime |
| `--worker-component` | `backend` | Component name that workers register under for event-plane subscription |
149

150
151
152
153
154
155
### Shared Startup Gate

Set `DYN_ROUTER_MIN_INITIAL_WORKERS=<n>` to require at least `<n>` workers before the
standalone indexer, frontend push-router path, and KV router config-ready gate all proceed.
Leave it unset or set it to `0` to disable the startup wait.

156
157
## HTTP API

158
159
160
161
162
163
164
165
166
167
### `GET /health` — Liveness check

Returns `200 OK` unconditionally.

```bash
curl http://localhost:8090/health
```

### `GET /metrics` — Prometheus metrics

168
Returns metrics in Prometheus text exposition format. Available when the Python bindings are built with the `kv-indexer-metrics` or `kv-indexer-runtime` feature.
169
170
171
172
173
174
175
176
177
178
179
180

```bash
curl http://localhost:8090/metrics
```

| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `dynamo_kvindexer_request_duration_seconds` | Histogram | `endpoint` | HTTP request latency |
| `dynamo_kvindexer_requests_total` | Counter | `endpoint`, `method` | Total HTTP requests |
| `dynamo_kvindexer_errors_total` | Counter | `endpoint`, `status_class` | HTTP error responses (4xx/5xx) |
| `dynamo_kvindexer_models` | Gauge | — | Number of active model+tenant indexers |
| `dynamo_kvindexer_workers` | Gauge | — | Number of registered worker instances |
181
| `dynamo_kvindexer_listeners` | Gauge | `status` | Number of ZMQ listeners by status (`pending`, `active`, `paused`, `failed`) |
182

183
184
### `POST /register` — Register an endpoint

185
Register a ZMQ endpoint for an instance. Each call creates or reuses the indexer for the given `(model_name, tenant_id)` pair.
186
Registration is non-blocking: if the worker is not up yet, the listener is accepted in `pending` state and transitions to `active` once the initial ZMQ connection succeeds.
187
188

```bash
189
# Single model, default tenant
190
191
curl -X POST http://localhost:8090/register \
  -H 'Content-Type: application/json' \
192
193
194
195
196
197
198
199
  -d '{
    "instance_id": 1,
    "endpoint": "tcp://127.0.0.1:5557",
    "model_name": "llama-3-8b",
    "block_size": 16
  }'

# With tenant isolation
200
201
curl -X POST http://localhost:8090/register \
  -H 'Content-Type: application/json' \
202
203
204
205
206
207
208
209
  -d '{
    "instance_id": 2,
    "endpoint": "tcp://127.0.0.1:5558",
    "model_name": "llama-3-8b",
    "tenant_id": "customer-a",
    "block_size": 16,
    "dp_rank": 0
  }'
210
211
```

212
213
214
215
216
217
218
219
| Field | Required | Default | Description |
|-------|----------|---------|-------------|
| `instance_id` | yes | — | Worker instance identifier |
| `endpoint` | yes | — | ZMQ PUB address to subscribe to |
| `model_name` | yes | — | Model name (used to select the indexer) |
| `block_size` | yes | — | KV cache block size (must match the engine) |
| `tenant_id` | no | `"default"` | Tenant identifier for isolation |
| `dp_rank` | no | `0` | Data parallel rank |
220
| `replay_endpoint` | no | — | ZMQ ROUTER address for gap replay (e.g. `tcp://host:5560`) |
221

222
### `POST /unregister` — Deregister an instance
223

224
Remove an instance. Omitting `tenant_id` removes the instance from **all** tenants for the given model; providing it targets only that tenant's indexer.
225

226
```bash
227
228
229
230
231
232
# Remove from all tenants
curl -X POST http://localhost:8090/unregister \
  -H 'Content-Type: application/json' \
  -d '{"instance_id": 1, "model_name": "llama-3-8b"}'

# Remove from a specific tenant
233
234
curl -X POST http://localhost:8090/unregister \
  -H 'Content-Type: application/json' \
235
  -d '{"instance_id": 1, "model_name": "llama-3-8b", "tenant_id": "customer-a"}'
236
237
238
239

# Remove a specific dp_rank
curl -X POST http://localhost:8090/unregister \
  -H 'Content-Type: application/json' \
240
  -d '{"instance_id": 1, "model_name": "llama-3-8b", "tenant_id": "default", "dp_rank": 0}'
241
242
```

243
244
245
246
247
248
| Field | Required | Default | Description |
|-------|----------|---------|-------------|
| `instance_id` | yes | — | Worker instance to remove |
| `model_name` | yes | — | Model name (identifies the indexer) |
| `tenant_id` | no | — | Tenant identifier (omit to remove from all tenants) |
| `dp_rank` | no | — | Specific dp_rank to remove (omit to remove all) |
249
250

### `GET /workers` — List registered instances
251

252
253
254
255
256
257
```bash
curl http://localhost:8090/workers
```

Returns:
```json
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
[
  {
    "instance_id": 1,
    "source": "zmq",
    "status": "active",
    "endpoints": {
      "0": "tcp://127.0.0.1:5557",
      "1": "tcp://127.0.0.1:5558"
    },
    "listeners": {
      "0": {
        "endpoint": "tcp://127.0.0.1:5557",
        "status": "active"
      },
      "1": {
        "endpoint": "tcp://127.0.0.1:5558",
        "status": "active"
      }
    }
  },
  {
    "instance_id": 2,
    "source": "discovery",
    "status": "active",
    "endpoints": {},
    "listeners": {}
  }
]
286
```
287

288
289
For ZMQ-managed workers, `status` is aggregated across listeners with priority `failed > pending > active > paused`. Each listener entry may also expose a `last_error` field when the most recent startup or recv-loop attempt failed.

290
### `POST /query` — Query overlap for token IDs
291

292
Given raw token IDs, compute block hashes and return per-instance overlap scores (in matched tokens):
293

294
295
296
```bash
curl -X POST http://localhost:8090/query \
  -H 'Content-Type: application/json' \
297
  -d '{"token_ids": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], "model_name": "llama-3-8b"}'
298
```
299

300
301
302
Returns:
```json
{
303
  "scores": {"1": {"0": 32}, "2": {"1": 0}},
304
305
306
  "frequencies": [1, 1],
  "tree_sizes": {"1": {"0": 5}, "2": {"1": 3}}
}
307
308
```

309
310
311
312
313
314
315
316
Scores are in **matched tokens** (block overlap count × block size). Nested by `instance_id` then `dp_rank`.

| Field | Required | Default | Description |
|-------|----------|---------|-------------|
| `token_ids` | yes | — | Token sequence to query |
| `model_name` | yes | — | Model name (selects the indexer) |
| `tenant_id` | no | `"default"` | Tenant identifier |
| `lora_name` | no | — | LoRA adapter (overrides indexer-level lora_name for this query) |
317

318
319
320
321
322
### `POST /query_by_hash` — Query overlap for pre-computed hashes

```bash
curl -X POST http://localhost:8090/query_by_hash \
  -H 'Content-Type: application/json' \
323
  -d '{"block_hashes": [123456, 789012], "model_name": "llama-3-8b"}'
324
325
```

326
327
328
329
330
331
332
Same response format as `/query`. Scores are in matched tokens.

| Field | Required | Default | Description |
|-------|----------|---------|-------------|
| `block_hashes` | yes | — | Pre-computed block hash array |
| `model_name` | yes | — | Model name (selects the indexer) |
| `tenant_id` | no | `"default"` | Tenant identifier |
333
334

### `GET /dump` — Dump all radix tree events
335

336
Returns the full radix tree state as a JSON object keyed by `model_name:tenant_id`:
337
338
339

```bash
curl http://localhost:8090/dump
340
341
```

342
343
344
Returns:
```json
{
345
346
347
348
349
350
351
352
  "llama-3-8b:default": {
    "block_size": 16,
    "events": [<RouterEvent>, ...]
  },
  "mistral-7b:customer-a": {
    "block_size": 16,
    "events": [<RouterEvent>, ...]
  }
353
354
355
}
```

356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
Each indexer is dumped concurrently. The `block_size` field lets recovering peers create indexers with the correct block size without requiring `--block-size` on every replica.

### `POST /register_peer` — Register a peer indexer

```bash
curl -X POST http://localhost:8090/register_peer \
  -H 'Content-Type: application/json' \
  -d '{"url": "http://peer:8091"}'
```

### `POST /deregister_peer` — Remove a peer indexer

```bash
curl -X POST http://localhost:8090/deregister_peer \
  -H 'Content-Type: application/json' \
  -d '{"url": "http://peer:8091"}'
```

### `GET /peers` — List registered peers

```bash
curl http://localhost:8090/peers
```

Returns:
```json
["http://peer:8091"]
```
384

385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
## DP Rank Handling

When a worker registers with the standalone KV indexer (`/register`), it provides an `instance_id`, a ZMQ `endpoint`, and an optional `dp_rank` (defaults to 0). The service spawns one ZMQ listener per registration.

Each incoming `KvEventBatch` may carry an optional `data_parallel_rank` field. If present, it **overrides** the statically-registered `dp_rank` for that batch. This allows a single ZMQ port to multiplex events from multiple DP ranks.

**Caveat**: the registry only tracks dp_ranks from explicit `/register` calls. If an engine dynamically emits batches with a dp_rank that was never registered, the indexer will store those blocks correctly (under the dynamic `WorkerWithDpRank` key), but per-dp_rank deregistration (`/unregister` with `dp_rank`) will not find them. Full-instance deregistration (`/unregister` without `dp_rank`) still cleans up all dp_ranks for a given `worker_id` in the tree via `remove_worker`.

## Gap Detection and Replay

ZMQ PUB/SUB is lossy — messages can be dropped under backpressure or brief disconnects. The indexer detects gaps by tracking the sequence number of each batch: if `seq > last_seq + 1`, a gap is detected.

When a `replay_endpoint` is provided during `/register`, the indexer connects a DEALER socket to the engine's ROUTER socket and requests the missing batches by sequence number. The engine streams back buffered `(seq, payload)` pairs from its ring buffer until an empty-payload sentinel.

If no `replay_endpoint` is configured, gaps are logged as warnings but not recovered.

The sequence counter (`last_seq`) persists across unregister/register cycles, so re-registering a worker after a gap will trigger replay on the first batch received by the new listener.

403
404
405
406
407
408
## Dynamo Runtime Mode

When started with `--dynamo-runtime`, the indexer integrates with the Dynamo distributed runtime:

### Worker Discovery

409
The indexer watches MDC (Model Discovery Catalog) for worker additions and removals. When a worker registers with MDC, the indexer automatically creates an indexer for its model and block size. Workers discovered via MDC are tracked separately from those registered via `--workers` or the `/register` HTTP API; a worker cannot be registered through both paths simultaneously.
410
411
412
413
414
415
416
417
418
419
420
421
422

### Event Plane Subscription

Instead of connecting directly to ZMQ PUB sockets on each worker, the indexer subscribes to KV events through the Dynamo event plane. The transport (NATS or ZMQ) is determined by the `DYNAMO_EVENT_TRANSPORT` environment variable. Events are routed to the appropriate indexer based on the worker ID.

### Request Plane Query Endpoint

The indexer registers a query endpoint on the Dynamo request plane, allowing frontends to send `IndexerQueryRequest` messages containing a model name, namespace, and block hashes. The indexer looks up the appropriate radix tree and returns overlap scores. This enables frontends to use a remote indexer for KV-aware routing without direct HTTP access.

### Example

```bash
# Start the indexer with runtime integration
423
python -m dynamo.indexer --dynamo-runtime \
424
425
426
427
428
429
430
431
  --namespace my-namespace \
  --component-name kv-indexer \
  --worker-component backend \
  --port 8090 --threads 4
```

The HTTP API remains fully available in runtime mode. Static workers can be added via `--workers` alongside discovered workers.

432
433
## Limitations

434
- **Standalone mode is ZMQ only**: In standalone mode, workers must publish KV events via ZMQ PUB sockets. Build with `kv-indexer-runtime` and use `--dynamo-runtime` to receive events via the event plane (NATS or ZMQ).
435
436
437
438
- **No routing logic**: The indexer only maintains the radix tree and answers queries. It does not track active blocks, manage request lifecycle, or perform worker selection.

## Architecture

439
440
### Standalone Mode

441
442
443
```mermaid
graph TD
    subgraph Workers
444
445
        W1[Worker 1<br/>ZMQ PUB]
        W2[Worker 2<br/>ZMQ PUB]
446
447
    end

448
449
450
    subgraph "Standalone Indexer (HTTP)"
        REG[Worker Registry]
        ZMQ[ZMQ SUB Listeners]
451
        IDX["Indexer Map<br/>(model, tenant) → Radix Tree"]
452
        HTTP[HTTP API<br/>/query /dump /register /health]
453
454
455
456
    end

    CLIENT[External Client]

457
458
459
460
461
462
463
    W1 -->|ZMQ events| ZMQ
    W2 -->|ZMQ events| ZMQ
    CLIENT -->|POST /register| REG
    REG -->|spawn listeners| ZMQ
    ZMQ -->|apply events| IDX
    CLIENT -->|POST /query, GET /dump| HTTP
    HTTP -->|query| IDX
464
465
466
467

    style W1 fill:#f3e5f5,stroke:#333,color:#333
    style W2 fill:#f3e5f5,stroke:#333,color:#333
    style IDX fill:#2e8b57,stroke:#333,color:#fff
468
469
470
    style ZMQ fill:#2e8b57,stroke:#333,color:#fff
    style REG fill:#2e8b57,stroke:#333,color:#fff
    style HTTP fill:#2e8b57,stroke:#333,color:#fff
471
472
473
    style CLIENT fill:#fff3e0,stroke:#333,color:#333
```

474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
### Dynamo Runtime Mode

```mermaid
graph TD
    subgraph Workers
        W1[Worker 1]
        W2[Worker 2]
    end

    subgraph "Dynamo Runtime"
        MDC[MDC Discovery]
        EP[Event Plane<br/>NATS / ZMQ]
        RP[Request Plane]
    end

    subgraph "Standalone Indexer"
        DISC[Discovery Watcher]
        SUB[Event Subscriber]
        REG[Worker Registry]
        IDX["Indexer Map<br/>(model, tenant) → Radix Tree"]
        QE[Query Endpoint]
495
        HTTP[HTTP API<br/>/query /dump /register /metrics]
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
    end

    FRONTEND[Frontend / Router]
    CLIENT[External Client]

    W1 -->|register| MDC
    W2 -->|register| MDC
    MDC -->|added/removed| DISC
    DISC -->|add/remove workers| REG
    W1 -->|KV events| EP
    W2 -->|KV events| EP
    EP -->|RouterEvent| SUB
    SUB -->|apply events| IDX
    FRONTEND -->|IndexerQueryRequest| RP
    RP --> QE
    QE -->|query| IDX
    CLIENT -->|POST /query, GET /dump| HTTP
    HTTP -->|query| IDX

    style W1 fill:#f3e5f5,stroke:#333,color:#333
    style W2 fill:#f3e5f5,stroke:#333,color:#333
    style MDC fill:#e3f2fd,stroke:#333,color:#333
    style EP fill:#e3f2fd,stroke:#333,color:#333
    style RP fill:#e3f2fd,stroke:#333,color:#333
    style IDX fill:#2e8b57,stroke:#333,color:#fff
    style SUB fill:#2e8b57,stroke:#333,color:#fff
    style DISC fill:#2e8b57,stroke:#333,color:#fff
    style REG fill:#2e8b57,stroke:#333,color:#fff
    style QE fill:#2e8b57,stroke:#333,color:#fff
    style HTTP fill:#2e8b57,stroke:#333,color:#fff
    style FRONTEND fill:#fff3e0,stroke:#333,color:#333
    style CLIENT fill:#fff3e0,stroke:#333,color:#333
```

530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
### P2P Recovery Flow

```mermaid
sequenceDiagram
    participant B as Replica B (new)
    participant A as Replica A (healthy)
    participant W as Workers (ZMQ PUB)

    B->>W: Connect ZMQ SUB sockets
    Note over B,W: 1s delay for peer tree to advance past connection point
    B->>A: GET /dump
    A-->>B: Radix tree snapshot + block sizes
    Note over B: Apply dump events
    Note over B: Unblock ZMQ listeners
    B->>W: Start draining buffered events
    Note over B: Ready to serve queries
```

548
549
## See Also

550
- **[Mooncake KV Indexer RFC](https://github.com/kvcache-ai/Mooncake/issues/1403)**: Community API standardization for KV cache indexers
551
552
- **[Router Guide](router-guide.md)**: Full KV router configuration and tuning
- **[Router Design](../../design-docs/router-design.md)**: Architecture and event transport modes
553
- **[Standalone Router](../../../components/src/dynamo/router/README.md)**: Full routing service (routes requests to workers)