standalone-indexer.md 6.36 KB
Newer Older
1
2
3
4
---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
title: Standalone KV Indexer
5
subtitle: Run the KV cache indexer as an independent HTTP service for querying block state
6
7
8
9
---

## Overview

10
The standalone KV indexer (`dynamo-kv-indexer`) is a lightweight HTTP binary that subscribes to ZMQ KV event streams from workers, maintains a radix tree of cached blocks, and exposes HTTP endpoints for querying and managing workers.
11

12
This is distinct from the [Standalone Router](../../../components/src/dynamo/router/README.md), which is a full routing service. The standalone indexer provides only the indexing and query layer without routing logic.
13

14
15
16
17
18
19
The HTTP API follows the [Mooncake KV Indexer RFC](https://github.com/kvcache-ai/Mooncake/issues/1403) conventions.

## Compatibility

The standalone indexer works with any engine that publishes KV cache events over ZMQ in the expected msgpack format. This includes bare vLLM and SGLang engines, which emit ZMQ KV events natively — no Dynamo-specific wrapper is required.

20
21
22
23
24
25
26
## Use Cases

- **Debugging**: Inspect the radix tree state to verify which blocks are cached on which workers.
- **State verification**: Confirm that the indexer's view of KV cache state matches the router's internal state (used in integration tests).
- **Custom routing**: Build external routing logic that queries the indexer for overlap scores and makes its own worker selection decisions.
- **Monitoring**: Observe KV cache distribution across workers without running a full router.

27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
## Building

The binary is a feature-gated target in the `dynamo-kv-router` crate:

```bash
cargo build -p dynamo-kv-router --features indexer-bin --bin dynamo-kv-indexer
```

## CLI

```bash
dynamo-kv-indexer --block-size 16 --port 8090 [--threads 1] [--workers "1=tcp://host:5557,2=tcp://host:5558"]
```

| Flag | Default | Description |
|------|---------|-------------|
| `--block-size` | (required) | KV cache block size (must match the engine's block size) |
| `--port` | `8090` | HTTP server listen port |
| `--threads` | `1` | Number of indexer threads (1 = single-threaded, >1 = thread pool) |
| `--workers` | (none) | Initial workers as `instance_id=zmq_address,...` pairs |

## HTTP API

### `POST /register` — Register an endpoint

Register a ZMQ endpoint for an instance. Call once per dp_rank for data-parallel workers:

```bash
# Single dp_rank (dp_rank defaults to 0)
curl -X POST http://localhost:8090/register \
  -H 'Content-Type: application/json' \
  -d '{"instance_id": 1, "endpoint": "tcp://127.0.0.1:5557"}'

# Multiple dp_ranks — register each separately
curl -X POST http://localhost:8090/register \
  -H 'Content-Type: application/json' \
  -d '{"instance_id": 1, "endpoint": "tcp://127.0.0.1:5557", "dp_rank": 0}'
curl -X POST http://localhost:8090/register \
  -H 'Content-Type: application/json' \
  -d '{"instance_id": 1, "endpoint": "tcp://127.0.0.1:5558", "dp_rank": 1}'
```

The indexer spawns a ZMQ SUB listener for each endpoint and begins consuming KV events.
70

71
### `POST /unregister` — Deregister an instance
72

73
Remove all dp_ranks for an instance, or a specific dp_rank:
74

75
76
77
78
79
80
81
82
83
84
```bash
# Remove all dp_ranks
curl -X POST http://localhost:8090/unregister \
  -H 'Content-Type: application/json' \
  -d '{"instance_id": 1}'

# Remove a specific dp_rank
curl -X POST http://localhost:8090/unregister \
  -H 'Content-Type: application/json' \
  -d '{"instance_id": 1, "dp_rank": 0}'
85
86
```

87
88
89
Cancels ZMQ listeners and removes the instance's blocks from the radix tree.

### `GET /workers` — List registered instances
90

91
92
93
94
95
96
97
98
```bash
curl http://localhost:8090/workers
```

Returns:
```json
[{"instance_id": 1, "endpoints": {"0": "tcp://127.0.0.1:5557", "1": "tcp://127.0.0.1:5558"}}]
```
99

100
### `POST /query` — Query overlap for token IDs
101

102
Given raw token IDs, compute block hashes and return per-instance overlap scores:
103

104
105
106
107
108
```bash
curl -X POST http://localhost:8090/query \
  -H 'Content-Type: application/json' \
  -d '{"token_ids": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]}'
```
109

110
111
112
113
114
115
116
Returns:
```json
{
  "scores": {"1": {"0": 2}, "2": {"1": 0}},
  "frequencies": [1, 1],
  "tree_sizes": {"1": {"0": 5}, "2": {"1": 3}}
}
117
118
```

119
Scores are nested by `instance_id` then `dp_rank`. Higher score means more cached prefix blocks on that instance.
120

121
122
123
124
125
126
### `POST /query_by_hash` — Query overlap for pre-computed hashes

```bash
curl -X POST http://localhost:8090/query_by_hash \
  -H 'Content-Type: application/json' \
  -d '{"block_hashes": [123456, 789012]}'
127
128
```

129
130
131
Same response format as `/query`.

### `GET /dump` — Dump all radix tree events
132

133
134
135
136
Returns the full radix tree state as a JSON array of `RouterEvent` objects:

```bash
curl http://localhost:8090/dump
137
138
139
140
```

## Limitations

141
- **ZMQ only**: Workers must publish KV events via ZMQ PUB sockets. The standalone indexer does not subscribe to NATS event streams.
142
143
144
145
146
147
148
- **No routing logic**: The indexer only maintains the radix tree and answers queries. It does not track active blocks, manage request lifecycle, or perform worker selection.

## Architecture

```mermaid
graph TD
    subgraph Workers
149
150
        W1[Worker 1<br/>ZMQ PUB]
        W2[Worker 2<br/>ZMQ PUB]
151
152
    end

153
154
155
    subgraph "Standalone Indexer (HTTP)"
        REG[Worker Registry]
        ZMQ[ZMQ SUB Listeners]
156
        IDX[Indexer / Radix Tree]
157
        HTTP[HTTP API<br/>/query /dump /register]
158
159
160
161
    end

    CLIENT[External Client]

162
163
164
165
166
167
168
    W1 -->|ZMQ events| ZMQ
    W2 -->|ZMQ events| ZMQ
    CLIENT -->|POST /register| REG
    REG -->|spawn listeners| ZMQ
    ZMQ -->|apply events| IDX
    CLIENT -->|POST /query, GET /dump| HTTP
    HTTP -->|query| IDX
169
170
171
172

    style W1 fill:#f3e5f5,stroke:#333,color:#333
    style W2 fill:#f3e5f5,stroke:#333,color:#333
    style IDX fill:#2e8b57,stroke:#333,color:#fff
173
174
175
    style ZMQ fill:#2e8b57,stroke:#333,color:#fff
    style REG fill:#2e8b57,stroke:#333,color:#fff
    style HTTP fill:#2e8b57,stroke:#333,color:#fff
176
177
178
179
180
    style CLIENT fill:#fff3e0,stroke:#333,color:#333
```

## See Also

181
- **[Mooncake KV Indexer RFC](https://github.com/kvcache-ai/Mooncake/issues/1403)**: Community API standardization for KV cache indexers
182
183
- **[Router Guide](router-guide.md)**: Full KV router configuration and tuning
- **[Router Design](../../design-docs/router-design.md)**: Architecture and event transport modes
184
- **[Standalone Router](../../../components/src/dynamo/router/README.md)**: Full routing service (routes requests to workers)