README.md 1.68 KB
Newer Older
Ryan Olson's avatar
Ryan Olson committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# kvbm-engine

Distributed coordination primitives for KV cache block management (KVBM).

This crate implements the leader/worker architecture for managing KV cache blocks across a tiered storage hierarchy:

**G1** (GPU HBM) → **G2** (Pinned DRAM) → **G3** (NVMe/SSD) → **G4** (S3/MinIO)

Leaders own block metadata and make placement decisions. Workers execute data transfers (RDMA, NVMe, object storage). Sessions coordinate multi-instance block transfers.

## Feature Flags


| Flag           | Purpose                                  |
| -------------- | ---------------------------------------- |
| `s3` (default) | S3/MinIO object storage (G4 tier)        |
| `testing`      | Test utilities and mock infrastructure   |
| `nats`         | NATS-based pub/sub transport             |
| `collectives`  | NIXL + NCCL multi-GPU collectives        |
| `nccl`         | NCCL via cudarc                          |
| `nvtx`         | NVIDIA Tools Extension profiling markers |


## Documentation

Detailed module documentation lives in `[docs/](docs/)`:

- [Architecture](docs/architecture.md) — Overall system design
- [Leader](docs/leader.md) — Block coordination and metadata management
- [Session](docs/session.md) — Distributed onboarding protocol
- [Worker](docs/worker.md) — Transfer execution
- [Worker Group](docs/worker-group.md) — SPMD parallel workers
- [Offload](docs/offload.md) — Async tier-demotion pipeline
- [Offload Developer Guide](docs/offload-developer.md) — Contributing to the offload module
- [Object Storage](docs/object.md) — S3/MinIO integration
- [Runtime](docs/runtime.md) — Runtime bundle (tokio, Velo, NIXL)
- [Testing](docs/testing.md) — Test utilities and fixtures