"examples/backends/trtllm/launch/disagg_router.sh" did not exist on "901715b5e172b64a7518cb0e02cdaf4b11e244a6"
README.md 5.69 KB
Newer Older
Ryan Olson's avatar
Ryan Olson committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
# kvbm-physical

Physical layout and transfer management for KV cache block storage.

`kvbm-physical` provides the low-level building blocks for mapping KV cache blocks to memory, registering them for RDMA transfers via NIXL, and executing transfers between heterogeneous storage tiers (GPU, host, disk, remote).

## Modules

### `layout` — Block-to-memory mapping

Abstractions for how KV cache blocks are organized in memory.

- **`Layout` trait** — Core abstraction mapping `(block_id, layer_id, outer_id)` to a `MemoryRegion`. Implementations include fully contiguous (single allocation) and layer-separate (one allocation per layer) variants.
- **`KvBlockLayout`** — Describes dimension ordering within a block. Five named formats (`UniversalTP`, `UniversalPP`, `OperationalHND`, `OperationalNHD`, `Custom`) plus `Unknown`. Provides `requires_transform()`, `is_operational()`, and `is_universal()` for kernel selection.
- **`PhysicalLayout`** — Wraps a `Layout` with its physical storage location (`StorageKind`) and NIXL registration metadata (`NixlMetadata`). Constructed via a type-state builder: Config → Layout type → Memory allocation → `build()`.
- **`LayoutConfig`** — Block dimensions: `num_blocks`, `num_layers`, `outer_dim`, `page_size`, `inner_dim`, `dtype_width_bytes`, optional `num_heads`.
- **`KvBlocks`** — Groups block IDs with a shared `PhysicalLayout` and optional `KvBlockLayout` override for cross-format transfers.

### `manager` — Layout registration and transfer orchestration

- **`TransferManager`** — Primary API. Registers layouts, exports/imports RDMA metadata between workers, and executes transfers by handle.
- **`LayoutHandle`** — Compact `u128` encoding `(worker_id, layout_id)`. Identifies a registered layout within a specific worker; not symmetric across workers.
- **`LogicalLayoutDescriptor`** — Bridges a `LayoutHandle` to a `LogicalLayoutHandle` (G1/G2/G3/G4 tier). Enables callers to say "copy from G1 to G2" while `TransferManager` resolves worker-specific physical handles.
- **`SerializedLayout`** — Wire format for RDMA metadata exchange. Packs worker address, NIXL metadata, and layout descriptors into a bincode blob.
- **`WorkerAddress`**`(worker_id, nixl_agent_name)` pair identifying a worker on the network.

### `transfer` — Transfer configuration and execution

- **`TransferConfig` / builder** — Configures event system, NIXL backends, CUDA device, capabilities, and memory pool before building a `TransferManager`.
- **`TransferOptions`** — Per-transfer configuration: `layer_range`, `nixl_write_notification`, `bounce_buffer`, caller-provided `cuda_stream`, and src/dst `kv_layout` overrides.
- **`TransferPreferences`** — Strategy hints via `NativeVsNixlPolicy` (PreferNative / PreferNixl / Automatic).
- **`TransferCompleteNotification`**`Either<Ready, EventAwaiter>` implementing `IntoFuture`. Zero-cost for synchronous completions. `aggregate()` composes multiple notifications. `could_yield()` checks if awaiting will suspend.
- **`BounceBuffer`** — Staging area for two-hop transfers (e.g., Device &rarr; Host &rarr; Remote).
- **Checksum utilities** — BLAKE3 block/layer checksums for transfer verification.
- **Fill utilities** — Constant/sequential patterns for testing and initialization.

## Quick Start

```rust,ignore
use kvbm_physical::{TransferManager, TransferOptions};
use kvbm_physical::layout::{LayoutConfig, PhysicalLayout};

// 1. Build the TransferManager (creates NIXL agent, CUDA streams, event system)
let manager = TransferManager::builder()
    .nixl_backend("ucx")
    .cuda_device_id(0)
    .build()?;

// 2. Configure a layout
let config = LayoutConfig::builder()
    .num_blocks(64)
    .num_layers(32)
    .outer_dim(2)
    .page_size(16)
    .inner_dim(128)
    .dtype_width_bytes(2)
    .build()?;

// 3. Build a physical layout (type-state builder: config -> layout type -> memory -> build)
let gpu_layout = PhysicalLayout::builder(manager.nixl_agent().clone())
    .with_config(config.clone())
    .fully_contiguous()
    .allocate_device(0)
    .build()?;

let host_layout = PhysicalLayout::builder(manager.nixl_agent().clone())
    .with_config(config)
    .fully_contiguous()
    .allocate_pinned(Some(0))
    .build()?;

// 4. Register layouts to get handles
let gpu_handle = manager.register_layout(gpu_layout)?;
let host_handle = manager.register_layout(host_layout)?;

// 5. Execute a transfer and await completion
let notification = manager.execute_transfer(
    gpu_handle,
    &[0, 1, 2, 3],        // source block IDs
    host_handle,
    &[0, 1, 2, 3],        // destination block IDs
    TransferOptions::new(),
)?;
notification.await?;
```

## Testing

All functional tests in `kvbm-physical` require a real NIXL installation and a CUDA GPU. They are gated behind two feature flags:

- **`testing-kvbm`** — enables tests requiring NIXL and CUDA (creates NixlAgent instances and allocates device memory / launches kernels)

### Running tests

```bash
# Without GPU/NIXL — only the sentinel test runs (confirms skipping)
cargo test -p kvbm-physical

# With GPU + NIXL available
cargo test -p kvbm-physical --features testing-kvbm
```

When neither feature is enabled, a single **sentinel test** runs and prints a reminder message. This ensures `cargo test` never silently passes with zero tests.

### What the sentinel test looks like

```
running 1 test
test sentinel::all_functional_tests_skipped___enable_testing_nixl_and_testing_cuda ... ok
```

The `test_version_check_on_deserialization` test in `layout::tests` is the only functional test that runs without feature flags, as it does not require NIXL or CUDA.

## Documentation

- [v1 Migration Guide](docs/v1_migration.md) — Migration from `dynamo-llm::block_manager` to `kvbm-physical`