This document provides implementation details for developers working on the offload pipeline. For high-level concepts and policy statements, see [offload.md](offload.md).
## Container-Based Architecture
### OffloadContainer
The container is the fundamental unit that flows through the pipeline:
```rust,ignore
structOffloadContainer<T:BlockMetadata>{
/// Source blocks to transfer
blocks:Vec<SourceBlock<T>>,
/// Precondition event - Some before PreconditionAwaiter, None after
precondition:Option<EventHandle>,
/// Cancellation token (cloned from TransferHandle)
cancel_token:CancellationToken,
}
impl<T:BlockMetadata>OffloadContainer<T>{
/// Check if this container has been cancelled
fnis_cancelled(&self)->bool{
self.cancel_token.is_requested()
}
/// Upgrade all blocks from Weak → Strong
/// Returns None if any block was evicted
fnupgrade(self)->Option<UpgradedContainer<T>>{
// Implementation upgrades each SourceBlock
}
}
```
### OffloadBatch
Batches group multiple containers for efficient transfer:
The offload module manages the asynchronous transfer of KV cache blocks between storage tiers. It provides a pipeline-based architecture for evaluating, batching, and executing block transfers with full cancellation support.
## Overview
Offloading moves blocks from a source tier (e.g., GPU memory) to a destination tier (e.g., host memory, remote storage, or object storage). The pipeline ensures:
-**Policy-based filtering**: Only blocks meeting criteria are transferred
-**Batched execution**: Blocks are grouped for efficient transfer
-**Cancellation support**: Transfers can be cancelled at any point before commitment
-**Precondition synchronization**: Transfers wait for forward pass completion
| **PolicyEvaluator** | Filters blocks based on configured policies (frequency, presence, etc.) |
| **PreconditionAwaiter** | Waits for forward pass completion before proceeding |
| **Batcher** | Groups containers into batches based on total block count |
| **TransferExecutor** | Upgrades blocks and executes the actual transfer |
## Container Data Model
The fundamental unit flowing through the pipeline is an **OffloadContainer**:
```rust,ignore
structOffloadContainer<T:BlockMetadata>{
/// The blocks to offload
blocks:Vec<SourceBlock<T>>,
/// Precondition event (forward pass completion)
precondition:Option<EventHandle>,
/// Cancellation token
cancel_token:CancellationToken,
}
```
Containers are grouped into batches for efficient transfer:
```rust,ignore
structOffloadBatch<T:BlockMetadata>{
/// Multiple containers, each independently cancellable
containers:Vec<OffloadContainer<T>>,
}
```
### P1: Container is the Unit of Cancellation
Individual blocks within a container are not independently cancellable. When a container is cancelled, all its blocks are cancelled together.
### P2: Token Travels with Container
Each container carries its own `CancellationToken`, cloned from the `TransferHandle` at enqueue time. The token travels with the container through all pipeline stages until upgrade.
### P3: Upgrade is the Commitment Boundary
The upgrade step (Weak → Strong) is the point of no return:
-**Before upgrade**: Containers can be cancelled via sweep or token check
-**After upgrade**: We own the blocks; cancellation no longer applies
### P4: Sweep Before Upgrade
The last cancellation check occurs immediately before upgrade. The `TransferExecutor` calls `batch.sweep_cancelled()` to remove cancelled containers before committing.
### P5: Flat Map After Upgrade
After upgrade, all blocks from all containers are consolidated into a single `Vec<ImmutableBlock<T>>` for efficient batch transfer. Per-container identity is lost at this point.
### P6: PreconditionAwaiter Uses Select
The precondition awaiter can be cancelled via `select!` on both the precondition event and the cancellation token. If cancelled while waiting, the container is dropped immediately.
## Configuration
Pipeline behavior is controlled via `PipelineConfig`:
| Option | Default | Description |
|--------|---------|-------------|
| `batch_config.max_batch_size` | 64 | Maximum blocks per batch |
| `batch_config.min_batch_size` | 8 | Minimum blocks before flush |
| `batch_config.flush_interval` | 10ms | Time before flushing partial batch |
| `policy_timeout` | 100ms | Timeout for policy evaluation |
| `sweep_interval` | 10ms | Interval for cancel sweeper |
| `max_concurrent_transfers` | 1 | Concurrent transfer batches |
<sectionid="summary"><h2>Summary</h2><pclass="desc">Key performance indicators across all tests</p><divclass="kpi-grid"id="kpiGrid"></div></section>
<sectionid="concurrency"><h2>Concurrency Scaling</h2><pclass="desc">Bandwidth vs concurrency — find the saturation point for each transfer type</p><divclass="pills"id="concPills"></div><divclass="chart-grid"id="concCharts"></div></section>
<sectionid="pagesize"><h2>Page Size Efficiency</h2><pclass="desc">Bandwidth vs page_size — block size amortization</p><divclass="pills"id="psPills"></div><divclass="chart-wrap"id="psChart"></div></section>
<sectionid="bounce"><h2>Bounce Buffer Analysis</h2><pclass="desc">Bandwidth vs bounce_blocks — double-buffering effectiveness</p><divclass="chart-wrap"id="bounceChart"></div></section>
<sectionid="g2g3"><h2>G2/G3 Raw Bandwidth</h2><pclass="desc">NVMe read vs write bandwidth at each concurrency level</p><divclass="chart-wrap"id="g2g3Chart"></div></section>
<sectionid="gds"><h2>GDS vs Staged</h2><pclass="desc">GPUDirect Storage bypass compared to best staged transfer</p><divclass="chart-wrap"id="gdsChart"></div></section>
<sectionid="bidir"><h2>Bidirectional Contention</h2><pclass="desc">Isolated vs contended bandwidth to measure contention degradation</p><divclass="chart-wrap"id="bidirChart"></div></section>
<sectionid="latency"><h2>Latency Distribution</h2><pclass="desc">Horizontal box plots showing min / p50 / p95 / p99 / max per test</p><divclass="chart-wrap"id="latChart"></div></section>
<sectionid="rawdata"><h2>Raw Data</h2><pclass="desc">All loaded records — click column headers to sort</p><divclass="card"><divclass="table-wrap"id="tableWrap"></div></div></section>
</main>
<divid="tooltip"></div>
<footer>KVBM Bench Viewer — NVIDIA CORPORATION & AFFILIATES</footer>