DESIGN.md 14.9 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
# Token Hashing Design: From Sequences to Lineage

## Executive Summary

The `dynamo-tokens` crate provides a family of hash types that encode increasingly rich information about token sequences in LLM inference systems. The key innovation is **packing complex relationships into simple value types**:

- **SequenceHash** (u64): Content uniqueness
- **PositionalSequenceHash** (u128): Content uniqueness + position
- **PositionalLineageHash** (u128): Content uniqueness + position + parent alignment

By encoding what would traditionally require complex structures with pointers and heap allocations into simple `u128` values, we achieve zero-cost abstractions: O(1) comparisons, lock-free sharing, and cache-friendly access patterns.

---

## The Problem: Token Trees Need Structure

### LLM Inference Creates Trees

When multiple users interact with an LLM, their conversations share common prefixes. A system prompt might be identical across thousands of requests, followed by diverging user inputs:

```mermaid
graph TD
    A["System Prompt<br/>(Block 0)"] --> B["User A: Hello<br/>(Block 1)"]
    A --> C["User B: Hi there<br/>(Block 1)"]
    B --> D["Assistant: Hi!<br/>(Block 2)"]
    B --> E["Assistant: Hello!<br/>(Block 2)"]
    C --> F["Assistant: Hey!<br/>(Block 2)"]
```

### Why Traditional Hashing Falls Short

A naive content hash tells us "these tokens are identical" but loses critical information:

| Question | Flat Hash | Positional Hash | Lineage Hash |
|----------|-----------|-----------------|--------------|
| Are these the same tokens? | Yes | Yes | Yes |
| At the same position? | ? | Yes | Yes |
| Who is the parent? | ? | ? | Yes |

Without position and lineage information, caching systems cannot:
- Distinguish identical blocks at different positions
- Efficiently evict leaves while preserving shared prefixes
- Navigate parent-child relationships without external data structures

### The Struct Approach Doesn't Scale

One could imagine a structure like:

```rust
struct BlockInfo {
    content_hash: u64,
    position: u64,
    parent: Option<Arc<BlockInfo>>,  // Heap allocation, reference counting
}
```

This approach has problems:
- **Heap allocation**: Each block requires memory allocation
- **Reference counting**: Arc overhead for thread safety
- **Cache misses**: Following pointers causes cache line misses
- **Synchronization**: Reference counting requires atomic operations

---

## The Solution: Value-Encoded Hashes

### Design Principle: Everything in 128 Bits

All our hash types are simple integers that can be:
- **Copied** by value (no heap allocation)
- **Compared** with a single CPU instruction
- **Shared** across threads without locks
- **Stored** inline in data structures (cache-friendly)

```mermaid
graph LR
    subgraph "Traditional Approach"
        A[Block Info Struct] -->|pointer| B[Parent Struct]
        B -->|pointer| C[Grandparent]
    end

    subgraph "Value Approach"
        D["PositionalLineageHash<br/>u128 = position + current + parent"]
    end
```

---

## Hash Type Evolution

### 1. SequenceHash (u64) - Content Identity

The foundation: a 64-bit xxhash of token content.

```
┌────────────────────────────────────────────────────────────────┐
│                     SequenceHash (64 bits)                     │
│            xxhash of [tokens + previous_sequence_hash]         │
└────────────────────────────────────────────────────────────────┘
```

**Encodes:** "Is this content unique?"

**Limitation:** Two identical blocks at different positions have the same hash, preventing position-aware deduplication.

### 2. PositionalSequenceHash (u128) - Adding Position

Extends SequenceHash with position awareness using adaptive encoding:

```
┌───────────────────────────────────────────────────────────────────────────────┐
│                     PositionalSequenceHash (128 bits)                         │
├──────────┬─────────────────────────────┬──────────────────────────────────────┤
│  Upper   │  Mode (2) + Position (8-31) │  Local Block Hash (31-54 bits)       │
│ 64 bits  │  Adaptive based on position │  Content signature                   │
├──────────┼─────────────────────────────┴──────────────────────────────────────┤
│  Lower   │                    SequenceHash (64 bits)                          │
│ 64 bits  │                    Full content uniqueness                         │
└──────────┴────────────────────────────────────────────────────────────────────┘
```

**Mode Selection:**

| Mode | Position Bits | Local Hash Bits | Max Position |
|------|---------------|-----------------|--------------|
| 00   | 8             | 54              | 255          |
| 01   | 16            | 46              | 65,535       |
| 10   | 24            | 38              | 16,777,215   |
| 11   | 31            | 31              | 2,147,483,647|

**Encodes:** "Is this the same content at the same position?"

**Limitation:** No parent relationship - cannot traverse backward in the tree.

### 3. PositionalLineageHash (u128) - Adding Lineage

The key innovation: encode parent-child relationships in the hash itself.

```
┌───────────────────────────────────────────────────────────────────────────────┐
│                     PositionalLineageHash (128 bits)                          │
├──────┬───────────┬────────────────────────┬───────────────────────────────────┤
│ Mode │ Position  │  Parent Hash Fragment  │  Current Hash Fragment            │
│ (2)  │ (8-24)    │  (51-59 bits)          │  (51-59 bits)                     │
└──────┴───────────┴────────────────────────┴───────────────────────────────────┘
```

**Mode Selection:**

| Mode | Position Bits | Parent Bits | Current Bits | Max Position |
|------|---------------|-------------|--------------|--------------|
| 00   | 8             | 59          | 59           | 255          |
| 01   | 16            | 55          | 55           | 65,535       |
| 10   | 24            | 51          | 51           | 16,777,215   |

**Encodes:** "Position + Content + Parent Identity"

---

## The Lineage Hash: Deep Dive

### Why Parent Fragments?

Given a block at position N, we can:
1. Extract its parent fragment
2. Look up position N-1 in the radix tree
3. Find all blocks whose current fragment matches our parent fragment

This enables **backward traversal without pointers**.

```mermaid
graph TD
    subgraph "Position 0"
        A["Block A<br/>current: 0xABC..."]
    end

    subgraph "Position 1"
        B["Block B<br/>parent: 0xABC...<br/>current: 0xDEF..."]
        C["Block C<br/>parent: 0xABC...<br/>current: 0x123..."]
    end

    subgraph "Position 2"
        D["Block D<br/>parent: 0xDEF...<br/>current: 0x456..."]
    end

    A -.->|"parent fragment matches"| B
    A -.->|"parent fragment matches"| C
    B -.->|"parent fragment matches"| D
```

### Cross-Mode Boundary Alignment

**The Challenge:** When position transitions from 255 to 256, the mode changes from 0 (59-bit fragments) to 1 (55-bit fragments). How do we ensure parent-child matching works?

**The Solution:** Pre-truncate the current hash fragment to the next mode's capacity.

```mermaid
sequenceDiagram
    participant P255 as Position 255 (Mode 0)
    participant P256 as Position 256 (Mode 1)

    Note over P255: Mode 0 could use 59 bits
    Note over P255: But next mode only has 55 bits
    P255->>P255: Truncate current to 55 bits
    P255->>P256: Parent fragment = 55 bits
    Note over P256: Mode 1 stores parent in 55 bits
    Note over P256: Fragments match!
```

At position 255, we know position 256 will use Mode 1 with 55-bit fragments. So we pre-truncate our current hash to 55 bits, ensuring the child at position 256 can store and match our fragment.

---

## The PositionalHash Trait

A generic abstraction enables polymorphic data structures:

```rust
/// Trait for hashes that include position information.
pub trait PositionalHash {
    /// Returns the position associated with the hash.
    fn position(&self) -> u64;
}
```

Both `PositionalSequenceHash` and `PositionalLineageHash` implement this trait, allowing the same radix tree to work with either:

```rust
pub struct PositionalRadixTree<V, K = PositionalSequenceHash>
where
    K: PositionalHash + Hash + Eq + Clone,
{
    map: DashMap<u64, DashMap<K, V>>,
}
```

---

## PositionalRadixTree: Sparse Two-Level Indexing

The radix tree exploits position to create a sparse, efficient structure:

```mermaid
graph TD
    subgraph "Level 1: Position Index"
        P0["Position 0"]
        P1["Position 1"]
        P2["Position 2"]
        Pn["Position N"]
    end

    subgraph "Level 2: Hash Maps"
        H0["Hash → Value"]
        H1["Hash → Value"]
        H2["Hash → Value"]
        Hn["Hash → Value"]
    end

    P0 --> H0
    P1 --> H1
    P2 --> H2
    Pn --> Hn
```

**Benefits:**
- **O(1) lookup** within a position level
- **Sparse allocation**: Only populated positions have hash maps
- **Lock granularity**: Different positions can be accessed concurrently
- **Memory efficient**: No entries for empty positions

---

## Use Cases

### 1. Hierarchical Block Caching

Token blocks form trees. A caching layer can exploit this structure:

```mermaid
graph TD
    subgraph "Cache Strategy"
        A[System Prompt] -->|"shared by 1000 requests"| B[Keep in cache]
        C[User Query] -->|"unique per request"| D[Evict first]
    end
```

Using lineage hashes, the cache can:
- Identify which blocks are leaves (no children)
- Evict leaves before parents
- Preserve shared prefixes as long as possible

### 2. Lineage-Aware Eviction

When memory pressure requires eviction, evict in tree order:

```mermaid
graph TD
    A["Block 0<br/>Keep (has children)"] --> B["Block 1<br/>Keep (has children)"]
    A --> C["Block 1'<br/>Evict 2nd"]
    B --> D["Block 2<br/>Evict 1st"]
    B --> E["Block 2'<br/>Evict 1st"]

    style D fill:#fbb
    style E fill:#fbb
    style C fill:#fdd
```

**Algorithm:**
1. Find all leaf blocks (no children pointing to them)
2. Evict leaves in LRU order
3. When a block becomes a leaf, it becomes evictable
4. Parents are naturally preserved until all children are gone

### 3. Block Deduplication with Position Awareness

When a new block arrives:
1. Compute its lineage hash
2. Check if a block with the same hash exists at that position
3. If yes, reuse the existing block (deduplication)
4. Position awareness prevents false matches across positions

### 4. Parent Traversal Without Pointers

Given a block at position N:
```rust
fn find_parent(hash: PositionalLineageHash, tree: &PositionalRadixTree<Block>) -> Option<Block> {
    let parent_pos = hash.position() - 1;
    let parent_fragment = hash.parent_hash_fragment();

    // Look up by position, filter by fragment match
    tree.position(parent_pos)?
        .iter()
        .find(|entry| entry.key().current_hash_fragment() == parent_fragment)
        .map(|entry| entry.value().clone())
}
```

No pointer chasing, no Arc references, just integer comparisons.

---

## API Reference

### PositionalLineageHash

```rust
impl PositionalLineageHash {
    /// Create from sequence hashes and position
    pub fn new(
        current_seq_hash: SequenceHash,
        parent_seq_hash: Option<SequenceHash>,
        position: u64,
    ) -> Self;

    /// Get the block position
    pub fn position(&self) -> u64;

    /// Get the current block's hash fragment
    pub fn current_hash_fragment(&self) -> u64;

    /// Get the parent block's hash fragment
    pub fn parent_hash_fragment(&self) -> u64;

    /// Get the encoding mode (0, 1, or 2)
    pub fn mode(&self) -> u8;

    /// Get the raw u128 value
    pub fn as_u128(&self) -> u128;
}
```

### PositionalSequenceHash

```rust
impl PositionalSequenceHash {
    /// Create from components
    pub fn new(
        sequence_hash: SequenceHash,
        position: u64,
        local_block_hash: BlockHash,
    ) -> Self;

    /// Get the sequence hash component
    pub fn sequence_hash(&self) -> SequenceHash;

    /// Get the block position
    pub fn position(&self) -> u64;

    /// Get the local block hash
    pub fn local_block_hash(&self) -> BlockHash;
}
```

### PositionalRadixTree

```rust
impl<V, K: PositionalHash + Hash + Eq + Clone> PositionalRadixTree<V, K> {
    /// Create empty tree
    pub fn new() -> Self;

    /// Get/create the sub-map for a hash's position
    pub fn prefix(&self, key: &K) -> RefMut<u64, DashMap<K, V>>;

    /// Get the sub-map for a specific position
    pub fn position(&self, position: u64) -> Option<RefMut<u64, DashMap<K, V>>>;

    /// Count total entries
    pub fn len(&self) -> usize;
}
```

---

## Future Directions

### Compressed Lineage Chains

For very long sequences, consider compressing lineage into skip-list style hashes that encode multiple ancestors.

### Distributed Coordination

The value-semantic design enables efficient network transfer - a u128 can be serialized in 16 bytes, enabling distributed caching systems to coordinate block state.

### Integration with External Caching

The hash types can serve as keys in external caching systems (Redis, memcached) without serialization overhead.

---

## Summary

The `dynamo-tokens` crate demonstrates how careful bit-packing can replace complex data structures with simple value types. By encoding position and lineage information directly in hash values, we achieve:

1. **Zero-cost abstractions**: No heap allocation, no reference counting
2. **Thread safety without locks**: Values can be freely copied and shared
3. **Cache-friendly access**: No pointer chasing, inline storage
4. **Rich semantics**: Position awareness, parent-child relationships

The progression from SequenceHash to PositionalLineageHash shows how each layer adds capabilities while maintaining the simplicity of value semantics.