# Token Hashing Design: From Sequences to Lineage ## Executive Summary The `dynamo-tokens` crate provides a family of hash types that encode increasingly rich information about token sequences in LLM inference systems. The key innovation is **packing complex relationships into simple value types**: - **SequenceHash** (u64): Content uniqueness - **PositionalSequenceHash** (u128): Content uniqueness + position - **PositionalLineageHash** (u128): Content uniqueness + position + parent alignment By encoding what would traditionally require complex structures with pointers and heap allocations into simple `u128` values, we achieve zero-cost abstractions: O(1) comparisons, lock-free sharing, and cache-friendly access patterns. --- ## The Problem: Token Trees Need Structure ### LLM Inference Creates Trees When multiple users interact with an LLM, their conversations share common prefixes. A system prompt might be identical across thousands of requests, followed by diverging user inputs: ```mermaid graph TD A["System Prompt
(Block 0)"] --> B["User A: Hello
(Block 1)"] A --> C["User B: Hi there
(Block 1)"] B --> D["Assistant: Hi!
(Block 2)"] B --> E["Assistant: Hello!
(Block 2)"] C --> F["Assistant: Hey!
(Block 2)"] ``` ### Why Traditional Hashing Falls Short A naive content hash tells us "these tokens are identical" but loses critical information: | Question | Flat Hash | Positional Hash | Lineage Hash | |----------|-----------|-----------------|--------------| | Are these the same tokens? | Yes | Yes | Yes | | At the same position? | ? | Yes | Yes | | Who is the parent? | ? | ? | Yes | Without position and lineage information, caching systems cannot: - Distinguish identical blocks at different positions - Efficiently evict leaves while preserving shared prefixes - Navigate parent-child relationships without external data structures ### The Struct Approach Doesn't Scale One could imagine a structure like: ```rust struct BlockInfo { content_hash: u64, position: u64, parent: Option>, // Heap allocation, reference counting } ``` This approach has problems: - **Heap allocation**: Each block requires memory allocation - **Reference counting**: Arc overhead for thread safety - **Cache misses**: Following pointers causes cache line misses - **Synchronization**: Reference counting requires atomic operations --- ## The Solution: Value-Encoded Hashes ### Design Principle: Everything in 128 Bits All our hash types are simple integers that can be: - **Copied** by value (no heap allocation) - **Compared** with a single CPU instruction - **Shared** across threads without locks - **Stored** inline in data structures (cache-friendly) ```mermaid graph LR subgraph "Traditional Approach" A[Block Info Struct] -->|pointer| B[Parent Struct] B -->|pointer| C[Grandparent] end subgraph "Value Approach" D["PositionalLineageHash
u128 = position + current + parent"] end ``` --- ## Hash Type Evolution ### 1. SequenceHash (u64) - Content Identity The foundation: a 64-bit xxhash of token content. ``` ┌────────────────────────────────────────────────────────────────┐ │ SequenceHash (64 bits) │ │ xxhash of [tokens + previous_sequence_hash] │ └────────────────────────────────────────────────────────────────┘ ``` **Encodes:** "Is this content unique?" **Limitation:** Two identical blocks at different positions have the same hash, preventing position-aware deduplication. ### 2. PositionalSequenceHash (u128) - Adding Position Extends SequenceHash with position awareness using adaptive encoding: ``` ┌───────────────────────────────────────────────────────────────────────────────┐ │ PositionalSequenceHash (128 bits) │ ├──────────┬─────────────────────────────┬──────────────────────────────────────┤ │ Upper │ Mode (2) + Position (8-31) │ Local Block Hash (31-54 bits) │ │ 64 bits │ Adaptive based on position │ Content signature │ ├──────────┼─────────────────────────────┴──────────────────────────────────────┤ │ Lower │ SequenceHash (64 bits) │ │ 64 bits │ Full content uniqueness │ └──────────┴────────────────────────────────────────────────────────────────────┘ ``` **Mode Selection:** | Mode | Position Bits | Local Hash Bits | Max Position | |------|---------------|-----------------|--------------| | 00 | 8 | 54 | 255 | | 01 | 16 | 46 | 65,535 | | 10 | 24 | 38 | 16,777,215 | | 11 | 31 | 31 | 2,147,483,647| **Encodes:** "Is this the same content at the same position?" **Limitation:** No parent relationship - cannot traverse backward in the tree. ### 3. PositionalLineageHash (u128) - Adding Lineage The key innovation: encode parent-child relationships in the hash itself. ``` ┌───────────────────────────────────────────────────────────────────────────────┐ │ PositionalLineageHash (128 bits) │ ├──────┬───────────┬────────────────────────┬───────────────────────────────────┤ │ Mode │ Position │ Parent Hash Fragment │ Current Hash Fragment │ │ (2) │ (8-24) │ (51-59 bits) │ (51-59 bits) │ └──────┴───────────┴────────────────────────┴───────────────────────────────────┘ ``` **Mode Selection:** | Mode | Position Bits | Parent Bits | Current Bits | Max Position | |------|---------------|-------------|--------------|--------------| | 00 | 8 | 59 | 59 | 255 | | 01 | 16 | 55 | 55 | 65,535 | | 10 | 24 | 51 | 51 | 16,777,215 | **Encodes:** "Position + Content + Parent Identity" --- ## The Lineage Hash: Deep Dive ### Why Parent Fragments? Given a block at position N, we can: 1. Extract its parent fragment 2. Look up position N-1 in the radix tree 3. Find all blocks whose current fragment matches our parent fragment This enables **backward traversal without pointers**. ```mermaid graph TD subgraph "Position 0" A["Block A
current: 0xABC..."] end subgraph "Position 1" B["Block B
parent: 0xABC...
current: 0xDEF..."] C["Block C
parent: 0xABC...
current: 0x123..."] end subgraph "Position 2" D["Block D
parent: 0xDEF...
current: 0x456..."] end A -.->|"parent fragment matches"| B A -.->|"parent fragment matches"| C B -.->|"parent fragment matches"| D ``` ### Cross-Mode Boundary Alignment **The Challenge:** When position transitions from 255 to 256, the mode changes from 0 (59-bit fragments) to 1 (55-bit fragments). How do we ensure parent-child matching works? **The Solution:** Pre-truncate the current hash fragment to the next mode's capacity. ```mermaid sequenceDiagram participant P255 as Position 255 (Mode 0) participant P256 as Position 256 (Mode 1) Note over P255: Mode 0 could use 59 bits Note over P255: But next mode only has 55 bits P255->>P255: Truncate current to 55 bits P255->>P256: Parent fragment = 55 bits Note over P256: Mode 1 stores parent in 55 bits Note over P256: Fragments match! ``` At position 255, we know position 256 will use Mode 1 with 55-bit fragments. So we pre-truncate our current hash to 55 bits, ensuring the child at position 256 can store and match our fragment. --- ## The PositionalHash Trait A generic abstraction enables polymorphic data structures: ```rust /// Trait for hashes that include position information. pub trait PositionalHash { /// Returns the position associated with the hash. fn position(&self) -> u64; } ``` Both `PositionalSequenceHash` and `PositionalLineageHash` implement this trait, allowing the same radix tree to work with either: ```rust pub struct PositionalRadixTree where K: PositionalHash + Hash + Eq + Clone, { map: DashMap>, } ``` --- ## PositionalRadixTree: Sparse Two-Level Indexing The radix tree exploits position to create a sparse, efficient structure: ```mermaid graph TD subgraph "Level 1: Position Index" P0["Position 0"] P1["Position 1"] P2["Position 2"] Pn["Position N"] end subgraph "Level 2: Hash Maps" H0["Hash → Value"] H1["Hash → Value"] H2["Hash → Value"] Hn["Hash → Value"] end P0 --> H0 P1 --> H1 P2 --> H2 Pn --> Hn ``` **Benefits:** - **O(1) lookup** within a position level - **Sparse allocation**: Only populated positions have hash maps - **Lock granularity**: Different positions can be accessed concurrently - **Memory efficient**: No entries for empty positions --- ## Use Cases ### 1. Hierarchical Block Caching Token blocks form trees. A caching layer can exploit this structure: ```mermaid graph TD subgraph "Cache Strategy" A[System Prompt] -->|"shared by 1000 requests"| B[Keep in cache] C[User Query] -->|"unique per request"| D[Evict first] end ``` Using lineage hashes, the cache can: - Identify which blocks are leaves (no children) - Evict leaves before parents - Preserve shared prefixes as long as possible ### 2. Lineage-Aware Eviction When memory pressure requires eviction, evict in tree order: ```mermaid graph TD A["Block 0
Keep (has children)"] --> B["Block 1
Keep (has children)"] A --> C["Block 1'
Evict 2nd"] B --> D["Block 2
Evict 1st"] B --> E["Block 2'
Evict 1st"] style D fill:#fbb style E fill:#fbb style C fill:#fdd ``` **Algorithm:** 1. Find all leaf blocks (no children pointing to them) 2. Evict leaves in LRU order 3. When a block becomes a leaf, it becomes evictable 4. Parents are naturally preserved until all children are gone ### 3. Block Deduplication with Position Awareness When a new block arrives: 1. Compute its lineage hash 2. Check if a block with the same hash exists at that position 3. If yes, reuse the existing block (deduplication) 4. Position awareness prevents false matches across positions ### 4. Parent Traversal Without Pointers Given a block at position N: ```rust fn find_parent(hash: PositionalLineageHash, tree: &PositionalRadixTree) -> Option { let parent_pos = hash.position() - 1; let parent_fragment = hash.parent_hash_fragment(); // Look up by position, filter by fragment match tree.position(parent_pos)? .iter() .find(|entry| entry.key().current_hash_fragment() == parent_fragment) .map(|entry| entry.value().clone()) } ``` No pointer chasing, no Arc references, just integer comparisons. --- ## API Reference ### PositionalLineageHash ```rust impl PositionalLineageHash { /// Create from sequence hashes and position pub fn new( current_seq_hash: SequenceHash, parent_seq_hash: Option, position: u64, ) -> Self; /// Get the block position pub fn position(&self) -> u64; /// Get the current block's hash fragment pub fn current_hash_fragment(&self) -> u64; /// Get the parent block's hash fragment pub fn parent_hash_fragment(&self) -> u64; /// Get the encoding mode (0, 1, or 2) pub fn mode(&self) -> u8; /// Get the raw u128 value pub fn as_u128(&self) -> u128; } ``` ### PositionalSequenceHash ```rust impl PositionalSequenceHash { /// Create from components pub fn new( sequence_hash: SequenceHash, position: u64, local_block_hash: BlockHash, ) -> Self; /// Get the sequence hash component pub fn sequence_hash(&self) -> SequenceHash; /// Get the block position pub fn position(&self) -> u64; /// Get the local block hash pub fn local_block_hash(&self) -> BlockHash; } ``` ### PositionalRadixTree ```rust impl PositionalRadixTree { /// Create empty tree pub fn new() -> Self; /// Get/create the sub-map for a hash's position pub fn prefix(&self, key: &K) -> RefMut>; /// Get the sub-map for a specific position pub fn position(&self, position: u64) -> Option>>; /// Count total entries pub fn len(&self) -> usize; } ``` --- ## Future Directions ### Compressed Lineage Chains For very long sequences, consider compressing lineage into skip-list style hashes that encode multiple ancestors. ### Distributed Coordination The value-semantic design enables efficient network transfer - a u128 can be serialized in 16 bytes, enabling distributed caching systems to coordinate block state. ### Integration with External Caching The hash types can serve as keys in external caching systems (Redis, memcached) without serialization overhead. --- ## Summary The `dynamo-tokens` crate demonstrates how careful bit-packing can replace complex data structures with simple value types. By encoding position and lineage information directly in hash values, we achieve: 1. **Zero-cost abstractions**: No heap allocation, no reference counting 2. **Thread safety without locks**: Values can be freely copied and shared 3. **Cache-friendly access**: No pointer chasing, inline storage 4. **Rich semantics**: Position awareness, parent-child relationships The progression from SequenceHash to PositionalLineageHash shows how each layer adds capabilities while maintaining the simplicity of value semantics.