• Nikos Karampatziakis's avatar
    Offloaded KV Cache (#31325) · ca59d6f7
    Nikos Karampatziakis authored
    * Initial implementation of OffloadedCache
    
    * enable usage via cache_implementation
    
    * Address feedback, add tests, remove legacy methods.
    
    * Remove flash-attn, discover synchronization bugs, fix bugs
    
    * Prevent usage in CPU only mode
    
    * Add a section about offloaded KV cache to the docs
    
    * Fix typos in docs
    
    * Clarifications and better explanation of streams
    ca59d6f7
generation_strategies.md 34 KB