• Li Zhang's avatar
    TurboMind 2 (#590) · ab1767cf
    Li Zhang authored
    * refresh decoder attention kernel
    
    * block-level kv cache
    
    * `BlockManager` & `SequenceManager`
    
    * update
    
    * update
    
    * update
    
    * update
    
    * rename
    
    * GQA support
    
    * fix context length
    
    * GQA dispatch
    
    * kv8
    
    * tune
    
    * async stream cb
    
    * nvtx
    
    * config parsing
    
    * debug
    
    * optimize output cost
    
    * split-k decoding
    
    * minor
    
    * truncate `session_len` by available blocks
    
    * minor
    
    * license
    
    * fix
    
    * dispatch `cp.async`
    
    * fix linking
    
    * fix
    
    * fix deadlock
    
    * guard input length
    
    * correct start offset
    
    * fix prefill chunking
    
    * fix `cache_block_seq_len` param passing
    
    * fix `block_size` fmtstr
    
    * fix output tokens
    
    * fix batch resizing
    
    * fix masking of finished sequences
    
    * add debug util
    
    * free unused block early
    
    * add ntk scaling and logn scaling
    
    * cmake flags
    
    * fix typo
    
    * w4a16 for sm75
    
    * fix msvc build
    
    * fix msvc build
    
    * fix block verification
    
    * fix msvc build
    
    * use `std::shuffle`
    
    * fix lint
    
    * fix lint
    
    * fix lint
    
    * clear incoming buffer
    
    * clear finished requests
    
    * fix batch initialization
    
    * fix typo
    
    * fix typo
    
    * fix comparison
    ab1767cf
SequenceManager.h 3.82 KB