Commits · 72869ef89f99f3b081e00110f5ae3a4b4d7eaf01 · OpenDAS / Lmdeploy

12 Dec, 2023 1 commit
- fix cache verification (#821) · 72869ef8
  Li Zhang authored Dec 12, 2023
  
  72869ef8
11 Dec, 2023 1 commit
- Simplify block manager (#812) · a54b16a2
  Li Zhang authored Dec 11, 2023
```
* simplify block manager

* fix lint
```
  a54b16a2
10 Nov, 2023 1 commit

Li Zhang authored Nov 10, 2023

* refresh decoder attention kernel

* block-level kv cache

* `BlockManager` & `SequenceManager`

* update

* update

* update

* update

* rename

* GQA support

* fix context length

* GQA dispatch

* kv8

* tune

* async stream cb

* nvtx

* config parsing

* debug

* optimize output cost

* split-k decoding

* minor

* truncate `session_len` by available blocks

* minor

* license

* fix

* dispatch `cp.async`

* fix linking

* fix

* fix deadlock

* guard input length

* correct start offset

* fix prefill chunking

* fix `cache_block_seq_len` param passing

* fix `block_size` fmtstr

* fix output tokens

* fix batch resizing

* fix masking of finished sequences

* add debug util

* free unused block early

* add ntk scaling and logn scaling

* cmake flags

* fix typo

* w4a16 for sm75

* fix msvc build

* fix msvc build

* fix block verification

* fix msvc build

* use `std::shuffle`

* fix lint

* fix lint

* fix lint

* clear incoming buffer

* clear finished requests

* fix batch initialization

* fix typo

* fix typo

* fix comparison

ab1767cf