Commits · ab1767cf035e5a6f68ae6a744c0d744326ce12ff · OpenDAS / Lmdeploy

10 Nov, 2023 1 commit

Li Zhang authored Nov 10, 2023

* refresh decoder attention kernel

* block-level kv cache

* `BlockManager` & `SequenceManager`

* update

* update

* update

* update

* rename

* GQA support

* fix context length

* GQA dispatch

* kv8

* tune

* async stream cb

* nvtx

* config parsing

* debug

* optimize output cost

* split-k decoding

* minor

* truncate `session_len` by available blocks

* minor

* license

* fix

* dispatch `cp.async`

* fix linking

* fix

* fix deadlock

* guard input length

* correct start offset

* fix prefill chunking

* fix `cache_block_seq_len` param passing

* fix `block_size` fmtstr

* fix output tokens

* fix batch resizing

* fix masking of finished sequences

* add debug util

* free unused block early

* add ntk scaling and logn scaling

* cmake flags

* fix typo

* w4a16 for sm75

* fix msvc build

* fix msvc build

* fix block verification

* fix msvc build

* use `std::shuffle`

* fix lint

* fix lint

* fix lint

* clear incoming buffer

* clear finished requests

* fix batch initialization

* fix typo

* fix typo

* fix comparison

ab1767cf

01 Jul, 2023 4 commits
- Change target tritonfastertransformerbackend to trtonturbomindbackend (#36) · 70e6ab26
  lvhan028 authored Jul 01, 2023
```
* change target tritonfastertransformerbackend to tritonturbomindbackend

* install targets to backends/turbomind

* changge model_dir
```
  70e6ab26
- build turbomind (#35) · 35d64462
  lvhan028 authored Jul 01, 2023
```
* build turbomind

* change namespace fastertransformer to turbomind

* change logger name
```
  35d64462
- rename src/fastertransformer to src/turbomind (#33) · 53d2e42c
  lvhan028 authored Jul 01, 2023
  
  53d2e42c
- Add lint action (#32) · fe46dac2
  AllentDan authored Jul 01, 2023
```
* temp

* fix lint

* csrc->src

* remove clang-format

* skip .rst

* skip doc

* clang-format

version

version

* mat_B
```
  fe46dac2
20 Jun, 2023 1 commit
- check-in fastertransformer (#7) · 9efcac38
  Li Zhang authored Jun 20, 2023
```
* add ft code

* gitignore

* fix lint

* revert fmha
```
  9efcac38