Commits · 0d21f366adeea29ef816ff137f4febc71c2416a7 · OpenDAS / Lmdeploy

21 Jul, 2023 1 commit

[Feature] Support Llama-2 with GQA (#147) · f07b697b

Li Zhang authored Jul 21, 2023

* add GQA for llama2

* fix model conversion

* fix lint & remove dev log

* update news

* minor

* fix allocation size

* fix split_dim for w_qkv.bias

f07b697b

05 Jul, 2023 3 commits

remove tokenizer_path from chat_example and move it to lmdeploy/turbomind (#55) · 61e8d2c6
q.yao authored Jul 05, 2023

61e8d2c6

[Feature] Stats Quantization Parameters for KV Cache (#45) · 3fff964d

pppppM authored Jul 05, 2023

* add cal qparams

* support offload inference

* add collect funtions (mod,weight)

* stats kv scales

* update init

* add user guide

* fix hints

* fix comments & support turbomind format

* update user guide

* fix slice kv cache error & support pileval dataset (used in llm-awq)

* fix wrong num heads slice

* update default dataset

* fix conflict

* fix hints

* fix hints

* add gitignore

3fff964d

Python ffi (#34) · 4fd6e710

q.yao authored Jul 05, 2023



* wip

* wip

* example finish

* fix include and namespace

* wtf

* install lib

* batchize

* update cmake install

* multithread

* fix comment

* fix

* add mmengine

* bind llamamodel

---------
Co-authored-by: grimoire <yaoqian@pjlab.org.cn>

4fd6e710

04 Jul, 2023 1 commit
- support 'input_tokens' in triton_example (#49) · 62e0fa9a
  lvhan028 authored Jul 04, 2023
```
* check-in script for tokenizing a file

* use max_input_len
```
  62e0fa9a
03 Jul, 2023 1 commit
- install triton_example and TransformerTritonBackend to runtime and lib respectively (#39) · bb6f8060
  lvhan028 authored Jul 03, 2023
  
  bb6f8060
01 Jul, 2023 3 commits

Change target tritonfastertransformerbackend to trtonturbomindbackend (#36) · 70e6ab26

lvhan028 authored Jul 01, 2023

* change target tritonfastertransformerbackend to tritonturbomindbackend

* install targets to backends/turbomind

* changge model_dir

70e6ab26

build turbomind (#35) · 35d64462

lvhan028 authored Jul 01, 2023

* build turbomind

* change namespace fastertransformer to turbomind

* change logger name

35d64462

Add lint action (#32) · fe46dac2

AllentDan authored Jul 01, 2023

* temp

* fix lint

* csrc->src

* remove clang-format

* skip .rst

* skip doc

* clang-format

version

version

* mat_B

fe46dac2

29 Jun, 2023 1 commit

use huggingface tokenizer (#26) · 64936449

q.yao authored Jun 29, 2023

* add hf tokenizer

* format

* fix for comment

* don't skip speical tokens

64936449

28 Jun, 2023 1 commit
- fix-gemm-tuning (#24) · 4d42a781
  Li Zhang authored Jun 28, 2023
  
  4d42a781
22 Jun, 2023 1 commit
- remove duplicate model converter script (#13) · ee962784
  lvhan028 authored Jun 22, 2023
```
* remove constraints on model name

* remove duplicate model converter
```
  ee962784
20 Jun, 2023 1 commit
- check-in fastertransformer (#7) · 9efcac38
  Li Zhang authored Jun 20, 2023
```
* add ft code

* gitignore

* fix lint

* revert fmha
```
  9efcac38