Commits · 3fff964d09b067637355524d99dd4e0365c0ef10 · OpenDAS / Lmdeploy

05 Jul, 2023 1 commit

[Feature] Stats Quantization Parameters for KV Cache (#45) · 3fff964d

pppppM authored Jul 05, 2023

* add cal qparams

* support offload inference

* add collect funtions (mod,weight)

* stats kv scales

* update init

* add user guide

* fix hints

* fix comments & support turbomind format

* update user guide

* fix slice kv cache error & support pileval dataset (used in llm-awq)

* fix wrong num heads slice

* update default dataset

* fix conflict

* fix hints

* fix hints

* add gitignore

3fff964d

04 Jul, 2023 2 commits

docs(README): fix (#50) · 4c303b17
tpoisonooo authored Jul 04, 2023

4c303b17

docs(project): add quantization test results (#46) · 197b3ee1

tpoisonooo authored Jul 04, 2023

* docs(README): update description

* docs(project): add quantization test results

* docs(README): reorder

* docs(quantization): add more description

* docs(README): remove openmmlab badge

* docs(README): scale up image

* docs(dir): add zh_cn subdir

197b3ee1

03 Jul, 2023 1 commit

[Doc] add persistent batch inference GIF (#43) · 9d8949bf

vansin authored Jul 03, 2023

* [Doc] add persistent batch inference

* update

* update

* Update README.md

* Update README_zh-CN.md

9d8949bf

01 Jul, 2023 2 commits

Change target tritonfastertransformerbackend to trtonturbomindbackend (#36) · 70e6ab26

lvhan028 authored Jul 01, 2023

* change target tritonfastertransformerbackend to tritonturbomindbackend

* install targets to backends/turbomind

* changge model_dir

70e6ab26

build turbomind (#35) · 35d64462

lvhan028 authored Jul 01, 2023

* build turbomind

* change namespace fastertransformer to turbomind

* change logger name

35d64462

30 Jun, 2023 2 commits
- rename llmdeploy to lmdeploy (#30) · 46f4738c
  lvhan028 authored Jun 30, 2023
```
* change llmdeploy to lmdeploy

* update logo

* update readme
```
  46f4738c
- refactor webui (#29) · 081a6e89
  AllentDan authored Jun 30, 2023
  
  081a6e89
29 Jun, 2023 1 commit
- Add webui (#27) · 0cc48011
  AllentDan authored Jun 29, 2023
```
* add webui

* update readme

* resolve comments

* readme
```
  0cc48011
28 Jun, 2023 1 commit

feat(src): add kv cache int8 quantization (#22) · cc93136e

tpoisonooo authored Jun 28, 2023

* feat(src): add int8 and compile passed

* feat(kernels): fix

* feat(llama): update kernel

* feat(src): add debug

* fix(kernel): k_cache use int8_t pointer

* style(llama): clean code

* feat(deploy.py): revert to enable fmha

* style(LlamaV2): clean code

* feat(deploy.py): add default quant policy

cc93136e

20 Jun, 2023 2 commits
- check-in dockerfile (#8) · bd2e0bf7
  lvhan028 authored Jun 20, 2023
```
* check-in dockerfile

* check-in dockerfile
```
  bd2e0bf7
- Add readme (#6) · 720fc533
  lvhan028 authored Jun 20, 2023
```
* add logo

* update readme
```
  720fc533
18 Jun, 2023 1 commit
- init commit · 7258c786
  lvhan028 authored Jun 18, 2023
  
  7258c786