Commits · 3e7b6bfd38c74fd715bd89ff20887b47d645637a · OpenDAS / Lmdeploy

05 Jul, 2023 13 commits

lvhan028 authored Jul 05, 2023

* add performance

* use png

* update

* update

* update

* update

* update

3e7b6bfd

lower transformer version <4.30.0 (#66) · adfd81d3
lvhan028 authored Jul 05, 2023

adfd81d3

update internlm‘s chat template (#54) · 3de27ead

lvhan028 authored Jul 05, 2023

* update internlm model

* update

* update

* update

* update

* update temperature, topk and top_p

* update

* update

* loosen log level

3de27ead

Update setup for build python wheel (#61) · d2c9caa4
RunningLeon authored Jul 05, 2023

d2c9caa4
fix build w/o python ffi (#64) · 08252a83
Li Zhang authored Jul 05, 2023

08252a83
add demo gif (#63) · 9d7cd629
AllentDan authored Jul 05, 2023
```
* add demo gif

* add demo gif
```
9d7cd629

fix(kv_qparams.py): zp use min (#59) · ec53d63f

tpoisonooo authored Jul 05, 2023

* fix(kv_qparams.py): zp use min

* revert(qparams.py): revert format

* fix(kv_qparams.py): update formula

ec53d63f

remove tokenizer_path from chat_example and move it to lmdeploy/turbomind (#55) · 61e8d2c6
q.yao authored Jul 05, 2023

61e8d2c6
Update README.md (#57) · da62f428
tpoisonooo authored Jul 05, 2023

da62f428
docs(README): typo (#56) · 7396d8f6
tpoisonooo authored Jul 05, 2023

7396d8f6

[Feature] Stats Quantization Parameters for KV Cache (#45) · 3fff964d

pppppM authored Jul 05, 2023

* add cal qparams

* support offload inference

* add collect funtions (mod,weight)

* stats kv scales

* update init

* add user guide

* fix hints

* fix comments & support turbomind format

* update user guide

* fix slice kv cache error & support pileval dataset (used in llm-awq)

* fix wrong num heads slice

* update default dataset

* fix conflict

* fix hints

* fix hints

* add gitignore

3fff964d

docs(quantization): add more test (#53) · edb6eb86

tpoisonooo authored Jul 05, 2023

* docs(quantization): add more test

* revert(generate.sh): revert ninja

* revert(llama_config.ini): revert empty line

* fix(quantization.md): fix link error

edb6eb86

Python ffi (#34) · 4fd6e710

q.yao authored Jul 05, 2023



* wip

* wip

* example finish

* fix include and namespace

* wtf

* install lib

* batchize

* update cmake install

* multithread

* fix comment

* fix

* add mmengine

* bind llamamodel

---------
Co-authored-by: grimoire <yaoqian@pjlab.org.cn>

4fd6e710

04 Jul, 2023 7 commits
- use format-11.1 (#38) · 5ea40abf
  AllentDan authored Jul 04, 2023
```
* format-11.1

* md-link-config
```
  5ea40abf
- fix model conversion (#51) · 9bbd39b7
  Li Zhang authored Jul 04, 2023
  
  9bbd39b7
- support 'input_tokens' in triton_example (#49) · 62e0fa9a
  lvhan028 authored Jul 04, 2023
```
* check-in script for tokenizing a file

* use max_input_len
```
  62e0fa9a
- docs(README): fix (#50) · 4c303b17
  tpoisonooo authored Jul 04, 2023
  
  4c303b17
- export attn_bias as int type into config (#48) · 0d19a95d
  lvhan028 authored Jul 04, 2023
  
  0d19a95d
- Update quantization.md (#47) · fa7cbc7a
  tpoisonooo authored Jul 04, 2023
  
  fa7cbc7a
- docs(project): add quantization test results (#46) · 197b3ee1
  tpoisonooo authored Jul 04, 2023
```
* docs(README): update description

* docs(project): add quantization test results

* docs(README): reorder

* docs(quantization): add more description

* docs(README): remove openmmlab badge

* docs(README): scale up image

* docs(dir): add zh_cn subdir
```
  197b3ee1
03 Jul, 2023 3 commits
- [Doc] add persistent batch inference GIF (#43) · 9d8949bf
  vansin authored Jul 03, 2023
```
* [Doc] add persistent batch inference

* update

* update

* Update README.md

* Update README_zh-CN.md
```
  9d8949bf
- install triton_example and TransformerTritonBackend to runtime and lib respectively (#39) · bb6f8060
  lvhan028 authored Jul 03, 2023
  
  bb6f8060
- fix(kernel): speed degrade (#41) · 6e58fced
  tpoisonooo authored Jul 03, 2023
```
* feat(template): remote diff

* feat(cmake): use c++17
```
  6e58fced
01 Jul, 2023 5 commits
- change FasterTransformer to TurboMind (#37) · 8aa6eb10
  lvhan028 authored Jul 01, 2023
  
  8aa6eb10
- Change target tritonfastertransformerbackend to trtonturbomindbackend (#36) · 70e6ab26
  lvhan028 authored Jul 01, 2023
```
* change target tritonfastertransformerbackend to tritonturbomindbackend

* install targets to backends/turbomind

* changge model_dir
```
  70e6ab26
- build turbomind (#35) · 35d64462
  lvhan028 authored Jul 01, 2023
```
* build turbomind

* change namespace fastertransformer to turbomind

* change logger name
```
  35d64462
- rename src/fastertransformer to src/turbomind (#33) · 53d2e42c
  lvhan028 authored Jul 01, 2023
  
  53d2e42c
- Add lint action (#32) · fe46dac2
  AllentDan authored Jul 01, 2023
```
* temp

* fix lint

* csrc->src

* remove clang-format

* skip .rst

* skip doc

* clang-format

version

version

* mat_B
```
  fe46dac2
30 Jun, 2023 3 commits
- rename serve/fastertransformer to serve/turbomind (#31) · e8ab4ba3
  lvhan028 authored Jun 30, 2023
```
* rename lmdeploy/serve/fastertransformer to lmdeploy/serve/turbomind

* update

* update
```
  e8ab4ba3
- rename llmdeploy to lmdeploy (#30) · 46f4738c
  lvhan028 authored Jun 30, 2023
```
* change llmdeploy to lmdeploy

* update logo

* update readme
```
  46f4738c
- refactor webui (#29) · 081a6e89
  AllentDan authored Jun 30, 2023
  
  081a6e89
29 Jun, 2023 4 commits
- fix crash when conversation history out of limit (#28) · cb8ac1b0
  lvhan028 authored Jun 29, 2023
  
  cb8ac1b0
- remove cuda architecture from build option (#23) · c16b857b
  lvhan028 authored Jun 29, 2023
  
  c16b857b
- use huggingface tokenizer (#26) · 64936449
  q.yao authored Jun 29, 2023
```
* add hf tokenizer

* format

* fix for comment

* don't skip speical tokens
```
  64936449
- Add webui (#27) · 0cc48011
  AllentDan authored Jun 29, 2023
```
* add webui

* update readme

* resolve comments

* readme
```
  0cc48011
28 Jun, 2023 2 commits

feat(src): add kv cache int8 quantization (#22) · cc93136e

tpoisonooo authored Jun 28, 2023

* feat(src): add int8 and compile passed

* feat(kernels): fix

* feat(llama): update kernel

* feat(src): add debug

* fix(kernel): k_cache use int8_t pointer

* style(llama): clean code

* feat(deploy.py): revert to enable fmha

* style(LlamaV2): clean code

* feat(deploy.py): add default quant policy

cc93136e

fix-gemm-tuning (#24) · 4d42a781
Li Zhang authored Jun 28, 2023

4d42a781

26 Jun, 2023 1 commit
- add gemm tuning (#18) · e357c71f
  Li Zhang authored Jun 26, 2023
  
  e357c71f
25 Jun, 2023 2 commits
- style(doc): README.md · 93604c3f
  tpoisonooo authored Jun 25, 2023
  
  93604c3f
- fix(deploy.py): qkv no bias assertion · e0c7f51b
  tpoisonooo authored Jun 25, 2023
  
  e0c7f51b