Commits · d3dbe1790783b98e5ad2d1a7b761bfa6fa8df169 · OpenDAS / Lmdeploy

11 Aug, 2023 1 commit

pppppM authored Aug 11, 2023

* support kv cache offload

* add dataloader docstring

* complete gitignore

* refactor collect mod fn

* add calibration

* fix lint

* add observers and quantizers

* fix lints

* add global available mixin

* fix lints

* split batch inference

* support smoothquant and awq

* update export kv scales

* fix lints

* fix some bugs

* update weight only usage

* update usage

* auto mapping and support smooth internlm

* trust remote code

* fix num head key error

* fix bias error

* align shape and pack order with llm-awq

* modified according to LZHgrla's comments.

* update gitignore

* fix kv qparams export error

* update usage

* decouple calibrate and awq

* update docstrings

* update api name

* update readme

* update readme

* update readme

* update readme

* update kv_qparams and readme

* fix typos

d3dbe179

07 Aug, 2023 1 commit

[Refactor] Support multi-session chat (#178) · 4bd0b487

WRH authored Aug 07, 2023

* add some dist utils

* add model utils

* add termio and basicstreamer

* typo

* fix world size

* refactor chat and tested llama1

* add internlm adapter and support stoping criteria

* concat with id for internlm

* update docstring

* update and support llama2

* typo

* move docs to docs

* update docstring of session manager

* update docstring

* update docs

* fix accel none in model

* fix and add test for tensor broadcast

* fix session using typing to check type

* add docstrings and comprehensive condition test

* unit test for dist

* fix session

* split unittests of utils

* typo

* update control flow of accel

* move test model

* remove main in unittest

* remove some log

* remove some comments

4bd0b487

04 Aug, 2023 1 commit

Support serving with gradio without communicating to TIS (#162) · 18c386d9

AllentDan authored Aug 04, 2023



* use local model for webui

* local model for app.py

* lint

* remove print

* add seed

* comments

* fixed seesion_id

* support turbomind batch inference

* update app.py

* lint and docstring

* move webui to serve/gradio

* update doc

* update doc

* update docstring and rmeove print conversition

* log

* Update docs/zh_cn/build.md
Co-authored-by: Chen Xin <xinchen.tju@gmail.com>

* Update docs/en/build.md
Co-authored-by: Chen Xin <xinchen.tju@gmail.com>

* use latest gradio

* fix

* replace partial with InterFace

* use host ip instead of coolie

---------
Co-authored-by: Chen Xin <xinchen.tju@gmail.com>

18c386d9

01 Aug, 2023 1 commit
- Fix typo in README.md (#187) · 8f80cb5f
  tpoisonooo authored Aug 01, 2023
  
  8f80cb5f
31 Jul, 2023 1 commit

Support Runtime tensor parallelism (#158) · 4767b04d

q.yao authored Jul 31, 2023

* works on interlm and vicuna

* support GQA

* remove comment

* update readme, add logger, default tp=1

* remove log

4767b04d

28 Jul, 2023 1 commit

bump version to v0.0.2 (#177) · 7e0b75bb

lvhan028 authored Jul 28, 2023

* bump version to v0.0.2

* fix command

* update installation and inference section

7e0b75bb

27 Jul, 2023 1 commit
- [Doc] add Twitter link (#175) · c1c1353d
  vansin authored Jul 27, 2023
```
* Doc: add Twitter link

* Doc: add a space
```
  c1c1353d
26 Jul, 2023 2 commits
- [Docs] Translate the quantization.md (#166) · 3df43e8c
  Xin Li authored Jul 26, 2023
```
* translate quantization doc

* revise
```
  3df43e8c
- docs(README): disable ECC (#159) · 63bd5916
  tpoisonooo authored Jul 26, 2023
```
* Update README_zh-CN.md

* Update README.md

* Update README_zh-CN.md

* Update README.md

* Update README_zh-CN.md
```
  63bd5916
24 Jul, 2023 1 commit
- checkin benchmark on real conversation data (#156) · 0bd1fa40
  lvhan028 authored Jul 24, 2023
```
* checkin benchmark on real conversation data

* change resolution

* update
```
  0bd1fa40
23 Jul, 2023 1 commit

Refactor the chat template of supported models using factory pattern (#144) · 7b470f07

lvhan028 authored Jul 23, 2023

* refactor model.py and support baichuan-7b

* remove model_name

* remove hard session_len

* export tokenizer.py to target dir

* remove model_name from client

* remove model_name

* update

* correct throughput equation

* fix session.response

* update serving.md

* update readme

* update according to review comments

* update

* update

* update

* update

7b470f07

21 Jul, 2023 1 commit

[Feature] Support Llama-2 with GQA (#147) · f07b697b

Li Zhang authored Jul 21, 2023

* add GQA for llama2

* fix model conversion

* fix lint & remove dev log

* update news

* minor

* fix allocation size

* fix split_dim for w_qkv.bias

f07b697b

20 Jul, 2023 1 commit

add llama2 chat template (#140) · 406f8c9f

q.yao authored Jul 20, 2023



* add llama2 template

* update readme and fix lint

* update readme

* add bos

* add bos

* remove bos

* Update model.py

---------
Co-authored-by: grimoire <yaoqian@pjlab.org.cn>

406f8c9f

19 Jul, 2023 1 commit

Fix tensor-parallel inference of internlm with bias (#135) · 79595cd1

q.yao authored Jul 19, 2023

* remove copy

* repetition_penalty=1

* add repetition_penalty to chat args

* update readme

* update readme

79595cd1

18 Jul, 2023 1 commit

update doc and requirements.txt (#119) · 4970d798

AllentDan authored Jul 18, 2023



* update requirements

* update transformers version

* lint

* comments

* lint

* update requirements

* remove setup_requires

---------
Co-authored-by: dongchunyu <dongchunyu@pjlab.org.cn>

4970d798

17 Jul, 2023 3 commits
- docs: fix typo (#123) · 9eca5cbe
  vansin authored Jul 17, 2023
```
* docs: fix doc

* fix: fix lint
```
  9eca5cbe
- [bugfix] Fix some docs' bug in 'serving' (#109) · 169d8c7f
  Jaylin Lee authored Jul 17, 2023
```
* [bugfix] Fix some docs' bug in 'serving'

* [bugfix] Fix some docs' bug in 'serving'
```
  169d8c7f
- [doc] use internlm-chat-7b (#124) · b6bb8ce2
  del-zhenwu authored Jul 17, 2023
```
* Update README.md: use internlm-chat-7b

* Update README_zh-CN.md: use intern-chat-7b
```
  b6bb8ce2
14 Jul, 2023 1 commit

[Doc] update discord and wechat link (#120) · db282626

vansin authored Jul 14, 2023

* Doc: update discord and wechat link

* Doc: update discord and wechat link

* [Doc] add discord and wechat link

* [Doc] add discord and wechat link

* [Doc] add discord and wechat link

* [Doc] add discord and wechat link

db282626

11 Jul, 2023 2 commits

docs(serving.md): typo (#92) · 4db08045
tpoisonooo authored Jul 11, 2023
```
* docs(serving.md): typo

* docs(README): quantization
```
4db08045

[Fix] Remaining Issues in #19 (#75) · a6ac981d

WRH authored Jul 11, 2023

* previous merged

* add chinese

* support torch<2

* add a docstring

* fix typo

* rename torch submodule

* rename to pytorch

* rename in readme

a6ac981d

06 Jul, 2023 6 commits

update benchmark image (#77) · 050a2120
pppppM authored Jul 06, 2023
```
* update benchmark image

* update image url
```
050a2120

[Feature] Add a torch client (#19) · 009075d8

WRH authored Jul 06, 2023

* draft torch client

* deal with space of tokenizer

* support tensor parallel

* fix

* fix

* move folder

* move instruction to readme

* move to torch/

* rename client to chat

* very bad response

* stash

* rename streamer

* support internlm

* change default args

* remove test

* improve instructions

* remove module docstring

* decrease header level of torch model

009075d8

Update .gitignore (#70) · b2393467
tpoisonooo authored Jul 06, 2023
```
* docs(README): fix script
```
b2393467
update image url (#72) · f4b7f9a6
pppppM authored Jul 06, 2023

f4b7f9a6
fix(project): interlm run error (#69) · 22d403f5
tpoisonooo authored Jul 06, 2023

22d403f5
add internlm url (#67) · 7c6edc83
pppppM authored Jul 06, 2023

7c6edc83

05 Jul, 2023 5 commits

improve readme (#52) · 3e7b6bfd

lvhan028 authored Jul 05, 2023

* add performance

* use png

* update

* update

* update

* update

* update

3e7b6bfd

add demo gif (#63) · 9d7cd629
AllentDan authored Jul 05, 2023
```
* add demo gif

* add demo gif
```
9d7cd629
Update README.md (#57) · da62f428
tpoisonooo authored Jul 05, 2023

da62f428
docs(README): typo (#56) · 7396d8f6
tpoisonooo authored Jul 05, 2023

7396d8f6

[Feature] Stats Quantization Parameters for KV Cache (#45) · 3fff964d

pppppM authored Jul 05, 2023

* add cal qparams

* support offload inference

* add collect funtions (mod,weight)

* stats kv scales

* update init

* add user guide

* fix hints

* fix comments & support turbomind format

* update user guide

* fix slice kv cache error & support pileval dataset (used in llm-awq)

* fix wrong num heads slice

* update default dataset

* fix conflict

* fix hints

* fix hints

* add gitignore

3fff964d

04 Jul, 2023 2 commits

docs(README): fix (#50) · 4c303b17
tpoisonooo authored Jul 04, 2023

4c303b17

docs(project): add quantization test results (#46) · 197b3ee1

tpoisonooo authored Jul 04, 2023

* docs(README): update description

* docs(project): add quantization test results

* docs(README): reorder

* docs(quantization): add more description

* docs(README): remove openmmlab badge

* docs(README): scale up image

* docs(dir): add zh_cn subdir

197b3ee1

03 Jul, 2023 1 commit

[Doc] add persistent batch inference GIF (#43) · 9d8949bf

vansin authored Jul 03, 2023

* [Doc] add persistent batch inference

* update

* update

* Update README.md

* Update README_zh-CN.md

9d8949bf

01 Jul, 2023 2 commits

Change target tritonfastertransformerbackend to trtonturbomindbackend (#36) · 70e6ab26

lvhan028 authored Jul 01, 2023

* change target tritonfastertransformerbackend to tritonturbomindbackend

* install targets to backends/turbomind

* changge model_dir

70e6ab26

build turbomind (#35) · 35d64462

lvhan028 authored Jul 01, 2023

* build turbomind

* change namespace fastertransformer to turbomind

* change logger name

35d64462

30 Jun, 2023 2 commits
- rename llmdeploy to lmdeploy (#30) · 46f4738c
  lvhan028 authored Jun 30, 2023
```
* change llmdeploy to lmdeploy

* update logo

* update readme
```
  46f4738c
- refactor webui (#29) · 081a6e89
  AllentDan authored Jun 30, 2023
  
  081a6e89
29 Jun, 2023 1 commit
- Add webui (#27) · 0cc48011
  AllentDan authored Jun 29, 2023
```
* add webui

* update readme

* resolve comments

* readme
```
  0cc48011