Commits · 7e0b75bbb856ddf062a0ee0ca491b6459726aeea · ModelZoo / Qwen_lmdeploy

28 Jul, 2023 1 commit

bump version to v0.0.2 (#177) · 7e0b75bb

lvhan028 authored Jul 28, 2023

* bump version to v0.0.2

* fix command

* update installation and inference section

7e0b75bb

27 Jul, 2023 1 commit
- add model_name param for chatbot (#174) · 7bc8d171
  MaxMatthew authored Jul 27, 2023
  
  7bc8d171
26 Jul, 2023 1 commit
- Add triton_models to whl package (#163) · e7bc11b4
  Chen Xin authored Jul 26, 2023
```
* defer symlink

* fix lint
```
  e7bc11b4
25 Jul, 2023 2 commits
- support fmha gqa (#160) · 5ed6bb59
  q.yao authored Jul 25, 2023
```
Co-authored-by: grimoire <yaoqian@pjlab.org.cn>
```
  5ed6bb59
- fix getting package root path error in python3.9 (#157) · 5203c850
  lvhan028 authored Jul 25, 2023
  
  5203c850
24 Jul, 2023 1 commit
- [Feature] decode-only forward pass (#153) · 0cc9d095
  Li Zhang authored Jul 24, 2023
```
* decode only forward pass

* fix lint

* batch embedding
```
  0cc9d095
23 Jul, 2023 1 commit

Refactor the chat template of supported models using factory pattern (#144) · 7b470f07

lvhan028 authored Jul 23, 2023

* refactor model.py and support baichuan-7b

* remove model_name

* remove hard session_len

* export tokenizer.py to target dir

* remove model_name from client

* remove model_name

* update

* correct throughput equation

* fix session.response

* update serving.md

* update readme

* update according to review comments

* update

* update

* update

* update

7b470f07

22 Jul, 2023 1 commit

add profile throughput benchmark (#146) · 2067862d

q.yao authored Jul 22, 2023



* add profile throughput benchmark

* add output only throughput

* update req/min

* update benckmark readme

* fix lint

---------
Co-authored-by: grimoire <yaoqian@pjlab.org.cn>

2067862d

21 Jul, 2023 3 commits

remove slicing reponse and add resume api (#154) · b728064e

MaxMatthew authored Jul 21, 2023

* Fix lmdeploy.serve.turbomind bug
* add __init__.py for turbomind
* add resume function
* fix the assignment for session.response

* Fix code style

b728064e

[Feature] Support Llama-2 with GQA (#147) · f07b697b

Li Zhang authored Jul 21, 2023

* add GQA for llama2

* fix model conversion

* fix lint & remove dev log

* update news

* minor

* fix allocation size

* fix split_dim for w_qkv.bias

f07b697b

[Fix] Support DeepSpeed on autoTP and kernel injection (#138) · 2a475478

Kevin Wang authored Jul 21, 2023



* [Fix] fix issue 127

* 优化防止接口更改

* 如果没有deepspeed用python启动需要手动加载到GPU上

* rollback the changes about max_out_tokens and delelte torch > 2.0 if statement

* support kernel injection with customized deepspeed

* spelling error

* Update chat.py

---------
Co-authored-by: wangruohui <12756472+wangruohui@users.noreply.github.com>

2a475478

20 Jul, 2023 3 commits

add llama2 chat template (#140) · 406f8c9f

q.yao authored Jul 20, 2023



* add llama2 template

* update readme and fix lint

* update readme

* add bos

* add bos

* remove bos

* Update model.py

---------
Co-authored-by: grimoire <yaoqian@pjlab.org.cn>

406f8c9f

return carriage cause overwriting at the same line (#143) · 8ba2d7c5
WRH authored Jul 20, 2023

8ba2d7c5

[Fix] Fix bug for issues #141 (#145) · cde17e73

humu789 authored Jul 20, 2023

* fix get_dataset error

* fix lint

* add datasets to requirements.txt

* update some msci

cde17e73

19 Jul, 2023 2 commits
- fix the offset during streaming chat (#142) · 289ffa3c
  lvhan028 authored Jul 19, 2023
  
  289ffa3c
- Fix tensor-parallel inference of internlm with bias (#135) · 79595cd1
  q.yao authored Jul 19, 2023
```
* remove copy

* repetition_penalty=1

* add repetition_penalty to chat args

* update readme

* update readme
```
  79595cd1
18 Jul, 2023 3 commits

update doc and requirements.txt (#119) · 4970d798

AllentDan authored Jul 18, 2023



* update requirements

* update transformers version

* lint

* comments

* lint

* update requirements

* remove setup_requires

---------
Co-authored-by: dongchunyu <dongchunyu@pjlab.org.cn>

4970d798

print info copy-paste error (#133) · 8664946d
Kevin Wang authored Jul 18, 2023

8664946d

Tensor Parallel python api (#82) · 7cbfe2ea

q.yao authored Jul 18, 2023

* wip

* profile disable tp

* fix profile

* lint

* fix dlpack

* remove comment

* add tp flag

* add session len check

* add eos

* remove tp and session len inputs

* warp tokenizer

* multithread load weight

* update profile

* refactor tokenizer

* remove pre/post process

* remove mpi4py requirement

* remove

* remove bind

* remove mpi requirement

* check backend_tokenizer

7cbfe2ea

17 Jul, 2023 1 commit
- [Fix] fix attempted_relative_import (#125) · db3b986b
  Kevin Wang authored Jul 17, 2023
```
* [Fix] fix attempted_relative_import

* use try...except...else
```
  db3b986b
14 Jul, 2023 3 commits
- add puyu model for internal use (#105) · 4cfb118f
  lvhan028 authored Jul 14, 2023
```
* add puyu model for internal use

* get/set session

* update

* add docstring
```
  4cfb118f
- miss <bos> of InternLM chat template (#112) · 86e70591
  lvhan028 authored Jul 14, 2023
  
  86e70591
- fix warnings.warn (#111) · ad1a8638
  A60 authored Jul 14, 2023
  
  ad1a8638
12 Jul, 2023 2 commits
- add docstring for turbomind (#97) · 955c019c
  lvhan028 authored Jul 12, 2023
```
* add docstring

* update

* update

* fix according to review results
```
  955c019c
- [Improve] Add docstrings to pytorch submodule (#93) · b6dc35fe
  WRH authored Jul 12, 2023
```
* add some docstrings.

* update docstring.

fix

* ignore magic methods
```
  b6dc35fe
11 Jul, 2023 3 commits
- set chuk_size=1 andxport tp to config.ini (#94) · 69b6eabe
  lvhan028 authored Jul 11, 2023
  
  69b6eabe
- feat(deploy.py): support w pack qkv (#83) · ac638b37
  tpoisonooo authored Jul 11, 2023
```
* feat(deploy.py): support w pack qkv
```
  ac638b37
- [Fix] Remaining Issues in #19 (#75) · a6ac981d
  WRH authored Jul 11, 2023
```
* previous merged

* add chinese

* support torch<2

* add a docstring

* fix typo

* rename torch submodule

* rename to pytorch

* rename in readme
```
  a6ac981d
06 Jul, 2023 4 commits

[Feature] Add a torch client (#19) · 009075d8

WRH authored Jul 06, 2023

* draft torch client

* deal with space of tokenizer

* support tensor parallel

* fix

* fix

* move folder

* move instruction to readme

* move to torch/

* rename client to chat

* very bad response

* stash

* rename streamer

* support internlm

* change default args

* remove test

* improve instructions

* remove module docstring

* decrease header level of torch model

009075d8

Streaming output (#71) · 74a4f3c9

q.yao authored Jul 06, 2023



* streaming-output

* fix end

* fix profile

* support chinese streaming

* lint

* update chat

* lint

* fix benchmark

---------
Co-authored-by: grimoire <yaoqian@pjlab.org.cn>

74a4f3c9

fix(project): interlm run error (#69) · 22d403f5
tpoisonooo authored Jul 06, 2023

22d403f5
add internlm url (#67) · 7c6edc83
pppppM authored Jul 06, 2023

7c6edc83

05 Jul, 2023 5 commits

update internlm‘s chat template (#54) · 3de27ead

lvhan028 authored Jul 05, 2023

* update internlm model

* update

* update

* update

* update

* update temperature, topk and top_p

* update

* update

* loosen log level

3de27ead

fix(kv_qparams.py): zp use min (#59) · ec53d63f

tpoisonooo authored Jul 05, 2023

* fix(kv_qparams.py): zp use min

* revert(qparams.py): revert format

* fix(kv_qparams.py): update formula

ec53d63f

remove tokenizer_path from chat_example and move it to lmdeploy/turbomind (#55) · 61e8d2c6
q.yao authored Jul 05, 2023

61e8d2c6

[Feature] Stats Quantization Parameters for KV Cache (#45) · 3fff964d

pppppM authored Jul 05, 2023

* add cal qparams

* support offload inference

* add collect funtions (mod,weight)

* stats kv scales

* update init

* add user guide

* fix hints

* fix comments & support turbomind format

* update user guide

* fix slice kv cache error & support pileval dataset (used in llm-awq)

* fix wrong num heads slice

* update default dataset

* fix conflict

* fix hints

* fix hints

* add gitignore

3fff964d

Python ffi (#34) · 4fd6e710

q.yao authored Jul 05, 2023



* wip

* wip

* example finish

* fix include and namespace

* wtf

* install lib

* batchize

* update cmake install

* multithread

* fix comment

* fix

* add mmengine

* bind llamamodel

---------
Co-authored-by: grimoire <yaoqian@pjlab.org.cn>

4fd6e710

04 Jul, 2023 2 commits
- fix model conversion (#51) · 9bbd39b7
  Li Zhang authored Jul 04, 2023
  
  9bbd39b7
- export attn_bias as int type into config (#48) · 0d19a95d
  lvhan028 authored Jul 04, 2023
  
  0d19a95d
03 Jul, 2023 1 commit
- install triton_example and TransformerTritonBackend to runtime and lib respectively (#39) · bb6f8060
  lvhan028 authored Jul 03, 2023
  
  bb6f8060