Commits · 0d9c6c9d56911ee8e8503614ba599ff0e48253a0 · OpenDAS / Lmdeploy

10 Aug, 2023 1 commit
- Add release note template (#211) · 0d9c6c9d
  Lyu Han authored Aug 10, 2023
```
* add release note template

* add 'impovement' label to the excluding list
```
  0d9c6c9d
07 Aug, 2023 5 commits

[Refactor] Support multi-session chat (#178) · 4bd0b487

WRH authored Aug 07, 2023

* add some dist utils

* add model utils

* add termio and basicstreamer

* typo

* fix world size

* refactor chat and tested llama1

* add internlm adapter and support stoping criteria

* concat with id for internlm

* update docstring

* update and support llama2

* typo

* move docs to docs

* update docstring of session manager

* update docstring

* update docs

* fix accel none in model

* fix and add test for tensor broadcast

* fix session using typing to check type

* add docstrings and comprehensive condition test

* unit test for dist

* fix session

* split unittests of utils

* typo

* update control flow of accel

* move test model

* remove main in unittest

* remove some log

* remove some comments

4bd0b487

bump version to v0.0.3 (#205) · c80f3e49
lvhan028 authored Aug 07, 2023

c80f3e49
Add non-stream inference api for chatbot (#200) · 3de0dbb6
lvhan028 authored Aug 07, 2023
```
* add non-stream inference api for chatbot

* update according to reviewer's comments
```
3de0dbb6
[Feature] Add script to split HuggingFace model to the smallest sharded checkpoints (#199) · b7e7e668
LZHgrla authored Aug 07, 2023
```
* add get_small_sharded_hf.py

* fix pre-commit
```
b7e7e668
Improve postprocessing in TIS serving by applying Incremental de-tokenizing (#197) · 0ed1e4d4
lvhan028 authored Aug 07, 2023
```
* change to incremental decoding

* update
```
0ed1e4d4

04 Aug, 2023 1 commit

Support serving with gradio without communicating to TIS (#162) · 18c386d9

AllentDan authored Aug 04, 2023



* use local model for webui

* local model for app.py

* lint

* remove print

* add seed

* comments

* fixed seesion_id

* support turbomind batch inference

* update app.py

* lint and docstring

* move webui to serve/gradio

* update doc

* update doc

* update docstring and rmeove print conversition

* log

* Update docs/zh_cn/build.md
Co-authored-by: Chen Xin <xinchen.tju@gmail.com>

* Update docs/en/build.md
Co-authored-by: Chen Xin <xinchen.tju@gmail.com>

* use latest gradio

* fix

* replace partial with InterFace

* use host ip instead of coolie

---------
Co-authored-by: Chen Xin <xinchen.tju@gmail.com>

18c386d9

03 Aug, 2023 3 commits
- Move lmdeploy/turbomind/utils.py to lmdeploy/utils.py (#191) · 7a2128be
  lvhan028 authored Aug 03, 2023
  
  7a2128be
- Fix build test error and move turbmind csrc test cases to `tests/csrc` (#188) · 44a85546
  lvhan028 authored Aug 03, 2023
```
* fix build tests failure

* move src test cases to tests/csrc
```
  44a85546
- [Docs] Translate turbomind.md into Chinese (#173) · 5545bbc5
  Xin Li authored Aug 03, 2023
```
* translate turbomind

* keep persistent batching

* revised

* revise
```
  5545bbc5
01 Aug, 2023 1 commit
- Fix typo in README.md (#187) · 8f80cb5f
  tpoisonooo authored Aug 01, 2023
  
  8f80cb5f
31 Jul, 2023 4 commits
- Support Runtime tensor parallelism (#158) · 4767b04d
  q.yao authored Jul 31, 2023
```
* works on interlm and vicuna

* support GQA

* remove comment

* update readme, add logger, default tp=1

* remove log
```
  4767b04d
- [Fix] Remove unused code to reduce binary size (#181) · 981a4610
  Li Zhang authored Jul 31, 2023
```
* clean-up

* fix lint

* fix lint
```
  981a4610
- Add issue and PR templates (#184) · 83697422
  lvhan028 authored Jul 31, 2023
  
  83697422
- Fix typo in profile_serving.py (#183) · 09c624ce
  del-zhenwu authored Jul 31, 2023
  
  09c624ce
28 Jul, 2023 1 commit

bump version to v0.0.2 (#177) · 7e0b75bb

lvhan028 authored Jul 28, 2023

* bump version to v0.0.2

* fix command

* update installation and inference section

7e0b75bb

27 Jul, 2023 4 commits

Add pypi ci (#170) · 859658eb
Chen Xin authored Jul 27, 2023
```
* add pypi ci

* fix build
```
859658eb
[Doc] add Twitter link (#175) · c1c1353d
vansin authored Jul 27, 2023
```
* Doc: add Twitter link

* Doc: add a space
```
c1c1353d
add model_name param for chatbot (#174) · 7bc8d171
MaxMatthew authored Jul 27, 2023

7bc8d171

Add manylinux builder (#164) · b9004712

Chen Xin authored Jul 27, 2023



* update builder

* remove root permission

* update readme

* update setup.py

* add install cuda 12.1 script

* use generate.sh

* add nccl to install_requires

* update README.md

* fix lint

* update setup.py

---------
Co-authored-by: chenxin <chenxin@pjlab.org.cn>

b9004712

26 Jul, 2023 3 commits
- [Docs] Translate the quantization.md (#166) · 3df43e8c
  Xin Li authored Jul 26, 2023
```
* translate quantization doc

* revise
```
  3df43e8c
- docs(README): disable ECC (#159) · 63bd5916
  tpoisonooo authored Jul 26, 2023
```
* Update README_zh-CN.md

* Update README.md

* Update README_zh-CN.md

* Update README.md

* Update README_zh-CN.md
```
  63bd5916
- Add triton_models to whl package (#163) · e7bc11b4
  Chen Xin authored Jul 26, 2023
```
* defer symlink

* fix lint
```
  e7bc11b4
25 Jul, 2023 2 commits
- support fmha gqa (#160) · 5ed6bb59
  q.yao authored Jul 25, 2023
```
Co-authored-by: grimoire <yaoqian@pjlab.org.cn>
```
  5ed6bb59
- fix getting package root path error in python3.9 (#157) · 5203c850
  lvhan028 authored Jul 25, 2023
  
  5203c850
24 Jul, 2023 2 commits
- checkin benchmark on real conversation data (#156) · 0bd1fa40
  lvhan028 authored Jul 24, 2023
```
* checkin benchmark on real conversation data

* change resolution

* update
```
  0bd1fa40
- [Feature] decode-only forward pass (#153) · 0cc9d095
  Li Zhang authored Jul 24, 2023
```
* decode only forward pass

* fix lint

* batch embedding
```
  0cc9d095
23 Jul, 2023 1 commit

Refactor the chat template of supported models using factory pattern (#144) · 7b470f07

lvhan028 authored Jul 23, 2023

* refactor model.py and support baichuan-7b

* remove model_name

* remove hard session_len

* export tokenizer.py to target dir

* remove model_name from client

* remove model_name

* update

* correct throughput equation

* fix session.response

* update serving.md

* update readme

* update according to review comments

* update

* update

* update

* update

7b470f07

22 Jul, 2023 1 commit

add profile throughput benchmark (#146) · 2067862d

q.yao authored Jul 22, 2023



* add profile throughput benchmark

* add output only throughput

* update req/min

* update benckmark readme

* fix lint

---------
Co-authored-by: grimoire <yaoqian@pjlab.org.cn>

2067862d

21 Jul, 2023 4 commits

remove slicing reponse and add resume api (#154) · b728064e

MaxMatthew authored Jul 21, 2023

* Fix lmdeploy.serve.turbomind bug
* add __init__.py for turbomind
* add resume function
* fix the assignment for session.response

* Fix code style

b728064e

[Feature] Support Llama-2 with GQA (#147) · f07b697b

Li Zhang authored Jul 21, 2023

* add GQA for llama2

* fix model conversion

* fix lint & remove dev log

* update news

* minor

* fix allocation size

* fix split_dim for w_qkv.bias

f07b697b

[Fix] Support DeepSpeed on autoTP and kernel injection (#138) · 2a475478

Kevin Wang authored Jul 21, 2023



* [Fix] fix issue 127

* 优化防止接口更改

* 如果没有deepspeed用python启动需要手动加载到GPU上

* rollback the changes about max_out_tokens and delelte torch > 2.0 if statement

* support kernel injection with customized deepspeed

* spelling error

* Update chat.py

---------
Co-authored-by: wangruohui <12756472+wangruohui@users.noreply.github.com>

2a475478

Add github action for publishing docker image (#148) · 1a665a63
RunningLeon authored Jul 21, 2023
```
* add docker action

* update

* fix

* update
```
1a665a63

20 Jul, 2023 3 commits

add llama2 chat template (#140) · 406f8c9f

q.yao authored Jul 20, 2023



* add llama2 template

* update readme and fix lint

* update readme

* add bos

* add bos

* remove bos

* Update model.py

---------
Co-authored-by: grimoire <yaoqian@pjlab.org.cn>

406f8c9f

return carriage cause overwriting at the same line (#143) · 8ba2d7c5
WRH authored Jul 20, 2023

8ba2d7c5

[Fix] Fix bug for issues #141 (#145) · cde17e73

humu789 authored Jul 20, 2023

* fix get_dataset error

* fix lint

* add datasets to requirements.txt

* update some msci

cde17e73

19 Jul, 2023 3 commits
- fix the offset during streaming chat (#142) · 289ffa3c
  lvhan028 authored Jul 19, 2023
  
  289ffa3c
- Fix tensor-parallel inference of internlm with bias (#135) · 79595cd1
  q.yao authored Jul 19, 2023
```
* remove copy

* repetition_penalty=1

* add repetition_penalty to chat args

* update readme

* update readme
```
  79595cd1
- Fix concatenate bug in benchmark serving script (#134) · 39350031
  rollroll90 authored Jul 19, 2023
  
  39350031
18 Jul, 2023 1 commit

update doc and requirements.txt (#119) · 4970d798

AllentDan authored Jul 18, 2023



* update requirements

* update transformers version

* lint

* comments

* lint

* update requirements

* remove setup_requires

---------
Co-authored-by: dongchunyu <dongchunyu@pjlab.org.cn>

4970d798