Commits · 169d5169fe4f805f39eef4a5b0aa2fe480190afe · wangkaixiong / Qwen_lmdeploy

25 Oct, 2023 1 commit

Add more user-friendly CLI (#541) · 169d5169

RunningLeon authored Oct 25, 2023

* add

* import fire in main

* wrap to speed up fire cli

* update

* update docs

* update docs

* fix

* resolve commennts

* resolve confict and add test for cli

169d5169

24 Oct, 2023 2 commits
- bump version to v0.0.12 (#604) · 96f1b8ef
  Lyu Han authored Oct 24, 2023
  
  96f1b8ef
- Fix crash and remove `sys_instruct` from `chat.py` and `client.py`(#591) · ffe4ba9c
  Chen Xin authored Oct 24, 2023
```
* fix crash

* update profile_generation.py

* format

* use self.bos_id

* remove sys_instruct
```
  ffe4ba9c
23 Oct, 2023 1 commit
- update solar chat template (#587) · baf1801b
  AllentDan authored Oct 23, 2023
  
  baf1801b
19 Oct, 2023 2 commits
- robust incremental decode for leading space (#581) · 186bfd2e
  AllentDan authored Oct 19, 2023
```
* robust incremental decode for leading space

* speed up lookup as prefix_space_tokens is shorter than no_prefix_space_tokens

* add UT and fix qwen stuff
```
  186bfd2e
- add solar chat template (#576) · 70a5c63a
  AllentDan authored Oct 19, 2023
  
  70a5c63a
18 Oct, 2023 2 commits
- avoid split chinese characters during decoding (#566) · eb3b4dc9
  AllentDan authored Oct 18, 2023
  
  eb3b4dc9
- change 'model_format' to 'qwen' when 'model_name' starts with 'qwen' (#575) · 9c3634ec
  Lyu Han authored Oct 18, 2023
  
  9c3634ec
17 Oct, 2023 1 commit
- bump version to v0.0.11 (#567) · bb3cce9a
  Lyu Han authored Oct 17, 2023
  
  bb3cce9a
16 Oct, 2023 1 commit
- Move `tokenizer.py` to the folder of lmdeploy (#543) · c261b49d
  q.yao authored Oct 16, 2023
```
* move tokenizer

* remove Tokenizer in init

* update deploy.py
```
  c261b49d
13 Oct, 2023 2 commits
- Add tp hint for deployment (#555) · 77a26812
  Chen Xin authored Oct 13, 2023
```
* add tp hint for deploy

* fix lint

* assert tp in turbomind

* fix lint
```
  77a26812
- Fix typing of openai protocol. (#554) · 6904053f
  YiiSh authored Oct 13, 2023
  
  6904053f
12 Oct, 2023 1 commit
- support deploy qwen-14b-chat (#482) · b21239a8
  Chen Xin authored Oct 12, 2023
```
* support deploy qwen-14b-chat

* update README

* load safetensors first
```
  b21239a8
11 Oct, 2023 1 commit

make IPv6 compatible, safe run for coroutine interrupting (#487) · 759e1ddf

AllentDan authored Oct 11, 2023

* make IPv6 compatible, safe run for coroutine interrupting

* instance_id -> session_id and fix api_client.py

* update doc

* remove useless faq

* safe ip mapping

* update app.py

* remove print

* update doc

759e1ddf

09 Oct, 2023 2 commits
- set the default value of being 0 (#532) · fbd9770a
  Lyu Han authored Oct 10, 2023
  
  fbd9770a
- Support CORS for openai api server (#481) · 02684144
  aisensiy authored Oct 09, 2023
```
* Support CORS for openai api server

* Remove unnecessary var

* Add CORS support follow the same style with vllm
```
  02684144
26 Sep, 2023 3 commits
- bump version to v0.0.10 (#474) · b58a9dff
  Lyu Han authored Sep 26, 2023
  
  b58a9dff
- Fix compatibility issues with Pydantic 2 (#465) · 22cd7d15
  aisensiy authored Sep 26, 2023
  
  22cd7d15
- expose stop words and filter eoa (#352) · 327deaee
  AllentDan authored Sep 26, 2023
```
* expose stop words

* support string

* fix

* remove eoa from chatbot

* remove eoa of turbomind

* fix ut

* suffix wheel and fix InternLM no system bug
```
  327deaee
25 Sep, 2023 2 commits
- Miss meta instruction of internlm-chat model (#470) · ce9e0756
  Lyu Han authored Sep 25, 2023
  
  ce9e0756
- Fix side effect brought by supporting codellama: `sequence_start` is always... · e980377a
  Lyu Han authored Sep 25, 2023
```
Fix side effect brought by supporting codellama: `sequence_start` is always true when calling `model.get_prompt` (#466)
```
  e980377a
20 Sep, 2023 2 commits

bump version to v0.0.9 (#428) · 0be9e7ab
Lyu Han authored Sep 20, 2023

0be9e7ab

Support InternLM 20B (#440) · df7955de

Lyu Han authored Sep 20, 2023



* better profiler

* wait for releasing mem

* remove fire

* remove support for multiple model benchmark

* comments

* support actual seqlen

* change chat template

* update

* fix ut

* int->size_t

* output more details

* correct tp

* rollback

* update

* update readme

* add 'internlm-chat' as the default tag for internlm chat models

* rollback tokenizer

---------
Co-authored-by: AllentDan <AllentDan@yeah.net>
Co-authored-by: grimoire <yaoqian@pjlab.org.cn>

df7955de

18 Sep, 2023 1 commit
- Fix token count bug (#416) · 3a7880a8
  AllentDan authored Sep 18, 2023
```
* fix token count bug

* fix error response
```
  3a7880a8
13 Sep, 2023 1 commit
- fix output[-1] when output is empty (#405) · 64c39dd8
  WRH authored Sep 13, 2023
  
  64c39dd8
11 Sep, 2023 3 commits

bump version to v0.0.8 (#401) · 450757b2
Lyu Han authored Sep 11, 2023

450757b2
[Fix] Update puyu model (#399) · cfec5bed
liukuikun authored Sep 11, 2023

cfec5bed

Support codellama (#359) · 65c662f9

Lyu Han authored Sep 11, 2023

* tmp

* add demo for codellama inference

* update

* update

* update

* update codellama.md

* export rope_theta

* update

* update doc

* fix client.py

* define SamplingParam

* rollback 'end'

* rotary_emb_base to rotary_embedding_base

* change to baichuan2-7b

65c662f9

08 Sep, 2023 1 commit

Support baichuan2-chat chat template (#378) · 55764e0b

WRH authored Sep 08, 2023



* support baichuan2-chat

* update args from generation config

* update deploy.py

* update readme

* tested with tp

* step-1 when last id is eos

* add news

---------
Co-authored-by: chenxin <chenxin@pjlab.org.cn>

55764e0b

07 Sep, 2023 2 commits
- fix exceed session len core dump for chat and generate (#366) · ce21a318
  AllentDan authored Sep 07, 2023
  
  ce21a318
- bug-fix: always use stream mode to enable persistent batching (#346) · 57cf99b9
  fade_away authored Sep 07, 2023
```
Co-authored-by: sleepwalker <just_for_singing@foxmail.com>
```
  57cf99b9
04 Sep, 2023 2 commits
- bump version to v0.0.7 (#358) · d065f3e4
  Lyu Han authored Sep 04, 2023
  
  d065f3e4
- Fix profile_serving hung issue (#344) · edb7c6ec
  Lyu Han authored Sep 04, 2023
```
* read data after start processes

* fix hang

* fix exceptions when request_output_len is 0
```
  edb7c6ec
01 Sep, 2023 2 commits

Decode generated token_ids incrementally (#309) · 9bfe03c6

AllentDan authored Sep 01, 2023

* add incremental decoding for turbomind

* update TIS

* fix triton post processing

* update doc

* fix typo

* SentencePieceTokenizer incremental decode, add qwen message prompt

* docstring

* update bot

9bfe03c6

Package 'bin/llama_gemm' to wheel (#320) · 22e8b2ca
Chen Xin authored Sep 01, 2023
```
* pack llama_gemm

* update CMakeLists.txt

* remove candidate

* update MANIFEST.in
```
22e8b2ca

29 Aug, 2023 2 commits

Fix turbomind import error on windows (#316) · d4d609bd
Chen Xin authored Aug 29, 2023

d4d609bd

fix(kvint8): update doc (#315) · a48e2d27

tpoisonooo authored Aug 29, 2023



* fix(kvint8): update doc

* style(lmdeploy): format

* style(kv_qparams.py): linting

* fix lint

* Update kv_int8.md

* Update kv_int8.md

---------
Co-authored-by: AllentDan <AllentDan@yeah.net>

a48e2d27

25 Aug, 2023 2 commits
- bump version to v0.0.6 (#283) · cfabbbd7
  Lyu Han authored Aug 25, 2023
  
  cfabbbd7
- Import turbomind in gradio server only when it is needed (#303) · 59f8e674
  AllentDan authored Aug 25, 2023
  
  59f8e674
24 Aug, 2023 1 commit

Enable the Gradio server to call inference services through the RESTful API (#287) · 4279d8ca

AllentDan authored Aug 24, 2023



* app use async engine

* add stop logic

* app update cancel

* app support restful-api

* update doc and use the right model name

* set doc url root

* add comments

* add an example

* renew_session

* update readme.md

* resolve comments

* Update restful_api.md

* Update restful_api.md

* Update restful_api.md

---------
Co-authored-by: tpoisonooo <khj.application@aliyun.com>

4279d8ca