Commits · 759e1ddf63f5963dc10f6d5e75a64f3a0f2a0395 · guobj / Qwen_lmdeploy

"configs/multimodal/instructblip/instructblip_vsr.py" did not exist on "ae3c1869dacc6afc56076ab516edec6b8f648696"

11 Oct, 2023 1 commit

make IPv6 compatible, safe run for coroutine interrupting (#487) · 759e1ddf

AllentDan authored Oct 11, 2023

* make IPv6 compatible, safe run for coroutine interrupting

* instance_id -> session_id and fix api_client.py

* update doc

* remove useless faq

* safe ip mapping

* update app.py

* remove print

* update doc

759e1ddf

09 Oct, 2023 1 commit

Support CORS for openai api server (#481) · 02684144

aisensiy authored Oct 09, 2023

* Support CORS for openai api server

* Remove unnecessary var

* Add CORS support follow the same style with vllm

02684144

26 Sep, 2023 2 commits
- Fix compatibility issues with Pydantic 2 (#465) · 22cd7d15
  aisensiy authored Sep 26, 2023
  
  22cd7d15
- expose stop words and filter eoa (#352) · 327deaee
  AllentDan authored Sep 26, 2023
```
* expose stop words

* support string

* fix

* remove eoa from chatbot

* remove eoa of turbomind

* fix ut

* suffix wheel and fix InternLM no system bug
```
  327deaee
18 Sep, 2023 1 commit
- Fix token count bug (#416) · 3a7880a8
  AllentDan authored Sep 18, 2023
```
* fix token count bug

* fix error response
```
  3a7880a8
11 Sep, 2023 1 commit

Support codellama (#359) · 65c662f9

Lyu Han authored Sep 11, 2023

* tmp

* add demo for codellama inference

* update

* update

* update

* update codellama.md

* export rope_theta

* update

* update doc

* fix client.py

* define SamplingParam

* rollback 'end'

* rotary_emb_base to rotary_embedding_base

* change to baichuan2-7b

65c662f9

08 Sep, 2023 1 commit

Support baichuan2-chat chat template (#378) · 55764e0b

WRH authored Sep 08, 2023



* support baichuan2-chat

* update args from generation config

* update deploy.py

* update readme

* tested with tp

* step-1 when last id is eos

* add news

---------
Co-authored-by: chenxin <chenxin@pjlab.org.cn>

55764e0b

07 Sep, 2023 2 commits
- fix exceed session len core dump for chat and generate (#366) · ce21a318
  AllentDan authored Sep 07, 2023
  
  ce21a318
- bug-fix: always use stream mode to enable persistent batching (#346) · 57cf99b9
  fade_away authored Sep 07, 2023
```
Co-authored-by: sleepwalker <just_for_singing@foxmail.com>
```
  57cf99b9
04 Sep, 2023 1 commit

Fix profile_serving hung issue (#344) · edb7c6ec

Lyu Han authored Sep 04, 2023

* read data after start processes

* fix hang

* fix exceptions when request_output_len is 0

edb7c6ec

01 Sep, 2023 1 commit

Decode generated token_ids incrementally (#309) · 9bfe03c6

AllentDan authored Sep 01, 2023

* add incremental decoding for turbomind

* update TIS

* fix triton post processing

* update doc

* fix typo

* SentencePieceTokenizer incremental decode, add qwen message prompt

* docstring

* update bot

9bfe03c6

25 Aug, 2023 1 commit
- Import turbomind in gradio server only when it is needed (#303) · 59f8e674
  AllentDan authored Aug 25, 2023
  
  59f8e674
24 Aug, 2023 2 commits

Enable the Gradio server to call inference services through the RESTful API (#287) · 4279d8ca

AllentDan authored Aug 24, 2023



* app use async engine

* add stop logic

* app update cancel

* app support restful-api

* update doc and use the right model name

* set doc url root

* add comments

* add an example

* renew_session

* update readme.md

* resolve comments

* Update restful_api.md

* Update restful_api.md

* Update restful_api.md

---------
Co-authored-by: tpoisonooo <khj.application@aliyun.com>

4279d8ca

Pad tok_embedding and output weights to make their shape divisible by TP (#285) · 4903d3cc

Lyu Han authored Aug 24, 2023

* Pad tok_embedding and output weights to make their shape divisible by TP

* update

* update

* update

* update

* update llamaBatch

4903d3cc

22 Aug, 2023 1 commit

Add Restful API (#223) · d5c10e7a

AllentDan authored Aug 22, 2023

* add restful api

* refine

* add simple doc

* lint

* add uvicorn requirement

* more args

* add llama2

* docstring

* update doc

* save

* refine

* lint

* better decode

* add v1/embedding

* add GenerateRequest

* add llama2 chat template

* correct profiling

* update documents

* add length judge

* add faq

* update doc and rename req_que to req_queue

* fix md link, use get_logger, fix sequence_end bug

* use another doc link for go to avoid lint error

* add api_client.py

* update doc

* update doc

* update function interface

* update FAQ

* resolve comments

d5c10e7a

21 Aug, 2023 1 commit

Pass chat template args including meta_prompt to model (#225) · 7785142d

AllentDan authored Aug 21, 2023

* pass args like meta_prompt to model

* update chatbot

* update

* rollback

* update llama2 and qwen

* refine

7785142d

18 Aug, 2023 2 commits

Support TP for w4a16 (#262) · 89f3d322
Li Zhang authored Aug 18, 2023

89f3d322

[Feature] Support Qwen-7B, dynamic NTK scaling and logN scaling in turbomind (#230) · 4a60b45d

Li Zhang authored Aug 18, 2023

* qwen support

* dynamic ntk & logn attn

* fix ntk & add chat template

* fix ntk scaling & stop words

* fix lint

* add tiktoken to requirements.txt

* fix tokenizer, set model format automatically

* update model.py

* update readme

* fix lint

4a60b45d

16 Aug, 2023 1 commit
- Adjust dependency of gradio server (#236) · 0d21f366
  AllentDan authored Aug 16, 2023
```
* import if lib directory exists

* only modify app.py
```
  0d21f366
14 Aug, 2023 2 commits

Fix TIS client got-no-space-result side effect brought by PR #197 (#222) · 68296844
Lyu Han authored Aug 14, 2023
```
* rollback

* rollback chatbot.py
```
68296844

[Feature] Blazing fast W4A16 inference (#202) · c3290cad

Li Zhang authored Aug 14, 2023

* add w4a16

* fix `deploy.py`

* add doc

* add w4a16 kernels

* fuse w1/w3 & bugfixes

* fix typo

* python

* guard sm75/80 features

* add missing header

* refactor

* qkvo bias

* update cost model

* fix lint

* update `deploy.py`

c3290cad

07 Aug, 2023 2 commits
- Add non-stream inference api for chatbot (#200) · 3de0dbb6
  lvhan028 authored Aug 07, 2023
```
* add non-stream inference api for chatbot

* update according to reviewer's comments
```
  3de0dbb6
- Improve postprocessing in TIS serving by applying Incremental de-tokenizing (#197) · 0ed1e4d4
  lvhan028 authored Aug 07, 2023
```
* change to incremental decoding

* update
```
  0ed1e4d4
04 Aug, 2023 1 commit

Support serving with gradio without communicating to TIS (#162) · 18c386d9

AllentDan authored Aug 04, 2023



* use local model for webui

* local model for app.py

* lint

* remove print

* add seed

* comments

* fixed seesion_id

* support turbomind batch inference

* update app.py

* lint and docstring

* move webui to serve/gradio

* update doc

* update doc

* update docstring and rmeove print conversition

* log

* Update docs/zh_cn/build.md
Co-authored-by: Chen Xin <xinchen.tju@gmail.com>

* Update docs/en/build.md
Co-authored-by: Chen Xin <xinchen.tju@gmail.com>

* use latest gradio

* fix

* replace partial with InterFace

* use host ip instead of coolie

---------
Co-authored-by: Chen Xin <xinchen.tju@gmail.com>

18c386d9

03 Aug, 2023 1 commit
- Move lmdeploy/turbomind/utils.py to lmdeploy/utils.py (#191) · 7a2128be
  lvhan028 authored Aug 03, 2023
  
  7a2128be
31 Jul, 2023 1 commit

Support Runtime tensor parallelism (#158) · 4767b04d

q.yao authored Jul 31, 2023

* works on interlm and vicuna

* support GQA

* remove comment

* update readme, add logger, default tp=1

* remove log

4767b04d

27 Jul, 2023 1 commit
- add model_name param for chatbot (#174) · 7bc8d171
  MaxMatthew authored Jul 27, 2023
  
  7bc8d171
26 Jul, 2023 1 commit
- Add triton_models to whl package (#163) · e7bc11b4
  Chen Xin authored Jul 26, 2023
```
* defer symlink

* fix lint
```
  e7bc11b4
25 Jul, 2023 2 commits
- support fmha gqa (#160) · 5ed6bb59
  q.yao authored Jul 25, 2023
```
Co-authored-by: grimoire <yaoqian@pjlab.org.cn>
```
  5ed6bb59
- fix getting package root path error in python3.9 (#157) · 5203c850
  lvhan028 authored Jul 25, 2023
  
  5203c850
23 Jul, 2023 1 commit

Refactor the chat template of supported models using factory pattern (#144) · 7b470f07

lvhan028 authored Jul 23, 2023

* refactor model.py and support baichuan-7b

* remove model_name

* remove hard session_len

* export tokenizer.py to target dir

* remove model_name from client

* remove model_name

* update

* correct throughput equation

* fix session.response

* update serving.md

* update readme

* update according to review comments

* update

* update

* update

* update

7b470f07

21 Jul, 2023 2 commits

remove slicing reponse and add resume api (#154) · b728064e

MaxMatthew authored Jul 21, 2023

* Fix lmdeploy.serve.turbomind bug
* add __init__.py for turbomind
* add resume function
* fix the assignment for session.response

* Fix code style

b728064e

[Feature] Support Llama-2 with GQA (#147) · f07b697b

Li Zhang authored Jul 21, 2023

* add GQA for llama2

* fix model conversion

* fix lint & remove dev log

* update news

* minor

* fix allocation size

* fix split_dim for w_qkv.bias

f07b697b

19 Jul, 2023 2 commits
- fix the offset during streaming chat (#142) · 289ffa3c
  lvhan028 authored Jul 19, 2023
  
  289ffa3c
- Fix tensor-parallel inference of internlm with bias (#135) · 79595cd1
  q.yao authored Jul 19, 2023
```
* remove copy

* repetition_penalty=1

* add repetition_penalty to chat args

* update readme

* update readme
```
  79595cd1
18 Jul, 2023 2 commits

update doc and requirements.txt (#119) · 4970d798

AllentDan authored Jul 18, 2023



* update requirements

* update transformers version

* lint

* comments

* lint

* update requirements

* remove setup_requires

---------
Co-authored-by: dongchunyu <dongchunyu@pjlab.org.cn>

4970d798

print info copy-paste error (#133) · 8664946d
Kevin Wang authored Jul 18, 2023

8664946d

14 Jul, 2023 1 commit
- add puyu model for internal use (#105) · 4cfb118f
  lvhan028 authored Jul 14, 2023
```
* add puyu model for internal use

* get/set session

* update

* add docstring
```
  4cfb118f
12 Jul, 2023 1 commit
- add docstring for turbomind (#97) · 955c019c
  lvhan028 authored Jul 12, 2023
```
* add docstring

* update

* update

* fix according to review results
```
  955c019c
11 Jul, 2023 1 commit
- set chuk_size=1 andxport tp to config.ini (#94) · 69b6eabe
  lvhan028 authored Jul 11, 2023
  
  69b6eabe