Commits · ffe4ba9c54be14cb23b9010619f685db59533d93 · guobj / Qwen_lmdeploy

24 Oct, 2023 1 commit
- Fix crash and remove `sys_instruct` from `chat.py` and `client.py`(#591) · ffe4ba9c
  Chen Xin authored Oct 24, 2023
```
* fix crash

* update profile_generation.py

* format

* use self.bos_id

* remove sys_instruct
```
  ffe4ba9c
18 Oct, 2023 1 commit
- avoid split chinese characters during decoding (#566) · eb3b4dc9
  AllentDan authored Oct 18, 2023
  
  eb3b4dc9
16 Oct, 2023 1 commit
- Move `tokenizer.py` to the folder of lmdeploy (#543) · c261b49d
  q.yao authored Oct 16, 2023
```
* move tokenizer

* remove Tokenizer in init

* update deploy.py
```
  c261b49d
25 Sep, 2023 1 commit
- Fix side effect brought by supporting codellama: `sequence_start` is always... · e980377a
  Lyu Han authored Sep 25, 2023
```
Fix side effect brought by supporting codellama: `sequence_start` is always true when calling `model.get_prompt` (#466)
```
  e980377a
11 Sep, 2023 1 commit

Lyu Han authored Sep 11, 2023

* tmp

* add demo for codellama inference

* update

* update

* update

* update codellama.md

* export rope_theta

* update

* update doc

* fix client.py

* define SamplingParam

* rollback 'end'

* rotary_emb_base to rotary_embedding_base

* change to baichuan2-7b

65c662f9

07 Sep, 2023 1 commit
- fix exceed session len core dump for chat and generate (#366) · ce21a318
  AllentDan authored Sep 07, 2023
  
  ce21a318
01 Sep, 2023 1 commit

Decode generated token_ids incrementally (#309) · 9bfe03c6

AllentDan authored Sep 01, 2023

* add incremental decoding for turbomind

* update TIS

* fix triton post processing

* update doc

* fix typo

* SentencePieceTokenizer incremental decode, add qwen message prompt

* docstring

* update bot

9bfe03c6

07 Aug, 2023 1 commit
- Add non-stream inference api for chatbot (#200) · 3de0dbb6
  lvhan028 authored Aug 07, 2023
```
* add non-stream inference api for chatbot

* update according to reviewer's comments
```
  3de0dbb6
31 Jul, 2023 1 commit

Support Runtime tensor parallelism (#158) · 4767b04d

q.yao authored Jul 31, 2023

* works on interlm and vicuna

* support GQA

* remove comment

* update readme, add logger, default tp=1

* remove log

4767b04d

23 Jul, 2023 1 commit

Refactor the chat template of supported models using factory pattern (#144) · 7b470f07

lvhan028 authored Jul 23, 2023

* refactor model.py and support baichuan-7b

* remove model_name

* remove hard session_len

* export tokenizer.py to target dir

* remove model_name from client

* remove model_name

* update

* correct throughput equation

* fix session.response

* update serving.md

* update readme

* update according to review comments

* update

* update

* update

* update

7b470f07

19 Jul, 2023 1 commit

Fix tensor-parallel inference of internlm with bias (#135) · 79595cd1

q.yao authored Jul 19, 2023

* remove copy

* repetition_penalty=1

* add repetition_penalty to chat args

* update readme

* update readme

79595cd1

18 Jul, 2023 1 commit

Tensor Parallel python api (#82) · 7cbfe2ea

q.yao authored Jul 18, 2023

* wip

* profile disable tp

* fix profile

* lint

* fix dlpack

* remove comment

* add tp flag

* add session len check

* add eos

* remove tp and session len inputs

* warp tokenizer

* multithread load weight

* update profile

* refactor tokenizer

* remove pre/post process

* remove mpi4py requirement

* remove

* remove bind

* remove mpi requirement

* check backend_tokenizer

7cbfe2ea

12 Jul, 2023 1 commit
- add docstring for turbomind (#97) · 955c019c
  lvhan028 authored Jul 12, 2023
```
* add docstring

* update

* update

* fix according to review results
```
  955c019c
06 Jul, 2023 2 commits

Streaming output (#71) · 74a4f3c9

q.yao authored Jul 06, 2023



* streaming-output

* fix end

* fix profile

* support chinese streaming

* lint

* update chat

* lint

* fix benchmark

---------
Co-authored-by: grimoire <yaoqian@pjlab.org.cn>

74a4f3c9

fix(project): interlm run error (#69) · 22d403f5
tpoisonooo authored Jul 06, 2023

22d403f5

05 Jul, 2023 4 commits

update internlm‘s chat template (#54) · 3de27ead

lvhan028 authored Jul 05, 2023

* update internlm model

* update

* update

* update

* update

* update temperature, topk and top_p

* update

* update

* loosen log level

3de27ead

remove tokenizer_path from chat_example and move it to lmdeploy/turbomind (#55) · 61e8d2c6
q.yao authored Jul 05, 2023

61e8d2c6

[Feature] Stats Quantization Parameters for KV Cache (#45) · 3fff964d

pppppM authored Jul 05, 2023

* add cal qparams

* support offload inference

* add collect funtions (mod,weight)

* stats kv scales

* update init

* add user guide

* fix hints

* fix comments & support turbomind format

* update user guide

* fix slice kv cache error & support pileval dataset (used in llm-awq)

* fix wrong num heads slice

* update default dataset

* fix conflict

* fix hints

* fix hints

* add gitignore

3fff964d

Python ffi (#34) · 4fd6e710

q.yao authored Jul 05, 2023



* wip

* wip

* example finish

* fix include and namespace

* wtf

* install lib

* batchize

* update cmake install

* multithread

* fix comment

* fix

* add mmengine

* bind llamamodel

---------
Co-authored-by: grimoire <yaoqian@pjlab.org.cn>

4fd6e710