Commits · ce9e07562bceea2741083dca4813cd6d30b5ec4b · guobj / Qwen_lmdeploy

25 Sep, 2023 2 commits
- Miss meta instruction of internlm-chat model (#470) · ce9e0756
  Lyu Han authored Sep 25, 2023
  
  ce9e0756
- Fix side effect brought by supporting codellama: `sequence_start` is always... · e980377a
  Lyu Han authored Sep 25, 2023
```
Fix side effect brought by supporting codellama: `sequence_start` is always true when calling `model.get_prompt` (#466)
```
  e980377a
20 Sep, 2023 2 commits

bump version to v0.0.9 (#428) · 0be9e7ab
Lyu Han authored Sep 20, 2023

0be9e7ab

Lyu Han authored Sep 20, 2023



* better profiler

* wait for releasing mem

* remove fire

* remove support for multiple model benchmark

* comments

* support actual seqlen

* change chat template

* update

* fix ut

* int->size_t

* output more details

* correct tp

* rollback

* update

* update readme

* add 'internlm-chat' as the default tag for internlm chat models

* rollback tokenizer

---------
Co-authored-by: AllentDan <AllentDan@yeah.net>
Co-authored-by: grimoire <yaoqian@pjlab.org.cn>

df7955de

18 Sep, 2023 1 commit
- Fix token count bug (#416) · 3a7880a8
  AllentDan authored Sep 18, 2023
```
* fix token count bug

* fix error response
```
  3a7880a8
13 Sep, 2023 1 commit
- fix output[-1] when output is empty (#405) · 64c39dd8
  WRH authored Sep 13, 2023
  
  64c39dd8
11 Sep, 2023 3 commits

bump version to v0.0.8 (#401) · 450757b2
Lyu Han authored Sep 11, 2023

450757b2
[Fix] Update puyu model (#399) · cfec5bed
liukuikun authored Sep 11, 2023

cfec5bed

Support codellama (#359) · 65c662f9

Lyu Han authored Sep 11, 2023

* tmp

* add demo for codellama inference

* update

* update

* update

* update codellama.md

* export rope_theta

* update

* update doc

* fix client.py

* define SamplingParam

* rollback 'end'

* rotary_emb_base to rotary_embedding_base

* change to baichuan2-7b

65c662f9

08 Sep, 2023 1 commit

Support baichuan2-chat chat template (#378) · 55764e0b

WRH authored Sep 08, 2023



* support baichuan2-chat

* update args from generation config

* update deploy.py

* update readme

* tested with tp

* step-1 when last id is eos

* add news

---------
Co-authored-by: chenxin <chenxin@pjlab.org.cn>

55764e0b

07 Sep, 2023 2 commits
- fix exceed session len core dump for chat and generate (#366) · ce21a318
  AllentDan authored Sep 07, 2023
  
  ce21a318
- bug-fix: always use stream mode to enable persistent batching (#346) · 57cf99b9
  fade_away authored Sep 07, 2023
```
Co-authored-by: sleepwalker <just_for_singing@foxmail.com>
```
  57cf99b9
04 Sep, 2023 2 commits
- bump version to v0.0.7 (#358) · d065f3e4
  Lyu Han authored Sep 04, 2023
  
  d065f3e4
- Fix profile_serving hung issue (#344) · edb7c6ec
  Lyu Han authored Sep 04, 2023
```
* read data after start processes

* fix hang

* fix exceptions when request_output_len is 0
```
  edb7c6ec
01 Sep, 2023 2 commits

Decode generated token_ids incrementally (#309) · 9bfe03c6

AllentDan authored Sep 01, 2023

* add incremental decoding for turbomind

* update TIS

* fix triton post processing

* update doc

* fix typo

* SentencePieceTokenizer incremental decode, add qwen message prompt

* docstring

* update bot

9bfe03c6

Package 'bin/llama_gemm' to wheel (#320) · 22e8b2ca
Chen Xin authored Sep 01, 2023
```
* pack llama_gemm

* update CMakeLists.txt

* remove candidate

* update MANIFEST.in
```
22e8b2ca

29 Aug, 2023 2 commits

Fix turbomind import error on windows (#316) · d4d609bd
Chen Xin authored Aug 29, 2023

d4d609bd

fix(kvint8): update doc (#315) · a48e2d27

tpoisonooo authored Aug 29, 2023



* fix(kvint8): update doc

* style(lmdeploy): format

* style(kv_qparams.py): linting

* fix lint

* Update kv_int8.md

* Update kv_int8.md

---------
Co-authored-by: AllentDan <AllentDan@yeah.net>

a48e2d27

25 Aug, 2023 2 commits
- bump version to v0.0.6 (#283) · cfabbbd7
  Lyu Han authored Aug 25, 2023
  
  cfabbbd7
- Import turbomind in gradio server only when it is needed (#303) · 59f8e674
  AllentDan authored Aug 25, 2023
  
  59f8e674
24 Aug, 2023 4 commits

Enable the Gradio server to call inference services through the RESTful API (#287) · 4279d8ca

AllentDan authored Aug 24, 2023



* app use async engine

* add stop logic

* app update cancel

* app support restful-api

* update doc and use the right model name

* set doc url root

* add comments

* add an example

* renew_session

* update readme.md

* resolve comments

* Update restful_api.md

* Update restful_api.md

* Update restful_api.md

---------
Co-authored-by: tpoisonooo <khj.application@aliyun.com>

4279d8ca

[Feature] Support decode with DP in pytorch (#193) · 81f29837

WRH authored Aug 24, 2023

* support decode

* unit test and benckmark and improve

* remove some drafts

* enable numerical test

* minor

* add some benchmark data

* add more output

* update interface

* remove debugs

* format

* update docstring

* remove print and add benchmark results

* use logits & add main

* fix rb

* dump large

* update test

* update test decode

* add decimal

81f29837

Pad tok_embedding and output weights to make their shape divisible by TP (#285) · 4903d3cc

Lyu Han authored Aug 24, 2023

* Pad tok_embedding and output weights to make their shape divisible by TP

* update

* update

* update

* update

* update llamaBatch

4903d3cc

[Fix] Fix llama2 70b & qwen quantization error (#273) · d5cb0be2
pppppM authored Aug 24, 2023
```
* fix llama2 70b

* fix qwen quantization

* remove pdb

* add faq
```
d5cb0be2

22 Aug, 2023 1 commit

Add Restful API (#223) · d5c10e7a

AllentDan authored Aug 22, 2023

* add restful api

* refine

* add simple doc

* lint

* add uvicorn requirement

* more args

* add llama2

* docstring

* update doc

* save

* refine

* lint

* better decode

* add v1/embedding

* add GenerateRequest

* add llama2 chat template

* correct profiling

* update documents

* add length judge

* add faq

* update doc and rename req_que to req_queue

* fix md link, use get_logger, fix sequence_end bug

* use another doc link for go to avoid lint error

* add api_client.py

* update doc

* update doc

* update function interface

* update FAQ

* resolve comments

d5c10e7a

21 Aug, 2023 1 commit

Pass chat template args including meta_prompt to model (#225) · 7785142d

AllentDan authored Aug 21, 2023

* pass args like meta_prompt to model

* update chatbot

* update

* rollback

* update llama2 and qwen

* refine

7785142d

18 Aug, 2023 2 commits

Support TP for w4a16 (#262) · 89f3d322
Li Zhang authored Aug 18, 2023

89f3d322

[Feature] Support Qwen-7B, dynamic NTK scaling and logN scaling in turbomind (#230) · 4a60b45d

Li Zhang authored Aug 18, 2023

* qwen support

* dynamic ntk & logn attn

* fix ntk & add chat template

* fix ntk scaling & stop words

* fix lint

* add tiktoken to requirements.txt

* fix tokenizer, set model format automatically

* update model.py

* update readme

* fix lint

4a60b45d

16 Aug, 2023 2 commits
- Adjust dependency of gradio server (#236) · 0d21f366
  AllentDan authored Aug 16, 2023
```
* import if lib directory exists

* only modify app.py
```
  0d21f366
- remove chat template (#252) · f06db80d
  Lyu Han authored Aug 16, 2023
  
  f06db80d
15 Aug, 2023 1 commit
- Fix wrong RPATH using the absolute path instead of relative one (#239) · 271a19fe
  Chen Xin authored Aug 15, 2023
  
  271a19fe
14 Aug, 2023 4 commits
- Bump version to v0.0.4 (#231) · 8cdcb2a9
  Lyu Han authored Aug 14, 2023
  
  8cdcb2a9
- Fix TIS client got-no-space-result side effect brought by PR #197 (#222) · 68296844
  Lyu Han authored Aug 14, 2023
```
* rollback

* rollback chatbot.py
```
  68296844
- feat(quantization): kv cache use asymmetric (#218) · 902a3e16
  tpoisonooo authored Aug 14, 2023
```
* feat(quantization): kv cache use asymmetric
```
  902a3e16
- [Feature] Blazing fast W4A16 inference (#202) · c3290cad
  Li Zhang authored Aug 14, 2023
```
* add w4a16

* fix `deploy.py`

* add doc

* add w4a16 kernels

* fuse w1/w3 & bugfixes

* fix typo

* python

* guard sm75/80 features

* add missing header

* refactor

* qkvo bias

* update cost model

* fix lint

* update `deploy.py`
```
  c3290cad
11 Aug, 2023 1 commit

[Feature] Support AWQ (#108) · d3dbe179

pppppM authored Aug 11, 2023

* support kv cache offload

* add dataloader docstring

* complete gitignore

* refactor collect mod fn

* add calibration

* fix lint

* add observers and quantizers

* fix lints

* add global available mixin

* fix lints

* split batch inference

* support smoothquant and awq

* update export kv scales

* fix lints

* fix some bugs

* update weight only usage

* update usage

* auto mapping and support smooth internlm

* trust remote code

* fix num head key error

* fix bias error

* align shape and pack order with llm-awq

* modified according to LZHgrla's comments.

* update gitignore

* fix kv qparams export error

* update usage

* decouple calibrate and awq

* update docstrings

* update api name

* update readme

* update readme

* update readme

* update readme

* update kv_qparams and readme

* fix typos

d3dbe179

07 Aug, 2023 4 commits

[Refactor] Support multi-session chat (#178) · 4bd0b487

WRH authored Aug 07, 2023

* add some dist utils

* add model utils

* add termio and basicstreamer

* typo

* fix world size

* refactor chat and tested llama1

* add internlm adapter and support stoping criteria

* concat with id for internlm

* update docstring

* update and support llama2

* typo

* move docs to docs

* update docstring of session manager

* update docstring

* update docs

* fix accel none in model

* fix and add test for tensor broadcast

* fix session using typing to check type

* add docstrings and comprehensive condition test

* unit test for dist

* fix session

* split unittests of utils

* typo

* update control flow of accel

* move test model

* remove main in unittest

* remove some log

* remove some comments

4bd0b487

bump version to v0.0.3 (#205) · c80f3e49
lvhan028 authored Aug 07, 2023

c80f3e49
Add non-stream inference api for chatbot (#200) · 3de0dbb6
lvhan028 authored Aug 07, 2023
```
* add non-stream inference api for chatbot

* update according to reviewer's comments
```
3de0dbb6
[Feature] Add script to split HuggingFace model to the smallest sharded checkpoints (#199) · b7e7e668
LZHgrla authored Aug 07, 2023
```
* add get_small_sharded_hf.py

* fix pre-commit
```
b7e7e668