Commits · 2ba9082289e953d39450d97f802b490a9126890a · OpenDAS / Lmdeploy

04 Dec, 2023 2 commits
- Fix missed arguments when benchmark static inference performance (#787) · 2ba90822
  Lyu Han authored Dec 04, 2023
```
* minor fix in the profile scripts and docs

* miss arguments

* typo

* fix lint

* update
```
  2ba90822
- add chat template for Yi (#779) · 12dc3e14
  AllentDan authored Dec 04, 2023
  
  12dc3e14
02 Dec, 2023 1 commit
- Fix early exit condition in attention kernel (#788) · 816022e4
  Li Zhang authored Dec 02, 2023
  
  816022e4
29 Nov, 2023 7 commits

Update benchmark user guide (#763) · d3e2cee4

Lyu Han authored Nov 29, 2023

* user guide of benchmark generation

* update benchmark generation guide

* update profiling throughput guide

* update profiling api_server guide

* rename file names

* update profile tis user guide

* update

* fix according to review comments

* update

* update according to review comments

* updaste

* add an example

* update

d3e2cee4

bump version to 0.1.0a1 (#776) · 9c46b27c
Lyu Han authored Nov 29, 2023

9c46b27c
convert model with hf repo_id (#774) · 77efebbf
Chen Xin authored Nov 29, 2023

77efebbf

Report first-token-latency and token-latency percentiles (#736) · 5c9e1e28

Lyu Han authored Nov 29, 2023

* update profile scripts

* add top_p, top_k and temperature as input arguments

* fix input_ids

* update profile_throughput

* update profile_restful_api

* update profile_serving

* update

* update

* add progress bar

* remove TODO comments

* update

* remove useless profile_* argument

* remove log level

* change concurrency default value to 64

* update restful_api.md

* update according to review comments

* fix docstring

5c9e1e28

improvement(build): enable ninja and gold linker (#767) · 8add942d

tpoisonooo authored Nov 29, 2023

* feat(build): enable ninja and lld

* fix(.github): add ninja installation

* fix(CI): remove dimsize=256

* fix(CI): add option for generate.sh

* fix(docs): update

8add942d

fix turbomind build on sm<80 (#754) · 8c672a7b
q.yao authored Nov 29, 2023
```
* fix

* fix lint
```
8c672a7b

add triton server test and workflow yml (#760) · 4744b28c

RunningLeon authored Nov 29, 2023

* add triton server test and workflow yml

* update

* revert changes in dockerfile

* update prompts

4744b28c

28 Nov, 2023 1 commit
- fix typo (#769) · 2f80c556
  q.yao authored Nov 28, 2023
  
  2f80c556
27 Nov, 2023 2 commits
- Set the default value of `max_context_token_num` 1 (#761) · 7868cea5
  Lyu Han authored Nov 27, 2023
  
  7868cea5
- [Fix] Rollback the data type of input_ids to TYPE_UINT32 in preprocessor's proto (#758) · 4bcc4f11
  Lyu Han authored Nov 27, 2023
  
  4bcc4f11
24 Nov, 2023 1 commit
- [Fix] build docker image failed since `packaging` is missing (#753) · c07f60fd
  Lyu Han authored Nov 24, 2023
  
  c07f60fd
23 Nov, 2023 3 commits
- [Fix] Skip empty batch (#747) · a7c5007c
  Li Zhang authored Nov 23, 2023
  
  a7c5007c
- bump version to v0.1.0a0 (#709) · d3386351
  Lyu Han authored Nov 23, 2023
  
  d3386351
- Fix cache/output length calculation (#738) · 434961c6
  Li Zhang authored Nov 23, 2023
  
  434961c6
22 Nov, 2023 1 commit

Support loading hf model directly (#685) · 6b00f623

Chen Xin authored Nov 22, 2023

* turbomind support export model params

* fix overflow

* support turbomind.from_pretrained

* fix tp

* support AutoModel

* support load kv qparams

* update auto_awq

* udpate docstring

* export lmdeploy version

* update doc

* remove download_hf_repo

* LmdeployForCausalLM -> LmdeployForCausalLM

* refactor turbomind.py

* update comment

* add bfloat16 convert back

* support gradio run_locl load hf

* support resuful api server load hf

* add docs

* support loading previous quantized model

* adapt pr 690

* udpate docs

* not export turbomind config when quantize a model

* check model_name when can not get it from config.json

* update readme

* remove model_name in auto_awq

* update

* update

* udpate

* fix build

* absolute import

6b00f623

21 Nov, 2023 1 commit
- Replace mmengine with mmengine-lite (#715) · 42e57c8b
  Zaida Zhou authored Nov 21, 2023
  
  42e57c8b
20 Nov, 2023 3 commits

Check-in user guide about turbomind config (#680) · 73386e21

Lyu Han authored Nov 20, 2023

* update

* update config guide

* update guide

* upate user guide according to review comments

73386e21

Optimize for throughput (#701) · 911c0a85

Li Zhang authored Nov 20, 2023



* tmp

* update

* update

* optimize for throughput

* update

* fix eos

* clean up

* fix serving

* fix indexed copy

* minor

* minor

---------
Co-authored-by: lvhan028 <lvhan_028@163.com>

911c0a85

Fix wrong eos_id and bos_id obtained through grpc api (#644) · 65d735ba
Lyu Han authored Nov 20, 2023
```
* Fix wrong eos_id and bos_id obtained through grpc api

* fix according to review comments

* update
```
65d735ba

19 Nov, 2023 2 commits
- Fix Tokenizer encode (#645) · 07640a3a
  AllentDan authored Nov 19, 2023
```
* same encode with HF

* sequence_start -> add_bos

* complement
```
  07640a3a
- [Doc] Update restful api doc (#662) · c02e281f
  AllentDan authored Nov 19, 2023
```
* update restful_api.md

* add a hint

* repeat 3 time
```
  c02e281f
16 Nov, 2023 1 commit
- [Fix] Fix load_checkpoint_in_model bug (#690) · 0fcc3034
  whcao authored Nov 16, 2023
```
* fix load_checkpoint_in_model bug

* fix comments

* fix comments

* fix bugs
```
  0fcc3034
15 Nov, 2023 1 commit
- fix turbomind stream canceling (#686) · 7d40d190
  q.yao authored Nov 15, 2023
```
* fix

* instance for each forward
```
  7d40d190
14 Nov, 2023 1 commit
- Fix init of batch state (#682) · 4eb8dd83
  Li Zhang authored Nov 14, 2023
```
* fix init of finished buf

* fix `finished_count`
```
  4eb8dd83
13 Nov, 2023 2 commits
- update kv8 docs (#681) · b7c88ca8
  pppppM authored Nov 13, 2023
  
  b7c88ca8
- [Docs] Update Supported Matrix (#679) · e641dd86
  pppppM authored Nov 13, 2023
```
* update supported matrix

* change the default shard size when saving quantized weights

* baichuan2 kv8
```
  e641dd86
10 Nov, 2023 2 commits

TurboMind 2 (#590) · ab1767cf

Li Zhang authored Nov 10, 2023

* refresh decoder attention kernel

* block-level kv cache

* `BlockManager` & `SequenceManager`

* update

* update

* update

* update

* rename

* GQA support

* fix context length

* GQA dispatch

* kv8

* tune

* async stream cb

* nvtx

* config parsing

* debug

* optimize output cost

* split-k decoding

* minor

* truncate `session_len` by available blocks

* minor

* license

* fix

* dispatch `cp.async`

* fix linking

* fix

* fix deadlock

* guard input length

* correct start offset

* fix prefill chunking

* fix `cache_block_seq_len` param passing

* fix `block_size` fmtstr

* fix output tokens

* fix batch resizing

* fix masking of finished sequences

* add debug util

* free unused block early

* add ntk scaling and logn scaling

* cmake flags

* fix typo

* w4a16 for sm75

* fix msvc build

* fix msvc build

* fix block verification

* fix msvc build

* use `std::shuffle`

* fix lint

* fix lint

* fix lint

* clear incoming buffer

* clear finished requests

* fix batch initialization

* fix typo

* fix typo

* fix comparison

ab1767cf

Add extra_requires to reduce dependencies (#580) · 06125966

RunningLeon authored Nov 10, 2023

* update reqs

* update docs

* resolve comments

* upgrade pydantic

* fix rebase

* update doc

* update

* update

* update readme

* update

* add flash-attn

06125966

09 Nov, 2023 3 commits
- bump version to v0.0.14 (#663) · 7b20cfdf
  Lyu Han authored Nov 09, 2023
  
  7b20cfdf
- Add UltraCM and WizardLM chat templates (#599) · 77491284
  AllentDan authored Nov 09, 2023
```
* add ultracm eval chat template

* add WizardLM chat template

* use ultrachat template instead of ultracm usecase
```
  77491284
- fix Tokenizer load error when the path of the being-converted model is not writable (#669) · 18170ee5
  Chen Xin authored Nov 09, 2023
  
  18170ee5
08 Nov, 2023 3 commits

Add check env sub command (#654) · 013000d1

RunningLeon authored Nov 08, 2023

* add check env

* update issue template'

* remove some reqs from check env

* resolve comment

013000d1

fix tokenizer_info when convert the model (#661) · 9febf610
Chen Xin authored Nov 08, 2023

9febf610

fix benchmark serving computation mistake (#630) · 529e56bd

AllentDan authored Nov 08, 2023

* fix benchmark serving computation mistake

* fix timestamps computations

* remove speed up

* no mp

* mp seems faster?

* remove

* update

* remove

* fix

* update

* update print log

* typo

* print fist token latency only stream==True

* remove renew_session

* update AsyncEngine

529e56bd

06 Nov, 2023 2 commits

Manage session id using random int for gradio local mode (#553) · 11d10930

aisensiy authored Nov 06, 2023



* Use session id from gradio state

* use a new session id after reset

* rename session id like a state

* update comments

* reformat files

* init session id on block loaded

* use auto increased session id

* remove session id textbox

* apply to api_server and tritonserver

* update docstring

* add lock for safety

---------
Co-authored-by: AllentDan <AllentDan@yeah.net>

11d10930

FIX: fix stop_session func bug (#578) · 85d2f662

yunzhongyan0 authored Nov 06, 2023



* FIX: fix stop_session func bug

* keep sequence_end = False

---------
Co-authored-by: honglei.yan <honglei.yan@nio.com>
Co-authored-by: AllentDan <AllentDan@yeah.net>

85d2f662

03 Nov, 2023 1 commit
- [Fix] Qwen's quantization results are abnormal & Baichuan cannot be quantized (#605) · c15fbf47
  pppppM authored Nov 03, 2023
```
* fix awq

* adapt new qwen code

* adapt qwen 14b and baichuan2 7b

* add docstring

* add runtime error for qwen
```
  c15fbf47