Commits · a5b67b954ea09e1a9c722d9f17a35b4af4cb9d97 · OpenDAS / Lmdeploy

12 Dec, 2023 1 commit
- simplify the header of the benchmark table (#820) · a5b67b95
  Lyu Han authored Dec 12, 2023
```
* simplify the header of the benchmark table

* miss comma

* fix lint
```
  a5b67b95
06 Dec, 2023 1 commit

Report the inference benchmark of models with different size (#794) · ebe90bc9

Lyu Han authored Dec 06, 2023

* update test scripts for models with different sizes

* update

* only test after tunning gemm

* chmod +x

* fix typo

* benchmark on a100

* fix typo

* fix typo

* per-token latency percentile in profile_throughput

* fix

* fix

* rename

* make the script accept parameters

* minor fix

* indent

* reformat table

* change to 3000

* minor fix

ebe90bc9

04 Dec, 2023 1 commit
- Fix missed arguments when benchmark static inference performance (#787) · 2ba90822
  Lyu Han authored Dec 04, 2023
```
* minor fix in the profile scripts and docs

* miss arguments

* typo

* fix lint

* update
```
  2ba90822
29 Nov, 2023 1 commit

Report first-token-latency and token-latency percentiles (#736) · 5c9e1e28

Lyu Han authored Nov 29, 2023

* update profile scripts

* add top_p, top_k and temperature as input arguments

* fix input_ids

* update profile_throughput

* update profile_restful_api

* update profile_serving

* update

* update

* add progress bar

* remove TODO comments

* update

* remove useless profile_* argument

* remove log level

* change concurrency default value to 64

* update restful_api.md

* update according to review comments

* fix docstring

5c9e1e28

08 Nov, 2023 1 commit

fix benchmark serving computation mistake (#630) · 529e56bd

AllentDan authored Nov 08, 2023

* fix benchmark serving computation mistake

* fix timestamps computations

* remove speed up

* no mp

* mp seems faster?

* remove

* update

* remove

* fix

* update

* update print log

* typo

* print fist token latency only stream==True

* remove renew_session

* update AsyncEngine

529e56bd

01 Nov, 2023 1 commit

Improve api_server and webui usage (#544) · 373bd013

AllentDan authored Nov 01, 2023

* make IPv6 compatible, safe run for coroutine interrupting

* instance_id -> session_id and fix api_client.py

* update doc

* remove useless faq

* safe ip mapping

* update app.py

* WIP completion

* completion

* update doc

* disable interactive mode for /v1/chat/completions

* docstring

* docstring

* refactor gradio

* update gradio

* udpate

* update doc

* rename

* session_id default -1

* missed two files

* add a APIClient

* add chat func for APIClient

* refine

* add concurrent function

* sequence_start, sequence_end --> interactive_mode

* update doc

* comments

* doc

* better text completion

* remove /v1/embeddings

* comments

* deprecate generate and use /v1/interactive/completions

* /v1/interactive/completion -> /v1/chat/interactive

* embeddings

* rename

* remove wrong arg description

* docstring

* fix

* update cli

* update doc

* strict session_len limit condition

* pass model args to api_server

373bd013

24 Oct, 2023 1 commit
- Fix crash and remove `sys_instruct` from `chat.py` and `client.py`(#591) · ffe4ba9c
  Chen Xin authored Oct 24, 2023
```
* fix crash

* update profile_generation.py

* format

* use self.bos_id

* remove sys_instruct
```
  ffe4ba9c
16 Oct, 2023 1 commit
- Move `tokenizer.py` to the folder of lmdeploy (#543) · c261b49d
  q.yao authored Oct 16, 2023
```
* move tokenizer

* remove Tokenizer in init

* update deploy.py
```
  c261b49d
11 Oct, 2023 1 commit

make IPv6 compatible, safe run for coroutine interrupting (#487) · 759e1ddf

AllentDan authored Oct 11, 2023

* make IPv6 compatible, safe run for coroutine interrupting

* instance_id -> session_id and fix api_client.py

* update doc

* remove useless faq

* safe ip mapping

* update app.py

* remove print

* update doc

759e1ddf

26 Sep, 2023 1 commit
- fix benchmark serving cannot use Qwen tokenizer (#443) · 97dcdff7
  AllentDan authored Sep 26, 2023
```
* fix benchmark serving cannot use Qwen tokenizer

* update benchmark readme
```
  97dcdff7
18 Sep, 2023 1 commit

Profile token generation with more settings (#364) · dfa67e8c

AllentDan authored Sep 18, 2023

* better profiler

* wait for releasing mem

* remove fire

* remove support for multiple model benchmark

* comments

* output more details

* correct tp

dfa67e8c

04 Sep, 2023 1 commit

Fix profile_serving hung issue (#344) · edb7c6ec

Lyu Han authored Sep 04, 2023

* read data after start processes

* fix hang

* fix exceptions when request_output_len is 0

edb7c6ec

24 Aug, 2023 1 commit

[Feature] Support decode with DP in pytorch (#193) · 81f29837

WRH authored Aug 24, 2023

* support decode

* unit test and benckmark and improve

* remove some drafts

* enable numerical test

* minor

* add some benchmark data

* add more output

* update interface

* remove debugs

* format

* update docstring

* remove print and add benchmark results

* use logits & add main

* fix rb

* dump large

* update test

* update test decode

* add decimal

81f29837

22 Aug, 2023 1 commit

Add Restful API (#223) · d5c10e7a

AllentDan authored Aug 22, 2023

* add restful api

* refine

* add simple doc

* lint

* add uvicorn requirement

* more args

* add llama2

* docstring

* update doc

* save

* refine

* lint

* better decode

* add v1/embedding

* add GenerateRequest

* add llama2 chat template

* correct profiling

* update documents

* add length judge

* add faq

* update doc and rename req_que to req_queue

* fix md link, use get_logger, fix sequence_end bug

* use another doc link for go to avoid lint error

* add api_client.py

* update doc

* update doc

* update function interface

* update FAQ

* resolve comments

d5c10e7a

16 Aug, 2023 1 commit

[Feature] Profiling tool for huggingface and deepspeed models (#161) · e50387cc

WRH authored Aug 16, 2023

* initial

* add docs

* noqa on docstring

* support no streamer

* typo

* add accel in model name

* fix lint

* fix CSVWRitter typo

* typo

e50387cc

14 Aug, 2023 1 commit
- Check-in user guide for w4a16 LLM deployment (#224) · 8e8629de
  Lyu Han authored Aug 14, 2023
```
* tmp

* update

* update

* update

* update

* update

* remove

* update

* update
```
  8e8629de
07 Aug, 2023 1 commit
- Improve postprocessing in TIS serving by applying Incremental de-tokenizing (#197) · 0ed1e4d4
  lvhan028 authored Aug 07, 2023
```
* change to incremental decoding

* update
```
  0ed1e4d4
31 Jul, 2023 2 commits
- Support Runtime tensor parallelism (#158) · 4767b04d
  q.yao authored Jul 31, 2023
```
* works on interlm and vicuna

* support GQA

* remove comment

* update readme, add logger, default tp=1

* remove log
```
  4767b04d
- Fix typo in profile_serving.py (#183) · 09c624ce
  del-zhenwu authored Jul 31, 2023
  
  09c624ce
23 Jul, 2023 1 commit

Refactor the chat template of supported models using factory pattern (#144) · 7b470f07

lvhan028 authored Jul 23, 2023

* refactor model.py and support baichuan-7b

* remove model_name

* remove hard session_len

* export tokenizer.py to target dir

* remove model_name from client

* remove model_name

* update

* correct throughput equation

* fix session.response

* update serving.md

* update readme

* update according to review comments

* update

* update

* update

* update

7b470f07

22 Jul, 2023 1 commit

add profile throughput benchmark (#146) · 2067862d

q.yao authored Jul 22, 2023



* add profile throughput benchmark

* add output only throughput

* update req/min

* update benckmark readme

* fix lint

---------
Co-authored-by: grimoire <yaoqian@pjlab.org.cn>

2067862d

19 Jul, 2023 1 commit
- Fix concatenate bug in benchmark serving script (#134) · 39350031
  rollroll90 authored Jul 19, 2023
  
  39350031
18 Jul, 2023 1 commit

Tensor Parallel python api (#82) · 7cbfe2ea

q.yao authored Jul 18, 2023

* wip

* profile disable tp

* fix profile

* lint

* fix dlpack

* remove comment

* add tp flag

* add session len check

* add eos

* remove tp and session len inputs

* warp tokenizer

* multithread load weight

* update profile

* refactor tokenizer

* remove pre/post process

* remove mpi4py requirement

* remove

* remove bind

* remove mpi requirement

* check backend_tokenizer

7cbfe2ea

06 Jul, 2023 1 commit

Streaming output (#71) · 74a4f3c9

q.yao authored Jul 06, 2023



* streaming-output

* fix end

* fix profile

* support chinese streaming

* lint

* update chat

* lint

* fix benchmark

---------
Co-authored-by: grimoire <yaoqian@pjlab.org.cn>

74a4f3c9

05 Jul, 2023 2 commits

[Feature] Stats Quantization Parameters for KV Cache (#45) · 3fff964d

pppppM authored Jul 05, 2023

* add cal qparams

* support offload inference

* add collect funtions (mod,weight)

* stats kv scales

* update init

* add user guide

* fix hints

* fix comments & support turbomind format

* update user guide

* fix slice kv cache error & support pileval dataset (used in llm-awq)

* fix wrong num heads slice

* update default dataset

* fix conflict

* fix hints

* fix hints

* add gitignore

3fff964d

Python ffi (#34) · 4fd6e710

q.yao authored Jul 05, 2023



* wip

* wip

* example finish

* fix include and namespace

* wtf

* install lib

* batchize

* update cmake install

* multithread

* fix comment

* fix

* add mmengine

* bind llamamodel

---------
Co-authored-by: grimoire <yaoqian@pjlab.org.cn>

4fd6e710

03 Jul, 2023 1 commit
- install triton_example and TransformerTritonBackend to runtime and lib respectively (#39) · bb6f8060
  lvhan028 authored Jul 03, 2023
  
  bb6f8060
30 Jun, 2023 2 commits
- rename serve/fastertransformer to serve/turbomind (#31) · e8ab4ba3
  lvhan028 authored Jun 30, 2023
```
* rename lmdeploy/serve/fastertransformer to lmdeploy/serve/turbomind

* update

* update
```
  e8ab4ba3
- rename llmdeploy to lmdeploy (#30) · 46f4738c
  lvhan028 authored Jun 30, 2023
```
* change llmdeploy to lmdeploy

* update logo

* update readme
```
  46f4738c
25 Jun, 2023 1 commit

Add profile (#15) · 23c05372

lvhan028 authored Jun 25, 2023

* remove constraints on model name

* remove duplicate model converter

* add profile

* get eos and bos from server

* update stop_words

* update sequence_length when the last generated token is eos_id

* fix

* fix

* check-in models

* valicate model_name

* make stop_words as property

* debug profiling

* better stats

* fix assistant reponse

* update profile serving

* update

* update

23c05372