Commits · a5b67b954ea09e1a9c722d9f17a35b4af4cb9d97 · OpenDAS / Lmdeploy

12 Dec, 2023 1 commit
- simplify the header of the benchmark table (#820) · a5b67b95
  Lyu Han authored Dec 12, 2023
```
* simplify the header of the benchmark table

* miss comma

* fix lint
```
  a5b67b95
06 Dec, 2023 1 commit

Report the inference benchmark of models with different size (#794) · ebe90bc9

Lyu Han authored Dec 06, 2023

* update test scripts for models with different sizes

* update

* only test after tunning gemm

* chmod +x

* fix typo

* benchmark on a100

* fix typo

* fix typo

* per-token latency percentile in profile_throughput

* fix

* fix

* rename

* make the script accept parameters

* minor fix

* indent

* reformat table

* change to 3000

* minor fix

ebe90bc9

04 Dec, 2023 1 commit
- Fix missed arguments when benchmark static inference performance (#787) · 2ba90822
  Lyu Han authored Dec 04, 2023
```
* minor fix in the profile scripts and docs

* miss arguments

* typo

* fix lint

* update
```
  2ba90822
29 Nov, 2023 1 commit

Report first-token-latency and token-latency percentiles (#736) · 5c9e1e28

Lyu Han authored Nov 29, 2023

* update profile scripts

* add top_p, top_k and temperature as input arguments

* fix input_ids

* update profile_throughput

* update profile_restful_api

* update profile_serving

* update

* update

* add progress bar

* remove TODO comments

* update

* remove useless profile_* argument

* remove log level

* change concurrency default value to 64

* update restful_api.md

* update according to review comments

* fix docstring

5c9e1e28

08 Nov, 2023 1 commit

fix benchmark serving computation mistake (#630) · 529e56bd

AllentDan authored Nov 08, 2023

* fix benchmark serving computation mistake

* fix timestamps computations

* remove speed up

* no mp

* mp seems faster?

* remove

* update

* remove

* fix

* update

* update print log

* typo

* print fist token latency only stream==True

* remove renew_session

* update AsyncEngine

529e56bd

01 Nov, 2023 1 commit

Improve api_server and webui usage (#544) · 373bd013

AllentDan authored Nov 01, 2023

* make IPv6 compatible, safe run for coroutine interrupting

* instance_id -> session_id and fix api_client.py

* update doc

* remove useless faq

* safe ip mapping

* update app.py

* WIP completion

* completion

* update doc

* disable interactive mode for /v1/chat/completions

* docstring

* docstring

* refactor gradio

* update gradio

* udpate

* update doc

* rename

* session_id default -1

* missed two files

* add a APIClient

* add chat func for APIClient

* refine

* add concurrent function

* sequence_start, sequence_end --> interactive_mode

* update doc

* comments

* doc

* better text completion

* remove /v1/embeddings

* comments

* deprecate generate and use /v1/interactive/completions

* /v1/interactive/completion -> /v1/chat/interactive

* embeddings

* rename

* remove wrong arg description

* docstring

* fix

* update cli

* update doc

* strict session_len limit condition

* pass model args to api_server

373bd013

16 Oct, 2023 1 commit
- Move `tokenizer.py` to the folder of lmdeploy (#543) · c261b49d
  q.yao authored Oct 16, 2023
```
* move tokenizer

* remove Tokenizer in init

* update deploy.py
```
  c261b49d
31 Jul, 2023 1 commit

Support Runtime tensor parallelism (#158) · 4767b04d

q.yao authored Jul 31, 2023

* works on interlm and vicuna

* support GQA

* remove comment

* update readme, add logger, default tp=1

* remove log

4767b04d

23 Jul, 2023 1 commit

Refactor the chat template of supported models using factory pattern (#144) · 7b470f07

lvhan028 authored Jul 23, 2023

* refactor model.py and support baichuan-7b

* remove model_name

* remove hard session_len

* export tokenizer.py to target dir

* remove model_name from client

* remove model_name

* update

* correct throughput equation

* fix session.response

* update serving.md

* update readme

* update according to review comments

* update

* update

* update

* update

7b470f07

22 Jul, 2023 1 commit

add profile throughput benchmark (#146) · 2067862d

q.yao authored Jul 22, 2023



* add profile throughput benchmark

* add output only throughput

* update req/min

* update benckmark readme

* fix lint

---------
Co-authored-by: grimoire <yaoqian@pjlab.org.cn>

2067862d