Commits · a5b67b954ea09e1a9c722d9f17a35b4af4cb9d97 · OpenDAS / Lmdeploy

12 Dec, 2023 1 commit
- simplify the header of the benchmark table (#820) · a5b67b95
  Lyu Han authored Dec 12, 2023
```
* simplify the header of the benchmark table

* miss comma

* fix lint
```
  a5b67b95
06 Dec, 2023 1 commit

Report the inference benchmark of models with different size (#794) · ebe90bc9

Lyu Han authored Dec 06, 2023

* update test scripts for models with different sizes

* update

* only test after tunning gemm

* chmod +x

* fix typo

* benchmark on a100

* fix typo

* fix typo

* per-token latency percentile in profile_throughput

* fix

* fix

* rename

* make the script accept parameters

* minor fix

* indent

* reformat table

* change to 3000

* minor fix

ebe90bc9

05 Dec, 2023 1 commit

auto upload cuda12.1 python pkg to release when create new tag (#784) · 079f29bc

Chen Xin authored Dec 05, 2023

* add cuda12-whl-release ci

* enable environment

* test py310-311 windows wheel

* fix py310, py311 setup.py error on windows

* fix lint

079f29bc

04 Dec, 2023 1 commit
- Fix missed arguments when benchmark static inference performance (#787) · 2ba90822
  Lyu Han authored Dec 04, 2023
```
* minor fix in the profile scripts and docs

* miss arguments

* typo

* fix lint

* update
```
  2ba90822
29 Nov, 2023 3 commits

Update benchmark user guide (#763) · d3e2cee4

Lyu Han authored Nov 29, 2023

* user guide of benchmark generation

* update benchmark generation guide

* update profiling throughput guide

* update profiling api_server guide

* rename file names

* update profile tis user guide

* update

* fix according to review comments

* update

* update according to review comments

* updaste

* add an example

* update

d3e2cee4

Report first-token-latency and token-latency percentiles (#736) · 5c9e1e28

Lyu Han authored Nov 29, 2023

* update profile scripts

* add top_p, top_k and temperature as input arguments

* fix input_ids

* update profile_throughput

* update profile_restful_api

* update profile_serving

* update

* update

* add progress bar

* remove TODO comments

* update

* remove useless profile_* argument

* remove log level

* change concurrency default value to 64

* update restful_api.md

* update according to review comments

* fix docstring

5c9e1e28

improvement(build): enable ninja and gold linker (#767) · 8add942d

tpoisonooo authored Nov 29, 2023

* feat(build): enable ninja and lld

* fix(.github): add ninja installation

* fix(CI): remove dimsize=256

* fix(CI): add option for generate.sh

* fix(docs): update

8add942d

27 Nov, 2023 1 commit
- Set the default value of `max_context_token_num` 1 (#761) · 7868cea5
  Lyu Han authored Nov 27, 2023
  
  7868cea5
22 Nov, 2023 1 commit

Support loading hf model directly (#685) · 6b00f623

Chen Xin authored Nov 22, 2023

* turbomind support export model params

* fix overflow

* support turbomind.from_pretrained

* fix tp

* support AutoModel

* support load kv qparams

* update auto_awq

* udpate docstring

* export lmdeploy version

* update doc

* remove download_hf_repo

* LmdeployForCausalLM -> LmdeployForCausalLM

* refactor turbomind.py

* update comment

* add bfloat16 convert back

* support gradio run_locl load hf

* support resuful api server load hf

* add docs

* support loading previous quantized model

* adapt pr 690

* udpate docs

* not export turbomind config when quantize a model

* check model_name when can not get it from config.json

* update readme

* remove model_name in auto_awq

* update

* update

* udpate

* fix build

* absolute import

6b00f623

20 Nov, 2023 1 commit

Check-in user guide about turbomind config (#680) · 73386e21

Lyu Han authored Nov 20, 2023

* update

* update config guide

* update guide

* upate user guide according to review comments

73386e21

19 Nov, 2023 1 commit
- [Doc] Update restful api doc (#662) · c02e281f
  AllentDan authored Nov 19, 2023
```
* update restful_api.md

* add a hint

* repeat 3 time
```
  c02e281f
13 Nov, 2023 1 commit
- update kv8 docs (#681) · b7c88ca8
  pppppM authored Nov 13, 2023
  
  b7c88ca8
10 Nov, 2023 1 commit

Add extra_requires to reduce dependencies (#580) · 06125966

RunningLeon authored Nov 10, 2023

* update reqs

* update docs

* resolve comments

* upgrade pydantic

* fix rebase

* update doc

* update

* update

* update readme

* update

* add flash-attn

06125966

01 Nov, 2023 1 commit

Improve api_server and webui usage (#544) · 373bd013

AllentDan authored Nov 01, 2023

* make IPv6 compatible, safe run for coroutine interrupting

* instance_id -> session_id and fix api_client.py

* update doc

* remove useless faq

* safe ip mapping

* update app.py

* WIP completion

* completion

* update doc

* disable interactive mode for /v1/chat/completions

* docstring

* docstring

* refactor gradio

* update gradio

* udpate

* update doc

* rename

* session_id default -1

* missed two files

* add a APIClient

* add chat func for APIClient

* refine

* add concurrent function

* sequence_start, sequence_end --> interactive_mode

* update doc

* comments

* doc

* better text completion

* remove /v1/embeddings

* comments

* deprecate generate and use /v1/interactive/completions

* /v1/interactive/completion -> /v1/chat/interactive

* embeddings

* rename

* remove wrong arg description

* docstring

* fix

* update cli

* update doc

* strict session_len limit condition

* pass model args to api_server

373bd013

25 Oct, 2023 2 commits

Add more user-friendly CLI (#541) · 169d5169

RunningLeon authored Oct 25, 2023

* add

* import fire in main

* wrap to speed up fire cli

* update

* update docs

* update docs

* fix

* resolve commennts

* resolve confict and add test for cli

169d5169

Add "build from docker" section (#602) · 7283781e

Lyu Han authored Oct 25, 2023

* add build from docker section

* update

* install python package

* update

* update

* update

7283781e

23 Oct, 2023 1 commit
- Revert "[Docs] Simplify `build.md` (#370)" (#586) · af2f072e
  pppppM authored Oct 23, 2023
```
This reverts commit 4b5c2bda.
```
  af2f072e
13 Oct, 2023 1 commit

[doc] Update benchmark command in w4a16.md (#500) · 0b861c48

del-zhenwu authored Oct 13, 2023



* [doc] Update benchmark command in w4a16.md

* Update w4a16.md

* Update w4a16.md

add pip install nvidia-ml-py

* [doc] Update w4a16.md

* fix lint error
Signed-off-by: del-zhenwu <dele.zhenwu@gmail.com>

* [doc] update model_path & prompt_tokens
Signed-off-by: del-zhenwu <dele.zhenwu@gmail.com>

---------
Signed-off-by: del-zhenwu <dele.zhenwu@gmail.com>

0b861c48

12 Oct, 2023 1 commit
- update huggingface internlm-chat-7b model url (#546) · 27e12477
  AllentDan authored Oct 12, 2023
  
  27e12477
11 Oct, 2023 2 commits

Fix typo in `docs/en/pytorch.md` (#539) · 169d088a
Shahrukh Khan authored Oct 11, 2023

169d088a

make IPv6 compatible, safe run for coroutine interrupting (#487) · 759e1ddf

AllentDan authored Oct 11, 2023

* make IPv6 compatible, safe run for coroutine interrupting

* instance_id -> session_id and fix api_client.py

* update doc

* remove useless faq

* safe ip mapping

* update app.py

* remove print

* update doc

759e1ddf

14 Sep, 2023 1 commit
- Fix build.md (#411) · ec034c15
  nlp-pang authored Sep 14, 2023
```
* fix the build step

* Fix the build step
```
  ec034c15
11 Sep, 2023 1 commit

Support codellama (#359) · 65c662f9

Lyu Han authored Sep 11, 2023

* tmp

* add demo for codellama inference

* update

* update

* update

* update codellama.md

* export rope_theta

* update

* update doc

* fix client.py

* define SamplingParam

* rollback 'end'

* rotary_emb_base to rotary_embedding_base

* change to baichuan2-7b

65c662f9

05 Sep, 2023 1 commit
- [Docs] Simplify `build.md` (#370) · 4b5c2bda
  pppppM authored Sep 05, 2023
```
* use conda install nccl, openmpi and rapidjson

* update en doc
```
  4b5c2bda
01 Sep, 2023 1 commit

Decode generated token_ids incrementally (#309) · 9bfe03c6

AllentDan authored Sep 01, 2023

* add incremental decoding for turbomind

* update TIS

* fix triton post processing

* update doc

* fix typo

* SentencePieceTokenizer incremental decode, add qwen message prompt

* docstring

* update bot

9bfe03c6

30 Aug, 2023 1 commit
- Update FAQ for restful api (#319) · eaccbc0a
  AllentDan authored Aug 30, 2023
```
* update FAQ for restful api

* refine
```
  eaccbc0a
29 Aug, 2023 1 commit

fix(kvint8): update doc (#315) · a48e2d27

tpoisonooo authored Aug 29, 2023



* fix(kvint8): update doc

* style(lmdeploy): format

* style(kv_qparams.py): linting

* fix lint

* Update kv_int8.md

* Update kv_int8.md

---------
Co-authored-by: AllentDan <AllentDan@yeah.net>

a48e2d27

24 Aug, 2023 2 commits

Enable the Gradio server to call inference services through the RESTful API (#287) · 4279d8ca

AllentDan authored Aug 24, 2023



* app use async engine

* add stop logic

* app update cancel

* app support restful-api

* update doc and use the right model name

* set doc url root

* add comments

* add an example

* renew_session

* update readme.md

* resolve comments

* Update restful_api.md

* Update restful_api.md

* Update restful_api.md

---------
Co-authored-by: tpoisonooo <khj.application@aliyun.com>

4279d8ca

[Fix] Fix llama2 70b & qwen quantization error (#273) · d5cb0be2
pppppM authored Aug 24, 2023
```
* fix llama2 70b

* fix qwen quantization

* remove pdb

* add faq
```
d5cb0be2

22 Aug, 2023 1 commit

Add Restful API (#223) · d5c10e7a

AllentDan authored Aug 22, 2023

* add restful api

* refine

* add simple doc

* lint

* add uvicorn requirement

* more args

* add llama2

* docstring

* update doc

* save

* refine

* lint

* better decode

* add v1/embedding

* add GenerateRequest

* add llama2 chat template

* correct profiling

* update documents

* add length judge

* add faq

* update doc and rename req_que to req_queue

* fix md link, use get_logger, fix sequence_end bug

* use another doc link for go to avoid lint error

* add api_client.py

* update doc

* update doc

* update function interface

* update FAQ

* resolve comments

d5c10e7a

21 Aug, 2023 3 commits
- docs(quantization): update description (#272) · f44ef17c
  tpoisonooo authored Aug 21, 2023
  
  f44ef17c
- add readthedocs (#208) · c238f1cd
  RunningLeon authored Aug 21, 2023
```
* add readthedocs configs

* update readme

* fix link

* update

* remove turbomind in api

* update

* fix comment and remove api
```
  c238f1cd
- Check-in FAQ (#256) · 2f29a3c7
  Lyu Han authored Aug 21, 2023
```
* Check-in FAQ

* update

* update
```
  2f29a3c7
17 Aug, 2023 1 commit

docs(quantzation): update description (#253) · 903707b5

tpoisonooo authored Aug 17, 2023

* Update quantization.md

* docs(quantization): update description

* docs(README): rename quantization files

903707b5

15 Aug, 2023 1 commit
- Remove specified version in user guide (#241) · e68a1d00
  Lyu Han authored Aug 15, 2023
  
  e68a1d00
14 Aug, 2023 3 commits

Check-in user guide for w4a16 LLM deployment (#224) · 8e8629de
Lyu Han authored Aug 14, 2023
```
* tmp

* update

* update

* update

* update

* update

* remove

* update

* update
```
8e8629de
feat(quantization): kv cache use asymmetric (#218) · 902a3e16
tpoisonooo authored Aug 14, 2023
```
* feat(quantization): kv cache use asymmetric
```
902a3e16

[Feature] Blazing fast W4A16 inference (#202) · c3290cad

Li Zhang authored Aug 14, 2023

* add w4a16

* fix `deploy.py`

* add doc

* add w4a16 kernels

* fuse w1/w3 & bugfixes

* fix typo

* python

* guard sm75/80 features

* add missing header

* refactor

* qkvo bias

* update cost model

* fix lint

* update `deploy.py`

c3290cad

07 Aug, 2023 1 commit

[Refactor] Support multi-session chat (#178) · 4bd0b487

WRH authored Aug 07, 2023

* add some dist utils

* add model utils

* add termio and basicstreamer

* typo

* fix world size

* refactor chat and tested llama1

* add internlm adapter and support stoping criteria

* concat with id for internlm

* update docstring

* update and support llama2

* typo

* move docs to docs

* update docstring of session manager

* update docstring

* update docs

* fix accel none in model

* fix and add test for tensor broadcast

* fix session using typing to check type

* add docstrings and comprehensive condition test

* unit test for dist

* fix session

* split unittests of utils

* typo

* update control flow of accel

* move test model

* remove main in unittest

* remove some log

* remove some comments

4bd0b487

04 Aug, 2023 1 commit

Support serving with gradio without communicating to TIS (#162) · 18c386d9

AllentDan authored Aug 04, 2023



* use local model for webui

* local model for app.py

* lint

* remove print

* add seed

* comments

* fixed seesion_id

* support turbomind batch inference

* update app.py

* lint and docstring

* move webui to serve/gradio

* update doc

* update doc

* update docstring and rmeove print conversition

* log

* Update docs/zh_cn/build.md
Co-authored-by: Chen Xin <xinchen.tju@gmail.com>

* Update docs/en/build.md
Co-authored-by: Chen Xin <xinchen.tju@gmail.com>

* use latest gradio

* fix

* replace partial with InterFace

* use host ip instead of coolie

---------
Co-authored-by: Chen Xin <xinchen.tju@gmail.com>

18c386d9