Commits · b190521b6028759ce27e12d2d85f35bd7ab73d62 · OpenDAS / Lmdeploy

15 Dec, 2023 1 commit

support image_embs input (#799) · b190521b

Chen Xin authored Dec 15, 2023

* support image_embs input

* add some checks

* update interactive/config.pbtxt && TurbomindModelConfig

* update docstring

* refactor

* support convert embeddings to bf16

* update interactive/config.pbtxt

* embeddings -> input_embeddings

* use input_embedding_ranges

* remove embedding_begins/ends

b190521b

14 Dec, 2023 1 commit
- fix api_server `stop` and `end_session` (#835) · af5a3edb
  AllentDan authored Dec 14, 2023
  
  af5a3edb
13 Dec, 2023 3 commits

add encode for opencompass (#828) · 558029b6
AllentDan authored Dec 13, 2023
```
* add encode for opencompass

* doc

* remove **kwargs
```
558029b6
Support building docker image manually in CI (#825) · 872701e3
RunningLeon authored Dec 13, 2023
```
* update dockerfile and ci

* remove unused env

* ignore link check of reddit website urls
```
872701e3

Add api.py (#805) · 5c9aa51a

AllentDan authored Dec 13, 2023

* add api.py

* update serve function

* add model_name arg and provide examples

* docstring

* remove service_available

* type hint

5c9aa51a

12 Dec, 2023 3 commits
- fix `finish_reason` (#816) · 16b4b823
  AllentDan authored Dec 12, 2023
  
  16b4b823
- simplify the header of the benchmark table (#820) · a5b67b95
  Lyu Han authored Dec 12, 2023
```
* simplify the header of the benchmark table

* miss comma

* fix lint
```
  a5b67b95
- fix cache verification (#821) · 72869ef8
  Li Zhang authored Dec 12, 2023
  
  72869ef8
11 Dec, 2023 4 commits
- FIFO pipe strategy for api_server (#795) · cfa80974
  AllentDan authored Dec 11, 2023
```
* FIFO pipe for api_server

* asyncio sleep 0

* remove unwanted import

* rename symbols

* speed benchmark up by disable preprocess for string input

* replace Queue with set

* comment
```
  cfa80974
- Disable attention mask when it is not needed (#813) · b8354dae
  Li Zhang authored Dec 11, 2023
```
* disable attention mask when not needed

* fix for sm<80 and float data type
```
  b8354dae
- set smem size for repetition penalty kernel (#818) · d5a89465
  Li Zhang authored Dec 11, 2023
  
  d5a89465
- Simplify block manager (#812) · a54b16a2
  Li Zhang authored Dec 11, 2023
```
* simplify block manager

* fix lint
```
  a54b16a2
07 Dec, 2023 1 commit
- fix out of bounds access (#809) · 2d5f5b30
  Li Zhang authored Dec 07, 2023
  
  2d5f5b30
06 Dec, 2023 3 commits

bump version to v0.1.0a2 (#807) · fddad305
Lyu Han authored Dec 06, 2023

fddad305

Report the inference benchmark of models with different size (#794) · ebe90bc9

Lyu Han authored Dec 06, 2023

* update test scripts for models with different sizes

* update

* only test after tunning gemm

* chmod +x

* fix typo

* benchmark on a100

* fix typo

* fix typo

* per-token latency percentile in profile_throughput

* fix

* fix

* rename

* make the script accept parameters

* minor fix

* indent

* reformat table

* change to 3000

* minor fix

ebe90bc9

fix local kv head num (#806) · 5b9e454a
Lyu Han authored Dec 06, 2023

5b9e454a

05 Dec, 2023 2 commits
- fix extra colon in InternLMChat7B (#796) · bd7c4e39
  Qian Zhao authored Dec 05, 2023
  
  bd7c4e39
- auto upload cuda12.1 python pkg to release when create new tag (#784) · 079f29bc
  Chen Xin authored Dec 05, 2023
```
* add cuda12-whl-release ci

* enable environment

* test py310-311 windows wheel

* fix py310, py311 setup.py error on windows

* fix lint
```
  079f29bc
04 Dec, 2023 4 commits

add cuda12.1 build check ci (#782) · 7990d252
Chen Xin authored Dec 04, 2023
```
* update cuda12.1 build check ci

* use matrix
```
7990d252

Unify prefill & decode passes (#775) · 7f943a26

Li Zhang authored Dec 04, 2023

* Unify prefill and decode passes

* dynamic split-fuse

* refactor

* correct input count calculation

* remove unused

* lint

* lint

* fix msvc build

* fix msvc build

* fix msvc build

* fix msvc build

* fix msvc build

* fix msvc build

* fix msvc build

* fix msvc build

* fix msvc build

7f943a26

Fix missed arguments when benchmark static inference performance (#787) · 2ba90822
Lyu Han authored Dec 04, 2023
```
* minor fix in the profile scripts and docs

* miss arguments

* typo

* fix lint

* update
```
2ba90822
add chat template for Yi (#779) · 12dc3e14
AllentDan authored Dec 04, 2023

12dc3e14

02 Dec, 2023 1 commit
- Fix early exit condition in attention kernel (#788) · 816022e4
  Li Zhang authored Dec 02, 2023
  
  816022e4
29 Nov, 2023 7 commits

Update benchmark user guide (#763) · d3e2cee4

Lyu Han authored Nov 29, 2023

* user guide of benchmark generation

* update benchmark generation guide

* update profiling throughput guide

* update profiling api_server guide

* rename file names

* update profile tis user guide

* update

* fix according to review comments

* update

* update according to review comments

* updaste

* add an example

* update

d3e2cee4

bump version to 0.1.0a1 (#776) · 9c46b27c
Lyu Han authored Nov 29, 2023

9c46b27c
convert model with hf repo_id (#774) · 77efebbf
Chen Xin authored Nov 29, 2023

77efebbf

Report first-token-latency and token-latency percentiles (#736) · 5c9e1e28

Lyu Han authored Nov 29, 2023

* update profile scripts

* add top_p, top_k and temperature as input arguments

* fix input_ids

* update profile_throughput

* update profile_restful_api

* update profile_serving

* update

* update

* add progress bar

* remove TODO comments

* update

* remove useless profile_* argument

* remove log level

* change concurrency default value to 64

* update restful_api.md

* update according to review comments

* fix docstring

5c9e1e28

improvement(build): enable ninja and gold linker (#767) · 8add942d

tpoisonooo authored Nov 29, 2023

* feat(build): enable ninja and lld

* fix(.github): add ninja installation

* fix(CI): remove dimsize=256

* fix(CI): add option for generate.sh

* fix(docs): update

8add942d

fix turbomind build on sm<80 (#754) · 8c672a7b
q.yao authored Nov 29, 2023
```
* fix

* fix lint
```
8c672a7b

add triton server test and workflow yml (#760) · 4744b28c

RunningLeon authored Nov 29, 2023

* add triton server test and workflow yml

* update

* revert changes in dockerfile

* update prompts

4744b28c

28 Nov, 2023 1 commit
- fix typo (#769) · 2f80c556
  q.yao authored Nov 28, 2023
  
  2f80c556
27 Nov, 2023 2 commits
- Set the default value of `max_context_token_num` 1 (#761) · 7868cea5
  Lyu Han authored Nov 27, 2023
  
  7868cea5
- [Fix] Rollback the data type of input_ids to TYPE_UINT32 in preprocessor's proto (#758) · 4bcc4f11
  Lyu Han authored Nov 27, 2023
  
  4bcc4f11
24 Nov, 2023 1 commit
- [Fix] build docker image failed since `packaging` is missing (#753) · c07f60fd
  Lyu Han authored Nov 24, 2023
  
  c07f60fd
23 Nov, 2023 3 commits
- [Fix] Skip empty batch (#747) · a7c5007c
  Li Zhang authored Nov 23, 2023
  
  a7c5007c
- bump version to v0.1.0a0 (#709) · d3386351
  Lyu Han authored Nov 23, 2023
  
  d3386351
- Fix cache/output length calculation (#738) · 434961c6
  Li Zhang authored Nov 23, 2023
  
  434961c6
22 Nov, 2023 1 commit

Support loading hf model directly (#685) · 6b00f623

Chen Xin authored Nov 22, 2023

* turbomind support export model params

* fix overflow

* support turbomind.from_pretrained

* fix tp

* support AutoModel

* support load kv qparams

* update auto_awq

* udpate docstring

* export lmdeploy version

* update doc

* remove download_hf_repo

* LmdeployForCausalLM -> LmdeployForCausalLM

* refactor turbomind.py

* update comment

* add bfloat16 convert back

* support gradio run_locl load hf

* support resuful api server load hf

* add docs

* support loading previous quantized model

* adapt pr 690

* udpate docs

* not export turbomind config when quantize a model

* check model_name when can not get it from config.json

* update readme

* remove model_name in auto_awq

* update

* update

* udpate

* fix build

* absolute import

6b00f623

21 Nov, 2023 1 commit
- Replace mmengine with mmengine-lite (#715) · 42e57c8b
  Zaida Zhou authored Nov 21, 2023
  
  42e57c8b
20 Nov, 2023 1 commit

Check-in user guide about turbomind config (#680) · 73386e21

Lyu Han authored Nov 20, 2023

* update

* update config guide

* update guide

* upate user guide according to review comments

73386e21