Commits · 4db08045baa250148b1e176e9ac1d5797affcd75 · OpenDAS / Lmdeploy

11 Jul, 2023 5 commits
- docs(serving.md): typo (#92) · 4db08045
  tpoisonooo authored Jul 11, 2023
```
* docs(serving.md): typo

* docs(README): quantization
```
  4db08045
- feat(deploy.py): support w pack qkv (#83) · ac638b37
  tpoisonooo authored Jul 11, 2023
```
* feat(deploy.py): support w pack qkv
```
  ac638b37
- update contribution.md (#86) · e7d5e062
  q.yao authored Jul 11, 2023
```
* update contrib

* update links
```
  e7d5e062
- remove unused cmake flags (#73) · 54e0bbac
  Li Zhang authored Jul 11, 2023
  
  54e0bbac
- [Fix] Remaining Issues in #19 (#75) · a6ac981d
  WRH authored Jul 11, 2023
```
* previous merged

* add chinese

* support torch<2

* add a docstring

* fix typo

* rename torch submodule

* rename to pytorch

* rename in readme
```
  a6ac981d
10 Jul, 2023 1 commit
- Update lint.yml (#76) · cfb3b75d
  WRH authored Jul 10, 2023
  
  cfb3b75d
06 Jul, 2023 10 commits

update benchmark image (#77) · 050a2120
pppppM authored Jul 06, 2023
```
* update benchmark image

* update image url
```
050a2120

[Feature] Add a torch client (#19) · 009075d8

WRH authored Jul 06, 2023

* draft torch client

* deal with space of tokenizer

* support tensor parallel

* fix

* fix

* move folder

* move instruction to readme

* move to torch/

* rename client to chat

* very bad response

* stash

* rename streamer

* support internlm

* change default args

* remove test

* improve instructions

* remove module docstring

* decrease header level of torch model

009075d8

update zh readme (#74) · 76ae8627
pppppM authored Jul 06, 2023

76ae8627

Streaming output (#71) · 74a4f3c9

q.yao authored Jul 06, 2023



* streaming-output

* fix end

* fix profile

* support chinese streaming

* lint

* update chat

* lint

* fix benchmark

---------
Co-authored-by: grimoire <yaoqian@pjlab.org.cn>

74a4f3c9

fix clang-format (#68) · 208b6841
AllentDan authored Jul 06, 2023

208b6841
Update .gitignore (#70) · b2393467
tpoisonooo authored Jul 06, 2023
```
* docs(README): fix script
```
b2393467
update image url (#72) · f4b7f9a6
pppppM authored Jul 06, 2023

f4b7f9a6
fix(project): interlm run error (#69) · 22d403f5
tpoisonooo authored Jul 06, 2023

22d403f5
add internlm url (#67) · 7c6edc83
pppppM authored Jul 06, 2023

7c6edc83

add build-lmdeploy command in dockerfile (#40) · f56f3d87

lvhan028 authored Jul 06, 2023



* build lmdeploy in dockerfile

* rename file

* update dockerfile

---------
Co-authored-by: grimoire <streetyao@live.com>

f56f3d87

05 Jul, 2023 13 commits

improve readme (#52) · 3e7b6bfd

lvhan028 authored Jul 05, 2023

* add performance

* use png

* update

* update

* update

* update

* update

3e7b6bfd

lower transformer version <4.30.0 (#66) · adfd81d3
lvhan028 authored Jul 05, 2023

adfd81d3

update internlm‘s chat template (#54) · 3de27ead

lvhan028 authored Jul 05, 2023

* update internlm model

* update

* update

* update

* update

* update temperature, topk and top_p

* update

* update

* loosen log level

3de27ead

Update setup for build python wheel (#61) · d2c9caa4
RunningLeon authored Jul 05, 2023

d2c9caa4
fix build w/o python ffi (#64) · 08252a83
Li Zhang authored Jul 05, 2023

08252a83
add demo gif (#63) · 9d7cd629
AllentDan authored Jul 05, 2023
```
* add demo gif

* add demo gif
```
9d7cd629

fix(kv_qparams.py): zp use min (#59) · ec53d63f

tpoisonooo authored Jul 05, 2023

* fix(kv_qparams.py): zp use min

* revert(qparams.py): revert format

* fix(kv_qparams.py): update formula

ec53d63f

remove tokenizer_path from chat_example and move it to lmdeploy/turbomind (#55) · 61e8d2c6
q.yao authored Jul 05, 2023

61e8d2c6
Update README.md (#57) · da62f428
tpoisonooo authored Jul 05, 2023

da62f428
docs(README): typo (#56) · 7396d8f6
tpoisonooo authored Jul 05, 2023

7396d8f6

[Feature] Stats Quantization Parameters for KV Cache (#45) · 3fff964d

pppppM authored Jul 05, 2023

* add cal qparams

* support offload inference

* add collect funtions (mod,weight)

* stats kv scales

* update init

* add user guide

* fix hints

* fix comments & support turbomind format

* update user guide

* fix slice kv cache error & support pileval dataset (used in llm-awq)

* fix wrong num heads slice

* update default dataset

* fix conflict

* fix hints

* fix hints

* add gitignore

3fff964d

docs(quantization): add more test (#53) · edb6eb86

tpoisonooo authored Jul 05, 2023

* docs(quantization): add more test

* revert(generate.sh): revert ninja

* revert(llama_config.ini): revert empty line

* fix(quantization.md): fix link error

edb6eb86

Python ffi (#34) · 4fd6e710

q.yao authored Jul 05, 2023



* wip

* wip

* example finish

* fix include and namespace

* wtf

* install lib

* batchize

* update cmake install

* multithread

* fix comment

* fix

* add mmengine

* bind llamamodel

---------
Co-authored-by: grimoire <yaoqian@pjlab.org.cn>

4fd6e710

04 Jul, 2023 7 commits
- use format-11.1 (#38) · 5ea40abf
  AllentDan authored Jul 04, 2023
```
* format-11.1

* md-link-config
```
  5ea40abf
- fix model conversion (#51) · 9bbd39b7
  Li Zhang authored Jul 04, 2023
  
  9bbd39b7
- support 'input_tokens' in triton_example (#49) · 62e0fa9a
  lvhan028 authored Jul 04, 2023
```
* check-in script for tokenizing a file

* use max_input_len
```
  62e0fa9a
- docs(README): fix (#50) · 4c303b17
  tpoisonooo authored Jul 04, 2023
  
  4c303b17
- export attn_bias as int type into config (#48) · 0d19a95d
  lvhan028 authored Jul 04, 2023
  
  0d19a95d
- Update quantization.md (#47) · fa7cbc7a
  tpoisonooo authored Jul 04, 2023
  
  fa7cbc7a
- docs(project): add quantization test results (#46) · 197b3ee1
  tpoisonooo authored Jul 04, 2023
```
* docs(README): update description

* docs(project): add quantization test results

* docs(README): reorder

* docs(quantization): add more description

* docs(README): remove openmmlab badge

* docs(README): scale up image

* docs(dir): add zh_cn subdir
```
  197b3ee1
03 Jul, 2023 3 commits
- [Doc] add persistent batch inference GIF (#43) · 9d8949bf
  vansin authored Jul 03, 2023
```
* [Doc] add persistent batch inference

* update

* update

* Update README.md

* Update README_zh-CN.md
```
  9d8949bf
- install triton_example and TransformerTritonBackend to runtime and lib respectively (#39) · bb6f8060
  lvhan028 authored Jul 03, 2023
  
  bb6f8060
- fix(kernel): speed degrade (#41) · 6e58fced
  tpoisonooo authored Jul 03, 2023
```
* feat(template): remote diff

* feat(cmake): use c++17
```
  6e58fced
01 Jul, 2023 1 commit
- change FasterTransformer to TurboMind (#37) · 8aa6eb10
  lvhan028 authored Jul 01, 2023
  
  8aa6eb10