Commits · b9004712bcbd6963da40d48a40f9e1d01e7865cf · OpenDAS / Lmdeploy

27 Jul, 2023 1 commit

Chen Xin authored Jul 27, 2023



* update builder

* remove root permission

* update readme

* update setup.py

* add install cuda 12.1 script

* use generate.sh

* add nccl to install_requires

* update README.md

* fix lint

* update setup.py

---------
Co-authored-by: chenxin <chenxin@pjlab.org.cn>

b9004712

25 Jul, 2023 1 commit
- support fmha gqa (#160) · 5ed6bb59
  q.yao authored Jul 25, 2023
```
Co-authored-by: grimoire <yaoqian@pjlab.org.cn>
```
  5ed6bb59
24 Jul, 2023 1 commit
- [Feature] decode-only forward pass (#153) · 0cc9d095
  Li Zhang authored Jul 24, 2023
```
* decode only forward pass

* fix lint

* batch embedding
```
  0cc9d095
21 Jul, 2023 1 commit

[Feature] Support Llama-2 with GQA (#147) · f07b697b

Li Zhang authored Jul 21, 2023

* add GQA for llama2

* fix model conversion

* fix lint & remove dev log

* update news

* minor

* fix allocation size

* fix split_dim for w_qkv.bias

f07b697b

18 Jul, 2023 1 commit

Tensor Parallel python api (#82) · 7cbfe2ea

q.yao authored Jul 18, 2023

* wip

* profile disable tp

* fix profile

* lint

* fix dlpack

* remove comment

* add tp flag

* add session len check

* add eos

* remove tp and session len inputs

* warp tokenizer

* multithread load weight

* update profile

* refactor tokenizer

* remove pre/post process

* remove mpi4py requirement

* remove

* remove bind

* remove mpi requirement

* check backend_tokenizer

7cbfe2ea

17 Jul, 2023 1 commit
- update log info (#131) · 1f88baa5
  q.yao authored Jul 17, 2023
```
* update log info

* format cuda utils
```
  1f88baa5
06 Jul, 2023 2 commits

Streaming output (#71) · 74a4f3c9

q.yao authored Jul 06, 2023



* streaming-output

* fix end

* fix profile

* support chinese streaming

* lint

* update chat

* lint

* fix benchmark

---------
Co-authored-by: grimoire <yaoqian@pjlab.org.cn>

74a4f3c9

fix clang-format (#68) · 208b6841
AllentDan authored Jul 06, 2023

208b6841

05 Jul, 2023 2 commits

[Feature] Stats Quantization Parameters for KV Cache (#45) · 3fff964d

pppppM authored Jul 05, 2023

* add cal qparams

* support offload inference

* add collect funtions (mod,weight)

* stats kv scales

* update init

* add user guide

* fix hints

* fix comments & support turbomind format

* update user guide

* fix slice kv cache error & support pileval dataset (used in llm-awq)

* fix wrong num heads slice

* update default dataset

* fix conflict

* fix hints

* fix hints

* add gitignore

3fff964d

Python ffi (#34) · 4fd6e710

q.yao authored Jul 05, 2023



* wip

* wip

* example finish

* fix include and namespace

* wtf

* install lib

* batchize

* update cmake install

* multithread

* fix comment

* fix

* add mmengine

* bind llamamodel

---------
Co-authored-by: grimoire <yaoqian@pjlab.org.cn>

4fd6e710

04 Jul, 2023 1 commit
- use format-11.1 (#38) · 5ea40abf
  AllentDan authored Jul 04, 2023
```
* format-11.1

* md-link-config
```
  5ea40abf
03 Jul, 2023 2 commits
- install triton_example and TransformerTritonBackend to runtime and lib respectively (#39) · bb6f8060
  lvhan028 authored Jul 03, 2023
  
  bb6f8060
- fix(kernel): speed degrade (#41) · 6e58fced
  tpoisonooo authored Jul 03, 2023
```
* feat(template): remote diff

* feat(cmake): use c++17
```
  6e58fced
01 Jul, 2023 4 commits
- Change target tritonfastertransformerbackend to trtonturbomindbackend (#36) · 70e6ab26
  lvhan028 authored Jul 01, 2023
```
* change target tritonfastertransformerbackend to tritonturbomindbackend

* install targets to backends/turbomind

* changge model_dir
```
  70e6ab26
- build turbomind (#35) · 35d64462
  lvhan028 authored Jul 01, 2023
```
* build turbomind

* change namespace fastertransformer to turbomind

* change logger name
```
  35d64462
- rename src/fastertransformer to src/turbomind (#33) · 53d2e42c
  lvhan028 authored Jul 01, 2023
  
  53d2e42c
- Add lint action (#32) · fe46dac2
  AllentDan authored Jul 01, 2023
```
* temp

* fix lint

* csrc->src

* remove clang-format

* skip .rst

* skip doc

* clang-format

version

version

* mat_B
```
  fe46dac2
28 Jun, 2023 2 commits

feat(src): add kv cache int8 quantization (#22) · cc93136e

tpoisonooo authored Jun 28, 2023

* feat(src): add int8 and compile passed

* feat(kernels): fix

* feat(llama): update kernel

* feat(src): add debug

* fix(kernel): k_cache use int8_t pointer

* style(llama): clean code

* feat(deploy.py): revert to enable fmha

* style(LlamaV2): clean code

* feat(deploy.py): add default quant policy

cc93136e

fix-gemm-tuning (#24) · 4d42a781
Li Zhang authored Jun 28, 2023

4d42a781

26 Jun, 2023 1 commit
- add gemm tuning (#18) · e357c71f
  Li Zhang authored Jun 26, 2023
  
  e357c71f
24 Jun, 2023 1 commit
- Support attention bias (#14) · 2700abb3
  Li Zhang authored Jun 24, 2023
```
* support attention bias

* fix conflict
```
  2700abb3
22 Jun, 2023 1 commit

Fix fmha on sm 70 (#12) · 4b121180

q.yao authored Jun 22, 2023



* update arch

* clang-format

* remove comment

---------
Co-authored-by: yaoqian <yaoqian@localhost.localdomain>

4b121180

21 Jun, 2023 1 commit

support fmha (#9) · 6c7d9992

q.yao authored Jun 21, 2023

* support fmha

* update sm by cudaarch

* update ldscript path

* clang-format

* clang-format

---------

6c7d9992

20 Jun, 2023 1 commit
- check-in fastertransformer (#7) · 9efcac38
  Li Zhang authored Jun 20, 2023
```
* add ft code

* gitignore

* fix lint

* revert fmha
```
  9efcac38