Commits · 0d21f366adeea29ef816ff137f4febc71c2416a7 · OpenDAS / Lmdeploy

14 Aug, 2023 1 commit

[Feature] Blazing fast W4A16 inference (#202) · c3290cad

Li Zhang authored Aug 14, 2023

* add w4a16

* fix `deploy.py`

* add doc

* add w4a16 kernels

* fuse w1/w3 & bugfixes

* fix typo

* python

* guard sm75/80 features

* add missing header

* refactor

* qkvo bias

* update cost model

* fix lint

* update `deploy.py`

c3290cad

03 Aug, 2023 1 commit
- Fix build test error and move turbmind csrc test cases to `tests/csrc` (#188) · 44a85546
  lvhan028 authored Aug 03, 2023
```
* fix build tests failure

* move src test cases to tests/csrc
```
  44a85546
31 Jul, 2023 1 commit
- [Fix] Remove unused code to reduce binary size (#181) · 981a4610
  Li Zhang authored Jul 31, 2023
```
* clean-up

* fix lint

* fix lint
```
  981a4610
05 Jul, 2023 3 commits

Update setup for build python wheel (#61) · d2c9caa4
RunningLeon authored Jul 05, 2023

d2c9caa4
fix build w/o python ffi (#64) · 08252a83
Li Zhang authored Jul 05, 2023

08252a83

Python ffi (#34) · 4fd6e710

q.yao authored Jul 05, 2023



* wip

* wip

* example finish

* fix include and namespace

* wtf

* install lib

* batchize

* update cmake install

* multithread

* fix comment

* fix

* add mmengine

* bind llamamodel

---------
Co-authored-by: grimoire <yaoqian@pjlab.org.cn>

4fd6e710

03 Jul, 2023 2 commits
- install triton_example and TransformerTritonBackend to runtime and lib respectively (#39) · bb6f8060
  lvhan028 authored Jul 03, 2023
  
  bb6f8060
- fix(kernel): speed degrade (#41) · 6e58fced
  tpoisonooo authored Jul 03, 2023
```
* feat(template): remote diff

* feat(cmake): use c++17
```
  6e58fced
01 Jul, 2023 4 commits
- change FasterTransformer to TurboMind (#37) · 8aa6eb10
  lvhan028 authored Jul 01, 2023
  
  8aa6eb10
- Change target tritonfastertransformerbackend to trtonturbomindbackend (#36) · 70e6ab26
  lvhan028 authored Jul 01, 2023
```
* change target tritonfastertransformerbackend to tritonturbomindbackend

* install targets to backends/turbomind

* changge model_dir
```
  70e6ab26
- build turbomind (#35) · 35d64462
  lvhan028 authored Jul 01, 2023
```
* build turbomind

* change namespace fastertransformer to turbomind

* change logger name
```
  35d64462
- Add lint action (#32) · fe46dac2
  AllentDan authored Jul 01, 2023
```
* temp

* fix lint

* csrc->src

* remove clang-format

* skip .rst

* skip doc

* clang-format

version

version

* mat_B
```
  fe46dac2
28 Jun, 2023 1 commit

feat(src): add kv cache int8 quantization (#22) · cc93136e

tpoisonooo authored Jun 28, 2023

* feat(src): add int8 and compile passed

* feat(kernels): fix

* feat(llama): update kernel

* feat(src): add debug

* fix(kernel): k_cache use int8_t pointer

* style(llama): clean code

* feat(deploy.py): revert to enable fmha

* style(LlamaV2): clean code

* feat(deploy.py): add default quant policy

cc93136e

20 Jun, 2023 1 commit
- check-in fastertransformer (#7) · 9efcac38
  Li Zhang authored Jun 20, 2023
```
* add ft code

* gitignore

* fix lint

* revert fmha
```
  9efcac38