Commits · 9d8949bf22e64c6d852a8256d964c551ccebbae7 · OpenDAS / Lmdeploy

03 Jul, 2023 3 commits
- [Doc] add persistent batch inference GIF (#43) · 9d8949bf
  vansin authored Jul 03, 2023
```
* [Doc] add persistent batch inference

* update

* update

* Update README.md

* Update README_zh-CN.md
```
  9d8949bf
- install triton_example and TransformerTritonBackend to runtime and lib respectively (#39) · bb6f8060
  lvhan028 authored Jul 03, 2023
  
  bb6f8060
- fix(kernel): speed degrade (#41) · 6e58fced
  tpoisonooo authored Jul 03, 2023
```
* feat(template): remote diff

* feat(cmake): use c++17
```
  6e58fced
01 Jul, 2023 5 commits
- change FasterTransformer to TurboMind (#37) · 8aa6eb10
  lvhan028 authored Jul 01, 2023
  
  8aa6eb10
- Change target tritonfastertransformerbackend to trtonturbomindbackend (#36) · 70e6ab26
  lvhan028 authored Jul 01, 2023
```
* change target tritonfastertransformerbackend to tritonturbomindbackend

* install targets to backends/turbomind

* changge model_dir
```
  70e6ab26
- build turbomind (#35) · 35d64462
  lvhan028 authored Jul 01, 2023
```
* build turbomind

* change namespace fastertransformer to turbomind

* change logger name
```
  35d64462
- rename src/fastertransformer to src/turbomind (#33) · 53d2e42c
  lvhan028 authored Jul 01, 2023
  
  53d2e42c
- Add lint action (#32) · fe46dac2
  AllentDan authored Jul 01, 2023
```
* temp

* fix lint

* csrc->src

* remove clang-format

* skip .rst

* skip doc

* clang-format

version

version

* mat_B
```
  fe46dac2
30 Jun, 2023 3 commits
- rename serve/fastertransformer to serve/turbomind (#31) · e8ab4ba3
  lvhan028 authored Jun 30, 2023
```
* rename lmdeploy/serve/fastertransformer to lmdeploy/serve/turbomind

* update

* update
```
  e8ab4ba3
- rename llmdeploy to lmdeploy (#30) · 46f4738c
  lvhan028 authored Jun 30, 2023
```
* change llmdeploy to lmdeploy

* update logo

* update readme
```
  46f4738c
- refactor webui (#29) · 081a6e89
  AllentDan authored Jun 30, 2023
  
  081a6e89
29 Jun, 2023 4 commits
- fix crash when conversation history out of limit (#28) · cb8ac1b0
  lvhan028 authored Jun 29, 2023
  
  cb8ac1b0
- remove cuda architecture from build option (#23) · c16b857b
  lvhan028 authored Jun 29, 2023
  
  c16b857b
- use huggingface tokenizer (#26) · 64936449
  q.yao authored Jun 29, 2023
```
* add hf tokenizer

* format

* fix for comment

* don't skip speical tokens
```
  64936449
- Add webui (#27) · 0cc48011
  AllentDan authored Jun 29, 2023
```
* add webui

* update readme

* resolve comments

* readme
```
  0cc48011
28 Jun, 2023 2 commits

feat(src): add kv cache int8 quantization (#22) · cc93136e

tpoisonooo authored Jun 28, 2023

* feat(src): add int8 and compile passed

* feat(kernels): fix

* feat(llama): update kernel

* feat(src): add debug

* fix(kernel): k_cache use int8_t pointer

* style(llama): clean code

* feat(deploy.py): revert to enable fmha

* style(LlamaV2): clean code

* feat(deploy.py): add default quant policy

cc93136e

fix-gemm-tuning (#24) · 4d42a781
Li Zhang authored Jun 28, 2023

4d42a781

26 Jun, 2023 1 commit
- add gemm tuning (#18) · e357c71f
  Li Zhang authored Jun 26, 2023
  
  e357c71f
25 Jun, 2023 4 commits

style(doc): README.md · 93604c3f
tpoisonooo authored Jun 25, 2023

93604c3f
fix(deploy.py): qkv no bias assertion · e0c7f51b
tpoisonooo authored Jun 25, 2023

e0c7f51b
Update requirements.txt · 1b7151c1
tpoisonooo authored Jun 25, 2023

1b7151c1

Add profile (#15) · 23c05372

lvhan028 authored Jun 25, 2023

* remove constraints on model name

* remove duplicate model converter

* add profile

* get eos and bos from server

* update stop_words

* update sequence_length when the last generated token is eos_id

* fix

* fix

* check-in models

* valicate model_name

* make stop_words as property

* debug profiling

* better stats

* fix assistant reponse

* update profile serving

* update

* update

23c05372

24 Jun, 2023 1 commit
- Support attention bias (#14) · 2700abb3
  Li Zhang authored Jun 24, 2023
```
* support attention bias

* fix conflict
```
  2700abb3
22 Jun, 2023 2 commits

remove duplicate model converter script (#13) · ee962784
lvhan028 authored Jun 22, 2023
```
* remove constraints on model name

* remove duplicate model converter
```
ee962784

Fix fmha on sm 70 (#12) · 4b121180

q.yao authored Jun 22, 2023



* update arch

* clang-format

* remove comment

---------
Co-authored-by: yaoqian <yaoqian@localhost.localdomain>

4b121180

21 Jun, 2023 3 commits
- check-in build script (#11) · 102aefda
  lvhan028 authored Jun 21, 2023
  
  102aefda
- support fmha (#9) · 6c7d9992
  q.yao authored Jun 21, 2023
```
* support fmha

* update sm by cudaarch

* update ldscript path

* clang-format

* clang-format

---------
```
  6c7d9992
- check-in `.clang-format` (#10) · 62c60806
  Li Zhang authored Jun 21, 2023
  
  62c60806
20 Jun, 2023 4 commits

check-in dockerfile (#8) · bd2e0bf7
lvhan028 authored Jun 20, 2023
```
* check-in dockerfile

* check-in dockerfile
```
bd2e0bf7
check-in fastertransformer (#7) · 9efcac38
Li Zhang authored Jun 20, 2023
```
* add ft code

* gitignore

* fix lint

* revert fmha
```
9efcac38
Add readme (#6) · 720fc533
lvhan028 authored Jun 20, 2023
```
* add logo

* update readme
```
720fc533

update scripts for deploying llama family model to fastertransformer triton models () · 2bf481fb

lvhan028 authored Jun 20, 2023

* add scripts for deploying llama family models via fastertransformer

* fix

* fix

* set symlinks True when copying triton models templates

* pack model repository for triton inference server

* add exception

* fix

* update config.pbtxt and launching scripts

2bf481fb

18 Jun, 2023 4 commits
- check-in fastertransformer's triton models (#3 ) · 4f47f78c
  lvhan028 authored Jun 18, 2023
  
  4f47f78c
- add chatbot (#2 ) · ef2adb04
  lvhan028 authored Jun 18, 2023
  
  ef2adb04
- check-in license (#1 ) · a75c0a47
  lvhan028 authored Jun 18, 2023
  
  a75c0a47
- init commit · 7258c786
  lvhan028 authored Jun 18, 2023
  
  7258c786