Commits · b239346701bd8d9cbc2ba1a2f5053cb1e1d671b5 · OpenDAS / Lmdeploy

06 Jul, 2023 2 commits
- Update .gitignore (#70) · b2393467
  tpoisonooo authored Jul 06, 2023
```
* docs(README): fix script
```
  b2393467
- fix(project): interlm run error (#69) · 22d403f5
  tpoisonooo authored Jul 06, 2023
  
  22d403f5
05 Jul, 2023 2 commits

[Feature] Stats Quantization Parameters for KV Cache (#45) · 3fff964d

pppppM authored Jul 05, 2023

* add cal qparams

* support offload inference

* add collect funtions (mod,weight)

* stats kv scales

* update init

* add user guide

* fix hints

* fix comments & support turbomind format

* update user guide

* fix slice kv cache error & support pileval dataset (used in llm-awq)

* fix wrong num heads slice

* update default dataset

* fix conflict

* fix hints

* fix hints

* add gitignore

3fff964d

Python ffi (#34) · 4fd6e710

q.yao authored Jul 05, 2023



* wip

* wip

* example finish

* fix include and namespace

* wtf

* install lib

* batchize

* update cmake install

* multithread

* fix comment

* fix

* add mmengine

* bind llamamodel

---------
Co-authored-by: grimoire <yaoqian@pjlab.org.cn>

4fd6e710

04 Jul, 2023 1 commit
- support 'input_tokens' in triton_example (#49) · 62e0fa9a
  lvhan028 authored Jul 04, 2023
```
* check-in script for tokenizing a file

* use max_input_len
```
  62e0fa9a
20 Jun, 2023 2 commits

check-in fastertransformer (#7) · 9efcac38
Li Zhang authored Jun 20, 2023
```
* add ft code

* gitignore

* fix lint

* revert fmha
```
9efcac38

update scripts for deploying llama family model to fastertransformer triton models (#4) · 2bf481fb

lvhan028 authored Jun 20, 2023

* add scripts for deploying llama family models via fastertransformer

* fix

* fix

* set symlinks True when copying triton models templates

* pack model repository for triton inference server

* add exception

* fix

* update config.pbtxt and launching scripts

2bf481fb

18 Jun, 2023 1 commit
- check-in license (#1 ) · a75c0a47
  lvhan028 authored Jun 18, 2023
  
  a75c0a47