Commits · 22e8b2ca2b38ff924ca1cbf82ae7e34b51d1d61a · OpenDAS / Lmdeploy

01 Sep, 2023 1 commit
- Package 'bin/llama_gemm' to wheel (#320) · 22e8b2ca
  Chen Xin authored Sep 01, 2023
```
* pack llama_gemm

* update CMakeLists.txt

* remove candidate

* update MANIFEST.in
```
  22e8b2ca
24 Aug, 2023 1 commit

[Feature] Support decode with DP in pytorch (#193) · 81f29837

WRH authored Aug 24, 2023

* support decode

* unit test and benckmark and improve

* remove some drafts

* enable numerical test

* minor

* add some benchmark data

* add more output

* update interface

* remove debugs

* format

* update docstring

* remove print and add benchmark results

* use logits & add main

* fix rb

* dump large

* update test

* update test decode

* add decimal

81f29837

21 Aug, 2023 1 commit

add readthedocs (#208) · c238f1cd

RunningLeon authored Aug 21, 2023

* add readthedocs configs

* update readme

* fix link

* update

* remove turbomind in api

* update

* fix comment and remove api

c238f1cd

14 Aug, 2023 1 commit
- Bump version to v0.0.4 (#231) · 8cdcb2a9
  Lyu Han authored Aug 14, 2023
  
  8cdcb2a9
11 Aug, 2023 1 commit

[Feature] Support AWQ (#108) · d3dbe179

pppppM authored Aug 11, 2023

* support kv cache offload

* add dataloader docstring

* complete gitignore

* refactor collect mod fn

* add calibration

* fix lint

* add observers and quantizers

* fix lints

* add global available mixin

* fix lints

* split batch inference

* support smoothquant and awq

* update export kv scales

* fix lints

* fix some bugs

* update weight only usage

* update usage

* auto mapping and support smooth internlm

* trust remote code

* fix num head key error

* fix bias error

* align shape and pack order with llm-awq

* modified according to LZHgrla's comments.

* update gitignore

* fix kv qparams export error

* update usage

* decouple calibrate and awq

* update docstrings

* update api name

* update readme

* update readme

* update readme

* update readme

* update kv_qparams and readme

* fix typos

d3dbe179

27 Jul, 2023 1 commit

Add manylinux builder (#164) · b9004712

Chen Xin authored Jul 27, 2023



* update builder

* remove root permission

* update readme

* update setup.py

* add install cuda 12.1 script

* use generate.sh

* add nccl to install_requires

* update README.md

* fix lint

* update setup.py

---------
Co-authored-by: chenxin <chenxin@pjlab.org.cn>

b9004712

06 Jul, 2023 3 commits

[Feature] Add a torch client (#19) · 009075d8

WRH authored Jul 06, 2023

* draft torch client

* deal with space of tokenizer

* support tensor parallel

* fix

* fix

* move folder

* move instruction to readme

* move to torch/

* rename client to chat

* very bad response

* stash

* rename streamer

* support internlm

* change default args

* remove test

* improve instructions

* remove module docstring

* decrease header level of torch model

009075d8

Update .gitignore (#70) · b2393467
tpoisonooo authored Jul 06, 2023
```
* docs(README): fix script
```
b2393467
fix(project): interlm run error (#69) · 22d403f5
tpoisonooo authored Jul 06, 2023

22d403f5

05 Jul, 2023 2 commits

[Feature] Stats Quantization Parameters for KV Cache (#45) · 3fff964d

pppppM authored Jul 05, 2023

* add cal qparams

* support offload inference

* add collect funtions (mod,weight)

* stats kv scales

* update init

* add user guide

* fix hints

* fix comments & support turbomind format

* update user guide

* fix slice kv cache error & support pileval dataset (used in llm-awq)

* fix wrong num heads slice

* update default dataset

* fix conflict

* fix hints

* fix hints

* add gitignore

3fff964d

Python ffi (#34) · 4fd6e710

q.yao authored Jul 05, 2023



* wip

* wip

* example finish

* fix include and namespace

* wtf

* install lib

* batchize

* update cmake install

* multithread

* fix comment

* fix

* add mmengine

* bind llamamodel

---------
Co-authored-by: grimoire <yaoqian@pjlab.org.cn>

4fd6e710

04 Jul, 2023 1 commit
- support 'input_tokens' in triton_example (#49) · 62e0fa9a
  lvhan028 authored Jul 04, 2023
```
* check-in script for tokenizing a file

* use max_input_len
```
  62e0fa9a
20 Jun, 2023 2 commits

check-in fastertransformer (#7) · 9efcac38
Li Zhang authored Jun 20, 2023
```
* add ft code

* gitignore

* fix lint

* revert fmha
```
9efcac38

update scripts for deploying llama family model to fastertransformer triton models () · 2bf481fb

lvhan028 authored Jun 20, 2023

* add scripts for deploying llama family models via fastertransformer

* fix

* fix

* set symlinks True when copying triton models templates

* pack model repository for triton inference server

* add exception

* fix

* update config.pbtxt and launching scripts

2bf481fb

18 Jun, 2023 1 commit
- check-in license (#1 ) · a75c0a47
  lvhan028 authored Jun 18, 2023
  
  a75c0a47