Commits · d5cb0be2cd16e6c5eefd4d266a38357fde83a660 · ModelZoo / Qwen_lmdeploy

24 Aug, 2023 1 commit
- [Fix] Fix llama2 70b & qwen quantization error (#273) · d5cb0be2
  pppppM authored Aug 24, 2023
```
* fix llama2 70b

* fix qwen quantization

* remove pdb

* add faq
```
  d5cb0be2
14 Aug, 2023 1 commit
- feat(quantization): kv cache use asymmetric (#218) · 902a3e16
  tpoisonooo authored Aug 14, 2023
```
* feat(quantization): kv cache use asymmetric
```
  902a3e16
11 Aug, 2023 1 commit

pppppM authored Aug 11, 2023

* support kv cache offload

* add dataloader docstring

* complete gitignore

* refactor collect mod fn

* add calibration

* fix lint

* add observers and quantizers

* fix lints

* add global available mixin

* fix lints

* split batch inference

* support smoothquant and awq

* update export kv scales

* fix lints

* fix some bugs

* update weight only usage

* update usage

* auto mapping and support smooth internlm

* trust remote code

* fix num head key error

* fix bias error

* align shape and pack order with llm-awq

* modified according to LZHgrla's comments.

* update gitignore

* fix kv qparams export error

* update usage

* decouple calibrate and awq

* update docstrings

* update api name

* update readme

* update readme

* update readme

* update readme

* update kv_qparams and readme

* fix typos

d3dbe179

07 Aug, 2023 1 commit
- [Feature] Add script to split HuggingFace model to the smallest sharded checkpoints (#199) · b7e7e668
  LZHgrla authored Aug 07, 2023
```
* add get_small_sharded_hf.py

* fix pre-commit
```
  b7e7e668
20 Jul, 2023 1 commit

[Fix] Fix bug for issues #141 (#145) · cde17e73

humu789 authored Jul 20, 2023

* fix get_dataset error

* fix lint

* add datasets to requirements.txt

* update some msci

cde17e73

06 Jul, 2023 1 commit
- add internlm url (#67) · 7c6edc83
  pppppM authored Jul 06, 2023
  
  7c6edc83
05 Jul, 2023 2 commits

fix(kv_qparams.py): zp use min (#59) · ec53d63f

tpoisonooo authored Jul 05, 2023

* fix(kv_qparams.py): zp use min

* revert(qparams.py): revert format

* fix(kv_qparams.py): update formula

ec53d63f

[Feature] Stats Quantization Parameters for KV Cache (#45) · 3fff964d

pppppM authored Jul 05, 2023

* add cal qparams

* support offload inference

* add collect funtions (mod,weight)

* stats kv scales

* update init

* add user guide

* fix hints

* fix comments & support turbomind format

* update user guide

* fix slice kv cache error & support pileval dataset (used in llm-awq)

* fix wrong num heads slice

* update default dataset

* fix conflict

* fix hints

* fix hints

* add gitignore

3fff964d