- 14 Aug, 2023 2 commits
-
-
tpoisonooo authored
* feat(quantization): kv cache use asymmetric
-
Li Zhang authored
* add w4a16 * fix `deploy.py` * add doc * add w4a16 kernels * fuse w1/w3 & bugfixes * fix typo * python * guard sm75/80 features * add missing header * refactor * qkvo bias * update cost model * fix lint * update `deploy.py`
-
- 31 Jul, 2023 2 commits
- 25 Jul, 2023 1 commit
-
-
q.yao authored
Co-authored-by:grimoire <yaoqian@pjlab.org.cn>
-
- 24 Jul, 2023 1 commit
-
-
Li Zhang authored
* decode only forward pass * fix lint * batch embedding
-
- 21 Jul, 2023 1 commit
-
-
Li Zhang authored
* add GQA for llama2 * fix model conversion * fix lint & remove dev log * update news * minor * fix allocation size * fix split_dim for w_qkv.bias
-
- 04 Jul, 2023 1 commit
-
-
AllentDan authored
* format-11.1 * md-link-config
-
- 01 Jul, 2023 3 commits