- 27 May, 2024 1 commit
-
-
gaoqiong authored
-
- 15 Dec, 2023 1 commit
-
-
q.yao authored
* Add bf16 template sp * prepare merge * add enable bf * add bf16 decode attention support * fix python lint * fix yapf * fix c format * c format11 * fix cast * fix on sm<80 * fix linux bf162 cast * fix type cast * fix lint * support from hf pretrained * fix pybind * fix converter * add trust remote code * fix comment * fix convert qwen * fix lint * fix baichuan * update weight map
-
- 14 Aug, 2023 1 commit
-
-
Li Zhang authored
* add w4a16 * fix `deploy.py` * add doc * add w4a16 kernels * fuse w1/w3 & bugfixes * fix typo * python * guard sm75/80 features * add missing header * refactor * qkvo bias * update cost model * fix lint * update `deploy.py`
-
- 04 Jul, 2023 1 commit
-
-
AllentDan authored
* format-11.1 * md-link-config
-
- 01 Jul, 2023 3 commits
- 28 Jun, 2023 1 commit
-
-
tpoisonooo authored
* feat(src): add int8 and compile passed * feat(kernels): fix * feat(llama): update kernel * feat(src): add debug * fix(kernel): k_cache use int8_t pointer * style(llama): clean code * feat(deploy.py): revert to enable fmha * style(LlamaV2): clean code * feat(deploy.py): add default quant policy
-
- 20 Jun, 2023 1 commit
-
-
Li Zhang authored
* add ft code * gitignore * fix lint * revert fmha
-