CMakeLists.txt · cc93136e6a166566fc6f0502c67aa99a94673db3 · ModelZoo / Qwen_lmdeploy

"megatron/inference/gpt/__init__.py" did not exist on "3aca141586a4b8cdc983c3ecf5f7baf60506c7f8"

feat(src): add kv cache int8 quantization (#22) · cc93136e

tpoisonooo authored Jun 28, 2023

* feat(src): add int8 and compile passed

* feat(kernels): fix

* feat(llama): update kernel

* feat(src): add debug

* fix(kernel): k_cache use int8_t pointer

* style(llama): clean code

* feat(deploy.py): revert to enable fmha

* style(LlamaV2): clean code

* feat(deploy.py): add default quant policy

cc93136e

CMakeLists.txt 13.8 KB

Replace CMakeLists.txt