src/fastertransformer/utils/memory_utils.h · cc93136e6a166566fc6f0502c67aa99a94673db3 · OpenDAS / Lmdeploy

feat(src): add kv cache int8 quantization (#22) · cc93136e

tpoisonooo authored Jun 28, 2023

* feat(src): add int8 and compile passed

* feat(kernels): fix

* feat(llama): update kernel

* feat(src): add debug

* fix(kernel): k_cache use int8_t pointer

* style(llama): clean code

* feat(deploy.py): revert to enable fmha

* style(LlamaV2): clean code

* feat(deploy.py): add default quant policy

cc93136e

memory_utils.h 5.47 KB

Replace memory_utils.h