"server/text_generation/models/santacoder.py" did not exist on "f7ac394935f9ce502d827a1e8b3be2396c44f950"
feat(src): add kv cache int8 quantization (#22)
* feat(src): add int8 and compile passed * feat(kernels): fix * feat(llama): update kernel * feat(src): add debug * fix(kernel): k_cache use int8_t pointer * style(llama): clean code * feat(deploy.py): revert to enable fmha * style(LlamaV2): clean code * feat(deploy.py): add default quant policy
Showing
Please register or sign in to comment