"megatron/inference/gpt/__init__.py" did not exist on "3aca141586a4b8cdc983c3ecf5f7baf60506c7f8"
  • tpoisonooo's avatar
    feat(src): add kv cache int8 quantization (#22) · cc93136e
    tpoisonooo authored
    * feat(src): add int8 and compile passed
    
    * feat(kernels): fix
    
    * feat(llama): update kernel
    
    * feat(src): add debug
    
    * fix(kernel): k_cache use int8_t pointer
    
    * style(llama): clean code
    
    * feat(deploy.py): revert to enable fmha
    
    * style(LlamaV2): clean code
    
    * feat(deploy.py): add default quant policy
    cc93136e
CMakeLists.txt 13.8 KB