1. 14 Aug, 2023 1 commit
    • Li Zhang's avatar
      [Feature] Blazing fast W4A16 inference (#202) · c3290cad
      Li Zhang authored
      * add w4a16
      
      * fix `deploy.py`
      
      * add doc
      
      * add w4a16 kernels
      
      * fuse w1/w3 & bugfixes
      
      * fix typo
      
      * python
      
      * guard sm75/80 features
      
      * add missing header
      
      * refactor
      
      * qkvo bias
      
      * update cost model
      
      * fix lint
      
      * update `deploy.py`
      c3290cad
  2. 03 Aug, 2023 1 commit
  3. 31 Jul, 2023 1 commit
  4. 05 Jul, 2023 3 commits
  5. 03 Jul, 2023 2 commits
  6. 01 Jul, 2023 4 commits
  7. 28 Jun, 2023 1 commit
    • tpoisonooo's avatar
      feat(src): add kv cache int8 quantization (#22) · cc93136e
      tpoisonooo authored
      * feat(src): add int8 and compile passed
      
      * feat(kernels): fix
      
      * feat(llama): update kernel
      
      * feat(src): add debug
      
      * fix(kernel): k_cache use int8_t pointer
      
      * style(llama): clean code
      
      * feat(deploy.py): revert to enable fmha
      
      * style(LlamaV2): clean code
      
      * feat(deploy.py): add default quant policy
      cc93136e
  8. 20 Jun, 2023 1 commit