1. 14 Aug, 2023 1 commit
    • Li Zhang's avatar
      [Feature] Blazing fast W4A16 inference (#202) · c3290cad
      Li Zhang authored
      * add w4a16
      
      * fix `deploy.py`
      
      * add doc
      
      * add w4a16 kernels
      
      * fuse w1/w3 & bugfixes
      
      * fix typo
      
      * python
      
      * guard sm75/80 features
      
      * add missing header
      
      * refactor
      
      * qkvo bias
      
      * update cost model
      
      * fix lint
      
      * update `deploy.py`
      c3290cad
  2. 23 Jul, 2023 1 commit
    • lvhan028's avatar
      Refactor the chat template of supported models using factory pattern (#144) · 7b470f07
      lvhan028 authored
      * refactor model.py and support baichuan-7b
      
      * remove model_name
      
      * remove hard session_len
      
      * export tokenizer.py to target dir
      
      * remove model_name from client
      
      * remove model_name
      
      * update
      
      * correct throughput equation
      
      * fix session.response
      
      * update serving.md
      
      * update readme
      
      * update according to review comments
      
      * update
      
      * update
      
      * update
      
      * update
      7b470f07
  3. 17 Jul, 2023 1 commit
  4. 13 Jul, 2023 1 commit
  5. 11 Jul, 2023 2 commits
  6. 05 Jul, 2023 1 commit
    • lvhan028's avatar
      improve readme (#52) · 3e7b6bfd
      lvhan028 authored
      * add performance
      
      * use png
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      3e7b6bfd