1. 28 May, 2024 2 commits
  2. 27 May, 2024 1 commit
  3. 20 Dec, 2023 1 commit
  4. 15 Dec, 2023 1 commit
    • q.yao's avatar
      Support turbomind bf16 (#803) · 3295eac3
      q.yao authored
      * Add bf16 template sp
      
      * prepare merge
      
      * add enable bf
      
      * add bf16 decode attention support
      
      * fix python lint
      
      * fix yapf
      
      * fix c format
      
      * c format11
      
      * fix cast
      
      * fix on sm<80
      
      * fix linux bf162 cast
      
      * fix type cast
      
      * fix lint
      
      * support from hf pretrained
      
      * fix pybind
      
      * fix converter
      
      * add trust remote code
      
      * fix comment
      
      * fix convert qwen
      
      * fix lint
      
      * fix baichuan
      
      * update weight map
      3295eac3
  5. 14 Aug, 2023 1 commit
    • Li Zhang's avatar
      [Feature] Blazing fast W4A16 inference (#202) · c3290cad
      Li Zhang authored
      * add w4a16
      
      * fix `deploy.py`
      
      * add doc
      
      * add w4a16 kernels
      
      * fuse w1/w3 & bugfixes
      
      * fix typo
      
      * python
      
      * guard sm75/80 features
      
      * add missing header
      
      * refactor
      
      * qkvo bias
      
      * update cost model
      
      * fix lint
      
      * update `deploy.py`
      c3290cad
  6. 01 Jul, 2023 3 commits
  7. 20 Jun, 2023 1 commit