1. 22 Sep, 2023 1 commit
    • Xu Kai's avatar
      [feature] add gptq for inference (#4754) · 946ab56c
      Xu Kai authored
      * [gptq] add gptq kernel (#4416)
      
      * add gptq
      
      * refactor code
      
      * fix tests
      
      * replace auto-gptq
      
      * rname inferance/quant
      
      * refactor test
      
      * add auto-gptq as an option
      
      * reset requirements
      
      * change assert and check auto-gptq
      
      * add import warnings
      
      * change test flash attn version
      
      * remove example
      
      * change requirements of flash_attn
      
      * modify tests
      
      * [skip ci] change requirements-test
      
      * [gptq] faster gptq cuda kernel (#4494)
      
      * [skip ci] add cuda kernels
      
      * add license
      
      * [skip ci] fix max_input_len
      
      * format files & change test size
      
      * [skip ci]
      
      * [gptq] add gptq tensor parallel (#4538)
      
      * add gptq tensor parallel
      
      * add gptq tp
      
      * delete print
      
      * add test gptq check
      
      * add test auto gptq check
      
      * [gptq] combine gptq and kv cache manager (#4706)
      
      * combine gptq and kv cache manager
      
      * add init bits
      
      * delete useless code
      
      * add model path
      
      * delete usless print and update test
      
      * delete usless import
      
      * move option gptq to shard config
      
      * change replace linear to shardformer
      
      * update bloom policy
      
      * delete useless code
      
      * fix import bug and delete uselss code
      
      * change colossalai/gptq to colossalai/quant/gptq
      
      * update import linear for tests
      
      * delete useless code and mv gptq_kernel to kernel directory
      
      * fix triton kernel
      
      * add triton import
      946ab56c