".github/workflows/doc_test_on_pr.yml" did not exist on "a3e549615627c7893f1b7189719644a02d0f0319"
  • Xu Kai's avatar
    [feature] add gptq for inference (#4754) · 946ab56c
    Xu Kai authored
    * [gptq] add gptq kernel (#4416)
    
    * add gptq
    
    * refactor code
    
    * fix tests
    
    * replace auto-gptq
    
    * rname inferance/quant
    
    * refactor test
    
    * add auto-gptq as an option
    
    * reset requirements
    
    * change assert and check auto-gptq
    
    * add import warnings
    
    * change test flash attn version
    
    * remove example
    
    * change requirements of flash_attn
    
    * modify tests
    
    * [skip ci] change requirements-test
    
    * [gptq] faster gptq cuda kernel (#4494)
    
    * [skip ci] add cuda kernels
    
    * add license
    
    * [skip ci] fix max_input_len
    
    * format files & change test size
    
    * [skip ci]
    
    * [gptq] add gptq tensor parallel (#4538)
    
    * add gptq tensor parallel
    
    * add gptq tp
    
    * delete print
    
    * add test gptq check
    
    * add test auto gptq check
    
    * [gptq] combine gptq and kv cache manager (#4706)
    
    * combine gptq and kv cache manager
    
    * add init bits
    
    * delete useless code
    
    * add model path
    
    * delete usless print and update test
    
    * delete usless import
    
    * move option gptq to shard config
    
    * change replace linear to shardformer
    
    * update bloom policy
    
    * delete useless code
    
    * fix import bug and delete uselss code
    
    * change colossalai/gptq to colossalai/quant/gptq
    
    * update import linear for tests
    
    * delete useless code and mv gptq_kernel to kernel directory
    
    * fix triton kernel
    
    * add triton import
    946ab56c
This project is licensed under the Apache License 2.0. Learn more
LICENSE 25.3 KB