• Li Zhang's avatar
    [Feature] Blazing fast W4A16 inference (#202) · c3290cad
    Li Zhang authored
    * add w4a16
    
    * fix `deploy.py`
    
    * add doc
    
    * add w4a16 kernels
    
    * fuse w1/w3 & bugfixes
    
    * fix typo
    
    * python
    
    * guard sm75/80 features
    
    * add missing header
    
    * refactor
    
    * qkvo bias
    
    * update cost model
    
    * fix lint
    
    * update `deploy.py`
    c3290cad
format.cu 4.51 KB