"vscode:/vscode.git/clone" did not exist on "39f3609d0e1e7602d7d9a9a811f1ad2f3f3c620d"
  • Li Zhang's avatar
    [Feature] Blazing fast W4A16 inference (#202) · c3290cad
    Li Zhang authored
    * add w4a16
    
    * fix `deploy.py`
    
    * add doc
    
    * add w4a16 kernels
    
    * fuse w1/w3 & bugfixes
    
    * fix typo
    
    * python
    
    * guard sm75/80 features
    
    * add missing header
    
    * refactor
    
    * qkvo bias
    
    * update cost model
    
    * fix lint
    
    * update `deploy.py`
    c3290cad
LlamaWeight.h 2.17 KB