"tests/gpt2/test_modeling_flax_gpt2.py" did not exist on "75f6641eaf5ccf130cd6b6f4d4a04fb08e6e5ada"
  • Li Zhang's avatar
    [Feature] Blazing fast W4A16 inference (#202) · c3290cad
    Li Zhang authored
    * add w4a16
    
    * fix `deploy.py`
    
    * add doc
    
    * add w4a16 kernels
    
    * fuse w1/w3 & bugfixes
    
    * fix typo
    
    * python
    
    * guard sm75/80 features
    
    * add missing header
    
    * refactor
    
    * qkvo bias
    
    * update cost model
    
    * fix lint
    
    * update `deploy.py`
    c3290cad
LlamaTritonModel.cc 17.4 KB