• carlushuang's avatar
    [CK_TILE] layernorm support fused-quant/fused-add (#1604) · c3a4800c
    carlushuang authored
    * add prenorm/postnorm support, refactor using generate.py
    
    * update README
    
    * update README
    
    * fix format
    
    * update some description and fix format
    
    * update format
    
    * format
    
    * use non-raw for loading
    
    * format and update n4096
    
    * dynamic-quant ready
    
    * update readme
    
    * support fused dynamic-quant
    
    * update fused-quant, with smooth
    
    * update README
    
    * update args
    
    * update some based on comment
    c3a4800c
permute.hpp 363 Bytes