• Xu Kai's avatar
    [inference] Add smmoothquant for llama (#4904) · 611a5a80
    Xu Kai authored
    * [inference] add int8 rotary embedding kernel for smoothquant (#4843)
    
    * [inference] add smoothquant llama attention (#4850)
    
    * add smoothquant llama attention
    
    * remove uselss code
    
    * remove useless code
    
    * fix import error
    
    * rename file name
    
    * [inference] add silu linear fusion for smoothquant llama mlp  (#4853)
    
    * add silu linear
    
    * update skip condition
    
    * catch smoothquant cuda lib exception
    
    * prcocess exception for tests
    
    * [inference] add llama mlp for smoothquant (#4854)
    
    * add llama mlp for smoothquant
    
    * fix down out scale
    
    * remove duplicate lines
    
    * add llama mlp check
    
    * delete useless code
    
    * [inference] add smoothquant llama (#4861)
    
    * add smoothquant llama
    
    * fix attention accuracy
    
    * fix accuracy
    
    * add kv cache and save pretrained
    
    * refactor example
    
    * delete smooth
    
    * refactor code
    
    * [inference] add smooth function and delete useless code for smoothquant (#4895)
    
    * add smooth function and delete useless code
    
    * update datasets
    
    * remove duplicate import
    
    * delete useless file
    
    * refactor codes (#4902)
    
    * rafactor code
    
    * add license
    
    * add torch-int and smoothquant license
    611a5a80
test_smoothquant_linear.py 1.05 KB