• Cuiqing Li's avatar
    [Kernels]Updated Triton kernels into 2.1.0 and adding flash-decoding for llama... · 459a88c8
    Cuiqing Li authored
    
    [Kernels]Updated Triton kernels into 2.1.0 and adding flash-decoding for llama token attention  (#4965)
    
    * adding flash-decoding
    
    * clean
    
    * adding kernel
    
    * adding flash-decoding
    
    * add integration
    
    * add
    
    * adding kernel
    
    * adding kernel
    
    * adding triton 2.1.0 features for inference
    
    * update bloom triton kernel
    
    * remove useless vllm kernels
    
    * clean codes
    
    * fix
    
    * adding files
    
    * fix readme
    
    * update llama flash-decoding
    
    ---------
    Co-authored-by: default avatarcuiqing.li <lixx336@gmail.com>
    459a88c8
bench_llama.py 4.26 KB