"applications/llm/bin/tio/src/output.rs" did not exist on "c9130f8f8ce264379131e9ee2973534fe4cbf713"
  • Cuiqing Li's avatar
    [Kernels]Updated Triton kernels into 2.1.0 and adding flash-decoding for llama... · 459a88c8
    Cuiqing Li authored
    
    [Kernels]Updated Triton kernels into 2.1.0 and adding flash-decoding for llama token attention  (#4965)
    
    * adding flash-decoding
    
    * clean
    
    * adding kernel
    
    * adding flash-decoding
    
    * add integration
    
    * add
    
    * adding kernel
    
    * adding kernel
    
    * adding triton 2.1.0 features for inference
    
    * update bloom triton kernel
    
    * remove useless vllm kernels
    
    * clean codes
    
    * fix
    
    * adding files
    
    * fix readme
    
    * update llama flash-decoding
    
    ---------
    Co-authored-by: default avatarcuiqing.li <lixx336@gmail.com>
    459a88c8
test_llama2_infer.py 2.21 KB