• Cuiqing Li's avatar
    [Refactor] Integrated some lightllm kernels into token-attention (#4946) · 3a41e830
    Cuiqing Li authored
    
    
    * add some req for inference
    
    * clean codes
    
    * add codes
    
    * add some lightllm deps
    
    * clean codes
    
    * hello
    
    * delete rms files
    
    * add some comments
    
    * add comments
    
    * add doc
    
    * add lightllm deps
    
    * add lightllm cahtglm2 kernels
    
    * add lightllm cahtglm2 kernels
    
    * replace rotary embedding with lightllm kernel
    
    * add some commnets
    
    * add some comments
    
    * add some comments
    
    * add
    
    * replace fwd kernel att1
    
    * fix a arg
    
    * add
    
    * add
    
    * fix token attention
    
    * add some comments
    
    * clean codes
    
    * modify comments
    
    * fix readme
    
    * fix bug
    
    * fix bug
    
    ---------
    Co-authored-by: default avatarcuiqing.li <lixx336@gmail.com>
    Co-authored-by: default avatarCjhHa1 <cjh18671720497@outlook.com>
    3a41e830
chatglm2.py 22.3 KB