[Kernel] Use flash-attn for decoding (#3648)
Co-authored-by:Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by:
LiuXiaoxuanPKU <lilyliupku@gmail.com>
Showing
Please register or sign in to comment
Co-authored-by:Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by:
LiuXiaoxuanPKU <lilyliupku@gmail.com>