[Kernel] Flashinfer for prefill & decode, with Cudagraph support for decode (#4628)
Co-authored-by:
LiuXiaoxuanPKU <llilyliupku@gmail.com>, bong-furiosa <bongwon.jang@furiosa.ai>
Showing
Please register or sign in to comment
Co-authored-by:
LiuXiaoxuanPKU <llilyliupku@gmail.com>, bong-furiosa <bongwon.jang@furiosa.ai>