Add split-kv and M<->H swap to varlen forward decoding attention (#754)
* Add split-k, M<->H to varseq path * skip M<->H when dropout>0, fix LSE
Showing
Please register or sign in to comment
* Add split-k, M<->H to varseq path * skip M<->H when dropout>0, fix LSE