clean CP implementation for flash attention and cuDNN 9.6 (#1387)
* make pad_between_seqs check do not consider padding at the end Signed-off-by:Xiaowei Ren <xren@nvidia.com> * change CP THD test to make it consider 0-length sequence Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * minor change to flash func name Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * only use varlen func of flash attention while qkv_format is THD Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * try to converge code of flash and fused attentions Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix bwd compute with P2P Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * remove redundant out_per_step view Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * enable cudnn>9.6 and THD+GQA Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * enable CP with FusedAttn+SWA+All_Gather Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * enable CP with FusedAttn+SWA+All_Gather Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * code cleaning for cu_seqlens Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix some pylint error Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * minor import change for pylint Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * more fix for pylint Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix lse_seqlen in thd out correction Signed-off-by:
Xiaowei Ren <xren@nvidia.com> --------- Signed-off-by:
Xiaowei Ren <xren@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Showing
This diff is collapsed.
Please register or sign in to comment