-
Xiaowei Ren authored
* try to use cuDNN fused attention for context parallelism Signed-off-by:
xren <xren@nvidia.com> * assert CP is only supported with NVTE_F16_arbitrary_seqlen Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * port fused attn api to context parallelism Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add one more assert Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * assert CP does not support padded tokens Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add qkv_format into CP implementation Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * remove qkv_format from CP function Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix qkv_for,at Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix bwd error with FA v2 Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * make cp implementation support non-causal masking Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * bug fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * remove redundant asserts for CP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * minor assert information change Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * assert core attn bias has not been supported with CP yet Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * make CP work with window_sizes of [-1, -1] and [-1, 0] Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add draft code for fa test with cp Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * move fused attn test to a specific folder Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add assert_close to flash attn cp test Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add more tests for CP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add optional arguments for FA v2.4+ Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * minor change Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add skip condition for CP test Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * class and function naming fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * docstring fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * do not use fused attn if backend does not work with CP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * create a separate folder for CP test as it needs multi-GPUs Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add attn_mask_type check in attn_forwrad_func_with_cp Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * code format fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> --------- Signed-off-by:
xren <xren@nvidia.com> Signed-off-by:
Xiaowei Ren <xren@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
cyanguwa <8636796+cyanguwa@users.noreply.github.com>
94f54d71