Add attention bias and qkv format to context parallelism (#726)
* make FusedAttn with CP support bias Signed-off-by:Xiaowei Ren <xren@nvidia.com> * assert Alibi cannot work with CP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * syntax fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix variable name Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix tensor shapes Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * a typo fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix bias indexing for CP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * bug fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add attn bias tests Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * change dbias update location Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix CP test model configs Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * change CP test sequence length Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * make AttnFuncWithCP support qkv format of sbhd Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * make sure qkv are contiguous for CP in cuDNN fused attn Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * change assert message Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix code format Signed-off-by:
Xiaowei Ren <xren@nvidia.com> --------- Signed-off-by:
Xiaowei Ren <xren@nvidia.com> Co-authored-by:
cyanguwa <8636796+cyanguwa@users.noreply.github.com>
Showing
Please register or sign in to comment