add flash implementation with context parallelism (#362)
* add flash implementation with context parallelism Signed-off-by:xren <xren@nvidia.com> * next more comments Signed-off-by:
xren <xren@nvidia.com> * code comment fix Signed-off-by:
xren <xren@nvidia.com> * comment fix Signed-off-by:
xren <xren@nvidia.com> * add missing space Signed-off-by:
xren <xren@nvidia.com> * fix docstrings Signed-off-by:
xren <xren@nvidia.com> * try to add fa v2 api Signed-off-by:
xren <xren@nvidia.com> * fix a comment Signed-off-by:
xren <xren@nvidia.com> * fix padded kv return Signed-off-by:
xren <xren@nvidia.com> * add docstrings of context parallelism Signed-off-by:
xren <xren@nvidia.com> * minor fix Signed-off-by:
xren <xren@nvidia.com> * minor docstring fix Signed-off-by:
xren <xren@nvidia.com> * fix positional arguments Signed-off-by:
xren <xren@nvidia.com> * make docstring line shorter Signed-off-by:
xren <xren@nvidia.com> * add fa v2 backward api for flash_attn_with_cp Signed-off-by:
xren <xren@nvidia.com> * remove redundant code Signed-off-by:
xren <xren@nvidia.com> * make sure hidden size per attn head is multiple of 8 for FA2 Signed-off-by:
xren <xren@nvidia.com> * remove an unnecessary assert check for FA2 Signed-off-by:
xren <xren@nvidia.com> * indention fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * Update FA version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
xren <xren@nvidia.com> Signed-off-by:
Xiaowei Ren <xren@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Showing
Please register or sign in to comment