Fix context parallelism implementation with THD format (#1012)
* use 2hd layout Signed-off-by:Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change qkv_format check Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add a code comment Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * tensor shape bug fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * tensor shape fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add function to compute cu_seqlens of a cp rank Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add cu_seqlens and cu_seqlens_padded to context parallelism Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * typo fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * minor change Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix FlashAttention output sequence length Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix cu_seqlens_kv_per_step calculation Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * zero dQKV for ending padded tokens Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * zero dQKV tensors of FlashAttention Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix softmax_lse correction Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * remove padded tokens of KV to save comounication Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * do not need to zero dkv for FlashAttention any mroe Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * zero out tensors Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * remove redundant code Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix CP unit test Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix kv shape of cp test with thd format Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * update cp unit test Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove redundant code Signed-off-by:
Xiaowei Ren <xren@nvidia.com> --------- Signed-off-by:
Xiaowei Ren <xren@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
Xiaowei Ren <xren@cs-cw-dfw-login-01.cm.cluster>
Showing
This diff is collapsed.
Please register or sign in to comment