FSDP grad fusion support (#2191)
* FSDP grad fusion support Signed-off-by:Selvaraj Anandaraj <selvaraja@login-ptyche01.ptyche.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Re-factored grad overwriting usage Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-ptyche02.ptyche.clusters.nvidia.com> * Update transformer_engine/pytorch/ops/basic/basic_linear.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Selvaraj Anandaraj <selvaraja@nvidia.com> * Update transformer_engine/pytorch/ops/fused/backward_linear_add.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Selvaraj Anandaraj <selvaraja@nvidia.com> * Update transformer_engine/pytorch/ops/fused/backward_linear_scale.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Selvaraj Anandaraj <selvaraja@nvidia.com> * Update transformer_engine/pytorch/ops/fused/userbuffers_backward_linear.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Selvaraj Anandaraj <selvaraja@nvidia.com> * Modified API usage, added arg details Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-ptyche02.ptyche.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-ptyche01.ptyche.clusters.nvidia.com> Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-ptyche02.ptyche.clusters.nvidia.com> Signed-off-by:
Selvaraj Anandaraj <selvaraja@nvidia.com> Co-authored-by:
Selvaraj Anandaraj <selvaraja@login-ptyche01.ptyche.clusters.nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Selvaraj Anandaraj <selvaraja@login-ptyche02.ptyche.clusters.nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
Showing
Please register or sign in to comment