[PyTorch] Branching operations (#1027)
* Add op for in-place add Signed-off-by:Tim Moon <tmoon@nvidia.com> * Add op for in-place add Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add op that adds extra output to fuser Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add fused op for GEMM+bias+add Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add fused op for dgrad+add Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add documentation Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix linter warnings Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Review suggestions from @ptrendx Output tensor dtype and device take precedence over weight tensor in linear functional API. Move some index calculation to fuser constructor. Avoid some unnecessary dereferences. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug test failures Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update transformer_engine/pytorch/ops/fuser.py Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Showing
Please register or sign in to comment