"transformer_engine/common/triton/cross_entropy.py" did not exist on "e9a5fa4e368464f3b310b90ab7f670f35319344b"
[PyTorch] Debug NCCL communication overlapping in linear backward with FP8 data (#1620)
* Overlap input all-gather with dgrad GEMM in FP8 linear layers Signed-off-by:Tim Moon <tmoon@nvidia.com> * Add missing docstring Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Showing
Please register or sign in to comment