[PyTorch] Fix CP implementation with FP8 (#1483)

* commit some debug code Signed-off-by: Xiaowei Ren <xren@nvidia.com> * add more debug info Signed-off-by: Xiaowei Ren <xren@nvidia.com> * debug code commit and typo fix Signed-off-by: Xiaowei Ren <xren@nvidia.com> * a typo fix Signed-off-by: Xiaowei Ren <xren@nvidia.com> * remove debug info Signed-off-by: Xiaowei Ren <xren@nvidia.com> * do not return lse Signed-off-by: Xiaowei Ren <xren@nvidia.com> * add amax_per_step for quantizers of CP Signed-off-by: Xiaowei Ren <xren@nvidia.com> * fix FP8 + CP Signed-off-by: Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * bug fix Signed-off-by: Xiaowei Ren <xren@nvidia.com> * bug fix Signed-off-by: Xiaowei Ren <xren@nvidia.com> * dtype fix Signed-off-by: Xiaowei Ren <xren@nvidia.com> * bug fix Signed-off-by: Xiaowei Ren <xren@nvidia.com> --------- Signed-off-by: Xiaowei Ren <xren@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xiaowei Ren <xren@login-preos01.a51.clusters.nvidia.com>

[PyTorch] Fix CP implementation with FP8 (#1483)
* commit some debug code Signed-off-by: Xiaowei Ren <xren@nvidia.com> * add more debug info Signed-off-by: Xiaowei Ren <xren@nvidia.com> * debug code commit and typo fix Signed-off-by: Xiaowei Ren <xren@nvidia.com> * a typo fix Signed-off-by: Xiaowei Ren <xren@nvidia.com> * remove debug info Signed-off-by: Xiaowei Ren <xren@nvidia.com> * do not return lse Signed-off-by: Xiaowei Ren <xren@nvidia.com> * add amax_per_step for quantizers of CP Signed-off-by: Xiaowei Ren <xren@nvidia.com> * fix FP8 + CP Signed-off-by: Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * bug fix Signed-off-by: Xiaowei Ren <xren@nvidia.com> * bug fix Signed-off-by: Xiaowei Ren <xren@nvidia.com> * dtype fix Signed-off-by: Xiaowei Ren <xren@nvidia.com> * bug fix Signed-off-by: Xiaowei Ren <xren@nvidia.com> --------- Signed-off-by: Xiaowei Ren <xren@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xiaowei Ren <xren@login-preos01.a51.clusters.nvidia.com>
257345a5 · Xiaowei Ren · GitHub · b612cdeb · 257345a5 · 257345a5
Unverified Commit 257345a5 authored Feb 20, 2025 by Xiaowei Ren Committed by GitHub Feb 20, 2025
Expand all Show whitespace changes
Inline Side-by-side

Showing with 166 additions and 98 deletions

transformer_engine/pytorch/attention.py transformer_engine/pytorch/attention.py +165 -97

transformer_engine/pytorch/fp8.py transformer_engine/pytorch/fp8.py +1 -1

No files found.
--- a/transformer_engine/pytorch/attention.py
+++ b/transformer_engine/pytorch/attention.py
--- a/transformer_engine/pytorch/fp8.py
+++ b/transformer_engine/pytorch/fp8.py
@@ -56,7 +56,7 @@ def get_fp8_torch_dtype(fp8_recipe: Recipe, fprop_tensor: bool = True) -> torch.
        fp8_recipe.fp8_format == Format.HYBRID and fprop_tensor
    ):
        return torch.float8_e4m3fn
-    return torch.float8_e5m2fn
+    return torch.float8_e5m2
 def get_fp8_te_dtype(fp8_recipe: Recipe, fprop_tensor: bool = True) -> tex.DType: