- 02 Jul, 2024 1 commit
-
-
Frank Lin authored
* NVTE_OVERRIDE_MAX_SEQ_LEN Signed-off-by:
Frank Lin <eee4017@gmail.com> * small fix Signed-off-by:
Frank Lin <eee4017@gmail.com> * preserve old amax_and_scale_update_inplace and new amax_and_scale_update_inplace Signed-off-by:
Frank Lin <eee4017@gmail.com> * remove useless code path; try to simplify logic within the baseline Signed-off-by:
Frank Lin <eee4017@gmail.com> * simplify logic Signed-off-by:
Frank Lin <eee4017@gmail.com> * small fix Signed-off-by:
Frank Lin <eee4017@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix comments from Timmoon Signed-off-by:
Frank Lin <eee4017@gmail.com> * fix comments from Timmoon Signed-off-by:
Frank Lin <eee4017@gmail.com> * Update transformer_engine/paddle/distributed.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Frank Lin <eee4017@gmail.com> * diable bw fp8 update Signed-off-by:
Frank Lin <eee4017@gmail.com> * fix lint Signed-off-by:
Frank Lin <eee4017@gmail.com> * fix ci error Signed-off-by:
Frank Lin <eee4017@gmail.com> --------- Signed-off-by:
Frank Lin <eee4017@gmail.com> Co-authored-by:
Frank Lin (Engrg-Hardware 1) <fralin@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 14 Jun, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
* Apply formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Apply formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 30 Jan, 2024 1 commit
-
-
Tim Moon authored
* Replace paddle.fluid imports with paddle.base Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Remove paddle.fluid usage from tests Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 12 Jan, 2024 1 commit
-
-
Tian Zheng authored
* Actively free tensor in bwd Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * - Add inplace support for fp8 casting - Allow skipping weight update in fp8 meta update Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Support weight caching for Linear Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add weight caching for LayernormLinear Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add weight caching for LayerNormMLP Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add weight caching for Transformer layer Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add PP unittests Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix CI Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> --------- Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com>
-
- 11 Jan, 2024 1 commit
-
-
Tian Zheng authored
* Add SP for linear Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add SP for LayerNormLinear Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add SP for LayerNormMLP Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add SP API for transformer layer Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add sequence_parallel attr Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add SP unittests for Transformer and Attention Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix compatibility with PaddleNLP Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Copyright Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 03 Jan, 2024 1 commit
-
-
Przemyslaw Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
- 06 Sep, 2023 1 commit
-
-
Tian Zheng authored
* Add recompute Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Support recompute core attention Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix transformer layer recompute Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add doc Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Improve recompute test Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Improve performance of stack backtrace Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Improve code stype Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix code style Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> --------- Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com>
-
- 26 Aug, 2023 1 commit
-
-
Tian Zheng authored
* [Paddle] Add TP, DP, PP, FSDP Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Minor fix Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix CI failure Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Remove set_nccl_overlap_warning_if_tp Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Improve variable naming Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Refactor FP8 Buffer Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Stylic changes Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix FP32 parallel training Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix numel performance issue Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Squashed commit of the following: commit 79e2e5fd774e67dcdda9aae01a9f31a6479c5d70 Author: Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> Date: Sun Aug 20 14:39:16 2023 +0000 Add TP test Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> commit 1d40ad60540490f97ed82ba877cc6eda8902cbf6 Author: Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> Date: Sun Aug 20 14:22:25 2023 +0000 Fix tp_size when disabled Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> commit 6632f735a0c8251862355fc74622af59fae3a509 Author: Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> Date: Sun Aug 20 05:52:18 2023 +0000 Add TP for attention and transformer layer Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add shape check Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add FSDP check for stage 1,2,3 Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Review changes Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix group_sharding test Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Support NVTE_FUSE_ATTN Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix CI errors Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> --------- Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 02 Aug, 2023 1 commit
-
-
Tian Zheng authored
Refactor fp8 state Signed-off-by:Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com>
-
- 01 Aug, 2023 1 commit
-
-
Tian Zheng authored
* Add FP8 support - Add FP8 recipe - Add FP8 path for nn layers - Add MNIST FP8 example Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Update README Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix LayerNormMLP FP8 backward Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix FP8 training in float32 accumulation Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix FP8 checkpointing for non forward execution cases (same as #323) Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Refactors and improvements for better code stype, readability and organization Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Remove unnecassary pylint override Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> --------- Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com>
-
- 19 Jul, 2023 1 commit
-
-
Tian Zheng authored
* Add Linear layer (FP16) Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> - Add BF16 training example - Add fp8_autocast (only supports non-fp8 for now) Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Remove FP8 stuff Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Simplify Linear layer forward Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add LayerNorm layer (BF16) Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add LayerNormLinear layer Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Store weights in BF16 Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add LayerNormMLP layer Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add BF16 MNIST example Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Remove in-place cast for compatibility with Paddle AMP mechanism Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * README correction Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add Paddle op as a backend option Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix code format Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix dtype change between iterations Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Minor fixes Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Move forward function out of base layer Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Use Paddle nvtx bindings Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> --------- Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com>
-