- 12 Jun, 2024 1 commit
-
-
Alp Dener authored
restricted fsdp asserts on primary fp8 weights to TE modules Signed-off-by:Alp Dener <adener@nvidia.com>
-
- 07 Jun, 2024 1 commit
-
-
Alp Dener authored
* New TE wrapper for PyTorch FullyShardedDataParallel to make TE modules distribute their activations after the forward pass and gather them before the backward pass Signed-off-by:
Alp Dener <adener@nvidia.com> * simplified TE module setup for FSDP comms Signed-off-by:
Alp Dener <adener@nvidia.com> * FSDP scatter/gather for tensors saved into autograd ctx now working for base TE modules Signed-off-by:
Alp Dener <adener@nvidia.com> * make sure activation recompute disables FSDP scatter/gather Signed-off-by:
Alp Dener <adener@nvidia.com> * make sure Fp8 weight buffers are sharded at the end of the backward pass and gathered before forward Signed-off-by:
Alp Dener <adener@nvidia.com> * Fixed typo in attribute name Signed-off-by:
Alp Dener <adener@nvidia.com> * fixed bug in finding FSDP-wrapped TE modules Signed-off-by:
Alp Dener <adener@nvidia.com> * fixed typo in fp8 weight tensor name Signed-off-by:
Alp Dener <adener@nvidia.com> * fixed incorrect # of gradients Signed-off-by:
Alp Dener <adener@nvidia.com> * Added fp8 amax gradient hook tensor to the parameter reset Signed-off-by:
Alp Dener <adener@nvidia.com> * get rid of erroneous dummy tensor leftover from incorrect rebase Signed-off-by:
Alp Dener <adener@nvidia.com> * Linting fixes Signed-off-by:
Alp Dener <adener@nvidia.com> * fixing git snafu and removing debug statements Signed-off-by:
Alp Dener <adener@nvidia.com> --------- Signed-off-by:
Alp Dener <adener@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 22 May, 2024 1 commit
-
-
Alp Dener authored
TE checkpoint now preserves the torch autocast context from the forward pass during the recompute phase Signed-off-by:Alp Dener <adener@nvidia.com>
-
- 18 Apr, 2024 1 commit
-
-
Alp Dener authored
fix type checking in checkpointing to assume that there must be TE modules in custom callables Signed-off-by:Alp Dener <adener@nvidia.com>
-
- 16 Apr, 2024 1 commit
-
-
Alp Dener authored
* changed TE checkpoint passthrough logic to also recursively look for TE submodules Signed-off-by:
Alp Dener <adener@nvidia.com> * simplified search for TE modules in the checkpointed network Signed-off-by:
Alp Dener <adener@nvidia.com> --------- Signed-off-by:
Alp Dener <adener@nvidia.com>
-
- 12 Apr, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
* FP8 cuda graphs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> Co-authored-by:
Charlene Yang <charleney@nvidia.com> * Fix numerics Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * exclude torch compile from numerics tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * More numerics fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix CI Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * rm fusion from unfused path Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> Co-authored-by:
Charlene Yang <charleney@nvidia.com>
-
- 04 Apr, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
* Args can be None Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix other arg types Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 29 Mar, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
* Fix backward compatibility with checkpoint API Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * review comments and fix lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 04 Mar, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
Update checkpoint API doc Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 24 Feb, 2024 1 commit
-
-
Alp Dener authored
* added non-reentrant mode support to TE checkpoint Signed-off-by:
Alp Dener <adener@nvidia.com> * updated get_cuda_rng_tracker kwarg to get_rng_state_tracker to remain consistent with other TE API Signed-off-by:
Alp Dener <adener@nvidia.com> * docstring cleanup Signed-off-by:
Alp Dener <adener@nvidia.com> * added mechanism to disable bias_gelu_nvfusion in LayerNormMLP when checkpointing in non-reentrant mode Signed-off-by:
Alp Dener <adener@nvidia.com> * refactored checkpoint and recompute hook names to match PyTorch implementation Signed-off-by:
Alp Dener <adener@nvidia.com> * Fixed incorrect reference before assignment Signed-off-by:
Alp Dener <adener@nvidia.com> * fixed argument error in calling native PyTorch checkpoint Signed-off-by:
Alp Dener <adener@nvidia.com> * fixed linting errors for missing docstrings Signed-off-by:
Alp Dener <adener@nvidia.com> * Fix lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * bias GELU fusion consistency between checkpoint test and reference comparison Signed-off-by:
Alp Dener <adener@nvidia.com> --------- Signed-off-by:
Alp Dener <adener@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 10 Jan, 2024 1 commit
-
-
Zhang Haitao authored
* support non-tensor inputs/outputs for checkpoint Signed-off-by:
skydoorkai <htsantaclara@163.com> * better format Signed-off-by:
skydoorkai <htsantaclara@163.com> * modify to avoid python loops Signed-off-by:
skydoorkai <htsantaclara@163.com> --------- Signed-off-by:
skydoorkai <htsantaclara@163.com>
-
- 03 Jan, 2024 1 commit
-
-
Przemyslaw Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
- 28 Nov, 2023 1 commit
-
-
Marks101 authored
* [PyTorch] Linear: fix computation for wgrad if sequence_parallel=True Signed-off-by:
Markus Schnoes <markus.schnoes@gmx.de> * Remove buggy gather_along_last_dim Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [PyTorch] Linear: fix line length Signed-off-by:
Markus Schnoes <markus.schnoes@gmx.de> * Simplify logic for saving input tensor for Linear backward Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Markus Schnoes <markus.schnoes@gmx.de> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com>
-
- 31 Oct, 2023 1 commit
-
-
Tim Moon authored
* Experimental FP8 tensor Co-authored-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Sudhakar Singh <sudhakars@nvidia.com> Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add fp8 tensor to ci test Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * review comments and tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Minor changes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Default to FP8 usage Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix docs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Naming changes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * minor fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix transpose caching Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Debug transpose caching Handle case where transpose cache is updated externally. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Rename FP8GlobalStateManager.with_fp8_parameters Signed-off-by:
Tim Moon <tmoon@nvidia.com> * remove Float8Tensor from import API Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Avoid caching FP8 transposes if not required Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix import error in FP8 tensor tests Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix tranpose caching and checkpointing bug Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Improve caching and fix distopt case Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update transformer_engine/pytorch/float8_tensor.py Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> * Remove recursive logic Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix cache reset bug Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Store FP8 attributes in dict Easier for multiple tensors to share, e.g. detached tensors. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Make sure scale_inv is 1D tensor Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Make sure scale_inv is 1D tensor Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fixes and detach recipe Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Set default fp8 data type Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Sudhakar Singh <sudhakars@nvidia.com> Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com>
-
- 12 Oct, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* Add class for RNG state tracker. Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix docs for checkpoint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * docs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 16 Aug, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* Initial refactor Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Reorder methods by purpose Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Save full global state Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * More fixes to test Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 15 Mar, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
Use updated comm API PyTorch Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 03 Jan, 2023 1 commit
-
-
Przemyslaw Tredak authored
Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Przemek Tredak <ptredak@nvidia.com>
-
- 01 Dec, 2022 1 commit
-
-
Przemyslaw Tredak authored
* Add pylint to Lint action Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Test Ubuntu 20.04 Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Pylint inside the container Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Update transformer_engine/pytorch/distributed.py Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Przemyslaw Tredak <ptrendx@gmail.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 16 Nov, 2022 1 commit
-
-
Kirthi Shankar Sivamani authored
* Fix bugs for full activation recompute in FP8 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Ensure identical numerics in recomputation for pipeline parallelism Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * expose checkpoint API and add docs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * complete checkpointing docs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 04 Oct, 2022 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 28 Sep, 2022 1 commit
-
-
Przemek Tredak authored
Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Przemek Tredak <ptredak@nvidia.com>
-