Commits · 0edf30b87159e82048b5f248e4b379aebb8f364a · OpenDAS / TransformerEngine

07 Jun, 2024 1 commit

[PyTorch] Distributed intermediate/activation tensors for FSDP (#687) · 0edf30b8

Alp Dener authored Jun 07, 2024



* New TE wrapper for PyTorch FullyShardedDataParallel to make TE modules distribute their activations after the forward pass and gather them before the backward pass
Signed-off-by: Alp Dener <adener@nvidia.com>

* simplified TE module setup for FSDP comms
Signed-off-by: Alp Dener <adener@nvidia.com>

* FSDP scatter/gather for tensors saved into autograd ctx now working for base TE modules
Signed-off-by: Alp Dener <adener@nvidia.com>

* make sure activation recompute disables FSDP scatter/gather
Signed-off-by: Alp Dener <adener@nvidia.com>

* make sure Fp8 weight buffers are sharded at the end of the backward pass and gathered before forward
Signed-off-by: Alp Dener <adener@nvidia.com>

* Fixed typo in attribute name
Signed-off-by: Alp Dener <adener@nvidia.com>

* fixed bug in finding FSDP-wrapped TE modules
Signed-off-by: Alp Dener <adener@nvidia.com>

* fixed typo in fp8 weight tensor name
Signed-off-by: Alp Dener <adener@nvidia.com>

* fixed incorrect # of gradients
Signed-off-by: Alp Dener <adener@nvidia.com>

* Added fp8 amax gradient hook tensor to the parameter reset
Signed-off-by: Alp Dener <adener@nvidia.com>

* get rid of erroneous dummy tensor leftover from incorrect rebase
Signed-off-by: Alp Dener <adener@nvidia.com>

* Linting fixes
Signed-off-by: Alp Dener <adener@nvidia.com>

* fixing git snafu and removing debug statements
Signed-off-by: Alp Dener <adener@nvidia.com>

---------
Signed-off-by: Alp Dener <adener@nvidia.com>
Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

0edf30b8

19 Jan, 2024 1 commit
- chore: Fix multiple typos (#617) · e4f506a0
  hugo-syn authored Jan 19, 2024
```
Signed-off-by: hugo-syn <hugo.vincent@synacktiv.com>
```
  e4f506a0
17 Jan, 2024 1 commit

[PyTorch] Deferred Initialization via `device='meta'` option (#596) · 434d58fa

Alp Dener authored Jan 17, 2024



* Implemented deferred initialization via `device='meta'` option for te.Linear and added new PyTorch example to demonstrate its use with FullyShardedDataParallel execution.
Signed-off-by: Alp Dener <adener@nvidia.com>

* correcting Float8Tensor initialization and fixing linting errors
Signed-off-by: Alp Dener <adener@nvidia.com>

* removed duplicate code from upstream rebase, local tests passing
Signed-off-by: Alp Dener <adener@nvidia.com>

* improved comments/documentation for FSDP example
Signed-off-by: Alp Dener <adener@nvidia.com>

* converted reset_parameters() into a base module function
Signed-off-by: Alp Dener <adener@nvidia.com>

* fixed Float8Tensor creation with deferred init, all tests passing locally
Signed-off-by: Alp Dener <adener@nvidia.com>

* extended deferred initialization to all TE modules
Signed-off-by: Alp Dener <adener@nvidia.com>

* fixed linting errors
Signed-off-by: Alp Dener <adener@nvidia.com>

* removed unnecessary reference to the parent module of parameter, added clarifying comments in parameter reset
Signed-off-by: Alp Dener <adener@nvidia.com>

---------
Signed-off-by: Alp Dener <adener@nvidia.com>

434d58fa

03 Jan, 2024 1 commit
- Change the copyright to include 2024 (#583) · cd798c97
  Przemyslaw Tredak authored Jan 02, 2024
```
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
```
  cd798c97
14 Jul, 2023 1 commit

Fix calibration in PyTorch example (#322) · 04822f40

Kirthi Shankar Sivamani authored Jul 14, 2023



* Fix example
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Review
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

---------
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

04822f40

24 Jan, 2023 1 commit

Schetlur/fp8 calibration (#40) · 7fc079a4

schetlur-nv authored Jan 24, 2023



* Initial commit for fp8 calibration.
Signed-off-by: Sharan Chetlur <schetlur@dlcluster.nvidia.com>

* Fixes to make unit tests pass
Signed-off-by: Sharan Chetlur <schetlur@dlcluster.nvidia.com>

* Added test and finished implementation
Signed-off-by: Sharan Chetlur <schetlur@nvidia.com>

* Cleaning up handling of save_for_backward in Linear
Signed-off-by: Sharan Chetlur <schetlur@nvidia.com>

* Removing commented lines
Signed-off-by: Sharan Chetlur <schetlur@nvidia.com>

* Minor fix to mnist test.
Signed-off-by: Sharan Chetlur <schetlur@nvidia.com>

* Pylint cleanup
Signed-off-by: Sharan Chetlur <schetlur@nvidia.com>

* Moving stats computation to the forward pass instead of pre_forward, and extending to all other layers
Signed-off-by: Sharan Chetlur <schetlur@nvidia.com>

* Pylint cleanup
Signed-off-by: Sharan Chetlur <schetlur@nvidia.com>

* Pylint cleanup.
Signed-off-by: Sharan Chetlur <schetlur@nvidia.com>

* Fixing unit test failures.
Signed-off-by: Sharan Chetlur <schetlur@nvidia.com>

* Misc changes
Signed-off-by: Sharan Chetlur <schetlur@nvidia.com>

* Fixing bad indentation from master merge and moving some code into the needs_stats conditional
Signed-off-by: Sharan Chetlur <schetlur@nvidia.com>
Signed-off-by: Sharan Chetlur <schetlur@dlcluster.nvidia.com>
Signed-off-by: Sharan Chetlur <schetlur@nvidia.com>
Signed-off-by: schetlur-nv <116769508+schetlur-nv@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <schetlur@dlcluster.nvidia.com>

7fc079a4

03 Jan, 2023 1 commit

Update copyright year (#48) · 64a8dc90

Przemyslaw Tredak authored Jan 03, 2023


Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

64a8dc90

28 Sep, 2022 1 commit

Inital code drop · 996ea169

Przemek Tredak authored Sep 27, 2022


Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

996ea169