- 24 Feb, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* Remove redundant amax AR for SP case Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * update advanced docs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 16 Feb, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* Fix no reduce_amax option for SP case Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * add warning about overriding reduce_amax Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 07 Feb, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* Bug fixes from PR 22 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add FP8 tests to ci Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * bundle unittests for ci Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 03 Feb, 2023 1 commit
-
-
vasunvidia authored
* Increase number of FP8 tensors per GEMM Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Enable FP8 output tensor for fp8_gemm Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * [BERT FP8] Initial TE review comments Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Temporary fix for cuda graph non convergence Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Address review comments-2 Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Review comments-3 Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Cleanup Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Change for New API Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Remove unnecessary clone for D_scale, D_amax Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Avoid Roll for AMAX history size = 1 Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Update onnx_te_gemm API Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Fix Lint errors Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> --------- Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com>
-
- 31 Jan, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 27 Jan, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 24 Jan, 2023 1 commit
-
-
schetlur-nv authored
* Initial commit for fp8 calibration. Signed-off-by:
Sharan Chetlur <schetlur@dlcluster.nvidia.com> * Fixes to make unit tests pass Signed-off-by:
Sharan Chetlur <schetlur@dlcluster.nvidia.com> * Added test and finished implementation Signed-off-by:
Sharan Chetlur <schetlur@nvidia.com> * Cleaning up handling of save_for_backward in Linear Signed-off-by:
Sharan Chetlur <schetlur@nvidia.com> * Removing commented lines Signed-off-by:
Sharan Chetlur <schetlur@nvidia.com> * Minor fix to mnist test. Signed-off-by:
Sharan Chetlur <schetlur@nvidia.com> * Pylint cleanup Signed-off-by:
Sharan Chetlur <schetlur@nvidia.com> * Moving stats computation to the forward pass instead of pre_forward, and extending to all other layers Signed-off-by:
Sharan Chetlur <schetlur@nvidia.com> * Pylint cleanup Signed-off-by:
Sharan Chetlur <schetlur@nvidia.com> * Pylint cleanup. Signed-off-by:
Sharan Chetlur <schetlur@nvidia.com> * Fixing unit test failures. Signed-off-by:
Sharan Chetlur <schetlur@nvidia.com> * Misc changes Signed-off-by:
Sharan Chetlur <schetlur@nvidia.com> * Fixing bad indentation from master merge and moving some code into the needs_stats conditional Signed-off-by:
Sharan Chetlur <schetlur@nvidia.com> Signed-off-by:
Sharan Chetlur <schetlur@dlcluster.nvidia.com> Signed-off-by:
Sharan Chetlur <schetlur@nvidia.com> Signed-off-by:
schetlur-nv <116769508+schetlur-nv@users.noreply.github.com> Co-authored-by:
Sharan Chetlur <schetlur@dlcluster.nvidia.com>
-
- 17 Jan, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* Move scale inverse calculation to framework Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * cleanup Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix RMSNorm Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix gated kernel/geglu Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 03 Jan, 2023 1 commit
-
-
Przemyslaw Tredak authored
Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Przemek Tredak <ptredak@nvidia.com>
-
- 23 Nov, 2022 1 commit
-
-
Kirthi Shankar Sivamani authored
fix checkpoint loading bug for FAR Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 17 Nov, 2022 1 commit
-
-
Kirthi Shankar Sivamani authored
* Make amax reduction optional Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * remove setup for global amax redux for optional case Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Improve documentation Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Address documentation review Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Documentation fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * better FP8 checkpointing Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Making checkpointing backwards compatible Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add deprecation warning for old checkpoint loading Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix checkpointing for fp8 recompute case Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * improvements to deprecation warning Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Przemyslaw Tredak <ptredak@nvidia.com> Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com>
-
- 16 Nov, 2022 1 commit
-
-
Kirthi Shankar Sivamani authored
* Fix bugs for full activation recompute in FP8 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Ensure identical numerics in recomputation for pipeline parallelism Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * expose checkpoint API and add docs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * complete checkpointing docs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 28 Sep, 2022 1 commit
-
-
Przemek Tredak authored
Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Przemek Tredak <ptredak@nvidia.com>
-