- 03 May, 2025 1 commit
-
-
Xin Yao authored
* Fix autocast deprecation warnings Signed-off-by:
Xin Yao <xiny@nvidia.com> * merge main Signed-off-by:
Xin Yao <xiny@nvidia.com> * update Signed-off-by:
Xin Yao <xiny@nvidia.com> * resolve comments Signed-off-by:
Xin Yao <xiny@nvidia.com> --------- Signed-off-by:
Xin Yao <xiny@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 28 Apr, 2025 1 commit
-
-
Kshitij Lakhani authored
* Move MultiHeadAttention into its own file. Modify tests and files in t_e/pytorch to import from the new MHA module Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * Resolving lost MHA changes from PR 1614 as a result of rebase Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * Move context parallelism code into it's own file. Modify test and local imports of cp code accordingly Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * Move softmax.py frm pytorch/ to pytorch/d_p_a Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * Move Unfused and Fused attention to backends.py and some utils functions to pytorch/utils.py Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * Resolving lost mark_activation_offload changes from PR 1678 as a result of rebase Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * Code clean up Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor attention dir Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * Refactor dir structure. Make relevant symbols public in __init__ for attention and d_p_a dirs Move FA package imports to backends.py Code cleanup Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * Modify tests to import attention modules correctly Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * Lint fixes Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * Code clean up and fix typo Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * Allowing InferenceParams and RoPE imports from attention module and pytorch module Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * Allow InferenceParams and RoPE imports via transformer_engine.pytorch and transformer_engine.pytorch.attention modules Remove unnecessary checks for check_set_window_size in MHA and TL Reorder backends such that smaller classes at the start and larger ones at the end Code clean up Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * Reinstating changes from PR 1478 for rope.py lost during rebase conflict resolution Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix lint issues Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * nit: Code clean up Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Make imports leaner Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 15 Apr, 2025 1 commit
-
-
Li Tao authored
* support adam bf16 state Signed-off-by:
XiaobingSuper <xiaobingzhangupc@gmail.com> * use fp32 kernel but keep bf16 optimizer states to save memory Signed-off-by:
lit <lit@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
XiaobingSuper <xiaobingzhangupc@gmail.com> Signed-off-by:
lit <lit@nvidia.com> Co-authored-by:
XiaobingSuper <xiaobingzhangupc@gmail.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 07 Feb, 2025 1 commit
-
-
Przemek Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
- 31 Jan, 2025 1 commit
-
-
Selvaraj Anandaraj authored
* Initial commit Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com> * Fixed compilation errors Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com> * Fixed syntax errors Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed NaN issue when initial param value is zero Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com> * Removed 64 bit indexing instantiation Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com> * Made this feature an opt-in Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com> * Removed arg from unscaled state Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com> * Fixed compilation error Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Cleaned up errors Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added support for checkpointing Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed checkpointing logic Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com> * Added tests Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added assert failure for capturable mode Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed pylint errors Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com> --------- Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com> Co-authored-by:
Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 02 Jan, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 01 Nov, 2024 1 commit
-
-
Kunlun Li authored
* Add precision aware fused adam Signed-off-by:
kunlunl <kunlunl@nvidia.com> * Minor changes based on review comments. Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Kunlun Li <94586211+kunlunl@users.noreply.github.com> --------- Signed-off-by:
kunlunl <kunlunl@nvidia.com> Signed-off-by:
Kunlun Li <94586211+kunlunl@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 18 Sep, 2024 1 commit
-
-
Tim Moon authored
Port optimizer tests to pytest Signed-off-by:Tim Moon <tmoon@nvidia.com>
-
- 16 Aug, 2024 1 commit
-
-
Shijie authored
* support dtype casting fusion in FusedAdam Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * minor changes Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * fix lint Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * changes based on review comments Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * remove unused code Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * code refactor Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * fix typo Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * refactor Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * remove unused code Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * Fix linter warnings Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Copy CUDA headers for framework sdists Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Shijie Wang <jaywan@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com>
-
- 14 Jun, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
* Apply formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Apply formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 30 May, 2024 1 commit
-
-
Xin Yao authored
* add multi-tensor kernels Signed-off-by:
Xin Yao <xiny@nvidia.com> * add FusedAdam Signed-off-by:
Xin Yao <xiny@nvidia.com> * add test to qa Signed-off-by:
Xin Yao <xiny@nvidia.com> * add FusedSGD Signed-off-by:
Xin Yao <xiny@nvidia.com> * fix lint Signed-off-by:
Xin Yao <xiny@nvidia.com> --------- Signed-off-by:
Xin Yao <xiny@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-