- 26 Nov, 2025 1 commit
-
-
Paweł Gadziński authored
* init Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * lines lenght Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * subtitle --- fix in many files: Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * cross entropy _input -> input rename Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * cross entropy _input -> input rename Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * a lot of small fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * torch_version() change Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add missing module and fix warnings Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * removed training whitespace: Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * Update docs/api/pytorch.rst Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * Fix import Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix more imports Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix NumPy docstring parameter spacing and indentation - Standardize parameter documentation to use 'param : type' format (space before and after colon) per NumPy style guide - Fix inconsistent indentation in cpu_offload.py docstring - Modified 51 Python files across transformer_engine/pytorch Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 21 Nov, 2025 1 commit
-
-
Paweł Gadziński authored
* fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 18 Nov, 2025 1 commit
-
-
Paweł Gadziński authored
* fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com>
-
- 14 Nov, 2025 1 commit
-
-
Paweł Gadziński authored
* init Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * offloading Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * all types Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * typo Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * init Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * api change Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * code drop Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * refactor Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * tests Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * code drop Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * example Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * cpu offload + debug warning Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change empty_like implementation to use make_like Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * main_grad fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * manual synchornization Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * old path Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * remove example Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * api changes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * reverted grouped linear Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * make odl code path work for modules Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * attention old code path Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * legacy tests Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * legacy tests Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * updated code path Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update transformer_engine/pytorch/tensor/quantized_tensor.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * nvfp4 support Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update tests/pytorch/test_cpu_offloading.py Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * small fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * docs change Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
root <root@ptyche0312.ptyche.clusters.nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
-
- 22 Oct, 2025 1 commit
-
-
Evgeny Tsykunov authored
* rename experimental -> custom_recipes Signed-off-by:
Evgeny <etsykunov@nvidia.com> * Decouple python base classes (api) Signed-off-by:
Evgeny <etsykunov@nvidia.com> * update test_custom_recipe Signed-off-by:
Evgeny <etsykunov@nvidia.com> * Rename experimental -> custom Signed-off-by:
Evgeny <etsykunov@nvidia.com> * Minor Signed-off-by:
Evgeny <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix import Signed-off-by:
Evgeny <etsykunov@nvidia.com> * Update tests/pytorch/nvfp4/test_nvfp4_rht_quantize_exact.py Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Evgeny Tsykunov <e.tsykunov@gmail.com> * Update tests/pytorch/test_custom_recipe.py Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Evgeny Tsykunov <e.tsykunov@gmail.com> * quantization_base -> quantized_tensor rename Signed-off-by:
Evgeny <etsykunov@nvidia.com> --------- Signed-off-by:
Evgeny <etsykunov@nvidia.com> Signed-off-by:
Evgeny Tsykunov <e.tsykunov@gmail.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 13 Oct, 2025 1 commit
-
-
Selvaraj Anandaraj authored
* Added multi-layout support for attention Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-ptyche01.ptyche.clusters.nvidia.com> * Comment/cleanup Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-ptyche01.ptyche.clusters.nvidia.com> * Bug fix on import time Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-ptyche01.ptyche.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-ptyche01.ptyche.clusters.nvidia.com> Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
Selvaraj Anandaraj <selvaraja@login-ptyche01.ptyche.clusters.nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com>
-
- 01 Oct, 2025 1 commit
-
-
Evgeny Tsykunov authored
* Introduce QuantizerBase Signed-off-by:
Evgeny <etsykunov@nvidia.com> * Expose as a first-class API Signed-off-by:
Evgeny <etsykunov@nvidia.com> * Undo QuantizerBase Signed-off-by:
Evgeny <etsykunov@nvidia.com> * Make Quantizer a base class without implementations Signed-off-by:
Evgeny <etsykunov@nvidia.com> * Support CustomRecipe and CustomRecipeState Signed-off-by:
Evgeny <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolving comments: quantize impl, num_quantizers, defaults Signed-off-by:
Evgeny <etsykunov@nvidia.com> * Quantizer factories Signed-off-by:
Evgeny <etsykunov@nvidia.com> * Add tests Signed-off-by:
Evgeny <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * QuantizedTensorBase _get_quantizer() + quantize_() Signed-off-by:
Evgeny <etsykunov@nvidia.com> * Experimental note + LayerNormMLP fix Signed-off-by:
Evgeny <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * tensor._internal -> tensor.base Signed-off-by:
Evgeny <etsykunov@nvidia.com> * Expose Signed-off-by:
Evgeny <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Minor import fix Signed-off-by:
Evgeny <etsykunov@nvidia.com> * Single quantizer factory with roles Signed-off-by:
Evgeny <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * More context for qfactory, fwd/bwd_roles Signed-off-by:
Evgeny <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Minor Signed-off-by:
Evgeny <etsykunov@nvidia.com> * Rename *Base -> *Storage quantized tensors Signed-off-by:
Evgeny <etsykunov@nvidia.com> * make_quantizers() will take roles from the operation Signed-off-by:
Evgeny <etsykunov@nvidia.com> * Improve tests and fix missing imports Signed-off-by:
Evgeny <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply suggestions from code review Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> * Merge main followup Signed-off-by:
Evgeny <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Evgeny <etsykunov@nvidia.com> Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 02 Sep, 2025 1 commit
-
-
Selvaraj Anandaraj authored
* Create GPU relaod buffers on main stream Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-ptyche01.ptyche.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed typo Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-preos01.a51.clusters.nvidia.com> * Fixed typo Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-preos01.a51.clusters.nvidia.com> --------- Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-ptyche01.ptyche.clusters.nvidia.com> Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-preos01.a51.clusters.nvidia.com> Co-authored-by:
Selvaraj Anandaraj <selvaraja@login-ptyche01.ptyche.clusters.nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Selvaraj Anandaraj <selvaraja@login-preos01.a51.clusters.nvidia.com> Co-authored-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com>
-
- 29 Jul, 2025 1 commit
-
-
Jan Bielak authored
Signed-off-by:
Jan Bielak <jbielak@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 25 Jul, 2025 1 commit
-
-
Selvaraj Anandaraj authored
* Fixed double buffering issue for assymetric layers Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-ptyche01.ptyche.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-ptyche01.ptyche.clusters.nvidia.com> Co-authored-by:
Selvaraj Anandaraj <selvaraja@login-ptyche01.ptyche.clusters.nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 10 Jul, 2025 1 commit
-
-
Paweł Gadziński authored
* push Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * lint fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * Update tests/pytorch/test_sanity.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * Update tests/pytorch/test_sanity.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * Update tests/pytorch/test_sanity.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * add Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 12 Jun, 2025 1 commit
-
-
Selvaraj Anandaraj authored
* Added double buffering support initial commit Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-preos01.a51.clusters.nvidia.com> * Fixed bugs Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-ptyche02.ptyche.clusters.nvidia.com> * Make only one double buffer creation Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-ptyche02.ptyche.clusters.nvidia.com> * Fixed bug Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-ptyche02.ptyche.clusters.nvidia.com> * Fixed typo Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-ptyche02.ptyche.clusters.nvidia.com> * Fixed flag setting Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-ptyche02.ptyche.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Merge conflict Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-ptyche02.ptyche.clusters.nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * lint fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-preos01.a51.clusters.nvidia.com> Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-ptyche02.ptyche.clusters.nvidia.com> Signed-off-by:
Selvaraj Anandaraj <anandaraj@wisc.edu> Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
Selvaraj Anandaraj <selvaraja@login-preos01.a51.clusters.nvidia.com> Co-authored-by:
Selvaraj Anandaraj <selvaraja@login-ptyche02.ptyche.clusters.nvidia.com> Co-authored-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> Co-authored-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 09 Jun, 2025 1 commit
-
-
Selvaraj Anandaraj authored
* Lora spike Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-preos02.a51.clusters.nvidia.com> * Added FP8 param support Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-preos02.a51.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * lint fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-preos02.a51.clusters.nvidia.com> Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
Selvaraj Anandaraj <selvaraja@login-preos02.a51.clusters.nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Pawel Gadzinski <pgadzinski@nvidia.com>
-
- 06 May, 2025 1 commit
-
-
Przemyslaw Tredak authored
* Changes to Linear Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Removing unnecessary check Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Relax the absolute tolerance in FP32 distributed test Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add QuantizedTensorBase class Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Change the blockwise tensor. Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * A little cleaning Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> --------- Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 17 Apr, 2025 1 commit
-
-
Paweł Gadziński authored
* drop Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 13 Mar, 2025 1 commit
-
-
Tim Moon authored
* Delete row-wise data in single-GPU linear forward Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug Python->C++ parsing of transpose-only Float8Tensors Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug tensor shape calculation without row-wise data Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug correctness issues with only column-wise data Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Only cache column-wise input in LayerNormLinear Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Support MXFP8 all-gather with only column-wise data Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix moe cases, lint, rm unused ctx Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix CPU activation offloading and use consistent logic for save/restore Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix typo Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * RM stray file Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix distributed and cpp tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix norm cpp tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Rm stray file Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * RM stray file Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix MXFP8 AG Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix FP8 with sequence parallelism Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix UB bulk dgrad Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 09 Mar, 2025 1 commit
-
-
Selvaraj Anandaraj authored
* Verified TE2.0 with offloading Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Skipping tests for Ampere and removed child class preparing Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-preos01.a51.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * offloading support for MXFP8 dtype Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-preos01.a51.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Changed quantized tensor detection mechanism Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-preos01.a51.clusters.nvidia.com> * Fix mxfp8 offload, lint errors, and var name Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Supported disabling offloading for quantized tensors Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-preos01.a51.clusters.nvidia.com> * bug fix Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-preos01.a51.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bugs Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-preos01.a51.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added support for None in list of Quantized data tensors Signed-off-by:
root <root@prenyx0095.a51.clusters.nvidia.com> * Hopper backward compatibility cleanup Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Coding style nit Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added guards Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Selvaraj Anandaraj <anandaraj@wisc.edu> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Selvaraj Anandaraj <anandaraj@wisc.edu> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 19 Feb, 2025 1 commit
-
-
Zhenhuan Liu authored
* Fix issues for MCore DDP. Signed-off-by:
Dennis Liu <denliu@nvidia.com> * Remove force data release for CPU offloading. Signed-off-by:
Dennis Liu <denliu@nvidia.com> * Add preserved attributeds. Signed-off-by:
Dennis Liu <denliu@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add main_grad to prevserved attributes. Signed-off-by:
Dennis Liu <denliu@nvidia.com> * Change prepare_for_saving to original tensor and add .data to CPU hook. Signed-off-by:
Dennis Liu <denliu@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update. Signed-off-by:
Dennis Liu <denliu@nvidia.com> * Fix for LayernormLinear in FP8. Signed-off-by:
Dennis Liu <denliu@nvidia.com> --------- Signed-off-by:
Dennis Liu <denliu@nvidia.com> Co-authored-by:
Xin Yao <xiny@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 07 Feb, 2025 1 commit
-
-
Przemek Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
- 02 Jan, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 16 Oct, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
* Upgrade pylint and first round formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * round 2 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * round 3 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Format and fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Paddle lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Reviews Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * FIxes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * More linting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Run formatter Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Paddle lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 04 Oct, 2024 1 commit
-
-
Tim Moon authored
* CPU perf optimization in linear autograd function Avoid enable_grad context when possible in cast function. Cache distributed group properties. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * CPU perf optimization in prepare_forward function Avoid torch.nn.Module impl of __setattr__. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Avoid module import in TE module forwards Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use fast getter for params Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Reuse tensor dims in linear autograd func Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply optimizations to grouped linear Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug test failures Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Debug test failures Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix linter warnings Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Avoid deepcopy in tests Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Move _fast_setattr logic to __setattr__ method Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 30 Jul, 2024 1 commit
-
-
Selvaraj Anandaraj authored
* Load balanced offloading algorithm Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com> Co-authored-by:
Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 23 Jul, 2024 1 commit
-
-
Selvaraj Anandaraj authored
* removed unwanted memcpyDtoD/fixed weight parametrisation Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com> Co-authored-by:
Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 14 Jun, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
* Apply formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Apply formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 22 Apr, 2024 1 commit
-
-
Tim Moon authored
* Remove unnecessary Pylint overrides Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fixes to lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 30 Jan, 2024 1 commit
-
-
Selvaraj Anandaraj authored
Fixed offloading for PyT version/ Added Attention activation offloading support/ Native FP8 support (#632) * Fixed offloading for PyT version/ Added Attention activation offloading support/ Native FP8 support Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> * Removed activation offloading for fused attention Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> * Fixed the illegal memory access issue for activation offloading of attention Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> * Removed the version guard Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> * Pipeline failures fix Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> * Fixed lint erros Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> * Lint error fix Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> --------- Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> Co-authored-by:
Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com>
-
- 21 Jan, 2024 1 commit
-
-
Selvaraj Anandaraj authored
Activation offloading to CPU's for the Linear, Layernorm Linear and the Layernorm MLP modules (#571) * Added support activation offloading to CPU's Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> * Moving CPU offloading library to TE Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> * Restructured code, added switch to choose between weight/activation offloading Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> * Removed arg during constructor Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> * Fix nit-pick errors Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> * Documentation fixes Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix to the code block in docs Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Added offloading unit test Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> * Fixed formatting Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> * wgrad fusion fix, minor errors and lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Errors, test, lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * RM test file Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fixed stray PyT tensors in LayernormMLP getting offloaded Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> * Fixed typi Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> * Fix offloading for rmsnorm, rm test Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix errors Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Float8Tensor compatible offloading Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Cleanup Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Selvaraj Anandaraj <selvaraja@login-eos01.eos.clusters.nvidia.com> Co-authored-by:
Przemyslaw Tredak <ptredak@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-