- 20 May, 2025 3 commits
-
-
Paweł Gadziński authored
* docs drop Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * a Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * Update docs/debug/1_getting_started.rst Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * Update docs/debug/1_getting_started.rst Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix imgs Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com>
-
Peter St. John authored
* Use an empty torch tensor to indicate no fp8 information in extra_state Signed-off-by:
Peter St. John <pstjohn@nvidia.com> * Add huggingface from_pretrained / save_pretrained tests Adds integration tests to ensure models containing TransformerLayer objects can be saved and loaded using the from_pretrained and save_pretrained methods. Signed-off-by:
Peter St. John <pstjohn@nvidia.com> --------- Signed-off-by:
Peter St. John <pstjohn@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
guyueh1 authored
* Fix split_overlap_rs aggregate=True chunk offset calculation Signed-off-by:
Guyue Huang <guyueh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add unit test for aggregate=True Signed-off-by:
Guyue Huang <guyueh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix unit test Signed-off-by:
Guyue Huang <guyueh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Guyue Huang <guyueh@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 19 May, 2025 3 commits
-
-
Evgeny Tsykunov authored
* Check tensor-recipe compatibility Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Tensor class in recipe, checking for *Base Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Extend recipe __repr__ with recipe_type Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Warn about recipe change Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Enable dynamic recipe change: clear fp8 workspace Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * TE 1.x checkpoint compatibility Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Disable warning for recipe wrappers Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Test recipe change Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use QuantizedTensorBase Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Fix circular import Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Revert previous circular import fix Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Fix pytorch imports in common Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Let quantizer know about the recipe Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix imports Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> --------- Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Przemyslaw Tredak <ptredak@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Kirthi Shankar Sivamani authored
* Fix README render on PyPI Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update README.rst Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Use anonymous hyperlink for duplicate. Fix indent. Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Paweł Gadziński authored
* tests drop Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move dir Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * tests fox Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Przemek Tredak <ptredak@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 17 May, 2025 1 commit
-
-
Przemek Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
- 16 May, 2025 3 commits
-
-
Selvaraj Anandaraj authored
* Added token ignoring for CE loss Signed-off-by:
Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added tests Signed-off-by:
root <root@cw-dfw-h100-004-210-013.cm.cluster> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> Co-authored-by:
Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
jberchtold-nvidia authored
* [JAX] Update flax module param initialization to support logical partitioning axes Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Fix ffn1 intermediate result being replicated Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Lint Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Add documentation and assert when logical_axes=None Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Fix bias in LayerNormMLP flax module Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Fix layer tests to not use nn_partitioning and instead use nn.with_logical_axes Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> --------- Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com>
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 15 May, 2025 2 commits
-
-
Kirthi Shankar Sivamani authored
* Cleanup runtime library loading Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Better comments and logic Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix catching stray builds Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix missing fw case Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * minor grammar Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix duplicate SO for editable installs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Better comment for build ext Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Improve error msg Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Kirthi Shankar Sivamani authored
removed unused test deps Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 14 May, 2025 5 commits
-
-
Peter St. John authored
Signed-off-by:
Peter St. John <pstjohn@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Kirthi Shankar Sivamani authored
Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com>
-
Charlene Yang authored
* reduce FA versions to make CI leaner Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * improve build speed Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add FA env var for all archs Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
-
Kirthi Shankar Sivamani authored
* rm unused swizzle extensions Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix swizzle Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Consistent namespaces and first refactor Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * format and lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * transformer_engine Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * revert accidental perm change Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Tim Moon authored
* Disable verbose debug logs in CI Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Disable log_cli option Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 13 May, 2025 3 commits
-
-
Evgeny Tsykunov authored
* Set sequence_parallel before super().__init__() in norm modules Signed-off-by:
Evgeny Tsykunov <etsykunov@etsykunov-mlt.client.nvidia.com> * getattr(self, sequence_parallel, None) -> self.sequence_parallel Signed-off-by:
Evgeny Tsykunov <etsykunov@etsykunov-mlt.client.nvidia.com> --------- Signed-off-by:
Evgeny Tsykunov <etsykunov@etsykunov-mlt.client.nvidia.com> Co-authored-by:
Evgeny Tsykunov <etsykunov@etsykunov-mlt.client.nvidia.com>
-
Charlene Yang authored
* disable sm89 and cuDNN < 9.11 for KV caching Signed-off-by:
Charlene Yang <charleney@nvidia.com> * disable some numerics tests Signed-off-by:
Charlene Yang <charleney@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Charlene Yang <charleney@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Kirthi Shankar Sivamani authored
* Disallow kwargs for pybind extensions and release GIL if possible Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Wrap nvte_* calls Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 12 May, 2025 2 commits
-
-
jberchtold-nvidia authored
This reverts commit 5bee81e2 . Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com>
-
Kirthi Shankar Sivamani authored
* Remove default debug info from distutils Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * add assert Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 11 May, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
* First pass refactor Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * first pass Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * core compiles Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Include cuda dirs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Compiles Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix test Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Move grad outside autocast Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix kv cache Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Address review comments Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change src file name in cmake Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * move the kernels too Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Move comment Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Move comments around Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * more movement Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * move Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 09 May, 2025 1 commit
-
-
Tim Moon authored
* Avoid spurious warning with non-FP8 GroupedLinear Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use `QuantizedTensorBase` Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 08 May, 2025 3 commits
-
-
Paweł Gadziński authored
* features drop Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * Update transformer_engine/debug/features/utils/stats_computation.py Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * Update transformer_engine/debug/features/disable_fp8_layer.py Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * Update transformer_engine/debug/features/log_fp8_tensor_stats.py Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * Update transformer_engine/debug/features/utils/stats_buffer.py Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * Update transformer_engine/debug/features/per_tensor_scaling.py Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * Update transformer_engine/debug/features/per_tensor_scaling.py Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * Update transformer_engine/debug/features/disable_fp8_gemm.py Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * Update transformer_engine/debug/features/per_tensor_scaling.py Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * Update transformer_engine/debug/features/per_tensor_scaling.py Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * changes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * temporarily removed saturations Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update transformer_engine/debug/features/_test_dummy_feature.py Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * docs fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Li Tao authored
* use lru to cache torch.Tensor() Signed-off-by:
lit <lit@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove duplicated definition Signed-off-by:
lit <lit@nvidia.com> * Update transformer_engine/pytorch/tensor/utils.py Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
lit <lit@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Xiaowei Ren authored
Signed-off-by:
Xiaowei Ren <xren@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 07 May, 2025 5 commits
-
-
Tim Moon authored
* Initial work toward restoring UB support in te.Sequential Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Forward UB linear runs, but has numerical error Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug UB forward tests Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Minor tweaks Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Remove Python checks for MXFP8 UB linear forward Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add dim check for MXFP8 full tiles Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Move QuantizedTensor logic out of UB comm and into Python helper function Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Support MXFP8 AGs Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Coalesce NCCL all-gathers for MXFP8 all-gather Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Initial impl of backward UB linear in te.Sequential Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug UB linear backward with no quantization dgrad GEMM + dx RS is still broken. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix chunk dims for dgrad GEMM + dx RS Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debugging MXFP8 UB cases Still failing with dy AG + wgrad GEMM Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use NCCL to overlap dy AG with dgrad GEMM Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug UB GEMM tests Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Initial refactoring of linear module forward Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Refactor linear module backward Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug linear module UB tests Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Tweak test tensor dims Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Do not store autograd context within wgrad GEMM closure Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix linter warnings Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update LayerNormLinear Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update LayerNormMLP Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug UB tests Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix linter warnings Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Debug test failures Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Minor style tweaks Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix incorrect usage for GEMM input with block-scaled FP8 Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix RS out dims Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Disable dgrad GEMM + UB AG + NCCL AG overlapping Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Disable dgrad GEMM + UB AG + NCCL AG overlap in te.Sequential Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Restore support for internal quantized tensors Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add tests for MXFP8 GEMM with UB Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix linter warnings Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug test failures Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug test failures Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Kirthi Shankar Sivamani authored
Add build isolation to workflow Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
jberchtold-nvidia authored
Signed-off-by:Jeremy Berchtold <jberchtold@nvidia.com>
-
Peter St. John authored
Signed-off-by:
Peter St. John <pstjohn@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Santosh Bhavani authored
* added a direct link to the quickstart notebook right after the code examples section Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * updated link in README for HF Accelerate docs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * update DeepSpeed integration link Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update Release Notes link to documentation archive Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * updated latest news and moved older news under a dropdown caret Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * moved previous news to bottom of readme Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fixed previous news link Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * added gtc videos Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * added TE GTC 2025 talk to latest news Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Santosh Bhavani <sbhavani@nvidia.com>
-
- 06 May, 2025 2 commits
-
-
Przemyslaw Tredak authored
* Changes to Linear Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Removing unnecessary check Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Relax the absolute tolerance in FP32 distributed test Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add QuantizedTensorBase class Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Change the blockwise tensor. Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * A little cleaning Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> --------- Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
jberchtold-nvidia authored
* Fix L2 test_custom_call_compute.py L2 tests Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Fix test_helper.py Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Address comments Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> --------- Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com>
-
- 05 May, 2025 4 commits
-
-
Phuong Nguyen authored
* removes unneccessary reshapes for FP8 GEMM * use nn.jax.scaled_matmul Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com>
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Kirthi Shankar Sivamani authored
* Move multi tensors kernels from PyTorch extensions to core Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add int16 type to core (for storing fp32 param remainders) Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix core build Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * same fix to scale Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix perf, memory, vars Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Re-add device guard for multi-device Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix junk output dtype for non-per tensor Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fixes for test and upgrade mcore version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix core tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
jberchtold-nvidia authored
* Enforce input sharding of norm primitive does not shard hidden dim Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Fix partitioning issue in dact primitive causing NaN and add better shape checks before calling TE API Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Move dact shape assertion from cpp to python Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> --------- Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com>
-
- 03 May, 2025 1 commit
-
-
Xin Yao authored
* Fix autocast deprecation warnings Signed-off-by:
Xin Yao <xiny@nvidia.com> * merge main Signed-off-by:
Xin Yao <xiny@nvidia.com> * update Signed-off-by:
Xin Yao <xiny@nvidia.com> * resolve comments Signed-off-by:
Xin Yao <xiny@nvidia.com> --------- Signed-off-by:
Xin Yao <xiny@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 02 May, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
* Update README.rst Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update README.rst Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-