- 30 May, 2025 1 commit
-
-
Tim Moon authored
Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 29 May, 2025 4 commits
-
-
Hua Huang authored
* Support SWA in CP Ring Attn THD striped sharding Signed-off-by:
Hua Huang <huah@nvidia.com> * Add some comments; move check to _FusedAttnCPWithP2PHelper.check_supported() Signed-off-by:
Hua Huang <huah@nvidia.com> [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Remove unused check Signed-off-by:
Hua Huang <huah@nvidia.com> --------- Signed-off-by:
Hua Huang <huah@nvidia.com>
-
Przemyslaw Tredak authored
* Changed the Tensor allocation strategy Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fixes Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Disable debug flag Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the double free error Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fixed pyTorch recipe extension Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Hide TensorAllocator and fix the usage in LayerNorm Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Cleaning Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix permutation Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> --------- Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Tim Moon authored
* Linear op avoids saving input tensor if weight grad is not needed Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Linear op forward avoids producing quantized tensors with unnecessary usages Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix linter warnings Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Avoid unnecessary usages in fused linear ops Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com>
-
Phuong Nguyen authored
* removes unnecessary reshapes for FP8 GEMM * use nn.jax.scaled_matmul Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com>
-
- 28 May, 2025 3 commits
-
-
Phuong Nguyen authored
* fp8 gemm with direct quant Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com>
-
jberchtold-nvidia authored
Update jax_scaled_masked_softmax to match TE kernel implementation Signed-off-by:Jeremy Berchtold <jberchtold@nvidia.com>
-
Kirthi Shankar Sivamani authored
* Fix single FW build with multi FW available Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Some fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * sug Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 27 May, 2025 1 commit
-
-
Santosh Bhavani authored
* added conda installation Signed-off-by:
Santosh Bhavani <santosh.bhavani@live.com> * fix for pypi Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Santosh Bhavani <santosh.bhavani@live.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 23 May, 2025 2 commits
-
-
Przemyslaw Tredak authored
* Modify the test cases Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Make the tests reproducible on different machines Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fixed the cache of the gamma_in_weight_dtype setting Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Reinstate the tests Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * More verbose code and comments Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> --------- Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
jberchtold-nvidia authored
* Fix env variable name in test.sh scripts to properly test pure-JAX implementations Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Update test scripts to use pure-JAX impl in encoder test_custom_call_compute.py already uses pure-JAX impl as reference so testing the pure-JAX impl against itself would be redundant. The encoder tests have their own implementation so testing the pure-JAX impl of primitives is still useful. Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Update qa/L0_jax_unittest/test.sh Co-authored-by:
Phuong Nguyen <phuonguyen@nvidia.com> Signed-off-by:
jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com> --------- Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> Signed-off-by:
jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com> Co-authored-by:
Phuong Nguyen <phuonguyen@nvidia.com>
-
- 22 May, 2025 7 commits
-
-
Tim Moon authored
Activation ops support fusing backward pass with quantize Signed-off-by:Tim Moon <tmoon@nvidia.com>
-
Kirthi Shankar Sivamani authored
Document all recipes Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
jberchtold-nvidia authored
Make primitive names more granular for better disabling granularity Signed-off-by:Jeremy Berchtold <jberchtold@nvidia.com>
-
Kirthi Shankar Sivamani authored
* Build support for cuda 13 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix build for cudnn 8.9*; cuda 12.1 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * readd include Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Kirthi Shankar Sivamani authored
Remove comm_gemm_overlap docs Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
jberchtold-nvidia authored
Fix incorrectly skipped test_quantize_dbias tests Signed-off-by:Jeremy Berchtold <jberchtold@nvidia.com>
-
Paweł Gadziński authored
fix saved_tensors Signed-off-by:Pawel Gadzinski <pgadzinski@nvidia.com>
-
- 21 May, 2025 2 commits
-
-
Sudhakar Singh authored
* fix model parallel encoder to be properly sharded Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Kirthi Shankar Sivamani authored
* Add missing docs for C API Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Grammar, typos, copy-paste errors Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * remove contiguous word Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Better wording Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 20 May, 2025 4 commits
-
-
Charlene Yang authored
add docstring for CP Signed-off-by:Charlene Yang <charleney@nvidia.com>
-
Paweł Gadziński authored
* docs drop Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * a Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * Update docs/debug/1_getting_started.rst Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * Update docs/debug/1_getting_started.rst Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix imgs Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com>
-
Peter St. John authored
* Use an empty torch tensor to indicate no fp8 information in extra_state Signed-off-by:
Peter St. John <pstjohn@nvidia.com> * Add huggingface from_pretrained / save_pretrained tests Adds integration tests to ensure models containing TransformerLayer objects can be saved and loaded using the from_pretrained and save_pretrained methods. Signed-off-by:
Peter St. John <pstjohn@nvidia.com> --------- Signed-off-by:
Peter St. John <pstjohn@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
guyueh1 authored
* Fix split_overlap_rs aggregate=True chunk offset calculation Signed-off-by:
Guyue Huang <guyueh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add unit test for aggregate=True Signed-off-by:
Guyue Huang <guyueh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix unit test Signed-off-by:
Guyue Huang <guyueh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Guyue Huang <guyueh@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 19 May, 2025 3 commits
-
-
Evgeny Tsykunov authored
* Check tensor-recipe compatibility Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Tensor class in recipe, checking for *Base Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Extend recipe __repr__ with recipe_type Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Warn about recipe change Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Enable dynamic recipe change: clear fp8 workspace Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * TE 1.x checkpoint compatibility Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Disable warning for recipe wrappers Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Test recipe change Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use QuantizedTensorBase Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Fix circular import Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Revert previous circular import fix Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Fix pytorch imports in common Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Let quantizer know about the recipe Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix imports Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> --------- Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Przemyslaw Tredak <ptredak@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Kirthi Shankar Sivamani authored
* Fix README render on PyPI Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update README.rst Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Use anonymous hyperlink for duplicate. Fix indent. Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Paweł Gadziński authored
* tests drop Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move dir Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * tests fox Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Przemek Tredak <ptredak@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 17 May, 2025 1 commit
-
-
Przemek Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
- 16 May, 2025 3 commits
-
-
Selvaraj Anandaraj authored
* Added token ignoring for CE loss Signed-off-by:
Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added tests Signed-off-by:
root <root@cw-dfw-h100-004-210-013.cm.cluster> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> Co-authored-by:
Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
jberchtold-nvidia authored
* [JAX] Update flax module param initialization to support logical partitioning axes Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Fix ffn1 intermediate result being replicated Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Lint Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Add documentation and assert when logical_axes=None Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Fix bias in LayerNormMLP flax module Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Fix layer tests to not use nn_partitioning and instead use nn.with_logical_axes Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> --------- Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com>
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 15 May, 2025 2 commits
-
-
Kirthi Shankar Sivamani authored
* Cleanup runtime library loading Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Better comments and logic Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix catching stray builds Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix missing fw case Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * minor grammar Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix duplicate SO for editable installs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Better comment for build ext Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Improve error msg Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Kirthi Shankar Sivamani authored
removed unused test deps Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 14 May, 2025 5 commits
-
-
Peter St. John authored
Signed-off-by:
Peter St. John <pstjohn@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Kirthi Shankar Sivamani authored
Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com>
-
Charlene Yang authored
* reduce FA versions to make CI leaner Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * improve build speed Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add FA env var for all archs Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
-
Kirthi Shankar Sivamani authored
* rm unused swizzle extensions Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix swizzle Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Consistent namespaces and first refactor Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * format and lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * transformer_engine Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * revert accidental perm change Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Tim Moon authored
* Disable verbose debug logs in CI Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Disable log_cli option Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 13 May, 2025 2 commits
-
-
Evgeny Tsykunov authored
* Set sequence_parallel before super().__init__() in norm modules Signed-off-by:
Evgeny Tsykunov <etsykunov@etsykunov-mlt.client.nvidia.com> * getattr(self, sequence_parallel, None) -> self.sequence_parallel Signed-off-by:
Evgeny Tsykunov <etsykunov@etsykunov-mlt.client.nvidia.com> --------- Signed-off-by:
Evgeny Tsykunov <etsykunov@etsykunov-mlt.client.nvidia.com> Co-authored-by:
Evgeny Tsykunov <etsykunov@etsykunov-mlt.client.nvidia.com>
-
Charlene Yang authored
* disable sm89 and cuDNN < 9.11 for KV caching Signed-off-by:
Charlene Yang <charleney@nvidia.com> * disable some numerics tests Signed-off-by:
Charlene Yang <charleney@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Charlene Yang <charleney@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-