- 10 Feb, 2026 1 commit
-
-
Jacket authored
Signed-off-by:Kaining Zhong <kainingz@nvidia.com>
-
- 03 Feb, 2026 1 commit
-
-
Paweł Gadziński authored
* init Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * year update in license Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 28 Jan, 2026 1 commit
-
-
Paweł Gadziński authored
* jjit bug fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix' Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * lint fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 25 Jan, 2026 1 commit
-
-
Tim Moon authored
* Expose option for custom op fusions Refactor fusion functions to remove index bookkeeping. Refactor fused ops to use consistent operation order. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add tests for custom ops Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix linter warnings and numerical test failures Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Tweak pattern matching logic with fixed window sizes Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use TF32 tols in fused op tests Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Review suggestion from @greptile-apps Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Backpropagate fixes from #2622 Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 22 Jan, 2026 2 commits
-
-
Sudhakar Singh authored
* SWA (left, right) with FusedAttention changes cherry-picked from https://github.com/NVIDIA/TransformerEngine/pull/1369 Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix test_kv_cache failures Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * remove unnecessary comments Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * fix some more filter issues, address feedback Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * fix for local test case failures - `bottom_right_diagonal` should be calculated in `fused_attn_fwd` call as well Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * make conditions more accurate Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * add cp tests to test swa (left, right) Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dead code and make conditions better Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feedback form Charlene Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * small er Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * plumb `bottom_right_diagonal` through jax Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * plumb `bottom_right_diagonal` through jax Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add missing fields Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * use proper mask type in CP Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Chen Cui authored
* Update THD sink attention logic for newer cudnn versions THD Sink attention is supported in 9.18.0 Signed-off-by:
Chen Cui <chcui@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update thd sink attention logic for cp>1 Signed-off-by:
Chen Cui <chcui@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add unit test for thd + sink attention Signed-off-by:
Chen Cui <chcui@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * address comments Signed-off-by:
Chen Cui <chcui@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * do not skip thd cp sink attention test Signed-off-by:
Chen Cui <chcui@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * disable deterministic mode for sink attention Signed-off-by:
Chen Cui <chcui@nvidia.com> --------- Signed-off-by:
Chen Cui <chcui@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
-
- 21 Jan, 2026 1 commit
-
-
Przemyslaw Tredak authored
* PoC of the changes Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Early exit from the Free function for the empty tensor Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Use the proper function for nvtx range Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Only do mark_not_offload when the cpu_offloading is enabled Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * First pass on making the setattr issue not come back Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Actually add pytest.ini Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Changes to __init__ Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * A different way Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * WAR the fact that it is not possible to set __setattr__ dynamically Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Simpler solution and fixes Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix for the inference mode DPA Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Start of debugging debug tools Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * More fixes in debug Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Speculative moving the validate_name to the constructor Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Making the debug tools names saner Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Change the setattr usage in the tensor parallel group setting Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Adding try/finally - it does not seem to impact the time in observable way Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fixing lint issues and the thunder test Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix 1 of the debug tests Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Removed the warning and enforcement in the CI Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * try-finally in the context manager Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fixing the debug tests Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 20 Jan, 2026 1 commit
-
-
Charlene Yang authored
* update FE to 1.17 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add determinism flag Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add determinism to test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add determinism to qa/ Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * move bias/dbias/versioning/dropout logic to C API Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update qa/L0_pytorch_unittest/test.sh make .xml file specific to deterministic tests in qa/ Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add determinism to Jax extension Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add determinism to Jax tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update tests/jax/test_fused_attn.py fix typo Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Update transformer_engine/common/fused_attn/fused_attn.cpp fix indentation Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix the AI fixes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Jax extension call Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fixes based on comments Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix selection logic and fwd arg Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix version check in Jax test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix pytorch CI failures Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix Jax CI failures Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix non-/determinism logic and CI Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix formatting Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update transformer_engine/common/fused_attn/fused_attn.cpp fix and/or logic Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update to 9.18.1 for requirement Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * reduce Jax CI tests for determinism Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
-
- 17 Jan, 2026 1 commit
-
-
Tim Moon authored
* Add general C API for setting tensor params Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Implement general accessors for NVTETensor Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Refactor tex swizzling to skip if scales are already swizzled Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add checks for non-swizzled scales in MXFP8 and NVFP4 kernels Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Support pre-swizzled scales in MXFP8Tensor Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add tex function to swizzle MXFP8 scales Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix bug in inplace swizzle function Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Tweak comments to use "compact/swizzled format" Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * MXFP8 quantize kernel with pre-swizzled scales Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Expose pre-swizzled scales in modules Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix bug in multi-swizzle Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Support MXFP8 gated activations with swizzled scales Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add PyTorch infrastructure for pre-swizzled NVFP4 tensors Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Deprecate DSv3-specific quantization logic in C API Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove support for DSv3 compact data from quantizer Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Remove DSv3 compact data format from core lib Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix bug in FP8 all-gather Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix linter warnings Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update JAX to use new swizzled scale API Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Review suggestion from @greptile-apps Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Review suggestions from @greptile-apps Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update C++ swizzle test with swizzled scales API Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Return default tensor params when querying params for invalid NVTETensor Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug DSv3 FP8 test failures Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug Userbuffers test failures Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Make sure gated activations populate FP8 transpose if needed Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Review suggestions from @greptile-apps Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Disable pre-swizzling with debug quantizer Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Review suggestion from @greptile-apps Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix merge conflicts and review suggestions Update copyright years. Tweak comments. Fix various complaints from @greptile-apps. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use explicitly sized types in config accessors Miscellaneous review suggestions from @ptrendx. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Make util header for function that compute swizzled scale index Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply suggestions from @greptile-apps Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> * Update expected error message in FP8 block-scaling test Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Review suggestion from @yaox12 Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
-
- 13 Jan, 2026 1 commit
-
-
Paweł Gadziński authored
* code drop Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 05 Jan, 2026 1 commit
-
-
Peter St. John authored
* Add tests for 2528 and 2529 Signed-off-by:
Peter St. John <pstjohn@nvidia.com> * Update tests/pytorch/test_deferred_init.py Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update tests/pytorch/test_deferred_init.py Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Peter St. John <pstjohn@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 02 Jan, 2026 1 commit
-
-
Kirthi Shankar Sivamani authored
Update copyright to include 2026 Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 27 Dec, 2025 1 commit
-
-
xiaoxi-wangfj authored
* [PyTorch] Fuse permute+pad and unpermute+unpad ops for FP8 optimization 1.Fused `moe_permute_with_probs` + `Fp8Padding` and fused `moe_unpermute` + `Fp8Unpadding`, that can remove the explicit padding/unpadding of moe expert, improved performance and reduced peak gpu memory usage. 2.Add tests of fused permute/pad and unpermute/unpad. Signed-off-by:
xiaoxi-wangfj <690912414@qq.com> * [PyTorch/Common] Fuse permute+pad and unpermute+unpad support with_merging_probs Signed-off-by:
xiaoxi-wangfj <690912414@qq.com> * [PyTorch]format code Signed-off-by:
xiaoxi-wangfj <690912414@qq.com> * [Common]perf expert_idx loaded once Signed-off-by:
xiaoxi-wangfj <690912414@qq.com> * fix: pad_offsets can be None Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
xiaoxi-wangfj <690912414@qq.com> * add padding + merging probs bwd support. Not tested Signed-off-by:
tdophung <tdophung@nvidia.com> * Fix garbage initialized act grad Signed-off-by:
tdophung <tdophung@nvidia.com> * all test passing for jax permutation + pad Signed-off-by:
tdophung <tdophung@nvidia.com> * change tokens_per_experts APIs to num_out_tokens with conservative allocation of worst case padding for output buffer Signed-off-by:
tdophung <tdophung@nvidia.com> * change test permutation to reduce test time Signed-off-by:
tdophung <tdophung@nvidia.com> * triggering PR refresh Signed-off-by:
tdophung <tdophung@nvidia.com> * format code Signed-off-by:
tdophung <tdophung@nvidia.com> * Remove some tests cases from pytorch side. Add a separate toekn_dispatch test for sanity in case combine accidentally undo an error on dispatch in the roundtrip test. Add distinction between L0 and L2 in test cases in jax Signed-off-by:
tdophung <tdophung@nvidia.com> * format code Signed-off-by:
tdophung <tdophung@nvidia.com> * remove chance for inefficiency in moving between CPU and GPU, remove redundant primitive using a new static bool for padding, add assert for align size Signed-off-by:
tdophung <tdophung@nvidia.com> * fix lint in jax Signed-off-by:
tdophung <tdophung@nvidia.com> * account for both jax newer and older than version 0.8.2. Adjusted gpu triton binding accordingly Signed-off-by:
tdophung <tdophung@nvidia.com> * format code Signed-off-by:
tdophung <tdophung@nvidia.com> * fix typo Signed-off-by:
tdophung <tdophung@nvidia.com> --------- Signed-off-by:
xiaoxi-wangfj <690912414@qq.com> Signed-off-by:
tdophung <tdophung@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
tdophung <tdophung@nvidia.com>
-
- 20 Dec, 2025 1 commit
-
-
Zhongbo Zhu authored
* rowwise colwise RHT group quant v1 Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * remove local array RW Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * change wait_barrier Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * fast math options Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * use mult to replace div Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * format Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * bulk move random states Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * greptile Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * lint Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * revert to use divides Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * avoid fp32 bf16 round-trip in RHT cast fusion Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * trigger fastmath by toggle NVTE_RHT_CAST_FUSION_USE_FAST_MATH Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * integrate row col rht fusion, functional Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * numerics aligned Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * style Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * remove device sync Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * 128 padding Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * revert colwise rng state creation because of row-col fused kernel Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * fix CI, linter Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * refactor RS for generating two random values Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * Avoid invalid configs with templated kernel Signed-off-by:
Tim Moon <tmoon@nvidia.com> * fix acc pipeline init with 0 arrival count Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * restore rowwise-only mode Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * switch to dynamic atomic scheduler Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * Avoid instantiating group RHT+cast kernel without row-wise or col-wise output Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Include fast math option in quantization config Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix linter warnings and review nits Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use TE license Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix bug where kernel is always launched on stream Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Restore BF16 intermediate downcast in fused RHT-cast kernels Signed-off-by:
Tim Moon <tmoon@nvidia.com> * fix numerical test of grouped kernel Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * Make sure row-wise and col-wise quantization use different RNG seeds Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> * Restore autoformatter Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 19 Dec, 2025 1 commit
-
-
Sudhakar Singh authored
* add early return back (removed in 2427) Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Make sure Float8Tensor.contiguous supports autograd Expand quantized tensor tests to check identity ops. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com>
-
- 15 Dec, 2025 3 commits
-
-
Paweł Gadziński authored
* Skip delayed wgrad tests in distributed numerics when debug mode is enabled Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
kwyss-nvidia authored
* Check calling convention for amax switch. Wgrad gemms with colwise x colwise require rowwise data via general_gemm. Since dy has both for dgrad and wgrad, the brittleness has likely not affected results. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Clear rowwise data when applicable. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Update test with columnwise cases. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Check enum value rather than implicit cast. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> --------- Signed-off-by:
Keith Wyss <kwyss@nvidia.com>
-
Yashaswi Karnati authored
* fix ce loss with ignore idx Signed-off-by:
ykarnati <ykarnati@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by:
ykarnati <ykarnati@nvidia.com> * remove fix comments Signed-off-by:
ykarnati <ykarnati@nvidia.com> * fallback divisor to 1 Signed-off-by:
ykarnati <ykarnati@nvidia.com> * have arg for n_rows and n_non_ignore Signed-off-by:
ykarnati <ykarnati@nvidia.com> * fuse n_non_ignore to softmax kernel Signed-off-by:
ykarnati <ykarnati@nvidia.com> * fix incorrect arg Signed-off-by:
ykarnati <ykarnati@nvidia.com> --------- Signed-off-by:
ykarnati <ykarnati@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 08 Dec, 2025 1 commit
-
-
vthumbe1503 authored
* bug fixed, test added Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * fix contigous Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * revert unecessary change Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * revert another change Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * address review comments Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * Update transformer_engine/pytorch/tensor/mxfp8_tensor.py Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
vthumbe1503 <vthumbe@nvidia.com> * address review comments Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * missed adding renamed file Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix minor issue Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * fix ci issue Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix the test for bfloat16 Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> Signed-off-by:
vthumbe1503 <vthumbe@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
-
- 02 Dec, 2025 1 commit
-
-
Kunlun Li authored
* Add primary weighs fp8 support for mxfp8 Signed-off-by:
kunlunl <kunlunl@nvidia.com> * Fix unit test and add better error log to unit test Signed-off-by:
kunlunl <kunlunl@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Move post all-gather processing out of for loop Signed-off-by:
kunlunl <kunlunl@nvidia.com> * Add descriptions and ASCII diagrams for partial cast and partial amax functions Signed-off-by:
kunlunl <kunlunl@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Minor fix based on greptile bot Signed-off-by:
kunlunl <kunlunl@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix compilation errors due to arch-specific PTX instructions Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Remove unused noop flag from C API Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Expose test_partial_cast Signed-off-by:
kunlunl <kunlunl@nvidia.com> * Skip mxfp8 partial cast test if mxfp8 is not available Signed-off-by:
kunlunl <kunlunl@nvidia.com> * Fix pytest error Signed-off-by:
kunlunl <kunlunl@nvidia.com> * pylint ignore unused manual_post_all_gather_processing Signed-off-by:
kunlunl <kunlunl@nvidia.com> * Fix error when using is_mxfp8_available Signed-off-by:
kunlunl <kunlunl@nvidia.com> --------- Signed-off-by:
kunlunl <kunlunl@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com>
-
- 26 Nov, 2025 1 commit
-
-
Tim Moon authored
Do not initialize recipe state in base op class Op attrs may not be set. Move recipe state initialization to linear op constructor. Signed-off-by:Tim Moon <tmoon@nvidia.com>
-
- 25 Nov, 2025 2 commits
-
-
Paweł Gadziński authored
* main Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * docs Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * add Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * test fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Zhongbo Zhu authored
* minor fix of torch view dtype Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * multi-tensor RHT amax, compiles Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * setup multi_tensor_quantize_nvfp4_impl Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * wire things up and run without crash Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * numerical test Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * unit test passing Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * finish unit test of split quantize api Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * bump up padding to 64 for nvfp4 grouped quantize Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * fix stochastic rounding Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * lint Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * change error message Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * clean up Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * enable multi-amax without RHT Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * fix col-only quantize mode Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * improve benchmark script Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * add NCU example script Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * add larger test case Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * add contiguous_data_and_scale check to bulk allocator Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * unified naming and differentiate between group_ and multi_ Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * move regular amax into multi_tensor.h Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * Disentangle logic for split-quantize and general multi-tensor quantize Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use size_t for split sections Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Suggestions from @greptile-apps Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 21 Nov, 2025 2 commits
-
-
Tim Moon authored
Only disable Flash Attention in Userbuffers test on A100 Signed-off-by:Tim Moon <tmoon@nvidia.com>
-
Sudhakar Singh authored
* Add support for THD+CP+SWA through A2A comms Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * unblock the `padding`+`THD`+`CP(A2A)` with SWA case in A2A forward Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * add proper support for thd Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * bug fix Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * enable thd+cp tests as essential Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * add cp+thd+a2a test to essential Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * fix comments from greptile Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add proper skip for flash attention Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * fix the test to create separate tensors for flash and fused attention backend scenarios Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * remove redundant compare Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * simplify code Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * add note for cu_seqlens_kv and cu_seqlens_kv_padded Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * Update tests/pytorch/attention/test_attention_with_cp.py Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * Update transformer_engine/pytorch/attention/dot_product_attention/context_parallel.py Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * fixo Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * fix docs Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * fix the argument name Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
-
- 19 Nov, 2025 2 commits
-
-
Tim Moon authored
Disable Flash attention in Userbuffers tests Signed-off-by:Tim Moon <tmoon@nvidia.com>
-
Charlene Yang authored
* fix test_current_device Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 18 Nov, 2025 3 commits
-
-
Jaime authored
[PyTorch] Implement Selective Activation Checkpointing for LayerNormMLP with checkpoint flag (#2311) * custom tests for selective activation checkpointing for layernorm mlp Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * add selective layernorm mlp to te.pytorch Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * update test and fix SLNMLP bug Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * implement slnmlp Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * fix tests pointed out by greptile app bot, still pass Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * minor formatting change in tests/pytorch/selective_layernorm_mlp/distributed/run_numerics.py Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Jaime <102792198+jaimec00@users.noreply.github.com> Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * remove duplicate import in test/pytorch/selective_layernorm_mlp/test_recipe.py Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * clean up tests, remove unused imports Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * remove unused paths in test_deffered_init Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * fix issue with zero_centered_gamma in test_numerics reference implementation Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * clean up tests Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * make comparison.py more extensive, cleaner output Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * fix small typo in tests/pytorch/selective_layernorm_mlp/compare.py Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Jaime <102792198+jaimec00@users.noreply.github.com> Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * fix typo by grepbot in compare.py Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * make selectiuve activation checkpointing optional in slnmlp via checkpoint flag Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * add comments to clarify logic Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * add checkpoint param to pytests, change compare.py to compare checkppoint=False vs checkpoint=True, skip cuda graph tests for checkpoint=True Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * refactor tests to call modified LayerNormMLP Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * refactor to implement selective activation checkpointing directly into LayerNormMLP, also fix bug to reach cleanup logic in fwd Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix skip explanation for cuda_graphs.py Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * make _recompute deal with lists instead of tuples Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix MOST cuda graph failures by initializing identical quantizers during fwd. Float8CurrentScaling with bf16 and fp16 still fail with checkpointing Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix cuda graphs issue, all tests pass now Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix small logic bugs, clean up Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * integrate tests into main testing scripts Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * incorporate rng state tracking in checkpointing Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up tests Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * fix return type mismatches Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * remove checkpoint test from test_recipe, add sperate test in test_numerics Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor typo fix Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Jaime <102792198+jaimec00@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clear up assertions in tests/pytorch/layernorm_mlp/test_selective_activation_checkpoint.py Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add license and copyright info Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * fix lint issues in layernorm_mlp Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * fix cpu_offload_v1 error Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * possibly fix recomputation in cuda graph bug Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * skip cuda graphs test for SLNMLP with SM>=10.0 and using delayed scaling Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo for setting IS_FIRST_FP8_MODULE Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> --------- Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> Signed-off-by:
Jaime <102792198+jaimec00@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Paweł Gadziński authored
* fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com>
-
Kirthi Shankar Sivamani authored
* Cache device tensors properly Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix annotation and add test Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * skip nvfp4 test if not supported Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 17 Nov, 2025 3 commits
-
-
Charlene Yang authored
* [Common] Deleted unused header (#2324) Deleted unused header Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * [JAX] L1_jax_distributed_test suit with individual executions (#2321) * L1 rework Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * comment out test_multi_process_grouped_gemm for now Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * rm e5m2 from test norm + MXFP8 Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * for branch Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * clean up and tests Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * change tests Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * [PyTorch debug] Fixes to debug tests failures (#2268) * code drop Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix: Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * [PyTorch Debug] Add max_blockwise_dynamic_range stats (#2137) * code drop Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * [JAX] Fix bug with pre scale bias (#2300) * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * [JAX] Try to use pre-downloaded dataset artifacts first (#2345) * Try to use pre-downloaded dataset artifacts first Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Set HF_HUB_OFFLINE to disable any network calls to HF when the pre-downloaded dataset is available Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> --------- Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * Fix out of bounds access in the FP4 dequantize kernel (#2346) Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * Make FP8 weights compatible with older MCore version (#2342) * Make cast_master_weights_to_fp8 compatible with older MCore version Signed-off-by:
kunlunl <kunlunl@nvidia.com> * Rename keep_columnwise to manual_post_all_gather_processing & Optimize unit test Signed-off-by:
kunlunl <kunlunl@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove redundant _test_mini_optimizer() Signed-off-by:
kunlunl <kunlunl@nvidia.com> --------- Signed-off-by:
kunlunl <kunlunl@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * [JAX] Add test to check jaxpr that amax is reused for nvfp4 recipe (#2348) * Add test to check jaxpr that amax is reused for nvfp4 recipe Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Move test to test_helper.py and rename file Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * Fix sharding of segment position to match id in ring attention. (#2349) Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * Disable cuDNN attention for known IMA and NaNs (#2344) * Fix cuDNN backend selection for more case. Add CG as a option as well Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix logic Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix cuDNN checks Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add more checks Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix cuddn version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix error message Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add check for window size Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * [JAX] Default to fused attention in JAX DPA (#2363) * Default to fused attention in JAX DPA Signed-off-by:
Kshitij Lakhani <klakhani@nvidia.com> * Consolidate documentation for DPA in JAX Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Kshitij Lakhani <33047503+KshitijLakhani@users.noreply.github.com> * Correctly update the documentation for defaults in JAX DPA Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Kshitij Lakhani <33047503+KshitijLakhani@users.noreply.github.com> --------- Signed-off-by:
Kshitij Lakhani <klakhani@nvidia.com> Signed-off-by:
Kshitij Lakhani <33047503+KshitijLakhani@users.noreply.github.com> Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * Update cudnn frontend to v1.16.0 (#2362) Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * [common] Remove kvpacked and qkvpacked attention functions for every kernel type. (#2287) * code drop Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * depracted compile time warning + \warning -> \deprecated Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * Move Triton to common (#2359) * move triton to common and change paths Signed-off-by:
tdophung <tdophung@nvidia.com> * Formatting Signed-off-by:
tdophung <tdophung@nvidia.com> --------- Signed-off-by:
tdophung <tdophung@nvidia.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * [JAX] Fused layers argument default values changed (#2347) * Changing default activations in MLP, TransformerLayer, dropout rate after FC1 to 0, and return_layernorm_output to False Signed-off-by:
tdophung <tdophung@nvidia.com> * Fixing the failing tests by hard coding arguments to the previous values instead of relying on newer default values Signed-off-by:
tdophung <tdophung@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
tdophung <tdophung@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * remove comment from gpt Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor changes for num_splits logic Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * replace None with 1 as default Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix last commit Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix docstring Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix dtype in pack/unpack when FP8 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add fused_attn_supported constraint for some tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update FA3 installation commands Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update FA3 installation commands in DPA Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * separate fused fp8 and f16 flags in tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * initialize fused_attn_supported_f16 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix FA installation in L3 tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
kunlunl <kunlunl@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Kshitij Lakhani <klakhani@nvidia.com> Signed-off-by:
Kshitij Lakhani <33047503+KshitijLakhani@users.noreply.github.com> Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
tdophung <tdophung@nvidia.com> Co-authored-by:
Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com> Co-authored-by:
Phuong Nguyen <phuonguyen@nvidia.com> Co-authored-by:
root <root@gpu-h100-0496.cm.cluster> Co-authored-by:
Peter Dykas <wdykas@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> Co-authored-by:
jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com> Co-authored-by:
Przemyslaw Tredak <ptredak@nvidia.com> Co-authored-by:
Kunlun Li <94586211+kunlunl@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Michael Goldfarb <mgoldfarb@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kshitij Lakhani <33047503+KshitijLakhani@users.noreply.github.com> Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by:
Teddy Do <tdophung@nvidia.com> Co-authored-by:
wdykas <73254672+wdykas@users.noreply.github.com>
-
Evgeny Tsykunov authored
* Enable reference current scaling recipe Signed-off-by:
Evgeny <etsykunov@nvidia.com> * minor Signed-off-by:
Evgeny <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * linter Signed-off-by:
Evgeny <etsykunov@nvidia.com> * Test ref vs native Signed-off-by:
Evgeny <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Evgeny <etsykunov@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Kirthi Shankar Sivamani authored
Initial changes to remove pytorch overheads Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 14 Nov, 2025 2 commits
-
-
Paweł Gadziński authored
* init Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * offloading Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * all types Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * typo Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * init Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * api change Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * code drop Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * refactor Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * tests Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * code drop Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * example Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * cpu offload + debug warning Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change empty_like implementation to use make_like Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * main_grad fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * manual synchornization Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * old path Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * remove example Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * api changes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * reverted grouped linear Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * make odl code path work for modules Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * attention old code path Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * legacy tests Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * legacy tests Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * updated code path Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update transformer_engine/pytorch/tensor/quantized_tensor.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * nvfp4 support Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update tests/pytorch/test_cpu_offloading.py Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * small fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * docs change Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
root <root@ptyche0312.ptyche.clusters.nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
-
Robin Zhang authored
* reset cudagraph Signed-off-by:
Robin Zhang <robinz@nvidia.com> * use closure instead of mutable default values Signed-off-by:
Robin Zhang <robinz@nvidia.com> * add test Signed-off-by:
Robin Zhang <robinz@nvidia.com> * fix test Signed-off-by:
Robin Zhang <robinz@nvidia.com> --------- Signed-off-by:
Robin Zhang <robinz@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 12 Nov, 2025 1 commit
-
-
Sudhakar Singh authored
* enable applying rope offsets in backwared Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * add tests for rope offsets for thd/bshd/sbhd formats Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fixes Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 11 Nov, 2025 1 commit
-
-
vthumbe1503 authored
* fix for float8 tensor fsdp2 training Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * zeros_like should return fp32 for fsdp2 to work Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * minor cleanup Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * fix unsharded weights not releasing memory Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * implement using fsdp preallgather and postallgather functions Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * FSDP2 works on Hopper/L40 Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor comment Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * some fixes for fp8 + handwavy changes for mxfp8 Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * only transpose saved for backward pass allgather in case of L40/Hoppergst Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * missed minor change to hopper use-case Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * communicate only required data in mxfp8, fix for updating weight usages when required instead of doing upfront in fwd pass Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * changes for meta Dtensors for weights and better all gather data handling in fsdp hook functions Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * better solution to figure out forward pass in FSDP2 Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * adress review comments Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * Update transformer_engine/pytorch/tensor/mxfp8_tensor.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
vthumbe1503 <vthumbe@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * everything functioning except hack for transformerlayer Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * fix merge conflict Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert change of commit id for cudnnt-frontend Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unnecessary change Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor issues with linting, add some comments Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor stuff Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * revert space removal Add default usage handling for rowwise and columnwise data. Signed-off-by:
vthumbe1503 <vthumbe@nvidia.com> * fix the fsdp state collection issue, and minor review comments addressing Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert change for dgrad redundant computation Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * bug: get fsdp param group's training state instead of root training state; address review comments Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * address review comments Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * address review comments Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * address coderabbit review comments Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * address review comments Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * address review comments Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * adress review comments; fix fp8 allgather test to do after fsdp lazy init Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * address review comments Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * remove detach Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * do what makes sense Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * Update transformer_engine/pytorch/tensor/float8_tensor.py Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
vthumbe1503 <vthumbe@nvidia.com> * Update transformer_engine/pytorch/tensor/mxfp8_tensor.py Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
vthumbe1503 <vthumbe@nvidia.com> * Update transformer_engine/pytorch/tensor/mxfp8_tensor.py Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
vthumbe1503 <vthumbe@nvidia.com> * Update transformer_engine/pytorch/tensor/mxfp8_tensor.py Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
vthumbe1503 <vthumbe@nvidia.com> * Update transformer_engine/pytorch/tensor/mxfp8_tensor.py Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
vthumbe1503 <vthumbe@nvidia.com> * Update transformer_engine/pytorch/tensor/mxfp8_tensor.py Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
vthumbe1503 <vthumbe@nvidia.com> * address review comments Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * address review comments Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * address review comments Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * adress review comments Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * address review comments Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * have better dtype for fsdp_post_all_gather arguments Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * minor comment Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * improve comment Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * fix the error in CI Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * minor comment add Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * accidentally removed view function Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * fix minor bug for h100 Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * minor addition Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * implement padding removal/addition for allgather Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * address review comments Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * address review comments Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * Update transformer_engine/pytorch/tensor/mxfp8_tensor.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
vthumbe1503 <vthumbe@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint error Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * adress review comments Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * improve the reset parameter logic for dtensors Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * other cosmetic changes Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cosmetic changes Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * cosmetic changes Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * Update transformer_engine/pytorch/module/layernorm_linear.py Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> --------- Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> Signed-off-by:
vthumbe1503 <vthumbe@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
-
- 06 Nov, 2025 1 commit
-
-
Kunlun Li authored
* Make cast_master_weights_to_fp8 compatible with older MCore version Signed-off-by:
kunlunl <kunlunl@nvidia.com> * Rename keep_columnwise to manual_post_all_gather_processing & Optimize unit test Signed-off-by:
kunlunl <kunlunl@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove redundant _test_mini_optimizer() Signed-off-by:
kunlunl <kunlunl@nvidia.com> --------- Signed-off-by:
kunlunl <kunlunl@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 05 Nov, 2025 1 commit
-
-
Paweł Gadziński authored
* code drop Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 04 Nov, 2025 1 commit
-
-
Paweł Gadziński authored
* code drop Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix: Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-