- 09 Jan, 2026 1 commit
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 08 Jan, 2026 1 commit
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com> See merge request dcutoolkit/deeplearing/TransformerEngine!67
-
- 07 Jan, 2026 1 commit
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 31 Dec, 2025 2 commits
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 15 Dec, 2025 1 commit
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 10 Dec, 2025 2 commits
-
-
tabuchixiangcai3 authored
Signed-off-by:Tangao <2205747538@qq.com>
-
tabuchixiangcai3 authored
Signed-off-by:Tangao <2205747538@qq.com>
-
- 04 Dec, 2025 1 commit
-
-
dongcl authored
-
- 03 Dec, 2025 3 commits
-
-
wenjh authored
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
wenjh authored
-
- 27 Nov, 2025 2 commits
-
-
oliver könig authored
* ci: Build and attach bdist wheels to release page Signed-off-by:
oliver könig <okoenig@nvidia.com> * free up space Signed-off-by:
oliver könig <okoenig@nvidia.com> * cleanup Signed-off-by:
oliver könig <okoenig@nvidia.com> * test Signed-off-by:
oliver könig <okoenig@nvidia.com> * test Signed-off-by:
oliver könig <okoenig@nvidia.com> * test Signed-off-by:
oliver könig <okoenig@nvidia.com> * fix Signed-off-by:
oliver könig <okoenig@nvidia.com> * test Signed-off-by:
oliver könig <okoenig@nvidia.com> * fix Signed-off-by:
oliver könig <okoenig@nvidia.com> * fix Signed-off-by:
oliver könig <okoenig@nvidia.com> * fix Signed-off-by:
oliver könig <okoenig@nvidia.com> * fix Signed-off-by:
oliver könig <okoenig@nvidia.com> * c28619d8999a147d5e09c1199f84ff6af6ad5794 Signed-off-by:
oliver könig <okoenig@nvidia.com> * c28619d8999a147d5e09c1199f84ff6af6ad5794 Signed-off-by:
oliver könig <okoenig@nvidia.com> * Reduce months to check from 7 to 5 Signed-off-by:
oliver könig <okoenig@nvidia.com> * Update .github/scripts/check_for_ngc_images.sh Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update .github/actions/build-pytorch-wheel/build.sh Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
oliver könig <okoenig@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Paweł Gadziński authored
* init Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * lines lenght Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * subtitle --- fix in many files: Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * cross entropy _input -> input rename Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * cross entropy _input -> input rename Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * a lot of small fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * torch_version() change Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add missing module and fix warnings Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * removed training whitespace: Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * Update docs/api/pytorch.rst Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * Fix import Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix more imports Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix NumPy docstring parameter spacing and indentation - Standardize parameter documentation to use 'param : type' format (space before and after colon) per NumPy style guide - Fix inconsistent indentation in cpu_offload.py docstring - Modified 51 Python files across transformer_engine/pytorch Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 26 Nov, 2025 2 commits
-
-
Tim Moon authored
Do not initialize recipe state in base op class Op attrs may not be set. Move recipe state initialization to linear op constructor. Signed-off-by:Tim Moon <tmoon@nvidia.com>
-
Phuong Nguyen authored
* allow dp + fsdp and fixed sr_rng_state partitioning Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleanup for lint test Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 23 Nov, 2025 5 commits
-
-
Sudhakar Singh authored
* Add support for THD+CP+SWA through A2A comms Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * unblock the `padding`+`THD`+`CP(A2A)` with SWA case in A2A forward Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * add proper support for thd Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * bug fix Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * enable thd+cp tests as essential Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * add cp+thd+a2a test to essential Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * fix comments from greptile Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add proper skip for flash attention Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * fix the test to create separate tensors for flash and fused attention backend scenarios Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * remove redundant compare Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * simplify code Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * add note for cu_seqlens_kv and cu_seqlens_kv_padded Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * Update tests/pytorch/attention/test_attention_with_cp.py Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * Update transformer_engine/pytorch/attention/dot_product_attention/context_parallel.py Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * fixo Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * fix docs Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * fix the argument name Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
-
Kshitij Lakhani authored
* Remove unnecessary SWA calculation from _segment_ids_pos_to_seqlens_offsets Signed-off-by:
Kshitij Lakhani <klakhani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Kshitij Lakhani <klakhani@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Kshitij Lakhani authored
* Make BSHD default for Unfused DPA, DPA and MHA in TE JAX Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * Remove explicit transpose_batch set for BSHD for DPA in JAX quickstart Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * Add warnings in DPA and MHA to warn users of change defaults to BSHD instead of SBHD Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * Minimize the scope of when to trigger warnings for changed defaults for transpose_batch_sequence Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Paweł Gadziński authored
* fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Paweł Gadziński authored
* fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 20 Nov, 2025 5 commits
-
-
Kirthi Shankar Sivamani authored
* Minor CPU overhead changes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Cache per device Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Kirthi Shankar Sivamani authored
Initial changes to remove pytorch overheads Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Paweł Gadziński authored
* fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com>
-
Paweł Gadziński authored
* fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed packed versions Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * jax Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix: Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * sofmtax_fusion -> softmax_fusion_type Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Charlene Yang authored
* [Common] Deleted unused header (#2324) Deleted unused header Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * [JAX] L1_jax_distributed_test suit with individual executions (#2321) * L1 rework Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * comment out test_multi_process_grouped_gemm for now Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * rm e5m2 from test norm + MXFP8 Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * for branch Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * clean up and tests Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * change tests Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * [PyTorch debug] Fixes to debug tests failures (#2268) * code drop Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix: Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * [PyTorch Debug] Add max_blockwise_dynamic_range stats (#2137) * code drop Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * [JAX] Fix bug with pre scale bias (#2300) * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * [JAX] Try to use pre-downloaded dataset artifacts first (#2345) * Try to use pre-downloaded dataset artifacts first Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Set HF_HUB_OFFLINE to disable any network calls to HF when the pre-downloaded dataset is available Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> --------- Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * Fix out of bounds access in the FP4 dequantize kernel (#2346) Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * Make FP8 weights compatible with older MCore version (#2342) * Make cast_master_weights_to_fp8 compatible with older MCore version Signed-off-by:
kunlunl <kunlunl@nvidia.com> * Rename keep_columnwise to manual_post_all_gather_processing & Optimize unit test Signed-off-by:
kunlunl <kunlunl@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove redundant _test_mini_optimizer() Signed-off-by:
kunlunl <kunlunl@nvidia.com> --------- Signed-off-by:
kunlunl <kunlunl@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * [JAX] Add test to check jaxpr that amax is reused for nvfp4 recipe (#2348) * Add test to check jaxpr that amax is reused for nvfp4 recipe Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Move test to test_helper.py and rename file Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * Fix sharding of segment position to match id in ring attention. (#2349) Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * Disable cuDNN attention for known IMA and NaNs (#2344) * Fix cuDNN backend selection for more case. Add CG as a option as well Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix logic Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix cuDNN checks Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add more checks Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix cuddn version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix error message Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add check for window size Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * [JAX] Default to fused attention in JAX DPA (#2363) * Default to fused attention in JAX DPA Signed-off-by:
Kshitij Lakhani <klakhani@nvidia.com> * Consolidate documentation for DPA in JAX Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Kshitij Lakhani <33047503+KshitijLakhani@users.noreply.github.com> * Correctly update the documentation for defaults in JAX DPA Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Kshitij Lakhani <33047503+KshitijLakhani@users.noreply.github.com> --------- Signed-off-by:
Kshitij Lakhani <klakhani@nvidia.com> Signed-off-by:
Kshitij Lakhani <33047503+KshitijLakhani@users.noreply.github.com> Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * Update cudnn frontend to v1.16.0 (#2362) Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * [common] Remove kvpacked and qkvpacked attention functions for every kernel type. (#2287) * code drop Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * depracted compile time warning + \warning -> \deprecated Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * Move Triton to common (#2359) * move triton to common and change paths Signed-off-by:
tdophung <tdophung@nvidia.com> * Formatting Signed-off-by:
tdophung <tdophung@nvidia.com> --------- Signed-off-by:
tdophung <tdophung@nvidia.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * [JAX] Fused layers argument default values changed (#2347) * Changing default activations in MLP, TransformerLayer, dropout rate after FC1 to 0, and return_layernorm_output to False Signed-off-by:
tdophung <tdophung@nvidia.com> * Fixing the failing tests by hard coding arguments to the previous values instead of relying on newer default values Signed-off-by:
tdophung <tdophung@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
tdophung <tdophung@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * remove comment from gpt Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor changes for num_splits logic Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * replace None with 1 as default Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix last commit Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix docstring Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix dtype in pack/unpack when FP8 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add fused_attn_supported constraint for some tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update FA3 installation commands Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update FA3 installation commands in DPA Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * separate fused fp8 and f16 flags in tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * initialize fused_attn_supported_f16 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix FA installation in L3 tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
kunlunl <kunlunl@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Kshitij Lakhani <klakhani@nvidia.com> Signed-off-by:
Kshitij Lakhani <33047503+KshitijLakhani@users.noreply.github.com> Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
tdophung <tdophung@nvidia.com> Co-authored-by:
Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com> Co-authored-by:
Phuong Nguyen <phuonguyen@nvidia.com> Co-authored-by:
root <root@gpu-h100-0496.cm.cluster> Co-authored-by:
Peter Dykas <wdykas@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> Co-authored-by:
jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com> Co-authored-by:
Przemyslaw Tredak <ptredak@nvidia.com> Co-authored-by:
Kunlun Li <94586211+kunlunl@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Michael Goldfarb <mgoldfarb@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kshitij Lakhani <33047503+KshitijLakhani@users.noreply.github.com> Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by:
Teddy Do <tdophung@nvidia.com> Co-authored-by:
wdykas <73254672+wdykas@users.noreply.github.com>
-
- 14 Nov, 2025 5 commits
-
-
jberchtold-nvidia authored
* Use TE quant if TE fused act is disabled Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Keep existing precision Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> --------- Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com>
-
Paweł Gadziński authored
* init Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * offloading Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * all types Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * typo Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * init Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * api change Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * code drop Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * refactor Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * tests Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * code drop Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * example Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * cpu offload + debug warning Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change empty_like implementation to use make_like Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * main_grad fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * manual synchornization Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * old path Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * remove example Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * api changes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * reverted grouped linear Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * make odl code path work for modules Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * attention old code path Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * legacy tests Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * legacy tests Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * updated code path Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update transformer_engine/pytorch/tensor/quantized_tensor.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * nvfp4 support Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update tests/pytorch/test_cpu_offloading.py Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * small fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * docs change Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
root <root@ptyche0312.ptyche.clusters.nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
-
jberchtold-nvidia authored
* Refactor to avoid storing a global quantization config so direct recipe passing works as intended Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * fix use_split_accumulator for current scaling recipe Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * fix tests that pass direct recipe and were missing quantize meta set Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Revert "fix use_split_accumulator for current scaling recipe" This reverts commit a74ab7df812ec0a069b1bdd208debb93ec25a900. Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * fix ci failures Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Fix amax_history post_init Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com> * Update transformer_engine/jax/quantize/quantizer.py Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix ci failures Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * fix ci issue Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * address comments Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * make recipe assertion classes in test_recipe_characteristics not inherit from unittest.TestCase Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> --------- Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> Signed-off-by:
jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com> Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Paweł Gadziński authored
* fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * add notes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * small fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Robin Zhang authored
* reset cudagraph Signed-off-by:
Robin Zhang <robinz@nvidia.com> * use closure instead of mutable default values Signed-off-by:
Robin Zhang <robinz@nvidia.com> * add test Signed-off-by:
Robin Zhang <robinz@nvidia.com> * fix test Signed-off-by:
Robin Zhang <robinz@nvidia.com> --------- Signed-off-by:
Robin Zhang <robinz@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 13 Nov, 2025 5 commits
-
-
jberchtold-nvidia authored
* Support for checkpointing quantizations Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Add jaxpr test for quant checkpoint name Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Revert "Support for checkpointing quantizations" This reverts commit f7b784940369d0da2a77c57fa6ea744e883c5832. Signed-off-by:
JAX Toolbox <jax@nvidia.com> * Checkpoint quantizations Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * lint Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * revert other files Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * move checkpointing to VJPs Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * fix ci failure Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> --------- Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> Signed-off-by:
JAX Toolbox <jax@nvidia.com> Co-authored-by:
JAX Toolbox <jax@nvidia.com>
-
Phuong Nguyen authored
* shardy + quantize_layout rework Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * add assertion for NVFP4 in fused act and fused norm primitive Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * add assertions Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com>
-
Phuong Nguyen authored
* swizzle via nvte Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com>
-
Evgeny Tsykunov authored
Fix amax computation using output_t data in normalization Signed-off-by:Evgeny <etsykunov@nvidia.com>
-
Lifu Zhang authored
Signed-off-by:
Lifu Zhang <lifuz@login-lyris01.lyris.clusters.nvidia.com> Co-authored-by:
Lifu Zhang <lifuz@login-lyris01.lyris.clusters.nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 12 Nov, 2025 4 commits
-
-
Sudhakar Singh authored
* enable applying rope offsets in backwared Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * add tests for rope offsets for thd/bshd/sbhd formats Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fixes Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
vthumbe1503 authored
* fix ci issue Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert back testing changes Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
wenjh authored
-
wenjh authored
-