- 03 Mar, 2026 1 commit
-
-
wenjh authored
-
- 24 Feb, 2026 1 commit
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 04 Feb, 2026 2 commits
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 30 Jan, 2026 1 commit
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 23 Jan, 2026 4 commits
-
-
maxiao3 authored
1,not find nvte_dgelu 2,fsdp_group is not none 3,CPUOffloadEnabled change to cpp_offload_v1 Signed-off-by:maxiao3 <maxiao3@sugon.com> See merge request dcutoolkit/deeplearing/TransformerEngine!74
-
maxiao3 authored
1,Resolve out-of-bounds issues for types struct 2,Fix TestFusedCastFloat8Vectorwise test case failure Signed-off-by:maxiao3 <maxiao3@sugon.com> See merge request dcutoolkit/deeplearing/TransformerEngine!73
-
zc20020701 authored
Signed-off-by:
zhaochao <zhaochao1@sugon.com> See merge request dcutoolkit/deeplearing/TransformerEngine!72 Co-authored-by:
zhaochao <zhaochao1@sugon.com>
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 21 Jan, 2026 1 commit
-
-
maxiao3 authored
Signed-off-by:maxiao3 <maxiao3@sugon.com> See merge request dcutoolkit/deeplearing/TransformerEngine!71
-
- 20 Jan, 2026 1 commit
-
-
Przemek Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
- 17 Jan, 2026 1 commit
-
-
Tim Moon authored
* Add general C API for setting tensor params Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Implement general accessors for NVTETensor Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Refactor tex swizzling to skip if scales are already swizzled Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add checks for non-swizzled scales in MXFP8 and NVFP4 kernels Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Support pre-swizzled scales in MXFP8Tensor Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add tex function to swizzle MXFP8 scales Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix bug in inplace swizzle function Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Tweak comments to use "compact/swizzled format" Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * MXFP8 quantize kernel with pre-swizzled scales Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Expose pre-swizzled scales in modules Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix bug in multi-swizzle Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Support MXFP8 gated activations with swizzled scales Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add PyTorch infrastructure for pre-swizzled NVFP4 tensors Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Deprecate DSv3-specific quantization logic in C API Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove support for DSv3 compact data from quantizer Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Remove DSv3 compact data format from core lib Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix bug in FP8 all-gather Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix linter warnings Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update JAX to use new swizzled scale API Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Review suggestion from @greptile-apps Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Review suggestions from @greptile-apps Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update C++ swizzle test with swizzled scales API Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Return default tensor params when querying params for invalid NVTETensor Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug DSv3 FP8 test failures Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug Userbuffers test failures Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Make sure gated activations populate FP8 transpose if needed Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Review suggestions from @greptile-apps Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Disable pre-swizzling with debug quantizer Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Review suggestion from @greptile-apps Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix merge conflicts and review suggestions Update copyright years. Tweak comments. Fix various complaints from @greptile-apps. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use explicitly sized types in config accessors Miscellaneous review suggestions from @ptrendx. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Make util header for function that compute swizzled scale index Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply suggestions from @greptile-apps Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> * Update expected error message in FP8 block-scaling test Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Review suggestion from @yaox12 Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
-
- 16 Jan, 2026 1 commit
-
-
Teddy Do authored
* initial impl, not tested Signed-off-by:
tdophung <tdophung@nvidia.com> * consolidate different unpermute primitives with with_pad and with_merging_probs booleans. Implement partitioning for all permutation primitives Signed-off-by:
tdophung <tdophung@nvidia.com> * Add distributed test for non-padding permutation Signed-off-by:
tdophung <tdophung@nvidia.com> * fix issues in distributed test for padding permutation. Make common kernel zero intiialize output permuted scales, permuted probs and output tokens Signed-off-by:
tdophung <tdophung@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert zeroing in triton common kernel as it is a race condition. Instead, add extra input (aliased wiuth output) buffer to inner primitive of permutation on jax side to pass in zero intitiated buffers done with jnp zeros Signed-off-by:
tdophung <tdophung@nvidia.com> * fix utils to handle input output aliasing in autotuned kernels Signed-off-by:
tdophung <tdophung@nvidia.com> * Clean up comments, and add more comments explaining input output alias in utils Signed-off-by:
tdophung <tdophung@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint and greptile comment Signed-off-by:
tdophung <tdophung@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix issues that lint fixing introduced Signed-off-by:
tdophung <tdophung@nvidia.com> --------- Signed-off-by:
tdophung <tdophung@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 15 Jan, 2026 5 commits
-
-
Jacket authored
Signed-off-by:Kaining Zhong <kainingz@nvidia.com>
-
jberchtold-nvidia authored
* install cmake in jax build github action Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Update build.yml Signed-off-by:
jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com> --------- Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> Signed-off-by:
jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com>
-
jberchtold-nvidia authored
disable fused attention in encoder tests for determinism Signed-off-by:Jeremy Berchtold <jberchtold@nvidia.com>
-
Santosh Bhavani authored
* Move older news to Previous Signed-off-by:
Santosh Bhavani <santosh.bhavani@live.com> * Add Nov 2025 news entries Signed-off-by:
Santosh Bhavani <santosh.bhavani@live.com> --------- Signed-off-by:
Santosh Bhavani <santosh.bhavani@live.com>
-
Hongbin Liu authored
* bug fix Signed-off-by:
hongbinl <hongbinl@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update transformer_engine/common/swizzle/swizzle_block_scaling.cu Mask to 8 bits to prevent potential bit overlap Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Hongbin Liu <lhb8125@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update transformer_engine/common/swizzle/swizzle_block_scaling.cu Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Hongbin Liu <lhb8125@users.noreply.github.com> * fix bug in 2d too Signed-off-by:
Hongbin Liu <hongbinl@nvidia.com> --------- Signed-off-by:
hongbinl <hongbinl@nvidia.com> Signed-off-by:
Hongbin Liu <lhb8125@users.noreply.github.com> Signed-off-by:
Hongbin Liu <hongbinl@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
-
- 14 Jan, 2026 1 commit
-
-
Teddy Do authored
* Remove pyhtorch-triton as a requirement and remove auto-fetching pytorch-triton as it is a placeeholder in pyPI Signed-off-by:
tdophung <tdophung@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix docstring Signed-off-by:
tdophung <tdophung@nvidia.com> --------- Signed-off-by:
tdophung <tdophung@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 13 Jan, 2026 2 commits
-
-
Victor Oliveira authored
ONNX: Fix FP8 quantization for the second MLP in LayernormMLP Signed-off-by:Victor Oliveira <victor.oliveira@getcruise.com>
-
Paweł Gadziński authored
* code drop Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 12 Jan, 2026 1 commit
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 10 Jan, 2026 1 commit
-
-
Tim Moon authored
Debug Doxygen and LaTeX warnings Signed-off-by:Tim Moon <tmoon@nvidia.com>
-
- 09 Jan, 2026 4 commits
-
-
Tim Moon authored
Update list of CI users Signed-off-by:Tim Moon <tmoon@nvidia.com>
-
dongchl authored
rollback activation offloading implementation See merge request dcutoolkit/deeplearing/TransformerEngine!70 Co-authored-by:dongcl <791582849@qq.com>
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
Kshitij Lakhani authored
* Pick a leaner set of combinations for TE JAX CP attn tests such that only those cp,dp,tp combinations are picked where cp*dp*tp is equal to num gpus Signed-off-by:
Kshitij Lakhani <klakhani@nvidia.com> * Consolidate the test cases run for different B,S,H,D and QKV layout Signed-off-by:
Kshitij Lakhani <klakhani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Code and comments clean up Signed-off-by:
Kshitij Lakhani <klakhani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Make FP16 + GQA test cross attn instead of self attn to generalize the test Signed-off-by:
Kshitij Lakhani <klakhani@nvidia.com> --------- Signed-off-by:
Kshitij Lakhani <klakhani@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 08 Jan, 2026 2 commits
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com> See merge request dcutoolkit/deeplearing/TransformerEngine!67
-
Teddy Do authored
* Add triton version detection logic, and NVTE_USE_PYTORCH_TRITON knob for jax Signed-off-by:
tdophung <tdophung@nvidia.com> * change build requirements and installation to reflect new option Signed-off-by:
tdophung <tdophung@nvidia.com> * reduce boilerplate comments Signed-off-by:
tdophung <tdophung@nvidia.com> * format code Signed-off-by:
tdophung <tdophung@nvidia.com> * fix typo Signed-off-by:
tdophung <tdophung@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * make env var more precise Signed-off-by:
tdophung <tdophung@nvidia.com> * make env variables checking consitent Signed-off-by:
tdophung <tdophung@nvidia.com> --------- Signed-off-by:
tdophung <tdophung@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 07 Jan, 2026 5 commits
-
-
Teddy Do authored
* force initialization to int32 Signed-off-by:
tdophung <tdophung@nvidia.com> * address greptile comment Signed-off-by:
tdophung <tdophung@nvidia.com> * del useless comments, add more restriction to int32 Signed-off-by:
tdophung <tdophung@nvidia.com> --------- Signed-off-by:
tdophung <tdophung@nvidia.com>
-
Zhongbo Zhu authored
* fix Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * resolve review comments Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * Comment tweaks Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> --------- Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
Teddy Do authored
* force initialization to int32 Signed-off-by:
tdophung <tdophung@nvidia.com> * address greptile comment Signed-off-by:
tdophung <tdophung@nvidia.com> --------- Signed-off-by:
tdophung <tdophung@nvidia.com>
-
- 06 Jan, 2026 3 commits
-
-
jberchtold-nvidia authored
[JAX] Fix test_layer to support fused attention and adjust test encoder tolerance to account for minor diff (#2563) Fix failing unit tests Signed-off-by:Jeremy Berchtold <jberchtold@nvidia.com>
-
jberchtold-nvidia authored
* Fix long compile time in padding.cu Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Paweł Gadziński authored
* docs: Add comprehensive Getting Started guide with benchmarks - Add new Getting Started documentation with PyTorch and JAX tutorials - Include benchmark scripts demonstrating TE performance benefits - Add CSS styling for code output and tabs - Replace old quickstart notebooks with improved documentation - Add transformer layer diagram (SVG) - Update docs configuration and workflow Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * 2026 in copyright Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 05 Jan, 2026 3 commits
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-