Commits · a976740700869369d69966fe39fbb3e507232ace · OpenDAS / TransformerEngine

06 Jan, 2026 1 commit

[docs] Getting started refactor (#2534) · a9767407

Paweł Gadziński authored Jan 06, 2026



* docs: Add comprehensive Getting Started guide with benchmarks

- Add new Getting Started documentation with PyTorch and JAX tutorials
- Include benchmark scripts demonstrating TE performance benefits
- Add CSS styling for code output and tabs
- Replace old quickstart notebooks with improved documentation
- Add transformer layer diagram (SVG)
- Update docs configuration and workflow
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* fix
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* fix
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* 2026 in copyright
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

---------
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

a9767407

02 Jan, 2026 1 commit
- Update copyright to include year 2026 (#2553) · 830ef60f
  Kirthi Shankar Sivamani authored Jan 02, 2026
```
Update copyright to include 2026
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
```
  830ef60f
17 Dec, 2025 1 commit

[JAX] Add tutorial for integrating TE/JAX quantization into an existing framework (#2423) · 442513c5

jberchtold-nvidia authored Dec 17, 2025



* Tutorial for integration te/jax quantization into an existing framework
Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com>

* add todos
Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com>

* support nvfp4 sr rng key, move wrapper module into TE itself, fix bfloat16 cast
Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com>

* update docstrings
Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com>

* Fix QKV proj and out proj in Flax example transformer layer
Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com>

* Use fused attention in quickstart_jax example
Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* remat policy
Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com>

* add tutorial to docs
Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com>

* update title
Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com>

* remove unused dtype from TE DPA module
Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com>

* Fix notebook title
Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com>

* Fix lint
Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com>

* Add explanation of flax module wrapper
Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



---------
Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

442513c5

15 Nov, 2025 1 commit

[JAX] Quickstart documentation (#2310) · 42d22740

Teddy Do authored Nov 14, 2025



* jax quickstart guide first commit
Signed-off-by: tdophung <tdophung@nvidia.com>

* edit the syntax errors and remove unnecessary comments in utils. Add some footnotes in the quick start notebook
Signed-off-by: tdophung <tdophung@nvidia.com>

* Fix greptiles comments on spelling, deepcopy, vjp function signature comaptibility with speedometer
Signed-off-by: tdophung <tdophung@nvidia.com>

* Add Copyright to utils and fix some more greptiles complaints
Signed-off-by: tdophung <tdophung@nvidia.com>

* Add comments to alternative of layers
Signed-off-by: tdophung <tdophung@nvidia.com>

* Remove weight sharing between different iterations of the transformerLayer
Signed-off-by: tdophung <tdophung@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: tdophung <tdophung@nvidia.com>

* Add enum for attention implementations. Fix inconsistency between fuse and unfused TE impls to achieve same performance (removing extra dropout layer in fused layers. Also some minor wording changes
Signed-off-by: tdophung <tdophung@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: tdophung <tdophung@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* Fix bug in TransformerLayer expected input shape being [sequence, batch, ...] instead of [batch, sequence,...]
Signed-off-by: tdophung <tdophung@nvidia.com>

* Changing structure of notebook to  bring fp8 ahead of fuse, to allow for fuse to take effect because quantization exist as suggested. Also make TransformerLayer perf get closer to Fused by setting hidden_dropout=0
Signed-off-by: tdophung <tdophung@nvidia.com>

* add option to choose between different attention implementation in call of BasicTETransformerLayer and demonstrated difference in runtime between using flax and using te's attetion implementation
Signed-off-by: tdophung <tdophung@nvidia.com>

* Fix mistake in lacking attention_implementation in FuseTETransformerLayer
Signed-off-by: tdophung <tdophung@nvidia.com>

* Removing AttentionWrapper and custom built DPA, using flax and TE's impl only, removing last mention of Pytorch
Signed-off-by: tdophung <tdophung@nvidia.com>

* More changing to markdowns to remove pytorch
Signed-off-by: tdophung <tdophung@nvidia.com>

* cosmetics fixes
Signed-off-by: tdophung <tdophung@nvidia.com>

* changing names of all implementations
Signed-off-by: tdophung <tdophung@nvidia.com>

* change fp8_autocast to autocast, make causal mask, and some wording changes
Signed-off-by: tdophung <tdophung@nvidia.com>

---------
Signed-off-by: tdophung <tdophung@nvidia.com>
Co-authored-by: tdophung <tdophung@dc2-container-xterm-034.prd.it.nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com>

42d22740