Commits · 85a919973d062063b86000856b5c0f258a27380d · OpenDAS / TransformerEngine

14 Oct, 2025 1 commit

Generalize quantization APIs for FP8/FP4/.. recipes (#2256) · 85a91997

Kirthi Shankar Sivamani authored Oct 14, 2025



* Initial API change
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Change all imports and api
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* format
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* fixes
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* fix typo
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* fix recipe tets
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* fix more tests
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* fix docs, tests, and make Jax change as well
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Change internal uses of fp8_autocast
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Address nits
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* rename file
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* CG function, and small test fixes
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Change instances of make_graphed_callables internally
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Fix distributed tests
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Review
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Review
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Fix test and add more docs
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Cleanup test imports and minimize internal file imports
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Make is_bf16_available public
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* fixes
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* fix tests
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Better docs and better api
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* format
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Apply suggestions from code review
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

* fix nvfp4 test
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

---------
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

85a91997

05 Oct, 2025 1 commit

Added the NVFP4 section to the low precision training tutorial (#2237) · 7e45be73

Przemyslaw Tredak authored Oct 05, 2025



* Added the NVFP4 part to the low precision tutorial
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* Added the runtime results
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* Update docs/examples/fp8_primer.ipynb
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Update docs/examples/fp8_primer.ipynb
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Update docs/examples/fp8_primer.ipynb
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Update docs/examples/fp8_primer.ipynb
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Update docs/examples/fp8_primer.ipynb
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Update docs/examples/fp8_primer.ipynb
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

---------
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

7e45be73

18 Sep, 2025 1 commit

[PyTorch] Support FA3 for MLA and with CP (#1907) · c334fc46

zhujian authored Sep 18, 2025



feature(FA3,MLA,CP):
1. Update FA3 to commit-id 3ba6f82 (tag 2.8.0.post2 with compile error fixed), PR-1604 support hdimQK != hdimV backward
2. Update get_attention_backend method because FA3 support MLA now
3. Add CP MLA support for FA3
4. Add unit tests for FA3 MLA CP
5. Update attention doc
Signed-off-by: zhujian <zhujian.whu.cs@gmail.com>

c334fc46

17 Sep, 2025 1 commit

TE Gemma tutorial attempt#2 (#1839) · 7042d7ae

Sudhakar Singh authored Sep 16, 2025



* add tutorial files and other local changes
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* remove extraneous code for easy debu
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* make cuda graphs work with non-paged and paged attention
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* perf imp for kv cache ops
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* add code for calibration
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* optimize kv_cache reindex and copy kernels
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* changes to make quantizers work with fp8_calibration
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* avoid reindexing from python side
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* rename variable from previous commit
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* minor fix
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* minor fix
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* use quantizer only if needed
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* functionality of the tutorial tested and perf checked
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* remove files and update headers/licenses
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* update header/license
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* update tutorial for review
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* make weights downloadable on the fly; remove extra print statements
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* fix lint and update comments
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* add comma back, typo
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* sequence_start_positions should be None for training
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* add paged attention numberes and update requirements.txt file
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* more fixes
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* make tutorial work on blackwell
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* remove gemma FT tutorial for now
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* fixing the headings placement and rewording attention -> kv caching
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* fixes from comments
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* fix the images
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* misc fixes
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* add more comments to te_gemma.py and cleanup utils.py
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* add more information about the hierarchy of the classes used in the tutorial
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* add better cuda graphs picture
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* addd updated cuda graphs pictures
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* add illustrated cuda graphs
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* fix
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* small fixes in documentation
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* add torch.no_grad() to force reduced memory usage
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* some fixes from recent comments
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* more fixes from remaining comments
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* add te_rope_emb to class desc
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* fix tutorial wording; add calibration fix to grouped_linear.py
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

---------
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

7042d7ae

28 Aug, 2025 1 commit

[PyTorch] ONNX export of FP8 Current Scaling (#2068) · 06a38cc0

Paweł Gadziński authored Aug 28, 2025



* Compute amax in normalization forward in current scaling in untuned kernels
Signed-off-by: Jan Bielak <jbielak@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* fix
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* fix
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* fix
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* code drop
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* fix
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* fix
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* apply tims suggestions
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

---------
Signed-off-by: Jan Bielak <jbielak@nvidia.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Co-authored-by: Jan Bielak <jbielak@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

06a38cc0

31 Jul, 2025 1 commit

[PyTorch] Tutorial for the ONNX export (#1586) · 8dfdb911

Paweł Gadziński authored Jul 31, 2025



* code drop
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* fixes
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* fixes
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* fixes
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* fix
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* fix
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

---------
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

8dfdb911

21 Jul, 2025 1 commit

[Common] Skip cuDNN 9.10.0/9.10.1 due to bugs (#1937) · 0d802283

Charlene Yang authored Jul 21, 2025



* exclude 9.10.0/.1 for certain configs
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix kv_channels
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* add get_backend to tests
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* add init files
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix numerics and cuda graph tests
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix jax tests
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* remove prints
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* minor changes after renaming
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* fix import structure and rename get_attention_backends
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* fix docs and benchmarks
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix get backend calls
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* Revert "fix get backend calls"

This reverts commit 653cbb51c697bc2f975416bb3aac1d85f76c36dc.
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* Revert "fix docs and benchmarks"

This reverts commit 98cd52e04ff7c53e26b412195f5744e39f7ed0e9.
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix docs, benchmarks and pre-commit ci
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix dpa/mha flash attn selection
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix rng states
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* fix ModelConfig
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix backend selection on Ampere
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix issues from last merge
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* Update tests/pytorch/utils.py
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* remove initialization of rng_states to None
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* redefine ModelConfig
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* fix typo
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* fix ModelConfig
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix seed for CP tests
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* Update tests/pytorch/test_sanity.py
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* move fixture from utils to individual tests
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix CI
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

---------
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

0d802283

09 Jun, 2025 1 commit

Use public API instead of removed private function in `te_llama.py` (#1856) · fc185200

Jan Bielak authored Jun 09, 2025

Use public API instead of removed private function
* replaced use of _load_state_dict_into_model with model.load_state_dict because the private function _load_state_dict_into_model was removed in https://github.com/huggingface/transformers/pull/36335

Signed-off-by: Jan Bielak <jbielak@nvidia.com>

fc185200

07 May, 2025 1 commit

Update README: Added GTC 2025 videos, latest news, and improved doc links (#1752) · 0c5e3a52

Santosh Bhavani authored May 07, 2025



* added a direct link to the quickstart notebook right after the code examples section
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* updated link in README for HF Accelerate docs
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* update DeepSpeed integration link
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Update Release Notes link to documentation archive
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* updated latest news and moved older news under a dropdown caret
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* moved previous news to bottom of readme
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* fixed previous news link
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* added gtc videos
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* added TE GTC 2025 talk to latest news
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

---------
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Co-authored-by: Santosh Bhavani <sbhavani@nvidia.com>

0c5e3a52

28 Apr, 2025 1 commit

Refactor attention.py part 2 (#1704) · 8ace813c

Kshitij Lakhani authored Apr 28, 2025



* Move MultiHeadAttention into its own file. Modify tests and files in t_e/pytorch to import from the new MHA module
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* Resolving lost MHA changes from PR 1614 as a result of rebase
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* Move context parallelism code into it's own file. Modify test and local imports of cp code accordingly
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* Move softmax.py frm pytorch/ to pytorch/d_p_a
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* Move Unfused and Fused attention to backends.py and some utils functions to pytorch/utils.py
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* Resolving lost mark_activation_offload changes from PR 1678 as a result of rebase
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* Code clean up
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* Refactor attention dir
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* Refactor dir structure. Make relevant symbols public in __init__ for attention and d_p_a dirs
Move FA package imports to backends.py
Code cleanup
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* Modify tests to import attention modules correctly
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* Lint fixes
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* Code clean up and fix typo
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* Allowing InferenceParams and RoPE imports from attention module and pytorch module
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* Allow InferenceParams and RoPE imports via transformer_engine.pytorch and transformer_engine.pytorch.attention modules
Remove unnecessary checks for check_set_window_size in MHA and TL
Reorder backends such that smaller classes at the start and larger ones at the end
Code clean up
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* Reinstating changes from PR 1478 for rope.py lost during rebase conflict resolution
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* Fix lint issues
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* nit: Code clean up
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* Make imports leaner
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



---------
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

8ace813c

14 Mar, 2025 1 commit

Refactoring attention.py part 1 (#1542) · 37339478

Kshitij Lakhani authored Mar 14, 2025



* Create pytorch/dot_product_attention module and pytorch/d_p_a/utils.py
Move attention logging into a separate class in pytorch/d_p_a/utils.py
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* Create FlashAttentionUtils class in pytorch/d_p_a/utils/py for versioning info
Move versioning info out of pytorch/attention.py
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* Move AttentionParams and get_attention_backend from attention.py to d_p_a/utils.py
Fix tests and imports for the above refactor change
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* Move get_qkv_layout(), get_full_mask(), get_alibi(), get_attention_quantizers() to d_p_a/utils.py
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* Move tensor packing and unpacking helper functions from pyt/attention.py to d_p_a/utils.py
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* Move cumulative seqlens and indices methods from pyt/attention.py to d_p_a/utils.py
Rename cumulative functions from using _cu_ to using _cumul_ to differentiate from CUDA cu calls protocol
Rename tensor packaging methods with leading underscore to make them as internal to file
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* Remove unnecessary imports in pytorch/attention.py and d_p_a/utils.py
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* Create d_p_a/inference.py and move InferenceParams from pyt/attention.py to it
Modify tests and other files to import InferenceParams correctly
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

Modify docs api for InferenceParams
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* Create d_p_a/rope.py and move RoPE methods from  pytorch/attention.py to it
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* Code cleanup
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* Fix qa testing induced bug
Code clean up
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* Fix incorrect pack_tensor arg type
Code clean up
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* nit: Resolve lint errors
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* Remove typedef FAUtils for FlashAttentionUtils
Use attn_log instead of att_log
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

Fix lint error
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* nit: Fix the function name from get_cumul to the earlier get_cu
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* nit: Fix typos, explicit imports and remove extra comments
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

---------
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

37339478

12 Feb, 2025 1 commit

Update documentation for 2.0 release (#1479) · ee4a17de

Przemyslaw Tredak authored Feb 12, 2025



* Updated docs for TE 2.0
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* Do not expose comm_gemm_overlap and cast_transpose_noop
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* Made the figures larger
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* Apply suggestions from code review
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: Przemyslaw Tredak <ptrendx@gmail.com>

* Update quickstart_utils.py
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* Change from review
Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Signed-off-by: Przemyslaw Tredak <ptrendx@gmail.com>

---------
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
Signed-off-by: Przemyslaw Tredak <ptrendx@gmail.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

ee4a17de

07 Feb, 2025 1 commit
- Update main branch with TE 2.0 code, update version to 2.1.0.dev0 · 544dd14b
  Przemek Tredak authored Feb 07, 2025
```
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
```
  544dd14b
02 Jan, 2025 1 commit
- Update copyright to include 2025 (#1388) · c9ea6be9
  Kirthi Shankar Sivamani authored Jan 02, 2025
```
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
```
  c9ea6be9
20 Sep, 2024 1 commit

Allow downloading of model weights automatically (#1172) · 195d7032

Sudhakar Singh authored Sep 20, 2024



* allow tutorial to download the model weights automatically
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* allow users to provide weight cache directory
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



---------
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

195d7032

13 Aug, 2024 1 commit

[PyTorch] Update docs/example and benchmarks/ scripts (#1075) · 88c0c914

Charlene Yang authored Aug 13, 2024



* update example/benchmark scripts
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix head_dim after MLA
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* update notebook
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



---------
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

88c0c914

02 Aug, 2024 1 commit

Link attention docs to the main docs and fix errors reported by Sphinx (#1062) · 098e3006

Przemyslaw Tredak authored Aug 01, 2024



* Link attention docs to the main docs and fix errors reported by Sphinx
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* Lower the version of nbsphinx
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* More fixes
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* Change the URL of example_attention.py to GitHub
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* More fixes in the attention tutorial
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

---------
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

098e3006

10 Jul, 2024 1 commit

Add cuDNN sliding window and set_deterministic_algorithm (#992) · 8e039fdc

Charlene Yang authored Jul 10, 2024



* add cuDNN swa
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix SWA
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* add set_deterministic and minor fixes for swa
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* add AttentionParams
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* change window_size to int64_t; fix swa/determinism tests; cache _attention_backends
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* add window_size to get_backend; fix jax and paddle
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* minor fixes; add set_deter to bwd_impl
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix unit tests
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix FP8 tests due to determinism
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* add support matrix for SWA and bias
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* minor fixes and lint
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* minor fixes
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* add wording on window_size special cases
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* minor tweak on wording
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix jax assertion error
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix wording
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* call bwd with deterministic=true for jax/paddle
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* add determinism words in documentation
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

---------
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

8e039fdc

03 Jul, 2024 1 commit

[C/PyTorch] Add support for bottom-right-diagonal causal mask (#960) · 56e0b351

Charlene Yang authored Jul 03, 2024



* update to FE 1.5.1 and add bottom right causal
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* adjust logic for backend selection
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* update FE to 1.5.2
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* add get_attention_backend function
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* update get_attention_backend
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix get_attention_backend
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* tweak get_attention_backend and fix unit tests
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* minor fixes for unfused, get_backend, etc
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* Update transformer_engine/pytorch/attention.py
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix cpu offload
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* minor fixes for get_attention_backend
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* explicitly skip FP32 and padding tests because there is no support
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* minor fix for window size check
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* update check_set_window_size and add enc_dec_attn_mask_type/enc_dec_window_size
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* minor fixes
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



---------
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

56e0b351

14 Jun, 2024 2 commits

Apply formatting (#929) · 9416519d

Kirthi Shankar Sivamani authored Jun 13, 2024



* Apply formatting
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Apply formatting
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

---------
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

9416519d

Add documentation for dot product attention (#889) · 43569381

Charlene Yang authored Jun 13, 2024



* add attention docs
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* WIP: update attention doc
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* WIP: update attention doc
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* WIP: update attention doc
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* WIP: update attn doc
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* WIP: update attn doc
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* WIP: update attn doc
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* WIP: update attention doc
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* first draft
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* minor tweak to first draft
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* clean up pictures
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* first draft for review
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* minor fixes
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* add logging info/debug
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* minor fix of an SWA message
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* use subprocess instaed of os.sys
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* clean up benchmark script
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* add example script and update notebook
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* minor tweak
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* minor tweaks
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix lint
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* fix Jax/Paddle related comments
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* rerun H100 benchmark
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* restrict fp8 tests to sm90+
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* move get_cudnn_version from common to pytorch utils
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

---------
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

43569381

06 Jun, 2024 1 commit
- Build system refactor for wheels (#877) · c1b915ae
  Kirthi Shankar Sivamani authored Jun 06, 2024
```
Cleanup
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
```
  c1b915ae
01 Jun, 2024 1 commit

Added comments about Llama3 weights to Llama tutorial (#830) · 8b210490

Paweł Gadziński authored May 31, 2024



* Llama 3 update
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* Times update
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* Times update
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* utils.py fix
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* utils.py fix
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* utils.py fix
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* update te llama tutorial to allow running with llama 3 weights
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* small fixes
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* small fix
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* small fix
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* add llama 3 vs llama 2 distinctions
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* paraphrasing and corrected facts
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* fix
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* fix
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

---------
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
Co-authored-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>

8b210490

28 May, 2024 1 commit

Use correct FP8 group in multi-GPU docs (#852) · 9ff2c076

Tim Moon authored May 28, 2024



* Use correct FP8 group in multi-GPU docs

FP8 process group should be tensor-parallel group
Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Synchronize FP8 scales over world group in multi-GPU docs
Signed-off-by: Tim Moon <tmoon@nvidia.com>

---------
Signed-off-by: Tim Moon <tmoon@nvidia.com>

9ff2c076

25 May, 2024 1 commit

Different dimension for attention (#833) · 66736890

Paweł Gadziński authored May 24, 2024



* Fixed Llama tutorial. Changed batch size and added fused=True.
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Signed-off-by: root <root@ipp2-0037.nvidia.com>

* Tutorial updated but not complete yet.
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Signed-off-by: root <root@ipp2-0037.nvidia.com>

* Tutorial notebook reseted - removed fuse=true
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Signed-off-by: root <root@ipp2-0037.nvidia.com>

* Removed fused=true
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Signed-off-by: root <root@ipp2-0037.nvidia.com>

* Batch size back to 8
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Signed-off-by: root <root@ipp2-0037.nvidia.com>

* Typo and commented out line
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Signed-off-by: root <root@ipp2-0037.nvidia.com>

* fixed whitespace
Signed-off-by: root <root@ipp2-0037.nvidia.com>

* fixed whitespace
Signed-off-by: root <root@ipp2-0037.nvidia.com>

* Added comment to attention line. Fixed potential bug with loading weights - now loading works correctly, confirmed by the generation code.
Signed-off-by: root <root@ipp2-1661.nvidia.com>

* Comments
Signed-off-by: root <root@ipp2-1661.nvidia.com>

* Models cast added again
Signed-off-by: root <root@ipp2-1661.nvidia.com>

* Weight download info
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* Moved parameter gate_proj_size to config
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* gate_proj_size removed and put immediate_size instead
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* Llama 3 added to tutorial
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* Typos fix
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* Typos fix
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* Fixed model loading
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* Loading fix
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* Different dim for attention
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* Reversed other commit
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* Changed name to kv_channels
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* Fixed typo
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* Back to kv_channels in transformer layer
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* Back to kv_channels in transformer layer
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* Small bug fix
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* Small bug fix
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* Test fix
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* changed file modes
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* lint fix and resolved conflict
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* lint fix and resolved conflict
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

* Lint fix, hopefully last
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

---------
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Signed-off-by: root <root@ipp2-0037.nvidia.com>
Signed-off-by: root <root@ipp2-1661.nvidia.com>
Co-authored-by: root <root@ipp2-2373.nvidia.com>
Co-authored-by: root <root@ipp2-1588.nvidia.com>
Co-authored-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Co-authored-by: root <root@ipp2-0037.nvidia.com>
Co-authored-by: root <root@ipp2-1661.nvidia.com>
Co-authored-by: root <root@ipp2-2371.nvidia.com>
Co-authored-by: root <root@ipp2-1589.nvidia.com>
Co-authored-by: Sudhakar Singh <sudhakars@nvidia.com>
Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

66736890

31 Mar, 2024 1 commit

Llama tutorial fixes (#730) · 16a469df

Paweł Gadziński authored Mar 31, 2024



Llama tutorial fixes - all
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Co-authored-by: Pawel Gadzinski <pgadzinski@nvidia.com>

16a469df

20 Mar, 2024 1 commit

Llama accelerate tutorial (#720) · c38779be

Sudhakar Singh authored Mar 20, 2024



* tutorial and doc fixes
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* remove extra code
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* fix typos
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

---------
Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

c38779be

01 Mar, 2024 1 commit
- Create a small tutorial on how to accelerate HF Llama models with Transformer-Engine (#615) · 0bd84ed9
  Sudhakar Singh authored Feb 29, 2024
  
  0bd84ed9
08 Feb, 2024 1 commit
- Update example to use new TE_DType path (#660) · 379c1ee3
  Quentin Anthony authored Feb 08, 2024
```
Signed-off-by: Quentin Anthony <qganthony@yahoo.com>
```
  379c1ee3
19 Jan, 2024 1 commit
- chore: Fix multiple typos (#613) · b4b8ae7b
  hugo-syn authored Jan 19, 2024
```
Signed-off-by: hugo-syn <hugo.vincent@synacktiv.com>
```
  b4b8ae7b
03 Jan, 2024 1 commit
- Change the copyright to include 2024 (#583) · cd798c97
  Przemyslaw Tredak authored Jan 02, 2024
```
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
```
  cd798c97
06 Dec, 2023 1 commit

Update README.md - Latest News section (#554) · 14c51e62

Santosh Bhavani authored Dec 06, 2023



* Add H200 perf non-alpha image
Signed-off-by: Santosh Bhavani <santosh@semantic.md>

* Update README.rst - non-transparent H200 plot
Signed-off-by: Santosh Bhavani <santosh@semantic.md>

---------
Signed-off-by: Santosh Bhavani <santosh@semantic.md>

14c51e62

24 Feb, 2023 1 commit

Remove redundant AR for SP case (#79) · d8a2f352

Kirthi Shankar Sivamani authored Feb 23, 2023



* Remove redundant amax AR for SP case
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* update advanced docs
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

---------
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

d8a2f352

04 Jan, 2023 1 commit

Docs: remove build warnings and add FP8 caching note (#44) · d6ff6f4d

Kirthi Shankar Sivamani authored Jan 04, 2023



* docs: remove build warnings and add FP8 caching note
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* add comment about amax history
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

d6ff6f4d

03 Jan, 2023 1 commit

Update copyright year (#48) · 64a8dc90

Przemyslaw Tredak authored Jan 03, 2023


Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

64a8dc90

02 Dec, 2022 1 commit

Link performance optimization tutorial to docs (#36) · 0291a608

Przemyslaw Tredak authored Dec 02, 2022


Signed-off-by: Przemyslaw Tredak <ptredak@nvidia.com>
Signed-off-by: Przemyslaw Tredak <ptredak@nvidia.com>

0291a608

18 Nov, 2022 1 commit

Documentation for advanced performance optimizations (#20) · 8e7f4c8c

Tim Moon authored Nov 18, 2022



* Documentation for advanced perf optimizations

Fix bug where we were doing backward passes inside fp8_autocast in example notebooks.
Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Minor tweaks to advanced perf optimization docs

Review suggestions from @ptrendx
Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Rewording sequence parallelism in advanced perf optimization docs

Review suggestion from @ksivaman
Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

8e7f4c8c

28 Sep, 2022 1 commit

Inital code drop · 996ea169

Przemek Tredak authored Sep 27, 2022


Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

996ea169