- 18 Oct, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
* Support wheel build for cuda 13 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fixes for cu13 runtime, format Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add documentation Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Better error handling Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix jax sdist Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Modify function names Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 15 Oct, 2025 1 commit
-
-
Santosh Bhavani authored
* Enhance Latest News section with recent TE and FP8 developments - Adds NVFP4 pretraining research paper with PR #2177 reference Signed-off-by:
Santosh Bhavani <santosh.bhavani@live.com> * update nvfp4 reference Signed-off-by:
Santosh Bhavani <santosh.bhavani@live.com> * Update README.rst Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Santosh Bhavani <santosh.bhavani@live.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 14 Oct, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
* Initial API change Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Change all imports and api Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * format Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix typo Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix recipe tets Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix more tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix docs, tests, and make Jax change as well Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Change internal uses of fp8_autocast Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Address nits Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * rename file Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * CG function, and small test fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Change instances of make_graphed_callables internally Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix distributed tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Review Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Review Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix test and add more docs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Cleanup test imports and minimize internal file imports Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Make is_bf16_available public Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Better docs and better api Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * format Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Apply suggestions from code review Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> * fix nvfp4 test Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 22 Aug, 2025 1 commit
-
-
Phuong Nguyen authored
update NGC version Signed-off-by:Phuong Nguyen <phuonguyen@nvidia.com>
-
- 10 Jun, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
* Initial basic setup Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * rm setup reqs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * buil-isolation support Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * rm not needed funcs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix workflows Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix wheel Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix invalid wheel Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix JAX build in baremetal env Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update install inst in readme Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update build.yml Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * docstring fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 27 May, 2025 1 commit
-
-
Santosh Bhavani authored
* added conda installation Signed-off-by:
Santosh Bhavani <santosh.bhavani@live.com> * fix for pypi Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Santosh Bhavani <santosh.bhavani@live.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 19 May, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
* Fix README render on PyPI Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update README.rst Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Use anonymous hyperlink for duplicate. Fix indent. Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 07 May, 2025 2 commits
-
-
jberchtold-nvidia authored
Signed-off-by:Jeremy Berchtold <jberchtold@nvidia.com>
-
Santosh Bhavani authored
* added a direct link to the quickstart notebook right after the code examples section Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * updated link in README for HF Accelerate docs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * update DeepSpeed integration link Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update Release Notes link to documentation archive Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * updated latest news and moved older news under a dropdown caret Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * moved previous news to bottom of readme Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fixed previous news link Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * added gtc videos Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * added TE GTC 2025 talk to latest news Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Santosh Bhavani <sbhavani@nvidia.com>
-
- 02 May, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
* Update README.rst Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update README.rst Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 16 Apr, 2025 1 commit
-
-
Santosh Bhavani authored
* Update README.rst - Installation Update installation section with comprehensive guidelines - Add detailed system requirements - Include Conda installation method (experimental) - Document environment variables for customizing build process - Update FlashAttention support to cover both version 2 and 3 - Add troubleshooting section with solutions for common installation issues Signed-off-by:
Santosh Bhavani <sbhavani@nvidia.com> * Update README.rst - Installation removed conda section Signed-off-by:
Santosh Bhavani <sbhavani@nvidia.com> * Update README.rst - Installation added all gpu archs that support FP8 Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update README.rst Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update README.rst Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update README.rst Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update installation.rst Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix docs and adding troubleshooting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Santosh Bhavani <sbhavani@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 13 Mar, 2025 1 commit
-
-
Tim Moon authored
* Explicitly use python3 and pip3 Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Run pre-commit as Python module Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Replace some missed references to "python" or "pip" Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 12 Feb, 2025 1 commit
-
-
Przemyslaw Tredak authored
* Updated docs for TE 2.0 Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Do not expose comm_gemm_overlap and cast_transpose_noop Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Made the figures larger Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Apply suggestions from code review Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Przemyslaw Tredak <ptrendx@gmail.com> * Update quickstart_utils.py Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change from review Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Przemyslaw Tredak <ptrendx@gmail.com> --------- Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Przemyslaw Tredak <ptrendx@gmail.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 07 Feb, 2025 1 commit
-
-
Przemek Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
- 30 Jan, 2025 1 commit
-
-
Quentin Anthony authored
Signed-off-by:Quentin Anthony <qganthony@yahoo.com>
-
- 02 Jan, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 17 Sep, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
* Add PyPI install instructions Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Review from @timmoon10 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 14 Aug, 2024 1 commit
-
-
Tim Moon authored
* Bump minimum CUDA version to 12.0 Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug CUDA version check Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug CMake build Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Review suggestions from @ksivaman and @ptrendx Remove logic for CUDA <12.0 in PyTorch and Paddle builds. Update version in docs and README. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 07 Jun, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
* Remove interval arg from recipe Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Remove usage of interval and use explicit kwarg for testing recipes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 17 May, 2024 1 commit
-
-
Charlene Yang authored
* fix inconsistency for attn mask; now True means participating in attn Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix sliding window window_size for decoder+padding combination Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * revert paddle changes regarding mask Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * revert softmax to 1-mask;0-keep Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * enforce 1-mask out; 0-keep rule for jax masks Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix jax lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * revert pytorch mask changes; some kept in tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * revert to jax fused attn on main Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * inverse mask logic for get_cu_seqlens/_and_indices in PyTorch implementation and mask generation in unit tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * temporarily disable update_weight_scale_inv Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * enforce window_size for decoder Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add docstring for mask definition 1-mask out;0-keep Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add aux_ctx_tensors to save_for_backward Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * tweak make_decoder_mask and make_mask in jax tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * skip dBias for shapes other than 1HSS; otherwise dq/dk/dv NaNs Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * expand attn_biases from list to variables in save_for_backward Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix use of variable before assignment in jax dact_lu Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove window size definition for decoder Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add change notes in README for padding mask in PyTorch Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * tweak padding mask notes in README Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * expand list to tensors for save_for_backwards Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
cyanguwa <8636796+cyanguwa@users.noreply.github.com>
-
- 24 Apr, 2024 1 commit
-
-
Santosh Bhavani authored
Added HF Nanotron to integrations and updated GTC 24 video to ondemand link Signed-off-by:Santosh Bhavani <santosh@semantic.md>
-
- 03 Apr, 2024 1 commit
-
-
Santosh Bhavani authored
* Update README.rst 1. Updated latest news with databricks blog 2. Fixed formatting issues 3. Added GTC 2024 video Signed-off-by:
Santosh Bhavani <santosh@semantic.md> * Update README.rst added back overview marker for docs generation Signed-off-by:
Santosh Bhavani <santosh@semantic.md> * Added MPT-13B convergence result Signed-off-by:
Santosh Bhavani <santosh@semantic.md> * Added Levanter/JAX to integrations section of README Signed-off-by:
Santosh Bhavani <santosh@semantic.md> * Update README.rst Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update README.rst Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update README.rst Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update README.rst Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update README.rst Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Santosh Bhavani <santosh@semantic.md> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 13 Mar, 2024 1 commit
-
-
Santosh Bhavani authored
Update README.rst - Latest News Added an entry to Latest News section Signed-off-by:Santosh Bhavani <santosh@semantic.md>
-
- 05 Mar, 2024 1 commit
-
-
Zhenhuan Liu authored
Signed-off-by:Zhenhuan Liu <nkulzh16@gmail.com>
-
- 30 Jan, 2024 1 commit
-
-
Rahul Huilgol authored
Update README.rst Signed-off-by:Rahul Huilgol <rahulhuilgol@gmail.com>
-
- 23 Jan, 2024 1 commit
-
-
Ming-Xu Huang authored
Fix JAX/Exmaples in README.md Signed-off-by:Ming Huang <mingh@nvidia.com>
-
- 04 Jan, 2024 1 commit
-
-
Quentin Anthony authored
Add GPT-NeoX coming soon Signed-off-by:Quentin Anthony <qganthony@yahoo.com>
-
- 03 Jan, 2024 1 commit
-
-
Przemyslaw Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
- 06 Dec, 2023 2 commits
-
-
Santosh Bhavani authored
* Add H200 perf non-alpha image Signed-off-by:
Santosh Bhavani <santosh@semantic.md> * Update README.rst - non-transparent H200 plot Signed-off-by:
Santosh Bhavani <santosh@semantic.md> --------- Signed-off-by:
Santosh Bhavani <santosh@semantic.md>
-
Santosh Bhavani authored
* Update README.rst - FP8 convergence - added FP8 convergence section - removed model support (to be replaced with a feature support table) Signed-off-by:
Santosh Bhavani <santosh@semantic.md> * Update README.rst Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update README.rst Updated Latest News Signed-off-by:
Santosh Bhavani <santosh@semantic.md> * Update README.rst Add plot for H200 Signed-off-by:
Santosh Bhavani <santosh@semantic.md> --------- Signed-off-by:
Santosh Bhavani <santosh@semantic.md> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 14 Nov, 2023 1 commit
-
-
Ming-Xu Huang authored
* Refactor sharding.py for the further custom_partitioning migration Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Migrating both FWD and BWD of LayerNorm/RMSNorm from xmap to custom_partitioning. Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Migrating both FWD and BWD of all kinds of softmax from xmap to custom_partitioning. Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Fix the wrong order of parameters to LN/RMSN bwd in ln_mlp_fp8. Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * WAR to LN/RMSN_fp8 before migrating to CP. Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Fix the wrong order of parameters of bwd of LN/RMSN_fp8. Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Following review feedback to modify Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Force the hidden dim in Norm ops to no sharding and add warning msg. Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Reuse fwd_rule in VJP functions Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Migrating both FWD and BWD of self-fused-attn from xmap to custom_partitioning. Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Migrating both FWD and BWD of cross-fused-attn from xmap to custom_partitioning. Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * add gelu and dgelu. Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Reuse fwd_rule in VJP functions for attentions Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Apply native FP8 Dtypes to fp8.py Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Migrating cast_and_transpose from xmap to custom_partitioning Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Migrating transpose from xmap to custom_partitioning Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Apply XLA pattern match to perform FP8 GEMM. Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * migrate layernorm_fp8 to custom_partitioning. Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Unify code style Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Extend supported of Transpose with FP8 Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Implementing layernorm_fp8_dot based on migrated custom calls. Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Renaming variables and publish NVTE_FP8_COLLECTION_NAME Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Replace Q/DQ custom calls with native XLA implementations Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * migrate gelu_fp to custom_partitioning. Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Miner fix Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Support custom calls with mutli-dims Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Support gerneral dot indices in _fp8_dot_impl Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Implementing layernrom_geglu_fp8_mlp Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Remove GEMM custom calls Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Remove xmap related code Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Fix typo and add query-function to FP8MetaPackage Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Fix some bugs of custom calls Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Fix CT's bugs Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Update UTs/eaxmaples to adapt to the API changes. Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Unify kernel initilization in MLP. Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Modifing with code review's feedback Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Update README and Add deprecating warning to *ShardingType Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Canonicalize the dtype Signed-off-by:
Ming Huang <mingh@nvidia.com> * Adding assertion for non-supported batch dims. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Adding doc/examples to _multidim_transpose Signed-off-by:
Ming Huang <mingh@nvidia.com> * Set FP8 meta as WeightHParamsCollection.OVERWRITE_WITH_GRADIENT in Praxis modules. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Set FP8 meta as WeightHParamsCollection.OVERWRITE_WITH_GRADIENT in Praxis modules. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Apply dtype-based rtol/atol to UTs Signed-off-by:
Ming Huang <mingh@nvidia.com> * Deprecate QKV_INTERLEAVED enum Signed-off-by:
Ming Huang <mingh@nvidia.com> * Skip test_distributed_custom_ops.py Signed-off-by:
Ming Huang <mingh@nvidia.com> * Fix the wrong sharding of bias in SelfAttn Signed-off-by:
Ming Huang <mingh@nvidia.com> * WAR to fix the wrong cu_seqlen of MHA when DP/FSDP enabled Signed-off-by:
Ming Huang <mingh@nvidia.com> * Adding distributed ops unit-tests Signed-off-by:
Ming Huang <mingh@nvidia.com> * Adding license to test_distributed_* Signed-off-by:
Ming Huang <mingh@nvidia.com> * Follow review feedback to modify Signed-off-by:
Ming Huang <mingh@nvidia.com> * Use total bytes involved in collective ops as criteria. Signed-off-by:
Ming Huang <mingh@nvidia.com> --------- Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> Co-authored-by:
Donglin Yang <dongliny@nvidia.com>
-
- 13 Nov, 2023 1 commit
-
-
Santosh Bhavani authored
* Update README.rst - Installation section Added pip install instructions and cleaned up pre-reqs and FlashAttention-2 section Signed-off-by:
Santosh Bhavani <santosh@semantic.md> * Update README.rst Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Santosh Bhavani <santosh@semantic.md> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 13 Oct, 2023 1 commit
-
-
Tim Moon authored
Signed-off-by:Tim Moon <tmoon@nvidia.com>
-
- 11 Oct, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 02 Oct, 2023 1 commit
-
-
Santosh Bhavani authored
minor grammatical changes and added "JAX Toolbox" to integrations Signed-off-by:Santosh Bhavani <santosh@semantic.md>
-
- 16 Aug, 2023 1 commit
-
-
Przemyslaw Tredak authored
Signed-off-by:Przemyslaw Tredak <ptredak@nvidia.com>
-
- 31 Jul, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
Add compilation warning for FA 2.0 Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 14 Jul, 2023 1 commit
-
-
cyanguwa authored
* Fix bprop for cuDNN 8.9.3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Update cuDNN version requirement to 8.9.3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * debug paddle CI Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * debug paddle CI; force LD_LIBRARY Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * debug paddle CI; force LD_LIBRARY to /opt Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove debug info for paddle Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * change cudnn requirement to 8.9.1 for v1 and 8.9.0 for v2; add batch size 32 for unit test; add LD library path for paddle tests temporarily Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove printf line in fused_attn.cpp Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add batch size 32 for unit test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update cudnn-frontend to 0.9.2 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove temporary LD library path used for testing pre-released cudnn 8.9.3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
-
- 27 Jun, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 22 Jun, 2023 1 commit
-
-
Santosh Bhavani authored
Update README.rst Added NVIDIA NeMo to Integrations section Signed-off-by:Santosh Bhavani <santosh.bhavani@live.com>
-