- 25 Apr, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* Add guide to build from source Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 21 Apr, 2023 1 commit
-
-
cyanguwa authored
* Add FP8 fused attention to TE for PyTorch Signed-off-by:
Charlene Yang <charleney@nvidia.com> * add license for cudnn-frontend, modify installation requirements, and refactor some headers for aesthetics Signed-off-by:
Charlene Yang <charleney@nvidia.com> * add c api docs for fused attention Signed-off-by:
Charlene Yang <charleney@nvidia.com> * add exception for unsupported precision/sequence length combinations Signed-off-by:
Charlene Yang <charleney@nvidia.com> * fix installation requirement for non fused attn use cases Signed-off-by:
Charlene Yang <charleney@nvidia.com> * fix docs for fused-attn Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * prefix enums with NVTE_ and replace old MHA_Matrix with NVTE_QKV_Matrix Signed-off-by:
Charlene Yang <charleney@nvidia.com> * minor fixes based on PR comments Signed-off-by:
Charlene Yang <charleney@nvidia.com> * fix description for kvpacked fwd Signed-off-by:
Charlene Yang <charleney@nvidia.com> * fix description of Bias in C api Signed-off-by:
Charlene Yang <charleney@nvidia.com> * minor fixes for cudnn requirement and description for QKV tensors Signed-off-by:
Charlene Yang <charleney@nvidia.com> * fix QKV layout description and support matrix for C api Signed-off-by:
Charlene Yang <charleney@nvidia.com> * add asserts to cpp_extensions for qkv layout/bias type/attn mask type Signed-off-by:
Charlene Yang <charleney@nvidia.com> * fix typo precision Signed-off-by:
Charlene Yang <charleney@nvidia.com> --------- Signed-off-by:
Charlene Yang <charleney@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Charlene Yang <charleney@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 20 Apr, 2023 1 commit
-
-
Frédéric Bastien authored
* Clean up the installation instruction. We where telling to install the dev version in the README. Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * Typos Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com>
-
- 05 Apr, 2023 1 commit
-
-
Frédéric Bastien authored
* Update installation instructio for JAX and add some depenencies. Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * Bring back support for none pip installed pybind11. Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * Apply suggestions from code review Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Frédéric Bastien <frederic.bastien@gmail.com> * Changes following review. Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * Change order to make it more clear. Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * Add other reviers suggestion. Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * pybind11 is needed for all FW. Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * Add flax as a dep Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * Update README.rst Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Frédéric Bastien <frederic.bastien@gmail.com> --------- Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> Signed-off-by:
Frédéric Bastien <frederic.bastien@gmail.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 29 Mar, 2023 1 commit
-
-
tcherckez-nvidia authored
Signed-off-by:
Tal Cherckez <tcherckez@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 17 Mar, 2023 1 commit
-
-
Przemek Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
- 14 Mar, 2023 1 commit
-
-
Ming-Xu Huang authored
* Updated TE/JAX docs Signed-off-by:
Ming Huang <mingh@nvidia.com> * Adding TE/JAX docs' rst files Signed-off-by:
Ming Huang <mingh@nvidia.com> * Set DType as pybind11::module_local() to avoid generic_type errors. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Updating license and exporting more modules Signed-off-by:
Ming Huang <mingh@nvidia.com> * Adopting autoapi and removing enum_tools. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Fix typo Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Make jax.rst be style consistent. Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Fixing doc statements as the suggestion from review. Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Fixing doc statements as the suggestion from code review. Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Update the description of Softmax Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Removed categories in catalog as PyTorch Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> --------- Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 11 Mar, 2023 2 commits
-
-
Przemyslaw Tredak authored
* Change from AutoDoc to AutoAPI Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fixes Signed-off-by:
Przemyslaw Tredak <ptredak@nvidia.com> * WAR for the wrong autosummary generation Signed-off-by:
Przemyslaw Tredak <ptredak@nvidia.com> * Change common to be in line with pytorch API docs Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Add GitHub Action to build docs Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Trying to fix the versions Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> --------- Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Przemyslaw Tredak <ptredak@nvidia.com>
-
Przemyslaw Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
- 24 Feb, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* Remove redundant amax AR for SP case Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * update advanced docs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 22 Feb, 2023 1 commit
-
-
cyanguwa authored
* add flash attention to TransformerLayer Signed-off-by:
Charlene Yang <charleney@nvidia.com> * Add docs for FP8 calibration (#61) Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Charlene Yang <charleney@nvidia.com> * Fix the integer overflow in fused softmax (#60) Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Charlene Yang <charleney@nvidia.com> * prefix flash attn env var with NVTE_ Signed-off-by:
Charlene Yang <charleney@nvidia.com> * Address steady memory increase and bloated checkpoints (#63) Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Charlene Yang <charleney@nvidia.com> * fix env var logic Signed-off-by:
cyanguwa <cyang.uwa@gmail.com> Signed-off-by:
Charlene Yang <charleney@nvidia.com> * fix flash attn env var logic again Signed-off-by:
cyanguwa <cyang.uwa@gmail.com> Signed-off-by:
Charlene Yang <charleney@nvidia.com> * remove d2d copies (#64) * remove d2d copies Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * cleanup Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Charlene Yang <charleney@nvidia.com> * Increase number of FP8 tensors per GEMM (#22) * Increase number of FP8 tensors per GEMM Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Enable FP8 output tensor for fp8_gemm Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * [BERT FP8] Initial TE review comments Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Temporary fix for cuda graph non convergence Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Address review comments-2 Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Review comments-3 Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Cleanup Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Change for New API Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Remove unnecessary clone for D_scale, D_amax Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Avoid Roll for AMAX history size = 1 Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Update onnx_te_gemm API Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Fix Lint errors Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> --------- Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> Signed-off-by:
Charlene Yang <charleney@nvidia.com> * Bug fixes from PR 22 (#65) * Bug fixes from PR 22 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add FP8 tests to ci Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * bundle unittests for ci Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Charlene Yang <charleney@nvidia.com> * replace rearrange with transpose Signed-off-by:
cyanguwa <cyang.uwa@gmail.com> Signed-off-by:
Charlene Yang <charleney@nvidia.com> * QKV parameters unfused path fixes and optimization (#66) * Bug fixes from PR 22 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add FP8 tests to ci Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Better QKV parameter fusion Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * small fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * keep original param for unfused case to retain externally set attrs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * lint fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix ONNX exports Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * improve arg naming Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * No need to set data pointers Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Assert memory loc in NoopCat Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Handle case of different memory in param and buffer Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix assert always true Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Reassign params memory to avoid more concats Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Charlene Yang <charleney@nvidia.com> * Fix gradients when using AMP (#70) retain grad related attrs while casting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Charlene Yang <charleney@nvidia.com> * fix pylint violations fixed pyline violations such as trailing white spaces and too long lines Signed-off-by:
cyanguwa <cyang.uwa@gmail.com> * fix pylint violation on line 264 with R1719 Signed-off-by:
cyanguwa <cyang.uwa@gmail.com> * fix two more pylint violations Signed-off-by:
cyanguwa <cyang.uwa@gmail.com> * DotProductAttention API Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add docs for attention Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix assert always true Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * check for correct flash-attn version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * address review comments Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * lint+build fixes, correct settings for default flash-attn Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * correct version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * review comments and fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix onnx and disable flash-attn export test Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * remove einops dependency Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * cleanup internal API; rm duplication Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * only install TE wheel (exclude flash-attn to rm conflicts) Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * forgot to change install wheel path Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * next round review comments Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix flash_attn output Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix QK layer scaling Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * update docs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * review comments and fixes to selective checkpointing Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Charlene Yang <charleney@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
cyanguwa <cyang.uwa@gmail.com> Co-authored-by:
Charlene Yang <charleney@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 04 Jan, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* docs: remove build warnings and add FP8 caching note Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * add comment about amax history Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 03 Jan, 2023 1 commit
-
-
Przemyslaw Tredak authored
Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Przemek Tredak <ptredak@nvidia.com>
-
- 06 Dec, 2022 1 commit
-
-
Kirthi Shankar Sivamani authored
* Softmax docs and type fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * lint whitespace Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * change API, better naming, const fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 02 Dec, 2022 1 commit
-
-
Przemyslaw Tredak authored
Signed-off-by:
Przemyslaw Tredak <ptredak@nvidia.com> Signed-off-by:
Przemyslaw Tredak <ptredak@nvidia.com>
-
- 18 Nov, 2022 1 commit
-
-
Tim Moon authored
* Documentation for advanced perf optimizations Fix bug where we were doing backward passes inside fp8_autocast in example notebooks. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Minor tweaks to advanced perf optimization docs Review suggestions from @ptrendx Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Rewording sequence parallelism in advanced perf optimization docs Review suggestion from @ksivaman Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 16 Nov, 2022 1 commit
-
-
Kirthi Shankar Sivamani authored
* Fix bugs for full activation recompute in FP8 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Ensure identical numerics in recomputation for pipeline parallelism Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * expose checkpoint API and add docs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * complete checkpointing docs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 20 Oct, 2022 1 commit
-
-
Przemyslaw Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
- 28 Sep, 2022 1 commit
-
-
Przemek Tredak authored
Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Przemek Tredak <ptredak@nvidia.com>
-