- 27 Jul, 2023 1 commit
-
-
Przemyslaw Tredak authored
* Exposing RMSNorm in pyTorch extensions Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * First pass at the Python API Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Small fixes Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Added numerics tests and fixed issues Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Lint fixes Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Added RMSNorm to LayerNormMLP Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Added ONNX export and tests for RMSNorm Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix python lint Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix BERT case Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Added normalization option to the TransformerLayer Added tests Fixed test failures Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix documentation Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix kwarg bug Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix IMA and invalid type error Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Increase RMSNorm threshold for bf16 case Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix ONNX tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 19 Jul, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* FA v2.0 support Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix typo Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 13 Jun, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 06 Jun, 2023 1 commit
-
-
Tian Zheng authored
* First step of PaddlePaddle integration - Add build option for paddle - Add basic test framework - Add 3 basic operators: cast_from_fp8, cast_to_fp8, gemm Signed-off-by:
Tian Zheng <tizheng@nvidia.com> Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix review comments Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Support paddle build Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add paddle build support for new building framework Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix review comments Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Clean up build process for Paddle stub file Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Minor fixes Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix pylint "wrong-import-order" warning Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix review comments Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Skip BF16 GEMM tests for unsupported arch Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> --------- Signed-off-by:
Tian Zheng <tizheng@nvidia.com> Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 01 Jun, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
pin FA version Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 31 May, 2023 1 commit
-
-
Tim Moon authored
* Refactor Setuptools build system Successfully launches CMake install, but installs CMake extensions in temp dir. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug JAX build Fix pybind11 import. Distinguish between build-time and run-time dependencies. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add helper function to determine dependencies Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add missing license Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug case where system CMake is too old Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add missing license Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Simplify sanity import tests Just importing modules provides richer error messages. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Properly install submodules Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Install helper library for TensorFlow Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update documentation Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Do not install Ninja by default Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Include Git commit hash in version string Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Override build_ext.build_extensions instead of build_ext.run Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix incorrect include path Restore Ninja dependency. Restore overriding build_ext.run func. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Review suggestions from @nouiz Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Disable parallel Ninja jobs in GitHub actions Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Properly install userbuffers lib Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Tweak install docs Review suggestion from @ksivaman Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add examples for specifying framework in docs Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com>
-
- 02 May, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
Use only built-ins for setup Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 30 Apr, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
Remove distutils Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 21 Apr, 2023 2 commits
-
-
cyanguwa authored
* Add FP8 fused attention to TE for PyTorch Signed-off-by:
Charlene Yang <charleney@nvidia.com> * add license for cudnn-frontend, modify installation requirements, and refactor some headers for aesthetics Signed-off-by:
Charlene Yang <charleney@nvidia.com> * add c api docs for fused attention Signed-off-by:
Charlene Yang <charleney@nvidia.com> * add exception for unsupported precision/sequence length combinations Signed-off-by:
Charlene Yang <charleney@nvidia.com> * fix installation requirement for non fused attn use cases Signed-off-by:
Charlene Yang <charleney@nvidia.com> * fix docs for fused-attn Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * prefix enums with NVTE_ and replace old MHA_Matrix with NVTE_QKV_Matrix Signed-off-by:
Charlene Yang <charleney@nvidia.com> * minor fixes based on PR comments Signed-off-by:
Charlene Yang <charleney@nvidia.com> * fix description for kvpacked fwd Signed-off-by:
Charlene Yang <charleney@nvidia.com> * fix description of Bias in C api Signed-off-by:
Charlene Yang <charleney@nvidia.com> * minor fixes for cudnn requirement and description for QKV tensors Signed-off-by:
Charlene Yang <charleney@nvidia.com> * fix QKV layout description and support matrix for C api Signed-off-by:
Charlene Yang <charleney@nvidia.com> * add asserts to cpp_extensions for qkv layout/bias type/attn mask type Signed-off-by:
Charlene Yang <charleney@nvidia.com> * fix typo precision Signed-off-by:
Charlene Yang <charleney@nvidia.com> --------- Signed-off-by:
Charlene Yang <charleney@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Charlene Yang <charleney@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Kirthi Shankar Sivamani authored
* Initial refactor; linker error Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix linking issue and make mpi conditional Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix TF/JAX build Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Use max SMs at the last RS chunk in pipelined overlap Co-authored-by:
Sangkug Lym <slym@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Make userbuffers support opt-in Decouple userbuffers from MPI. Refactor MPI handling in build system. Standardize names to "userbuffers". Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Lint Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Sangkug Lym <slym@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com>
-
- 19 Apr, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* Port initial changes Co-authored-by:
Sangkug Lym <slym@nvidia.com> Co-authored-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * readd FA include for PyTorch Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Re-enable sm_70 + cleanup Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * LICENSE, cleanup header Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * 5k -> 173 errors Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * license and fixes in userbuffers-host Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * next round fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * final cpp cleanup Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * pylinting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix from linting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Turn off default async amax reduction (#148) Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * remove unused code path Signed-off-by:
Sangkug Lym <slym@nvidia.com> * cleanup Macros Signed-off-by:
Sangkug Lym <slym@nvidia.com> * fix conflict resolution bug Signed-off-by:
Sangkug Lym <slym@nvidia.com> * Fix gencode flags in setup (#145) * Fix gencode flags based on cuda version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * review suggestions Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * revert append_nvcc_threads change Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Change overlap config dict error message Signed-off-by:
Sangkug Lym <slym@nvidia.com> * simplify ub initialization Signed-off-by:
Sangkug Lym <slym@nvidia.com> * lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix sanity imports Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * cpplint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix TensorFlow build Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix TE macros in public header Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * More fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * compiles with and w/o MPI Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fixes for python side annotations for conditional compile Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * link gdrAPI only when MPI found Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix comments for dummy var Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix linking Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Review comments Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * load MPI before TE Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add Py side argument checks Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * remove unused code and catch silent failures Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix cpp tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix find_lib path for tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Sangkug Lym <slym@nvidia.com> Co-authored-by:
Sangkug Lym <slym@nvidia.com> Co-authored-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com>
-
- 17 Apr, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* use upstream flash-attn Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * get correct FA for linting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 15 Apr, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* Fix gencode flags based on cuda version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * review suggestions Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * revert append_nvcc_threads change Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 11 Apr, 2023 1 commit
-
-
Trevor Morris authored
* Fix pybind11 install doc Signed-off-by:
Trevor Morris <tmorris@nvidia.com> * Set CMAKE_PREFIX_PATH for TF to find pybind11 Signed-off-by:
Trevor Morris <tmorris@nvidia.com> * Update test builds to use pip install of apt. Signed-off-by:
Trevor Morris <tmorris@nvidia.com> --------- Signed-off-by:
Trevor Morris <tmorris@nvidia.com>
-
- 07 Apr, 2023 1 commit
-
-
ngoyal2707 authored
* made bias configurable Signed-off-by:
Naman Goyal <naman@fb.com> * removed commented lines Signed-off-by:
Naman Goyal <naman@fb.com> * Update transformer_engine/pytorch/jit.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
ngoyal2707 <ngoyal2707@users.noreply.github.com> * Update transformer_engine/pytorch/jit.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
ngoyal2707 <ngoyal2707@users.noreply.github.com> * fixed incorrect call to fused bias dropout add kernel Signed-off-by:
Naman Goyal <naman@fb.com> * Update transformer_engine/pytorch/jit.py Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> * Separate FC1 and FC2 use_bias args; solves all ci errors Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * jit fusion improvement Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Docs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Naman Goyal <naman@fb.com> Signed-off-by:
ngoyal2707 <ngoyal2707@users.noreply.github.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Naman Goyal <naman@fb.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 05 Apr, 2023 1 commit
-
-
Frédéric Bastien authored
* Update installation instructio for JAX and add some depenencies. Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * Bring back support for none pip installed pybind11. Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * Apply suggestions from code review Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Frédéric Bastien <frederic.bastien@gmail.com> * Changes following review. Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * Change order to make it more clear. Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * Add other reviers suggestion. Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * pybind11 is needed for all FW. Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * Add flax as a dep Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * Update README.rst Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Frédéric Bastien <frederic.bastien@gmail.com> --------- Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> Signed-off-by:
Frédéric Bastien <frederic.bastien@gmail.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 28 Mar, 2023 1 commit
-
-
Trevor Morris authored
* Add tensorflow build Improve build instructions Fix pybind enum usage Fix Python_EXECUTABLE cmake var Move scale_inv calculations to FW Signed-off-by:
Trevor Morris <tmorris@nvidia.com> Signed-off-by:
kaixih <kaixih@nvidia.com> * Apply clang-format Signed-off-by:
Trevor Morris <tmorris@nvidia.com> Signed-off-by:
kaixih <kaixih@nvidia.com> * Format python files Signed-off-by:
Trevor Morris <tmorris@nvidia.com> Signed-off-by:
kaixih <kaixih@nvidia.com> * Add TF build CI Signed-off-by:
Trevor Morris <tmorris@nvidia.com> Signed-off-by:
kaixih <kaixih@nvidia.com> * Lint checks Signed-off-by:
kaixih <kaixih@nvidia.com> * Another round of lint checks Signed-off-by:
kaixih <kaixih@nvidia.com> * Fix TF image tag Signed-off-by:
Trevor Morris <tmorris@nvidia.com> Signed-off-by:
kaixih <kaixih@nvidia.com> * Use the existing recipe file Signed-off-by:
kaixih <kaixih@nvidia.com> * Add license claim blocks Signed-off-by:
kaixih <kaixih@nvidia.com> * Fix a bug about bias dtype conversion Signed-off-by:
kaixih <kaixih@nvidia.com> * Add mnist example and cleanup old examples Signed-off-by:
kaixih <kaixih@nvidia.com> * Autopep8 the tests Signed-off-by:
kaixih <kaixih@nvidia.com> * Autopep8 the examples Signed-off-by:
kaixih <kaixih@nvidia.com> * Add example in Readme Signed-off-by:
kaixih <kaixih@nvidia.com> * Add unit tests and linting for TensorFlow Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add causal mask for non-fused case Signed-off-by:
kaixih <kaixih@nvidia.com> * Fix the mismatched TF vs TE masks Signed-off-by:
kaixih <kaixih@nvidia.com> * Addressing CI tests Signed-off-by:
kaixih <kaixih@nvidia.com> * Run lint test Signed-off-by:
kaixih <kaixih@nvidia.com> * Add missing import Signed-off-by:
kaixih <kaixih@nvidia.com> * Skip fp8 tests for pre-Hopper GPUs Signed-off-by:
kaixih <kaixih@nvidia.com> * Remove non-pytest tests Signed-off-by:
kaixih <kaixih@nvidia.com> * Fix license Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Trevor Morris <tmorris@nvidia.com> Signed-off-by:
kaixih <kaixih@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
kaixih <kaixih@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com>
-
- 24 Feb, 2023 1 commit
-
-
Jeng Bai-Cheng authored
* add building workflow for jax modules Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * replace bit_cast with reinterpret_cast Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * add nvtx to cmake check list Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * refactor layernorm fwd Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * refactor rmsnorm fwd Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * refactor layernorm_bwd Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * set pytorch as default in setup.py Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * rename extension from *.cc to *.cpp cpplint cannot recognize *.cc file, so rename the extension Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * refactor style, to align TE/PyTorch Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * add pybinding, unittest and qa Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * fix license Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * disable c-extension-no-member and no-name-in-module Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * add dataclass avoid pylint error Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * Update transformer_engine/__init__.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Jeng Bai-Cheng <jeng1220@users.noreply.github.com> * Update tests/jax/test_custom_call_shape.py fix typo Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Jeng Bai-Cheng <jeng1220@users.noreply.github.com> * Update tests/jax/test_custom_call_shape.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Jeng Bai-Cheng <jeng1220@users.noreply.github.com> * add building workflow for jax modules Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * replace bit_cast with reinterpret_cast Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * add nvtx to cmake check list Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * refactor layernorm fwd Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * refactor rmsnorm fwd Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * refactor layernorm_bwd Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * set pytorch as default in setup.py Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * rename extension from *.cc to *.cpp cpplint cannot recognize *.cc file, so rename the extension Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * refactor style, to align TE/PyTorch Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * add pybinding, unittest and qa Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * fix license Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * disable c-extension-no-member and no-name-in-module Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * add dataclass avoid pylint error Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * Update transformer_engine/__init__.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Jeng Bai-Cheng <jeng1220@users.noreply.github.com> * Update tests/jax/test_custom_call_shape.py fix typo Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Jeng Bai-Cheng <jeng1220@users.noreply.github.com> * Update tests/jax/test_custom_call_shape.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Jeng Bai-Cheng <jeng1220@users.noreply.github.com> * fix conflict due to PR62 Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * fix c-extension-no-member and no-name-in-module 1. add transformer_engine_jax into extension-pkg-whitelist 2. convert pylintrc from CRLF to LF format Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * Update setup.py Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Jeng Bai-Cheng <jeng1220@users.noreply.github.com> * remove pylint:disable and refactor import order Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> --------- Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> Signed-off-by:
Jeng Bai-Cheng <jeng1220@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 22 Feb, 2023 1 commit
-
-
cyanguwa authored
* add flash attention to TransformerLayer Signed-off-by:
Charlene Yang <charleney@nvidia.com> * Add docs for FP8 calibration (#61) Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Charlene Yang <charleney@nvidia.com> * Fix the integer overflow in fused softmax (#60) Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Charlene Yang <charleney@nvidia.com> * prefix flash attn env var with NVTE_ Signed-off-by:
Charlene Yang <charleney@nvidia.com> * Address steady memory increase and bloated checkpoints (#63) Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Charlene Yang <charleney@nvidia.com> * fix env var logic Signed-off-by:
cyanguwa <cyang.uwa@gmail.com> Signed-off-by:
Charlene Yang <charleney@nvidia.com> * fix flash attn env var logic again Signed-off-by:
cyanguwa <cyang.uwa@gmail.com> Signed-off-by:
Charlene Yang <charleney@nvidia.com> * remove d2d copies (#64) * remove d2d copies Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * cleanup Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Charlene Yang <charleney@nvidia.com> * Increase number of FP8 tensors per GEMM (#22) * Increase number of FP8 tensors per GEMM Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Enable FP8 output tensor for fp8_gemm Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * [BERT FP8] Initial TE review comments Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Temporary fix for cuda graph non convergence Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Address review comments-2 Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Review comments-3 Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Cleanup Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Change for New API Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Remove unnecessary clone for D_scale, D_amax Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Avoid Roll for AMAX history size = 1 Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Update onnx_te_gemm API Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Fix Lint errors Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> --------- Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> Signed-off-by:
Charlene Yang <charleney@nvidia.com> * Bug fixes from PR 22 (#65) * Bug fixes from PR 22 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add FP8 tests to ci Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * bundle unittests for ci Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Charlene Yang <charleney@nvidia.com> * replace rearrange with transpose Signed-off-by:
cyanguwa <cyang.uwa@gmail.com> Signed-off-by:
Charlene Yang <charleney@nvidia.com> * QKV parameters unfused path fixes and optimization (#66) * Bug fixes from PR 22 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add FP8 tests to ci Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Better QKV parameter fusion Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * small fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * keep original param for unfused case to retain externally set attrs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * lint fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix ONNX exports Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * improve arg naming Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * No need to set data pointers Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Assert memory loc in NoopCat Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Handle case of different memory in param and buffer Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix assert always true Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Reassign params memory to avoid more concats Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Charlene Yang <charleney@nvidia.com> * Fix gradients when using AMP (#70) retain grad related attrs while casting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Charlene Yang <charleney@nvidia.com> * fix pylint violations fixed pyline violations such as trailing white spaces and too long lines Signed-off-by:
cyanguwa <cyang.uwa@gmail.com> * fix pylint violation on line 264 with R1719 Signed-off-by:
cyanguwa <cyang.uwa@gmail.com> * fix two more pylint violations Signed-off-by:
cyanguwa <cyang.uwa@gmail.com> * DotProductAttention API Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add docs for attention Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix assert always true Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * check for correct flash-attn version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * address review comments Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * lint+build fixes, correct settings for default flash-attn Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * correct version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * review comments and fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix onnx and disable flash-attn export test Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * remove einops dependency Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * cleanup internal API; rm duplication Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * only install TE wheel (exclude flash-attn to rm conflicts) Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * forgot to change install wheel path Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * next round review comments Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix flash_attn output Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix QK layer scaling Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * update docs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * review comments and fixes to selective checkpointing Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Charlene Yang <charleney@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
cyanguwa <cyang.uwa@gmail.com> Co-authored-by:
Charlene Yang <charleney@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 18 Jan, 2023 1 commit
-
-
asfiyab-nvidia authored
* Add ONNX export support for TE modules (#1) * Add TorchScript Operators * Add symbolic methods to ONNX exporter * Add tests for the ONNX export Signed-off-by:
Asfiya Baig <asfiyab@nvidia.com> * fixes for pylint tests Signed-off-by:
Asfiya Baig <asfiyab@nvidia.com> * fix pylint warning in softmax.py Signed-off-by:
Asfiya Baig <asfiyab@nvidia.com> * move FP8 ORT lib inside tests/ Signed-off-by:
Asfiya Baig <asfiyab@nvidia.com> * enable cross attention tests Signed-off-by:
Asfiya Baig <asfiyab@nvidia.com> * refactor code by @nzmora * Increase layernorm FP16 threshold * Normalize onnx file names: _ separates configs; - separates words in a single config * Add get_attn_mask_str and fix mask string * Add missing ONNX files * Moved generated ONNX files to tests/gen_onnx_models/ Signed-off-by:
Asfiya Baig <asfiyab@nvidia.com> * fix merge conflict changes Signed-off-by:
Asfiya Baig <asfiyab@nvidia.com> * fix Q/DQ scale input Signed-off-by:
Asfiya Baig <asfiyab@nvidia.com> * enable FP16 config when bias is disabled Signed-off-by:
Asfiya Baig <asfiyab@nvidia.com> * fix pylint check errors Signed-off-by:
Asfiya Baig <asfiyab@nvidia.com> * updates 1. remove List import for pylint failure 2. address comments: remove state tensors from GPU 3. address comments: Update reverse_map_dtype function and add to namespace Signed-off-by:
Asfiya Baig <asfiyab@nvidia.com> * minor fix: coding guidelines Signed-off-by:
Asfiya Baig <asfiyab@nvidia.com> * changes: 1. skip FP8 tests on non-hopper devices 2. minor fix for C++ lint check Signed-off-by:
Asfiya Baig <asfiyab@nvidia.com> * fix onnxruntime version Signed-off-by:
Asfiya Baig <asfiyab@nvidia.com> * minor fix: add space between code and comment Signed-off-by:
Asfiya Baig <asfiyab@nvidia.com> * changes 1. update copyrights 2. update path to ORT .so Signed-off-by:
Asfiya Baig <asfiyab@nvidia.com> * Apply suggestions from code review Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
asfiyab-nvidia <117682710+asfiyab-nvidia@users.noreply.github.com> Signed-off-by:
Asfiya Baig <asfiyab@nvidia.com> Signed-off-by:
asfiyab-nvidia <117682710+asfiyab-nvidia@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 03 Jan, 2023 1 commit
-
-
Przemyslaw Tredak authored
Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Przemek Tredak <ptredak@nvidia.com>
-
- 01 Dec, 2022 1 commit
-
-
Kirthi Shankar Sivamani authored
* Make fused softmax kernels PyTorch independent Co-authored-by:
Sean Lee <selee@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Address review comments Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * move get_batch_per_block to python Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix license in softmax.h Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Sean Lee <selee@nvidia.com>
-
- 05 Oct, 2022 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 28 Sep, 2022 1 commit
-
-
Przemek Tredak authored
Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Przemek Tredak <ptredak@nvidia.com>
-