- 17 Apr, 2023 3 commits
-
-
Przemek Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
Kirthi Shankar Sivamani authored
* Add tests for cuda graph capture Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * add sanity test and address reviews Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Kirthi Shankar Sivamani authored
* use upstream flash-attn Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * get correct FA for linting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 15 Apr, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* Fix gencode flags based on cuda version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * review suggestions Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * revert append_nvcc_threads change Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 14 Apr, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 13 Apr, 2023 3 commits
-
-
Neta Zmora authored
* Fix model load exception when state resides on GPU - Whenever converting a torch.tensor to numpy, we need to first migrate the tensor storage to the host CPU. - Add a warning not to do contant-folding when exporting to ONNX. This is due to a torch.onnx export bug. - Refactor compare_outputs Signed-off-by:
Neta Zmora <nzmora@nvidia.com> * Onnx export: Improve remark text Signed-off-by:
Neta Zmora <nzmora@nvidia.com> --------- Signed-off-by:
Neta Zmora <nzmora@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
zlsh80826 authored
* Add zero_center_gamma/functional pass Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add zero_centered_gamma for fp8_ln_mlp Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add zero_centered_gamma to modules Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add zero_centered_gamma to TransformerLayer Signed-off-by:
Reese Wang <rewang@nvidia.com> * Refactored code style for improved readability and consistency Signed-off-by:
Reese Wang <rewang@nvidia.com> * Docs enhancement for zero_centered_gamma Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add escape for line break and remove some bad if conditions Signed-off-by:
Reese Wang <rewang@nvidia.com> * Revise scale_init docs Signed-off-by:
Reese Wang <rewang@nvidia.com> --------- Signed-off-by:
Reese Wang <rewang@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
Kaixi Hou authored
Remove the autocast_variable Signed-off-by:kaixih <kaixih@nvidia.com>
-
- 11 Apr, 2023 1 commit
-
-
Trevor Morris authored
* Fix pybind11 install doc Signed-off-by:
Trevor Morris <tmorris@nvidia.com> * Set CMAKE_PREFIX_PATH for TF to find pybind11 Signed-off-by:
Trevor Morris <tmorris@nvidia.com> * Update test builds to use pip install of apt. Signed-off-by:
Trevor Morris <tmorris@nvidia.com> --------- Signed-off-by:
Trevor Morris <tmorris@nvidia.com>
-
- 08 Apr, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
Fix cyclic import error in TF Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 07 Apr, 2023 5 commits
-
-
Kirthi Shankar Sivamani authored
fix nightly docs Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
ngoyal2707 authored
* made bias configurable Signed-off-by:
Naman Goyal <naman@fb.com> * removed commented lines Signed-off-by:
Naman Goyal <naman@fb.com> * Update transformer_engine/pytorch/jit.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
ngoyal2707 <ngoyal2707@users.noreply.github.com> * Update transformer_engine/pytorch/jit.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
ngoyal2707 <ngoyal2707@users.noreply.github.com> * fixed incorrect call to fused bias dropout add kernel Signed-off-by:
Naman Goyal <naman@fb.com> * Update transformer_engine/pytorch/jit.py Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> * Separate FC1 and FC2 use_bias args; solves all ci errors Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * jit fusion improvement Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Docs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Naman Goyal <naman@fb.com> Signed-off-by:
ngoyal2707 <ngoyal2707@users.noreply.github.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Naman Goyal <naman@fb.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Ming-Xu Huang authored
* Rename enable_fp8 to is_fp8_enabled. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Adding an API to get an instance of DelayedScaling which is set via fp8_autocast. Signed-off-by:
Ming Huang <mingh@nvidia.com> --------- Signed-off-by:
Ming Huang <mingh@nvidia.com>
-
Tim Moon authored
* GitHub actions for linting JAX code Pylint is disabled since JAX container is not publicly available. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * GitHub action for building JAX code Disabled since JAX container is not publicly available. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add GitHub actions for TensorFlow and license checking Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug GitHub action for license checking Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use TensorFlow container for GitHub JAX tests Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com>
-
Kirthi Shankar Sivamani authored
* small cleanup before starting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * conditional dgrad for Linear Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * add tests and small improvements to LNLinear and LNMLP Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 05 Apr, 2023 3 commits
-
-
Frédéric Bastien authored
* Add the link to the examples and the development user guide. Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * Update README.rst Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Frédéric Bastien <frederic.bastien@gmail.com> --------- Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> Signed-off-by:
Frédéric Bastien <frederic.bastien@gmail.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Frédéric Bastien authored
* Update installation instructio for JAX and add some depenencies. Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * Bring back support for none pip installed pybind11. Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * Apply suggestions from code review Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Frédéric Bastien <frederic.bastien@gmail.com> * Changes following review. Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * Change order to make it more clear. Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * Add other reviers suggestion. Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * pybind11 is needed for all FW. Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * Add flax as a dep Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * Update README.rst Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Frédéric Bastien <frederic.bastien@gmail.com> --------- Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> Signed-off-by:
Frédéric Bastien <frederic.bastien@gmail.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Sangkug Lym authored
* async amax reduction add env knob to enable async amax reduction Signed-off-by:
slym <slym@login-preos01.a51.clusters.nvidia.com> * Style fixes Signed-off-by:
Tim Moon <tmoon@nvidia.com> * remove is_last_model Signed-off-by:
slym <slym@login-preos01.a51.clusters.nvidia.com> * fix naming Signed-off-by:
slym <slym@login-preos01.a51.clusters.nvidia.com> * revert var name Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * revert var name Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
slym <slym@login-preos01.a51.clusters.nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
slym <slym@login-preos01.a51.clusters.nvidia.com>
-
- 04 Apr, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* Add FP8 support for Ada Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * better message Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * lint fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Address review comments Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * better message for no fp8 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * same thing for onnx test Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix CI and review Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 03 Apr, 2023 1 commit
-
-
galagam authored
* Bugfix - compute scale_inv when loading checkpoint Signed-off-by:
Gal Hubara Agam <ghubaraagam@nvidia.com> * Save inverse scale in extra state tensor + minor CR fixes Signed-off-by:
Gal Hubara Agam <ghubaraagam@nvidia.com> * Fix lint Co-authored-by:
Gal Hubara Agam <ghubaraagam@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Gal Hubara Agam <ghubaraagam@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 30 Mar, 2023 2 commits
-
-
Kirthi Shankar Sivamani authored
* Fix segfault during GeLU export Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * address review comments Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Kirthi Shankar Sivamani authored
* Change FP8 recipe defaults Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Increase default amax history length Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Always check history size Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * no amax history for onnx export Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * revert onnx export test changes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix indices in onnx test Co-authored-by:
Neta Zmora <nzmora@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Neta Zmora <nzmora@nvidia.com>
-
- 29 Mar, 2023 2 commits
-
-
tcherckez-nvidia authored
Signed-off-by:
Tal Cherckez <tcherckez@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Ming-Xu Huang authored
* Support transpose_bs when decoded=True Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> Signed-off-by:
Ming Huang <mingh@nvidia.com> * Fix Bugs, 1. Fix missing dropout_dims in LayerNormMLP. 2. Fix broadcast issues in decoded. Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> Signed-off-by:
Ming Huang <mingh@nvidia.com> * Fix wrong masks in decoded. Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> Signed-off-by:
Ming Huang <mingh@nvidia.com> * Fixed wrong assert condition in TransformerLayer Signed-off-by:
Ming Huang <mingh@nvidia.com> * Fix amax is not set as 0 in each step. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Enhance rules conflict checking and docs. Signed-off-by:
Ming Huang <mingh@nvidia.com> * fix code formatting. Signed-off-by:
Ming Huang <mingh@nvidia.com> --------- Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> Signed-off-by:
Ming Huang <mingh@nvidia.com>
-
- 28 Mar, 2023 5 commits
-
-
vasunvidia authored
* Add support for fp8 GEMM BIAS AUX GELU fusion Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Fix Lint error Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Fix Lint error Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> --------- Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com>
-
Jeng Bai-Cheng authored
* refactor JAX examples Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * fix doc-string Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * add dp example Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * refactor Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * fix params_axes_pspec Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * Add model parallel example and refactor Update readme Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * align code and readme Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * update verification Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * add mask Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * num_gpu is configurable Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * update readme Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * update readme Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * solvepylint issue Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * ignore markdown and txt file from license check Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * Update README.md Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * add flax into requirements.txt Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> --------- Signed-off-by:
Ryan Jeng <rjeng@nvidia.com>
-
Trevor Morris authored
* Add tensorflow build Improve build instructions Fix pybind enum usage Fix Python_EXECUTABLE cmake var Move scale_inv calculations to FW Signed-off-by:
Trevor Morris <tmorris@nvidia.com> Signed-off-by:
kaixih <kaixih@nvidia.com> * Apply clang-format Signed-off-by:
Trevor Morris <tmorris@nvidia.com> Signed-off-by:
kaixih <kaixih@nvidia.com> * Format python files Signed-off-by:
Trevor Morris <tmorris@nvidia.com> Signed-off-by:
kaixih <kaixih@nvidia.com> * Add TF build CI Signed-off-by:
Trevor Morris <tmorris@nvidia.com> Signed-off-by:
kaixih <kaixih@nvidia.com> * Lint checks Signed-off-by:
kaixih <kaixih@nvidia.com> * Another round of lint checks Signed-off-by:
kaixih <kaixih@nvidia.com> * Fix TF image tag Signed-off-by:
Trevor Morris <tmorris@nvidia.com> Signed-off-by:
kaixih <kaixih@nvidia.com> * Use the existing recipe file Signed-off-by:
kaixih <kaixih@nvidia.com> * Add license claim blocks Signed-off-by:
kaixih <kaixih@nvidia.com> * Fix a bug about bias dtype conversion Signed-off-by:
kaixih <kaixih@nvidia.com> * Add mnist example and cleanup old examples Signed-off-by:
kaixih <kaixih@nvidia.com> * Autopep8 the tests Signed-off-by:
kaixih <kaixih@nvidia.com> * Autopep8 the examples Signed-off-by:
kaixih <kaixih@nvidia.com> * Add example in Readme Signed-off-by:
kaixih <kaixih@nvidia.com> * Add unit tests and linting for TensorFlow Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add causal mask for non-fused case Signed-off-by:
kaixih <kaixih@nvidia.com> * Fix the mismatched TF vs TE masks Signed-off-by:
kaixih <kaixih@nvidia.com> * Addressing CI tests Signed-off-by:
kaixih <kaixih@nvidia.com> * Run lint test Signed-off-by:
kaixih <kaixih@nvidia.com> * Add missing import Signed-off-by:
kaixih <kaixih@nvidia.com> * Skip fp8 tests for pre-Hopper GPUs Signed-off-by:
kaixih <kaixih@nvidia.com> * Remove non-pytest tests Signed-off-by:
kaixih <kaixih@nvidia.com> * Fix license Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Trevor Morris <tmorris@nvidia.com> Signed-off-by:
kaixih <kaixih@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
kaixih <kaixih@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com>
-
Tim Moon authored
* Remove zombie process from querying TE install path Co-authored-by:
Naman Goyal <naman@fb.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix FA version checking Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix unused import error Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix lint warning Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Naman Goyal <naman@fb.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Kirthi Shankar Sivamani authored
* fix usage of return_bias argument Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * review comments Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 23 Mar, 2023 1 commit
-
-
Neta Zmora authored
* Fix GELU ONNX export * Wrap GELU export with cast to/from FP32 to achieve same compute precision as TE. * Increase GELU export test thresholds. * Change export to ONNX opset 17 for smaller representation of LN (single node instead of subgraph). * Remove the need for LN work-around for ORT Signed-off-by:
Neta Zmora <nzmora@nvidia.com> * Add docstring to te_onnx_extensions.py::compute_in_fp32 Signed-off-by:
Neta Zmora <nzmora@nvidia.com> * Tune threshold for GELU ONNX export Ran 8K test instances to verify the threshold. Allow 2 coefficients to escape threshold. Two wrong coefficients are not a failure. Signed-off-by:
Neta Zmora <nzmora@nvidia.com> --------- Signed-off-by:
Neta Zmora <nzmora@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 22 Mar, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
FA doesn't support compute 8.6 with head_dim>64 Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 21 Mar, 2023 2 commits
-
-
Przemyslaw Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
vasunvidia authored
* Initial commit for fp8_transpose_dbias kernel Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * lint fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Suggestions and fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 18 Mar, 2023 1 commit
-
-
Neta Zmora authored
Add an option to serialize test i/o to file Small refactoring of the inferencing code. Change the default directory where generated ONNX files are stored. Use the temp directory to avoid clogging the file system. Add an option to serialize test input/output tensors to a Polygraphy RunResults object. Signed-off-by:
Neta Zmora <nzmora@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 17 Mar, 2023 4 commits
-
-
Przemek Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
Tim Moon authored
Signed-off-by:Tim Moon <tmoon@nvidia.com>
-
Przemek Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
Kirthi Shankar Sivamani authored
* add layernorm1p fp8 test Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * combine tests for easy maintenance Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * using torch.autocast for AMP and check grad types Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add test for wgrad accumulation fusion Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * rename file Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Setup numerical tests + SAR Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add test for full activation recompute Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add tests for checkpoint load/store Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * TE vs framework numerical tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix ci Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * relax thresholds Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 16 Mar, 2023 2 commits
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Neta Zmora authored
* Add a temporary workaround to layernorm export Seems like ORT is performing template-matching for LN and incorrectly concludes that it doesn't have a kernel for FP32 LN. The work-around adds the addition of fake_zero which is meant to prevent the template matching while keeping the graph virtually unchanged. This also requires `do_constant_folding=False` in `torch.onnx.export`. Signed-off-by:
Neta Zmora <nzmora@nvidia.com> * Adjust test threshold Signed-off-by:
Neta Zmora <nzmora@nvidia.com> * Opened an ORT bug and added the link for tracking Signed-off-by:
Neta Zmora <nzmora@nvidia.com> * Fix Python linter errors Signed-off-by:
Neta Zmora <nzmora@nvidia.com> * Simplify the LN workaround solution (ONNX export) After discussing https://github.com/microsoft/onnxruntime/issues/15021 with Microsoft engineers, replaced the LN workaround with a simpler implementation. In addition: * To make test more robust add `allow_cnt_errors` to `validate_result` * Add more documentation to clarify the purpose and methodology of the ONNX export tests Signed-off-by:
Neta Zmora <nzmora@nvidia.com> * Fix unused import Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> * Fix unused import Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> * Fix unused import Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> --------- Signed-off-by:
Neta Zmora <nzmora@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-