- 21 Apr, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* Initial refactor; linker error Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix linking issue and make mpi conditional Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix TF/JAX build Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Use max SMs at the last RS chunk in pipelined overlap Co-authored-by:
Sangkug Lym <slym@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Make userbuffers support opt-in Decouple userbuffers from MPI. Refactor MPI handling in build system. Standardize names to "userbuffers". Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Lint Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Sangkug Lym <slym@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com>
-
- 20 Apr, 2023 2 commits
-
-
Ming-Xu Huang authored
* Allow update_collections and update_fp8_metas to return both Dict and FrozenDict. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Fix the wrong shape issue of bias when fused QKV or KV. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Reuse tuplized features for bias creating. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Replace get_args to be more readable. Signed-off-by:
Ming Huang <mingh@nvidia.com> --------- Signed-off-by:
Ming Huang <mingh@nvidia.com>
-
Frédéric Bastien authored
* Clean up the installation instruction. We where telling to install the dev version in the README. Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * Typos Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com>
-
- 19 Apr, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* Port initial changes Co-authored-by:
Sangkug Lym <slym@nvidia.com> Co-authored-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * readd FA include for PyTorch Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Re-enable sm_70 + cleanup Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * LICENSE, cleanup header Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * 5k -> 173 errors Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * license and fixes in userbuffers-host Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * next round fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * final cpp cleanup Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * pylinting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix from linting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Turn off default async amax reduction (#148) Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * remove unused code path Signed-off-by:
Sangkug Lym <slym@nvidia.com> * cleanup Macros Signed-off-by:
Sangkug Lym <slym@nvidia.com> * fix conflict resolution bug Signed-off-by:
Sangkug Lym <slym@nvidia.com> * Fix gencode flags in setup (#145) * Fix gencode flags based on cuda version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * review suggestions Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * revert append_nvcc_threads change Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Change overlap config dict error message Signed-off-by:
Sangkug Lym <slym@nvidia.com> * simplify ub initialization Signed-off-by:
Sangkug Lym <slym@nvidia.com> * lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix sanity imports Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * cpplint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix TensorFlow build Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix TE macros in public header Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * More fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * compiles with and w/o MPI Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fixes for python side annotations for conditional compile Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * link gdrAPI only when MPI found Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix comments for dummy var Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix linking Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Review comments Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * load MPI before TE Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add Py side argument checks Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * remove unused code and catch silent failures Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix cpp tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix find_lib path for tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Sangkug Lym <slym@nvidia.com> Co-authored-by:
Sangkug Lym <slym@nvidia.com> Co-authored-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com>
-
- 18 Apr, 2023 3 commits
-
-
Frédéric Bastien authored
Signed-off-by:Frederic Bastien <fbastien@nvidia.com>
-
Sangkug Lym authored
* amax reduction internval Signed-off-by:
Sangkug Lym <slym@nvidia.com> Skip TP-domain only AMAX reduction when TP-group is not initialized Signed-off-by:
Sangkug Lym <slym@nvidia.com> * Update transformer_engine/pytorch/fp8.py Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Sangkug Lym <slym@nvidia.com> * check TP group initialized Signed-off-by:
Sangkug Lym <slym@nvidia.com> fix Signed-off-by:
Sangkug Lym <slym@nvidia.com> --------- Signed-off-by:
Sangkug Lym <slym@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Tim Moon authored
Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 17 Apr, 2023 3 commits
-
-
Przemek Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
Kirthi Shankar Sivamani authored
* Add tests for cuda graph capture Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * add sanity test and address reviews Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Kirthi Shankar Sivamani authored
* use upstream flash-attn Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * get correct FA for linting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 15 Apr, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* Fix gencode flags based on cuda version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * review suggestions Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * revert append_nvcc_threads change Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 14 Apr, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 13 Apr, 2023 3 commits
-
-
Neta Zmora authored
* Fix model load exception when state resides on GPU - Whenever converting a torch.tensor to numpy, we need to first migrate the tensor storage to the host CPU. - Add a warning not to do contant-folding when exporting to ONNX. This is due to a torch.onnx export bug. - Refactor compare_outputs Signed-off-by:
Neta Zmora <nzmora@nvidia.com> * Onnx export: Improve remark text Signed-off-by:
Neta Zmora <nzmora@nvidia.com> --------- Signed-off-by:
Neta Zmora <nzmora@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
zlsh80826 authored
* Add zero_center_gamma/functional pass Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add zero_centered_gamma for fp8_ln_mlp Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add zero_centered_gamma to modules Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add zero_centered_gamma to TransformerLayer Signed-off-by:
Reese Wang <rewang@nvidia.com> * Refactored code style for improved readability and consistency Signed-off-by:
Reese Wang <rewang@nvidia.com> * Docs enhancement for zero_centered_gamma Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add escape for line break and remove some bad if conditions Signed-off-by:
Reese Wang <rewang@nvidia.com> * Revise scale_init docs Signed-off-by:
Reese Wang <rewang@nvidia.com> --------- Signed-off-by:
Reese Wang <rewang@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
Kaixi Hou authored
Remove the autocast_variable Signed-off-by:kaixih <kaixih@nvidia.com>
-
- 11 Apr, 2023 1 commit
-
-
Trevor Morris authored
* Fix pybind11 install doc Signed-off-by:
Trevor Morris <tmorris@nvidia.com> * Set CMAKE_PREFIX_PATH for TF to find pybind11 Signed-off-by:
Trevor Morris <tmorris@nvidia.com> * Update test builds to use pip install of apt. Signed-off-by:
Trevor Morris <tmorris@nvidia.com> --------- Signed-off-by:
Trevor Morris <tmorris@nvidia.com>
-
- 08 Apr, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
Fix cyclic import error in TF Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 07 Apr, 2023 5 commits
-
-
Kirthi Shankar Sivamani authored
fix nightly docs Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
ngoyal2707 authored
* made bias configurable Signed-off-by:
Naman Goyal <naman@fb.com> * removed commented lines Signed-off-by:
Naman Goyal <naman@fb.com> * Update transformer_engine/pytorch/jit.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
ngoyal2707 <ngoyal2707@users.noreply.github.com> * Update transformer_engine/pytorch/jit.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
ngoyal2707 <ngoyal2707@users.noreply.github.com> * fixed incorrect call to fused bias dropout add kernel Signed-off-by:
Naman Goyal <naman@fb.com> * Update transformer_engine/pytorch/jit.py Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> * Separate FC1 and FC2 use_bias args; solves all ci errors Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * jit fusion improvement Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Docs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Naman Goyal <naman@fb.com> Signed-off-by:
ngoyal2707 <ngoyal2707@users.noreply.github.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Naman Goyal <naman@fb.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Ming-Xu Huang authored
* Rename enable_fp8 to is_fp8_enabled. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Adding an API to get an instance of DelayedScaling which is set via fp8_autocast. Signed-off-by:
Ming Huang <mingh@nvidia.com> --------- Signed-off-by:
Ming Huang <mingh@nvidia.com>
-
Tim Moon authored
* GitHub actions for linting JAX code Pylint is disabled since JAX container is not publicly available. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * GitHub action for building JAX code Disabled since JAX container is not publicly available. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add GitHub actions for TensorFlow and license checking Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug GitHub action for license checking Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use TensorFlow container for GitHub JAX tests Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com>
-
Kirthi Shankar Sivamani authored
* small cleanup before starting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * conditional dgrad for Linear Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * add tests and small improvements to LNLinear and LNMLP Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 05 Apr, 2023 3 commits
-
-
Frédéric Bastien authored
* Add the link to the examples and the development user guide. Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * Update README.rst Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Frédéric Bastien <frederic.bastien@gmail.com> --------- Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> Signed-off-by:
Frédéric Bastien <frederic.bastien@gmail.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Frédéric Bastien authored
* Update installation instructio for JAX and add some depenencies. Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * Bring back support for none pip installed pybind11. Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * Apply suggestions from code review Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Frédéric Bastien <frederic.bastien@gmail.com> * Changes following review. Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * Change order to make it more clear. Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * Add other reviers suggestion. Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * pybind11 is needed for all FW. Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * Add flax as a dep Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * Update README.rst Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Frédéric Bastien <frederic.bastien@gmail.com> --------- Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> Signed-off-by:
Frédéric Bastien <frederic.bastien@gmail.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Sangkug Lym authored
* async amax reduction add env knob to enable async amax reduction Signed-off-by:
slym <slym@login-preos01.a51.clusters.nvidia.com> * Style fixes Signed-off-by:
Tim Moon <tmoon@nvidia.com> * remove is_last_model Signed-off-by:
slym <slym@login-preos01.a51.clusters.nvidia.com> * fix naming Signed-off-by:
slym <slym@login-preos01.a51.clusters.nvidia.com> * revert var name Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * revert var name Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
slym <slym@login-preos01.a51.clusters.nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
slym <slym@login-preos01.a51.clusters.nvidia.com>
-
- 04 Apr, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* Add FP8 support for Ada Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * better message Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * lint fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Address review comments Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * better message for no fp8 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * same thing for onnx test Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix CI and review Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 03 Apr, 2023 1 commit
-
-
galagam authored
* Bugfix - compute scale_inv when loading checkpoint Signed-off-by:
Gal Hubara Agam <ghubaraagam@nvidia.com> * Save inverse scale in extra state tensor + minor CR fixes Signed-off-by:
Gal Hubara Agam <ghubaraagam@nvidia.com> * Fix lint Co-authored-by:
Gal Hubara Agam <ghubaraagam@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Gal Hubara Agam <ghubaraagam@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 30 Mar, 2023 2 commits
-
-
Kirthi Shankar Sivamani authored
* Fix segfault during GeLU export Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * address review comments Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Kirthi Shankar Sivamani authored
* Change FP8 recipe defaults Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Increase default amax history length Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Always check history size Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * no amax history for onnx export Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * revert onnx export test changes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix indices in onnx test Co-authored-by:
Neta Zmora <nzmora@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Neta Zmora <nzmora@nvidia.com>
-
- 29 Mar, 2023 2 commits
-
-
tcherckez-nvidia authored
Signed-off-by:
Tal Cherckez <tcherckez@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Ming-Xu Huang authored
* Support transpose_bs when decoded=True Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> Signed-off-by:
Ming Huang <mingh@nvidia.com> * Fix Bugs, 1. Fix missing dropout_dims in LayerNormMLP. 2. Fix broadcast issues in decoded. Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> Signed-off-by:
Ming Huang <mingh@nvidia.com> * Fix wrong masks in decoded. Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> Signed-off-by:
Ming Huang <mingh@nvidia.com> * Fixed wrong assert condition in TransformerLayer Signed-off-by:
Ming Huang <mingh@nvidia.com> * Fix amax is not set as 0 in each step. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Enhance rules conflict checking and docs. Signed-off-by:
Ming Huang <mingh@nvidia.com> * fix code formatting. Signed-off-by:
Ming Huang <mingh@nvidia.com> --------- Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> Signed-off-by:
Ming Huang <mingh@nvidia.com>
-
- 28 Mar, 2023 5 commits
-
-
vasunvidia authored
* Add support for fp8 GEMM BIAS AUX GELU fusion Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Fix Lint error Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Fix Lint error Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> --------- Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com>
-
Jeng Bai-Cheng authored
* refactor JAX examples Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * fix doc-string Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * add dp example Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * refactor Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * fix params_axes_pspec Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * Add model parallel example and refactor Update readme Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * align code and readme Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * update verification Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * add mask Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * num_gpu is configurable Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * update readme Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * update readme Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * solvepylint issue Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * ignore markdown and txt file from license check Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * Update README.md Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> * add flax into requirements.txt Signed-off-by:
Ryan Jeng <rjeng@nvidia.com> --------- Signed-off-by:
Ryan Jeng <rjeng@nvidia.com>
-
Trevor Morris authored
* Add tensorflow build Improve build instructions Fix pybind enum usage Fix Python_EXECUTABLE cmake var Move scale_inv calculations to FW Signed-off-by:
Trevor Morris <tmorris@nvidia.com> Signed-off-by:
kaixih <kaixih@nvidia.com> * Apply clang-format Signed-off-by:
Trevor Morris <tmorris@nvidia.com> Signed-off-by:
kaixih <kaixih@nvidia.com> * Format python files Signed-off-by:
Trevor Morris <tmorris@nvidia.com> Signed-off-by:
kaixih <kaixih@nvidia.com> * Add TF build CI Signed-off-by:
Trevor Morris <tmorris@nvidia.com> Signed-off-by:
kaixih <kaixih@nvidia.com> * Lint checks Signed-off-by:
kaixih <kaixih@nvidia.com> * Another round of lint checks Signed-off-by:
kaixih <kaixih@nvidia.com> * Fix TF image tag Signed-off-by:
Trevor Morris <tmorris@nvidia.com> Signed-off-by:
kaixih <kaixih@nvidia.com> * Use the existing recipe file Signed-off-by:
kaixih <kaixih@nvidia.com> * Add license claim blocks Signed-off-by:
kaixih <kaixih@nvidia.com> * Fix a bug about bias dtype conversion Signed-off-by:
kaixih <kaixih@nvidia.com> * Add mnist example and cleanup old examples Signed-off-by:
kaixih <kaixih@nvidia.com> * Autopep8 the tests Signed-off-by:
kaixih <kaixih@nvidia.com> * Autopep8 the examples Signed-off-by:
kaixih <kaixih@nvidia.com> * Add example in Readme Signed-off-by:
kaixih <kaixih@nvidia.com> * Add unit tests and linting for TensorFlow Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add causal mask for non-fused case Signed-off-by:
kaixih <kaixih@nvidia.com> * Fix the mismatched TF vs TE masks Signed-off-by:
kaixih <kaixih@nvidia.com> * Addressing CI tests Signed-off-by:
kaixih <kaixih@nvidia.com> * Run lint test Signed-off-by:
kaixih <kaixih@nvidia.com> * Add missing import Signed-off-by:
kaixih <kaixih@nvidia.com> * Skip fp8 tests for pre-Hopper GPUs Signed-off-by:
kaixih <kaixih@nvidia.com> * Remove non-pytest tests Signed-off-by:
kaixih <kaixih@nvidia.com> * Fix license Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Trevor Morris <tmorris@nvidia.com> Signed-off-by:
kaixih <kaixih@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
kaixih <kaixih@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com>
-
Tim Moon authored
* Remove zombie process from querying TE install path Co-authored-by:
Naman Goyal <naman@fb.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix FA version checking Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix unused import error Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix lint warning Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Naman Goyal <naman@fb.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Kirthi Shankar Sivamani authored
* fix usage of return_bias argument Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * review comments Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 23 Mar, 2023 1 commit
-
-
Neta Zmora authored
* Fix GELU ONNX export * Wrap GELU export with cast to/from FP32 to achieve same compute precision as TE. * Increase GELU export test thresholds. * Change export to ONNX opset 17 for smaller representation of LN (single node instead of subgraph). * Remove the need for LN work-around for ORT Signed-off-by:
Neta Zmora <nzmora@nvidia.com> * Add docstring to te_onnx_extensions.py::compute_in_fp32 Signed-off-by:
Neta Zmora <nzmora@nvidia.com> * Tune threshold for GELU ONNX export Ran 8K test instances to verify the threshold. Allow 2 coefficients to escape threshold. Two wrong coefficients are not a failure. Signed-off-by:
Neta Zmora <nzmora@nvidia.com> --------- Signed-off-by:
Neta Zmora <nzmora@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 22 Mar, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
FA doesn't support compute 8.6 with head_dim>64 Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 21 Mar, 2023 2 commits
-
-
Przemyslaw Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
vasunvidia authored
* Initial commit for fp8_transpose_dbias kernel Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * lint fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Suggestions and fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-