- 06 Dec, 2023 1 commit
-
-
Santosh Bhavani authored
* Update README.rst - FP8 convergence - added FP8 convergence section - removed model support (to be replaced with a feature support table) Signed-off-by:
Santosh Bhavani <santosh@semantic.md> * Update README.rst Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update README.rst Updated Latest News Signed-off-by:
Santosh Bhavani <santosh@semantic.md> * Update README.rst Add plot for H200 Signed-off-by:
Santosh Bhavani <santosh@semantic.md> --------- Signed-off-by:
Santosh Bhavani <santosh@semantic.md> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 14 Nov, 2023 1 commit
-
-
Ming-Xu Huang authored
* Refactor sharding.py for the further custom_partitioning migration Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Migrating both FWD and BWD of LayerNorm/RMSNorm from xmap to custom_partitioning. Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Migrating both FWD and BWD of all kinds of softmax from xmap to custom_partitioning. Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Fix the wrong order of parameters to LN/RMSN bwd in ln_mlp_fp8. Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * WAR to LN/RMSN_fp8 before migrating to CP. Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Fix the wrong order of parameters of bwd of LN/RMSN_fp8. Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Following review feedback to modify Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Force the hidden dim in Norm ops to no sharding and add warning msg. Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Reuse fwd_rule in VJP functions Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Migrating both FWD and BWD of self-fused-attn from xmap to custom_partitioning. Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Migrating both FWD and BWD of cross-fused-attn from xmap to custom_partitioning. Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * add gelu and dgelu. Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Reuse fwd_rule in VJP functions for attentions Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Apply native FP8 Dtypes to fp8.py Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Migrating cast_and_transpose from xmap to custom_partitioning Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Migrating transpose from xmap to custom_partitioning Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Apply XLA pattern match to perform FP8 GEMM. Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * migrate layernorm_fp8 to custom_partitioning. Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Unify code style Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Extend supported of Transpose with FP8 Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Implementing layernorm_fp8_dot based on migrated custom calls. Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Renaming variables and publish NVTE_FP8_COLLECTION_NAME Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Replace Q/DQ custom calls with native XLA implementations Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * migrate gelu_fp to custom_partitioning. Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Miner fix Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Support custom calls with mutli-dims Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Support gerneral dot indices in _fp8_dot_impl Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Implementing layernrom_geglu_fp8_mlp Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Remove GEMM custom calls Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Remove xmap related code Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Fix typo and add query-function to FP8MetaPackage Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Fix some bugs of custom calls Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Fix CT's bugs Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Update UTs/eaxmaples to adapt to the API changes. Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Unify kernel initilization in MLP. Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Modifing with code review's feedback Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Update README and Add deprecating warning to *ShardingType Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Canonicalize the dtype Signed-off-by:
Ming Huang <mingh@nvidia.com> * Adding assertion for non-supported batch dims. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Adding doc/examples to _multidim_transpose Signed-off-by:
Ming Huang <mingh@nvidia.com> * Set FP8 meta as WeightHParamsCollection.OVERWRITE_WITH_GRADIENT in Praxis modules. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Set FP8 meta as WeightHParamsCollection.OVERWRITE_WITH_GRADIENT in Praxis modules. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Apply dtype-based rtol/atol to UTs Signed-off-by:
Ming Huang <mingh@nvidia.com> * Deprecate QKV_INTERLEAVED enum Signed-off-by:
Ming Huang <mingh@nvidia.com> * Skip test_distributed_custom_ops.py Signed-off-by:
Ming Huang <mingh@nvidia.com> * Fix the wrong sharding of bias in SelfAttn Signed-off-by:
Ming Huang <mingh@nvidia.com> * WAR to fix the wrong cu_seqlen of MHA when DP/FSDP enabled Signed-off-by:
Ming Huang <mingh@nvidia.com> * Adding distributed ops unit-tests Signed-off-by:
Ming Huang <mingh@nvidia.com> * Adding license to test_distributed_* Signed-off-by:
Ming Huang <mingh@nvidia.com> * Follow review feedback to modify Signed-off-by:
Ming Huang <mingh@nvidia.com> * Use total bytes involved in collective ops as criteria. Signed-off-by:
Ming Huang <mingh@nvidia.com> --------- Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> Co-authored-by:
Donglin Yang <dongliny@nvidia.com>
-
- 13 Nov, 2023 1 commit
-
-
Santosh Bhavani authored
* Update README.rst - Installation section Added pip install instructions and cleaned up pre-reqs and FlashAttention-2 section Signed-off-by:
Santosh Bhavani <santosh@semantic.md> * Update README.rst Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Santosh Bhavani <santosh@semantic.md> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 13 Oct, 2023 1 commit
-
-
Tim Moon authored
Signed-off-by:Tim Moon <tmoon@nvidia.com>
-
- 11 Oct, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 02 Oct, 2023 1 commit
-
-
Santosh Bhavani authored
minor grammatical changes and added "JAX Toolbox" to integrations Signed-off-by:Santosh Bhavani <santosh@semantic.md>
-
- 16 Aug, 2023 1 commit
-
-
Przemyslaw Tredak authored
Signed-off-by:Przemyslaw Tredak <ptredak@nvidia.com>
-
- 31 Jul, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
Add compilation warning for FA 2.0 Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 14 Jul, 2023 1 commit
-
-
cyanguwa authored
* Fix bprop for cuDNN 8.9.3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Update cuDNN version requirement to 8.9.3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * debug paddle CI Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * debug paddle CI; force LD_LIBRARY Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * debug paddle CI; force LD_LIBRARY to /opt Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove debug info for paddle Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * change cudnn requirement to 8.9.1 for v1 and 8.9.0 for v2; add batch size 32 for unit test; add LD library path for paddle tests temporarily Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove printf line in fused_attn.cpp Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add batch size 32 for unit test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update cudnn-frontend to 0.9.2 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove temporary LD library path used for testing pre-released cudnn 8.9.3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
-
- 27 Jun, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 22 Jun, 2023 1 commit
-
-
Santosh Bhavani authored
Update README.rst Added NVIDIA NeMo to Integrations section Signed-off-by:Santosh Bhavani <santosh.bhavani@live.com>
-
- 14 Jun, 2023 1 commit
-
-
Santosh Bhavani authored
* Update README.rst Added additional integrations Signed-off-by:
Santosh Bhavani <santosh.bhavani@live.com> * Update README.rst Added DeepSpeed integration Signed-off-by:
Santosh Bhavani <santosh.bhavani@live.com> --------- Signed-off-by:
Santosh Bhavani <santosh.bhavani@live.com>
-
- 07 Jun, 2023 1 commit
-
-
Santosh Bhavani authored
* Update README.rst 1/ added a nav header with links 2/ added integrations section 3/ minor grammatical changes 4/ added link to release notes Signed-off-by:
Santosh Bhavani <santosh.bhavani@live.com> * Update README.rst Update NGC PyT container usage instructions Signed-off-by:
Santosh Bhavani <santosh.bhavani@live.com> * Update README.rst - added pre-reqs under installation - reorganized useful links as papers and videos - updated integrations to include upcoming work - updated copy in contributing section - updated highlights section - updated nav header - added latest news section Signed-off-by:
Santosh Bhavani <santosh.bhavani@live.com> * Update README.rst Co-authored-by:
Santosh Bhavani <santosh.bhavani@live.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update README.rst - updated integrations section - add DL FW support info Signed-off-by:
Santosh Bhavani <santosh.bhavani@live.com> --------- Signed-off-by:
Santosh Bhavani <santosh.bhavani@live.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 16 May, 2023 1 commit
-
-
Ming-Xu Huang authored
* Adding JAX/Praxis modules and dependencies. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Adding UTs to JAX/Praxis modules. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Remove praxis as a dependency due to not strictly needed Signed-off-by:
Ming Huang <mingh@nvidia.com> * Repalce is_fp8_supported to is_fp8_available Signed-off-by:
Ming Huang <mingh@nvidia.com> * Make Praxis as an optional dependency. 1. Removed 'from . import praxis' in __init__.py. 1.1 Noted, keep 'from . import flax' for deprecated warning. 2. Changed te.flax to te_flax in examples and README.rst. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Adding a workaround to FP8 training on Praxis. Signed-off-by:
Ming Huang <mingh@nvidia.com> --------- Signed-off-by:
Ming Huang <mingh@nvidia.com>
-
- 28 Apr, 2023 1 commit
-
-
Ming-Xu Huang authored
* Adjust Module Structure. 1. Collect Flax related modules to a sub-folder, flax. 2. Add a function to unify scale_init for zero-centered-gamma LN. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Make changes be compatible to previous versions. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Adapt jax/examples to the new module structure. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Update jax/docs and Add deprecated warning. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Update README Signed-off-by:
Ming Huang <mingh@nvidia.com> * Adding deprecated_wrapper Signed-off-by:
Ming Huang <mingh@nvidia.com> * Adding deprecated warning to flax modules which imported via transformer_engine.jax Signed-off-by:
Ming Huang <mingh@nvidia.com> * Fix CI errors and update docs. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Removing unnecessary deprecated warning in docs. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Implementing __iter__ to DeprecatedEnum. Signed-off-by:
Ming Huang <mingh@nvidia.com> --------- Signed-off-by:
Ming Huang <mingh@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 20 Apr, 2023 1 commit
-
-
Frédéric Bastien authored
* Clean up the installation instruction. We where telling to install the dev version in the README. Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * Typos Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com>
-
- 18 Apr, 2023 1 commit
-
-
Frédéric Bastien authored
Signed-off-by:Frederic Bastien <fbastien@nvidia.com>
-
- 05 Apr, 2023 2 commits
-
-
Frédéric Bastien authored
* Add the link to the examples and the development user guide. Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * Update README.rst Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Frédéric Bastien <frederic.bastien@gmail.com> --------- Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> Signed-off-by:
Frédéric Bastien <frederic.bastien@gmail.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Frédéric Bastien authored
* Update installation instructio for JAX and add some depenencies. Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * Bring back support for none pip installed pybind11. Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * Apply suggestions from code review Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Frédéric Bastien <frederic.bastien@gmail.com> * Changes following review. Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * Change order to make it more clear. Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * Add other reviers suggestion. Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * pybind11 is needed for all FW. Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * Add flax as a dep Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> * Update README.rst Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Frédéric Bastien <frederic.bastien@gmail.com> --------- Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> Signed-off-by:
Frédéric Bastien <frederic.bastien@gmail.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 28 Mar, 2023 1 commit
-
-
Trevor Morris authored
* Add tensorflow build Improve build instructions Fix pybind enum usage Fix Python_EXECUTABLE cmake var Move scale_inv calculations to FW Signed-off-by:
Trevor Morris <tmorris@nvidia.com> Signed-off-by:
kaixih <kaixih@nvidia.com> * Apply clang-format Signed-off-by:
Trevor Morris <tmorris@nvidia.com> Signed-off-by:
kaixih <kaixih@nvidia.com> * Format python files Signed-off-by:
Trevor Morris <tmorris@nvidia.com> Signed-off-by:
kaixih <kaixih@nvidia.com> * Add TF build CI Signed-off-by:
Trevor Morris <tmorris@nvidia.com> Signed-off-by:
kaixih <kaixih@nvidia.com> * Lint checks Signed-off-by:
kaixih <kaixih@nvidia.com> * Another round of lint checks Signed-off-by:
kaixih <kaixih@nvidia.com> * Fix TF image tag Signed-off-by:
Trevor Morris <tmorris@nvidia.com> Signed-off-by:
kaixih <kaixih@nvidia.com> * Use the existing recipe file Signed-off-by:
kaixih <kaixih@nvidia.com> * Add license claim blocks Signed-off-by:
kaixih <kaixih@nvidia.com> * Fix a bug about bias dtype conversion Signed-off-by:
kaixih <kaixih@nvidia.com> * Add mnist example and cleanup old examples Signed-off-by:
kaixih <kaixih@nvidia.com> * Autopep8 the tests Signed-off-by:
kaixih <kaixih@nvidia.com> * Autopep8 the examples Signed-off-by:
kaixih <kaixih@nvidia.com> * Add example in Readme Signed-off-by:
kaixih <kaixih@nvidia.com> * Add unit tests and linting for TensorFlow Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add causal mask for non-fused case Signed-off-by:
kaixih <kaixih@nvidia.com> * Fix the mismatched TF vs TE masks Signed-off-by:
kaixih <kaixih@nvidia.com> * Addressing CI tests Signed-off-by:
kaixih <kaixih@nvidia.com> * Run lint test Signed-off-by:
kaixih <kaixih@nvidia.com> * Add missing import Signed-off-by:
kaixih <kaixih@nvidia.com> * Skip fp8 tests for pre-Hopper GPUs Signed-off-by:
kaixih <kaixih@nvidia.com> * Remove non-pytest tests Signed-off-by:
kaixih <kaixih@nvidia.com> * Fix license Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Trevor Morris <tmorris@nvidia.com> Signed-off-by:
kaixih <kaixih@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
kaixih <kaixih@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com>
-
- 17 Mar, 2023 1 commit
-
-
Przemek Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
- 16 Mar, 2023 1 commit
-
-
Ming-Xu Huang authored
* Adding JAX to README.rst Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Refine README.rst as the suggestion from review. Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Refine the API doc of extend_logical_axis_rules. Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> --------- Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 03 Jan, 2023 1 commit
-
-
Przemyslaw Tredak authored
Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Przemek Tredak <ptredak@nvidia.com>
-
- 03 Nov, 2022 1 commit
-
-
nzmora-nvidia authored
Fix the sample code so it compiles after the signature of `te.Linear` has changed. Signed-off-by:
nzmora-nvidia <96238833+nzmora-nvidia@users.noreply.github.com> Signed-off-by:
nzmora-nvidia <96238833+nzmora-nvidia@users.noreply.github.com>
-
- 04 Oct, 2022 1 commit
-
-
Przemyslaw Tredak authored
Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Przemek Tredak <ptredak@nvidia.com>
-
- 28 Sep, 2022 1 commit
-
-
Przemek Tredak authored
Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Przemek Tredak <ptredak@nvidia.com>
-