- 24 Aug, 2024 1 commit
-
-
hXl3s authored
* Limit number of architectures build Signed-off-by:
Lukasz Pierscieniewski <lukaszp@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Lukasz Pierscieniewski <lukaszp@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 23 Aug, 2024 1 commit
-
-
Charlene Yang authored
* WIP: add fa3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: clean up Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * WIP: add benchmarks Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * differentiate func/varlen_func Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix parsing keyword for FA3 and remove bshd->thd conversion for flash_attn_func Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: add FP8 fwd support Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add FA3 FP8 fwd code and test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix assert for FA3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix FA3 FP8 logic and add tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update FA2 to <=2.6.3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * tweak unit tests for base/mask Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * set constraints for FA3 for sm90 and causal_bottom_right Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * revert debug changes in benchmark script Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 22 Aug, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
* Re-add framework specific required dependencies for source build Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix build Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 14 Aug, 2024 1 commit
-
-
Phuong Nguyen authored
Remove total time measurement Signed-off-by:Phuong Nguyen <phuonguyen@nvidia.com>
-
- 13 Aug, 2024 1 commit
-
-
Phuong Nguyen authored
* add timing for build * using perf_counter --------- Signed-off-by:Phuong Nguyen <phuonguyen@nvidia.com>
-
- 25 Jul, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
* Specify python version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add classifiers for python Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add utils to build wheels Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * make wheel scripts Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add aarch Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix paddle wheel Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * PaddlePaddle only builds for x86 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add optional fwk deps Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Python3.8; catch install error Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * [wip] cudnn9 compile with paddle support Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * [wip] dont link cudnn Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * dlopen cudnn Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * dynamically load nvrtc Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * remove residual packages; exclude stub from nvrtc .so search Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Exclude builtins from nvrtc .so search Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * properly include files for sdist Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * paddle wheel tie to python version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix paddle build from src [wip] Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix workflow paddle build Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix paddle Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix paddle Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix lint from pr986 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add sanity wheel test Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add sanity import to wheel test Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * remove upper limit on paddlepaddle version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Remove unused imports Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Remove pybind11 dependency Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix cpp tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Search .sos in cuda home Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * CLeanup, remove residual code Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 24 Jul, 2024 1 commit
-
-
Tim Moon authored
* Set minimum CMake version to 3.21 Stop linking to nvtx. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update .github/workflows/build.yml Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> * Revert Python version to 3.9 Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 14 Jun, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
* Apply formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Apply formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 13 Jun, 2024 1 commit
-
-
Alp Dener authored
* added DL framework callbacks for bootstrapping userbuffers without MPI Signed-off-by:
Alp Dener <adener@nvidia.com> * removed userbuffers availability check in TE modules since userbuffers is now always compiled Signed-off-by:
Alp Dener <adener@nvidia.com> * added comm+GEMM overlap example with LayerNormMLP Signed-off-by:
Alp Dener <adener@nvidia.com> * lintin and review fixes Signed-off-by:
Alp Dener <adener@nvidia.com> * linting and review fixes Signed-off-by:
Alp Dener <adener@nvidia.com> * added header guards Signed-off-by:
Alp Dener <adener@nvidia.com> * removed defunct userbuffers checks in build_utils and setup.py Signed-off-by:
Alp Dener <adener@nvidia.com> * added exposed API in modules/base.py to __all__ Signed-off-by:
Alp Dener <adener@nvidia.com> * removed transformer_engine/CMakeLists.txt and shifted all TE/common compile into transformer_engine/common/CmakeLists.txt Signed-off-by:
Alp Dener <adener@nvidia.com> --------- Signed-off-by:
Alp Dener <adener@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 07 Jun, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 06 Jun, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
Cleanup Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 30 May, 2024 1 commit
-
-
Xin Yao authored
* add multi-tensor kernels Signed-off-by:
Xin Yao <xiny@nvidia.com> * add FusedAdam Signed-off-by:
Xin Yao <xiny@nvidia.com> * add test to qa Signed-off-by:
Xin Yao <xiny@nvidia.com> * add FusedSGD Signed-off-by:
Xin Yao <xiny@nvidia.com> * fix lint Signed-off-by:
Xin Yao <xiny@nvidia.com> --------- Signed-off-by:
Xin Yao <xiny@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 21 May, 2024 1 commit
-
-
Alp Dener authored
replaced deprecated pkg_resources with packaging Signed-off-by:Alp Dener <adener@nvidia.com>
-
- 09 May, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
Bump FA version to 2.5.8 Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 10 Apr, 2024 1 commit
-
-
Jinze Xue authored
Signed-off-by:
Jinze Xue <jinzex@nvidia.com> Co-authored-by:
Jinze Xue <jinzex@nvidia.com>
-
- 03 Apr, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
This reverts commit 965803c9.
-
- 20 Mar, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 04 Mar, 2024 1 commit
-
-
Jinze Xue authored
* Enable incremental CMake build Signed-off-by:
Jinze Xue <jinzex@nvidia.com> * Update setup.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Jinze Xue <155670984+jinzex@users.noreply.github.com> * Update setup.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Jinze Xue <155670984+jinzex@users.noreply.github.com> * remove tempfile import Signed-off-by:
Jinze Xue <jinzex@nvidia.com> --------- Signed-off-by:
Jinze Xue <jinzex@nvidia.com> Signed-off-by:
Jinze Xue <155670984+jinzex@users.noreply.github.com> Co-authored-by:
Jinze Xue <jinzex@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 21 Feb, 2024 1 commit
-
-
Tim Moon authored
Add __version__ attribute to Python module Signed-off-by:Tim Moon <tmoon@nvidia.com>
-
- 23 Jan, 2024 1 commit
-
-
Luke Petre authored
* Support building using the manylinux docker image. libpython is only required for embedded python. Signed-off-by:
Luke Petre <lpetre@midjourney.com> * Be explicit about which python to use in cmake Signed-off-by:
Luke Petre <lpetre@midjourney.com> * Remove cmake version check Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Luke Petre <lpetre@gmail.com> --------- Signed-off-by:
Luke Petre <lpetre@midjourney.com> Signed-off-by:
Luke Petre <lpetre@gmail.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 06 Jan, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
* Deterministic FA, bump minimum supported version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix MQA/GQA Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Address review comments Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 03 Jan, 2024 1 commit
-
-
Przemyslaw Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
- 15 Nov, 2023 1 commit
-
-
cyanguwa authored
* fix condition checks related to FA head_dim Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * force q,k,v contiguous when RoPE is in use Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Expand FA version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 24 Oct, 2023 1 commit
-
-
Tim Moon authored
* Do not include logging macros in installed C headers Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug logging macros Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug C++ tests Use Google style for header includes. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update CUDA driver macros Incorporating changes from #389. Co-authored-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Jan Bielak <jbielak@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use core error checking macros in PyTorch extensions Hack to get around macro redefinition warning. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix missing arg when getting CUDA driver error string Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Reuse logging header in frameworks Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Jan Bielak <jbielak@nvidia.com>
-
- 13 Oct, 2023 1 commit
-
-
Tim Moon authored
Signed-off-by:Tim Moon <tmoon@nvidia.com>
-
- 11 Oct, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 27 Sep, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 22 Sep, 2023 1 commit
-
-
Xiaowei Ren authored
* add flash implementation with context parallelism Signed-off-by:
xren <xren@nvidia.com> * next more comments Signed-off-by:
xren <xren@nvidia.com> * code comment fix Signed-off-by:
xren <xren@nvidia.com> * comment fix Signed-off-by:
xren <xren@nvidia.com> * add missing space Signed-off-by:
xren <xren@nvidia.com> * fix docstrings Signed-off-by:
xren <xren@nvidia.com> * try to add fa v2 api Signed-off-by:
xren <xren@nvidia.com> * fix a comment Signed-off-by:
xren <xren@nvidia.com> * fix padded kv return Signed-off-by:
xren <xren@nvidia.com> * add docstrings of context parallelism Signed-off-by:
xren <xren@nvidia.com> * minor fix Signed-off-by:
xren <xren@nvidia.com> * minor docstring fix Signed-off-by:
xren <xren@nvidia.com> * fix positional arguments Signed-off-by:
xren <xren@nvidia.com> * make docstring line shorter Signed-off-by:
xren <xren@nvidia.com> * add fa v2 backward api for flash_attn_with_cp Signed-off-by:
xren <xren@nvidia.com> * remove redundant code Signed-off-by:
xren <xren@nvidia.com> * make sure hidden size per attn head is multiple of 8 for FA2 Signed-off-by:
xren <xren@nvidia.com> * remove an unnecessary assert check for FA2 Signed-off-by:
xren <xren@nvidia.com> * indention fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * Update FA version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
xren <xren@nvidia.com> Signed-off-by:
Xiaowei Ren <xren@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 12 Sep, 2023 1 commit
-
-
cyanguwa authored
* add workspace optimization for arbitrary_seqlen fused attn Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix whitespace for lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add use_workspace_opt to cudnn plan cache and fix workspace estimate Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * modify workspace opt logic; move zero fill to FP8 API only; other minor fixes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix try/catch Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix std string error when input is nullptr Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove comments Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Add = for required vs allowed workspace comparison Signed-off-by:
cyanguwa <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
cyanguwa <8636796+cyanguwa@users.noreply.github.com>
-
- 11 Aug, 2023 1 commit
-
-
cyanguwa authored
* miscellenous fixes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add back pytorch csrc extensions.h Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add unit tests for dpa checkpointing Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove seqlen%32/64 checks for now Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix tests for core attn bias Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add tests for changes regarding rng_state in aux_ctx_tensor Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * reuse rng tracker from numerics in fused attn; skip checkpointing if FAv2 in numerics Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * uncomment comments used for testing Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix pre/post scale bias Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Update transformer_engine/pytorch/attention.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
cyanguwa <8636796+cyanguwa@users.noreply.github.com> * remove skipifs for FAv2 check after PR366 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove checkpointing tests for transformer layer; dpa tests still provide coverage Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * adjust random number range for tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Add upper bound to FA version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Check backend only when using FusedAttention Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * remove imports/variables related to FAv2 checks Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * further fix random number ranges for tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix variable referenced before assignment error Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
cyanguwa <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 03 Aug, 2023 1 commit
-
-
Ming-Xu Huang authored
* Cast Flax collections to FrozenDict as WAR to adapt Flax 0.7.1. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Adding min version of flax to requirements.txt in examples Signed-off-by:
Ming Huang <mingh@nvidia.com> * Fix praxis tests and rename compare_frozen_dict to compare_dict Signed-off-by:
Ming Huang <mingh@nvidia.com> * [Paddle] Refactor FP8 state (#350) Refactor fp8 state Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> Signed-off-by:
Ming Huang <mingh@nvidia.com> * Store FP8 checkpointing data in CPU (#351) Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Ming Huang <mingh@nvidia.com> * Make test_layer be able to run on both Flax >=0.7.1 and <=0.7.0 Signed-off-by:
Ming Huang <mingh@nvidia.com> * Update Flax version Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tian Zheng <tizheng@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com>
-
- 29 Jul, 2023 1 commit
-
-
cyanguwa authored
* add support for multi-query/grouped-query attention Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * revert to flash-attn 1.0.6 and build 2.0.0.post1 manually in CI Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add keyword name for DPA input Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix fused attn tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix skipif for pytest Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Update transformer_engine/pytorch/attention.py Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update tests/pytorch/test_fused_attn.py Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix TP and SP case Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * add skipifs for pytest Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove higher limit for flash-attn version Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 27 Jul, 2023 1 commit
-
-
Przemyslaw Tredak authored
* Exposing RMSNorm in pyTorch extensions Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * First pass at the Python API Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Small fixes Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Added numerics tests and fixed issues Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Lint fixes Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Added RMSNorm to LayerNormMLP Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Added ONNX export and tests for RMSNorm Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix python lint Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix BERT case Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Added normalization option to the TransformerLayer Added tests Fixed test failures Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix documentation Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix kwarg bug Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix IMA and invalid type error Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Increase RMSNorm threshold for bf16 case Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix ONNX tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 19 Jul, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* FA v2.0 support Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix typo Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 13 Jun, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 06 Jun, 2023 1 commit
-
-
Tian Zheng authored
* First step of PaddlePaddle integration - Add build option for paddle - Add basic test framework - Add 3 basic operators: cast_from_fp8, cast_to_fp8, gemm Signed-off-by:
Tian Zheng <tizheng@nvidia.com> Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix review comments Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Support paddle build Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add paddle build support for new building framework Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix review comments Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Clean up build process for Paddle stub file Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Minor fixes Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix pylint "wrong-import-order" warning Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix review comments Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Skip BF16 GEMM tests for unsupported arch Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> --------- Signed-off-by:
Tian Zheng <tizheng@nvidia.com> Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 01 Jun, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
pin FA version Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 31 May, 2023 1 commit
-
-
Tim Moon authored
* Refactor Setuptools build system Successfully launches CMake install, but installs CMake extensions in temp dir. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug JAX build Fix pybind11 import. Distinguish between build-time and run-time dependencies. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add helper function to determine dependencies Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add missing license Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug case where system CMake is too old Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add missing license Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Simplify sanity import tests Just importing modules provides richer error messages. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Properly install submodules Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Install helper library for TensorFlow Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update documentation Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Do not install Ninja by default Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Include Git commit hash in version string Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Override build_ext.build_extensions instead of build_ext.run Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix incorrect include path Restore Ninja dependency. Restore overriding build_ext.run func. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Review suggestions from @nouiz Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Disable parallel Ninja jobs in GitHub actions Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Properly install userbuffers lib Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Tweak install docs Review suggestion from @ksivaman Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add examples for specifying framework in docs Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com>
-
- 02 May, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
Use only built-ins for setup Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 30 Apr, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
Remove distutils Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-