- 02 Jan, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 29 Oct, 2024 1 commit
-
-
Alp Dener authored
* moved userbuffers code to TE/common Signed-off-by:
Alp Dener <adener@nvidia.com> * moved comm+GEMM overlap code to TE/common Signed-off-by:
Alp Dener <adener@nvidia.com> * removed PyTorch depdency from comm+GEMM overlap in TE/common Signed-off-by:
Alp Dener <adener@nvidia.com> * added TE/PyTorch wrappers for refactored comm+GEMM overlap code in TE/common Signed-off-by:
Alp Dener <adener@nvidia.com> * updated TE/PyTorch Python API to match the refactored comm+GEMM overlap code Signed-off-by:
Alp Dener <adener@nvidia.com> * updated unit tests to work with refactored comm+GEMM overlap code Signed-off-by:
Alp Dener <adener@nvidia.com> * added a pylint exception to comm+GEMM overlap test runner Signed-off-by:
Alp Dener <adener@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing linting errors Signed-off-by:
Alp Dener <adener@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added documentation for te.initialize_ub Signed-off-by:
Alp Dener <adener@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed compile errors when building with NVTE_UB_WITH_MPI=1 Signed-off-by:
Alp Dener <adener@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed default bootstrap backend Signed-off-by:
Alp Dener <adener@nvidia.com> * switched default bootstrap backend priority to MPI > Gloo > NCCL Signed-off-by:
Alp Dener <adener@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updated bootstrap backend documentation Signed-off-by:
Alp Dener <adener@nvidia.com> * close UB bootstrap socket to avoid interfering with CUDA Multicast shareable file handle send/recv Signed-off-by:
Alp Dener <adener@nvidia.com> * added torch::Tensor wrappers for communication buffer and atomic counters so PyTorch can factor externally allocated memory into its garbage collection threshold Signed-off-by:
Alp Dener <adener@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * automated handling of world, local and node ranks/sizes within C++ CommOverlapHelper to simplify Python function signatures Signed-off-by:
Alp Dener <adener@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed incorrect read of environment variables Signed-off-by:
Alp Dener <adener@nvidia.com> * corrected priority for _SOCKET_IFNAME environment variables in UB bootstrapping Signed-off-by:
Alp Dener <adener@nvidia.com> * moved multicast support check to cuda_runtime.h and replaced cudaDeviceGetProp call with cached sm_count() Signed-off-by:
Alp Dener <adener@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed commented out old code and replaced external collective function type defines with aliases Signed-off-by:
Alp Dener <adener@nvidia.com> * compile-time CUDA version guard for CUDA Driver Multicast attribute Signed-off-by:
Alp Dener <adener@nvidia.com> * added compile-time CUDA version guards to Multicast code in Userbuffers Signed-off-by:
Alp Dener <adener@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * condensed UB docs, corrected const violations Signed-off-by:
Alp Dener <adener@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed autodoc rst for UB calls, added CUDA version guard on Multicast UB kernels Signed-off-by:
Alp Dener <adener@nvidia.com> * fixed incorrect UB type reporting for P2P overlaps, comment reformatting Signed-off-by:
Alp Dener <adener@nvidia.com> * add docstring to tex.ubuf_built_with_mpi() Signed-off-by:
Alp Dener <adener@nvidia.com> --------- Signed-off-by:
Alp Dener <adener@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 16 Oct, 2024 1 commit
-
-
Charlene Yang authored
* WIP: make FA2 optional Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * WIP: fix logic Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fixes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor tweak Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add L1 test to test all supported FA versions Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update version to 2.1.1 and trim L1 tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update onnxruntime version Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove onnxruntime from L1 FA versions tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 03 Sep, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
* Improvements for wheels Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fixes for wheel build Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Move package finder to common Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * format Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * FIx Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix CI and distributed test Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix paddle ci Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 24 Aug, 2024 1 commit
-
-
hXl3s authored
* Limit number of architectures build Signed-off-by:
Lukasz Pierscieniewski <lukaszp@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Lukasz Pierscieniewski <lukaszp@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 23 Aug, 2024 1 commit
-
-
Charlene Yang authored
* WIP: add fa3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: clean up Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * WIP: add benchmarks Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * differentiate func/varlen_func Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix parsing keyword for FA3 and remove bshd->thd conversion for flash_attn_func Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: add FP8 fwd support Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add FA3 FP8 fwd code and test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix assert for FA3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix FA3 FP8 logic and add tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update FA2 to <=2.6.3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * tweak unit tests for base/mask Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * set constraints for FA3 for sm90 and causal_bottom_right Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * revert debug changes in benchmark script Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 22 Aug, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
* Re-add framework specific required dependencies for source build Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix build Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 14 Aug, 2024 1 commit
-
-
Phuong Nguyen authored
Remove total time measurement Signed-off-by:Phuong Nguyen <phuonguyen@nvidia.com>
-
- 13 Aug, 2024 1 commit
-
-
Phuong Nguyen authored
* add timing for build * using perf_counter --------- Signed-off-by:Phuong Nguyen <phuonguyen@nvidia.com>
-
- 25 Jul, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
* Specify python version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add classifiers for python Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add utils to build wheels Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * make wheel scripts Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add aarch Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix paddle wheel Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * PaddlePaddle only builds for x86 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add optional fwk deps Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Python3.8; catch install error Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * [wip] cudnn9 compile with paddle support Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * [wip] dont link cudnn Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * dlopen cudnn Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * dynamically load nvrtc Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * remove residual packages; exclude stub from nvrtc .so search Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Exclude builtins from nvrtc .so search Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * properly include files for sdist Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * paddle wheel tie to python version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix paddle build from src [wip] Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix workflow paddle build Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix paddle Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix paddle Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix lint from pr986 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add sanity wheel test Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add sanity import to wheel test Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * remove upper limit on paddlepaddle version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Remove unused imports Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Remove pybind11 dependency Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix cpp tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Search .sos in cuda home Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * CLeanup, remove residual code Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 24 Jul, 2024 1 commit
-
-
Tim Moon authored
* Set minimum CMake version to 3.21 Stop linking to nvtx. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update .github/workflows/build.yml Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> * Revert Python version to 3.9 Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 14 Jun, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
* Apply formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Apply formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 13 Jun, 2024 1 commit
-
-
Alp Dener authored
* added DL framework callbacks for bootstrapping userbuffers without MPI Signed-off-by:
Alp Dener <adener@nvidia.com> * removed userbuffers availability check in TE modules since userbuffers is now always compiled Signed-off-by:
Alp Dener <adener@nvidia.com> * added comm+GEMM overlap example with LayerNormMLP Signed-off-by:
Alp Dener <adener@nvidia.com> * lintin and review fixes Signed-off-by:
Alp Dener <adener@nvidia.com> * linting and review fixes Signed-off-by:
Alp Dener <adener@nvidia.com> * added header guards Signed-off-by:
Alp Dener <adener@nvidia.com> * removed defunct userbuffers checks in build_utils and setup.py Signed-off-by:
Alp Dener <adener@nvidia.com> * added exposed API in modules/base.py to __all__ Signed-off-by:
Alp Dener <adener@nvidia.com> * removed transformer_engine/CMakeLists.txt and shifted all TE/common compile into transformer_engine/common/CmakeLists.txt Signed-off-by:
Alp Dener <adener@nvidia.com> --------- Signed-off-by:
Alp Dener <adener@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 07 Jun, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 06 Jun, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
Cleanup Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 30 May, 2024 1 commit
-
-
Xin Yao authored
* add multi-tensor kernels Signed-off-by:
Xin Yao <xiny@nvidia.com> * add FusedAdam Signed-off-by:
Xin Yao <xiny@nvidia.com> * add test to qa Signed-off-by:
Xin Yao <xiny@nvidia.com> * add FusedSGD Signed-off-by:
Xin Yao <xiny@nvidia.com> * fix lint Signed-off-by:
Xin Yao <xiny@nvidia.com> --------- Signed-off-by:
Xin Yao <xiny@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 21 May, 2024 1 commit
-
-
Alp Dener authored
replaced deprecated pkg_resources with packaging Signed-off-by:Alp Dener <adener@nvidia.com>
-
- 09 May, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
Bump FA version to 2.5.8 Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 10 Apr, 2024 1 commit
-
-
Jinze Xue authored
Signed-off-by:
Jinze Xue <jinzex@nvidia.com> Co-authored-by:
Jinze Xue <jinzex@nvidia.com>
-
- 03 Apr, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
This reverts commit 965803c9.
-
- 20 Mar, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 04 Mar, 2024 1 commit
-
-
Jinze Xue authored
* Enable incremental CMake build Signed-off-by:
Jinze Xue <jinzex@nvidia.com> * Update setup.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Jinze Xue <155670984+jinzex@users.noreply.github.com> * Update setup.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Jinze Xue <155670984+jinzex@users.noreply.github.com> * remove tempfile import Signed-off-by:
Jinze Xue <jinzex@nvidia.com> --------- Signed-off-by:
Jinze Xue <jinzex@nvidia.com> Signed-off-by:
Jinze Xue <155670984+jinzex@users.noreply.github.com> Co-authored-by:
Jinze Xue <jinzex@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 21 Feb, 2024 1 commit
-
-
Tim Moon authored
Add __version__ attribute to Python module Signed-off-by:Tim Moon <tmoon@nvidia.com>
-
- 23 Jan, 2024 1 commit
-
-
Luke Petre authored
* Support building using the manylinux docker image. libpython is only required for embedded python. Signed-off-by:
Luke Petre <lpetre@midjourney.com> * Be explicit about which python to use in cmake Signed-off-by:
Luke Petre <lpetre@midjourney.com> * Remove cmake version check Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Luke Petre <lpetre@gmail.com> --------- Signed-off-by:
Luke Petre <lpetre@midjourney.com> Signed-off-by:
Luke Petre <lpetre@gmail.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 06 Jan, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
* Deterministic FA, bump minimum supported version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix MQA/GQA Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Address review comments Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 03 Jan, 2024 1 commit
-
-
Przemyslaw Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
- 15 Nov, 2023 1 commit
-
-
cyanguwa authored
* fix condition checks related to FA head_dim Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * force q,k,v contiguous when RoPE is in use Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Expand FA version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 24 Oct, 2023 1 commit
-
-
Tim Moon authored
* Do not include logging macros in installed C headers Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug logging macros Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug C++ tests Use Google style for header includes. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update CUDA driver macros Incorporating changes from #389. Co-authored-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Jan Bielak <jbielak@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use core error checking macros in PyTorch extensions Hack to get around macro redefinition warning. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix missing arg when getting CUDA driver error string Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Reuse logging header in frameworks Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Jan Bielak <jbielak@nvidia.com>
-
- 13 Oct, 2023 1 commit
-
-
Tim Moon authored
Signed-off-by:Tim Moon <tmoon@nvidia.com>
-
- 11 Oct, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 27 Sep, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 22 Sep, 2023 1 commit
-
-
Xiaowei Ren authored
* add flash implementation with context parallelism Signed-off-by:
xren <xren@nvidia.com> * next more comments Signed-off-by:
xren <xren@nvidia.com> * code comment fix Signed-off-by:
xren <xren@nvidia.com> * comment fix Signed-off-by:
xren <xren@nvidia.com> * add missing space Signed-off-by:
xren <xren@nvidia.com> * fix docstrings Signed-off-by:
xren <xren@nvidia.com> * try to add fa v2 api Signed-off-by:
xren <xren@nvidia.com> * fix a comment Signed-off-by:
xren <xren@nvidia.com> * fix padded kv return Signed-off-by:
xren <xren@nvidia.com> * add docstrings of context parallelism Signed-off-by:
xren <xren@nvidia.com> * minor fix Signed-off-by:
xren <xren@nvidia.com> * minor docstring fix Signed-off-by:
xren <xren@nvidia.com> * fix positional arguments Signed-off-by:
xren <xren@nvidia.com> * make docstring line shorter Signed-off-by:
xren <xren@nvidia.com> * add fa v2 backward api for flash_attn_with_cp Signed-off-by:
xren <xren@nvidia.com> * remove redundant code Signed-off-by:
xren <xren@nvidia.com> * make sure hidden size per attn head is multiple of 8 for FA2 Signed-off-by:
xren <xren@nvidia.com> * remove an unnecessary assert check for FA2 Signed-off-by:
xren <xren@nvidia.com> * indention fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * Update FA version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
xren <xren@nvidia.com> Signed-off-by:
Xiaowei Ren <xren@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 12 Sep, 2023 1 commit
-
-
cyanguwa authored
* add workspace optimization for arbitrary_seqlen fused attn Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix whitespace for lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add use_workspace_opt to cudnn plan cache and fix workspace estimate Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * modify workspace opt logic; move zero fill to FP8 API only; other minor fixes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix try/catch Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix std string error when input is nullptr Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove comments Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Add = for required vs allowed workspace comparison Signed-off-by:
cyanguwa <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
cyanguwa <8636796+cyanguwa@users.noreply.github.com>
-
- 11 Aug, 2023 1 commit
-
-
cyanguwa authored
* miscellenous fixes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add back pytorch csrc extensions.h Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add unit tests for dpa checkpointing Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove seqlen%32/64 checks for now Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix tests for core attn bias Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add tests for changes regarding rng_state in aux_ctx_tensor Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * reuse rng tracker from numerics in fused attn; skip checkpointing if FAv2 in numerics Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * uncomment comments used for testing Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix pre/post scale bias Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Update transformer_engine/pytorch/attention.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
cyanguwa <8636796+cyanguwa@users.noreply.github.com> * remove skipifs for FAv2 check after PR366 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove checkpointing tests for transformer layer; dpa tests still provide coverage Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * adjust random number range for tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Add upper bound to FA version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Check backend only when using FusedAttention Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * remove imports/variables related to FAv2 checks Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * further fix random number ranges for tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix variable referenced before assignment error Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
cyanguwa <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 03 Aug, 2023 1 commit
-
-
Ming-Xu Huang authored
* Cast Flax collections to FrozenDict as WAR to adapt Flax 0.7.1. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Adding min version of flax to requirements.txt in examples Signed-off-by:
Ming Huang <mingh@nvidia.com> * Fix praxis tests and rename compare_frozen_dict to compare_dict Signed-off-by:
Ming Huang <mingh@nvidia.com> * [Paddle] Refactor FP8 state (#350) Refactor fp8 state Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> Signed-off-by:
Ming Huang <mingh@nvidia.com> * Store FP8 checkpointing data in CPU (#351) Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Ming Huang <mingh@nvidia.com> * Make test_layer be able to run on both Flax >=0.7.1 and <=0.7.0 Signed-off-by:
Ming Huang <mingh@nvidia.com> * Update Flax version Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tian Zheng <tizheng@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com>
-
- 29 Jul, 2023 1 commit
-
-
cyanguwa authored
* add support for multi-query/grouped-query attention Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * revert to flash-attn 1.0.6 and build 2.0.0.post1 manually in CI Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add keyword name for DPA input Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix fused attn tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix skipif for pytest Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Update transformer_engine/pytorch/attention.py Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update tests/pytorch/test_fused_attn.py Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix TP and SP case Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * add skipifs for pytest Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove higher limit for flash-attn version Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 27 Jul, 2023 1 commit
-
-
Przemyslaw Tredak authored
* Exposing RMSNorm in pyTorch extensions Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * First pass at the Python API Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Small fixes Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Added numerics tests and fixed issues Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Lint fixes Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Added RMSNorm to LayerNormMLP Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Added ONNX export and tests for RMSNorm Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix python lint Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix BERT case Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Added normalization option to the TransformerLayer Added tests Fixed test failures Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix documentation Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix kwarg bug Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix IMA and invalid type error Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Increase RMSNorm threshold for bf16 case Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix ONNX tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 19 Jul, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* FA v2.0 support Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix typo Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 13 Jun, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 06 Jun, 2023 1 commit
-
-
Tian Zheng authored
* First step of PaddlePaddle integration - Add build option for paddle - Add basic test framework - Add 3 basic operators: cast_from_fp8, cast_to_fp8, gemm Signed-off-by:
Tian Zheng <tizheng@nvidia.com> Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix review comments Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Support paddle build Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add paddle build support for new building framework Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix review comments Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Clean up build process for Paddle stub file Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Minor fixes Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix pylint "wrong-import-order" warning Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix review comments Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Skip BF16 GEMM tests for unsupported arch Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> --------- Signed-off-by:
Tian Zheng <tizheng@nvidia.com> Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-