You need to sign in or sign up before continuing.
- 11 May, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
* First pass refactor Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * first pass Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * core compiles Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Include cuda dirs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Compiles Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix test Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Move grad outside autocast Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix kv cache Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Address review comments Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change src file name in cmake Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * move the kernels too Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Move comment Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Move comments around Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * more movement Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * move Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 18 Apr, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
* Move jaxx cuda kernels to core Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 18 Mar, 2025 1 commit
-
-
Charlene Yang authored
* add paged attention; test_kv_cache_accuray and test_paged_attn pass Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove unnecessary change from last commit Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * test_fused_attn pass Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unnecessary import in test_numerics Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add license for test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add to L0 test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update license for test_paged_attn Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update kv_cache_manager license Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix build issue from previous merge Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * WIP: minor fix/preparation for inference/cuda graph Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: non-paged Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: non-paged, bshd/sbhd Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: non-paged, thd, no CG Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: non-paged, thd, CG Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: non-paged, CG Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: non-paged, using paged kernel Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: restructure kernels Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: paged, CG Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: padding + BRCM Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: restructure IP, clean up Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: fix non-CG, fused Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: fix last commit Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: unfused, non-CG Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: flash-attn, non-CG Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: flash_attn_with_kvcache Signed-off-by:
Charlene Yang <charleney@nvidia.com> * commit two files missed by bcef6b34 Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: thd_bshd_bshd Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: fix last commit Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: fix 1c31b68d Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: add bshd_2sbhd, sbhd_2bshd Signed-off-by:
Charlene Yang <charleney@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * WIP: some cleanup Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: all qkv_format combinations and merge CM files Signed-off-by:
Charlene Yang <charleney@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * WIP: some lint fixes Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: add docstring for IP Signed-off-by:
Charlene Yang <charleney@nvidia.com> * fix sequences_pre Signed-off-by:
Charlene Yang <charleney@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * WIP: minor fixes for multi-layer Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: initial multi-layer test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: minor clean up Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * WIP: clean up Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: switch to flash_attn_varlen_func Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: fix unfused for separate q/kv format Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: fix fused for separate q/kv formats Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: flash attn + TELayer + 2 layers Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: unfused + TL + 2layers Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: all modules/backend Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: minor cleanup Signed-off-by:
Charlene Yang <charleney@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * WIP: FlashAttention on Hopper with 2.7.3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: FlashAttention + v3 from 39e7179 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: FlashAttention + v3 + FP8 + WIP Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: add backend support table Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: clean up Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: separate use_flash_attention_2 and _3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: tweaks to paged attn script Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * WIP: enable/disable certain cases for fused attn Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: small fixes for lint and cg Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: minor fixes for attn/infer Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: fix CP Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: readd page info to FADescriptor_v1 Signed-off-by:
Charlene Yang <charleney@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor tweak to test_numerics.py Signed-off-by:
Charlene Yang <charleney@nvidia.com> * fix 9.5/9.7 sq/skv + mask logic Signed-off-by:
Charlene Yang <charleney@nvidia.com> * clean up Signed-off-by:
Charlene Yang <charleney@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fix for FA3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * more minor fixes for FA3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * test page_size=1 for FA3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix t3hd/th3d strides Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix ckpt recompute and fa3 k_scale Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * raise dynamo recompile limit for test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove thunder test from L0 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix FA selection logic Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix FA3 q_descale shape Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove page_table from IP.step() returns Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix FP8 FlashAttn DPA fp8_dpa tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix CP Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor tweaks Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update FA3 note and L3 test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove redundant import in test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * adopt new FA3 APIs from FA2.7.3+/hopper for CP and non-CP Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * relax tols for TransformerLayers Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix merge Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix merge 2 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix FA import comments Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * relax tols for Ampere Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix fa3 version and reduce messaging Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update FA3 to its latest commit on main Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add default values to IP and assertion to graph.py Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add more comments in attention Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * use custom_cache_manager instead of cache_manager Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
Charlene Yang <charleney@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 17 Mar, 2025 1 commit
-
-
Przemyslaw Tredak authored
* Do not create multiple cublas handle Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix for multiple GPUs per thread Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix multithreaded execution Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix from conlfict Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 25 Feb, 2025 1 commit
-
-
Charlene Yang authored
* minor fixes for attention Signed-off-by:
Charlene Yang <charleney@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Charlene Yang <charleney@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 22 Feb, 2025 1 commit
-
-
Kshitij Lakhani authored
* Remove dependency on transformer_engine::Tensor in attention.cu Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * Templatize thd_partition_indices_kernel and thd_read_half_tensor_kernel kernels ONLY for invoking recompilation and not directly using the pre-compiled symbols in libtransformer.so Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * Modify attention.cu for thd templatized kernels. Remove dependency on common.h Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * Move thd structs from libtransformer.so to framework extensions include header Code cleanup Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Consolidate and move thd_utils from common to framework extensions Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * Remove template decorators around thd_partition_indices_kernel and thd_read_half_tensor_kernel Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> Code clean up Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 07 Feb, 2025 1 commit
-
-
Przemek Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
- 10 Jan, 2025 1 commit
-
-
Xiaowei Ren authored
Take token count quantization of fused attention into consideration for CP results correction (#1396) * fix second half lse shape Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * bug fixes Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Xiaowei Ren <xren@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 02 Jan, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 20 Dec, 2024 1 commit
-
-
Charlene Yang authored
* add swa (left,0) + padding + brcm support Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * final fixes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * upgrade to FE 1.9-rc Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix jax tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * skip thd + CP + fused attn tests for cuDNN 9.6+ due to different stats shapes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 17 Dec, 2024 1 commit
-
-
Charlene Yang authored
add max_t for KV Signed-off-by:Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
-
- 25 Nov, 2024 1 commit
-
-
Michael Goldfarb authored
Moved framework agnostic THD kernels to common. --------- Signed-off-by:Michael Goldfarb <mgoldfarb@nvidia.com>
-
- 08 Nov, 2024 1 commit
-
-
Phuong Nguyen authored
* split cudnn utils from fused_attn/util --------- Signed-off-by:Phuong Nguyen <phuonguyen@nvidia.com>
-
- 25 Oct, 2024 2 commits
-
-
Charlene Yang authored
* WIP: add max_t support for THD Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * WIP: save tensors for debug and point to new FE Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix stats in bwd Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix stats in fwd Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add docstring for DPA Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add docstring Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: first try on adding max_b and max_t Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" This reverts commit c3d522e9f5aef3c8ddfec5bf6ff24c3db97bb059. Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Revert "WIP: first try on adding max_b and max_t" This reverts commit 3bc01ebaf2aa846fd16634e2d33b0d0f5803a076. Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update docstring and fix max_seqlen logic for thd Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * revert two lines of change in docstring Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: add get_max_b/t Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix max_seqlen code and docstring Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * sucess: add max_b/max_t Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove debug code Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * change max_b/max_t buckets Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix b vs orig_b Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix b vs orig_b with 0 fill Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update FE for T3HD/TH3D Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add max_b to conversion kernels Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix changes after last merge Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add Jax support for max_t Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update FE to 1.8.0-rc Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update FE to 1.8.0 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * code review/formating fixes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Stats shape for <9.6 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * return nullptr for offset_stats when cudnn < 9.6 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more version control Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Charlene Yang authored
* add THD MQA/GQA Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix nvte_get_fused_attn_backend Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 22 Oct, 2024 2 commits
-
-
Reese Wang authored
Add THD + GQA supports for cuDNN >= 9.6 Signed-off-by:Reese Wang <rewang@nvidia.com>
-
Michael Goldfarb authored
* Use 64-bit offsets for cuDNN 9.5+ * Align workspace tensors to 16B. * Fix bug where std::accumulate overflowed on large tensor shapes. * Only support 64-bit offsets on arbitrary sequence length fp16 backend. Signed-off-by:Michael Goldfarb <mgoldfarb@nvidia.com>
-
- 07 Oct, 2024 1 commit
-
-
Charlene Yang authored
* adjust window size to (i-window_size_left,i] for cuDNN Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * reduce the window to make any errors more pronouced Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 21 Aug, 2024 2 commits
-
-
Charlene Yang authored
* add support for padding in UnfusedDPA Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for padding_causal/_bottom_right Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix padding_causal/_bottom_right Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * need to test max512 backend Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert last commit Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix mask logic in unfused Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use actual_seqlen for alibi/causal_bottom_right padding Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fixes and convert causal to causal_bottom_right for inference Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use causal in kv cache inference test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * simplify get_alibi logic Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * simplify the non-padding path for get_alibi Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * avoid batch_size loop in generating padding_causal/_bottom_right masks Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Charlene Yang authored
* update FE to 1.6 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update to 1.6.1-rc for testing Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update to fe 1.6.1 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 06 Aug, 2024 1 commit
-
-
Charlene Yang authored
* add multi-latent attention for DPA Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Jax/Paddle API Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix typo in test script Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix too-many-boolean lint error Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Revert "fix lint" This reverts commit 67399a3a6f45bb4ce9e5eaa6bcce40b28e347e5b. Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix stride check in get_qkv_layout Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: fix layout_thd tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * WIP: debug info Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix merge conflict Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix thd pad_between_seqs=False/True tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 30 Jul, 2024 1 commit
-
-
Shijie authored
add deterministic option Signed-off-by:
Shijie Wang <jaywan@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 10 Jul, 2024 1 commit
-
-
Charlene Yang authored
* add cuDNN swa Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix SWA Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add set_deterministic and minor fixes for swa Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add AttentionParams Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * change window_size to int64_t; fix swa/determinism tests; cache _attention_backends Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add window_size to get_backend; fix jax and paddle Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fixes; add set_deter to bwd_impl Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix unit tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix FP8 tests due to determinism Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add support matrix for SWA and bias Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fixes and lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fixes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add wording on window_size special cases Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor tweak on wording Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix jax assertion error Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix wording Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * call bwd with deterministic=true for jax/paddle Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add determinism words in documentation Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 03 Jul, 2024 1 commit
-
-
Charlene Yang authored
* update to FE 1.5.1 and add bottom right causal Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adjust logic for backend selection Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update FE to 1.5.2 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add get_attention_backend function Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update get_attention_backend Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix get_attention_backend Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * tweak get_attention_backend and fix unit tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fixes for unfused, get_backend, etc Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update transformer_engine/pytorch/attention.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix cpu offload Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fixes for get_attention_backend Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * explicitly skip FP32 and padding tests because there is no support Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fix for window size check Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update check_set_window_size and add enc_dec_attn_mask_type/enc_dec_window_size Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fixes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 01 Jul, 2024 1 commit
-
-
Charlene Yang authored
* update FE to 1.5.2 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * enable unfused attn for cross attn Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * unify logging info Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * omit cudnn 9.1.1 and 9.2.1 due to bugs Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * set cu_seqlens_padded to cu_seqlens by default Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * replace variable name with ctx.variable Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Revert "enable unfused attn for cross attn" This reverts commit bc49f14fca904217a711b4a86c45a4a739a17a14. Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * restrict cudnn version for fp8 tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove mha_fill for FP8 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Revert "remove mha_fill for FP8" This reverts commit 83ffc44114dc6eb3d426d742b6c5a4d34805ec04. Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * lower cudnn version to >=9.2.1 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 18 Jun, 2024 1 commit
-
-
Charlene Yang authored
* simplify offset tensors Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fixes; tests pass Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix C lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * replace with_offset with with_padding Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * replace with_padding with padded Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fixes after merge Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fix for fused attn fwd/bwd calls Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Jax Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adjust spacing in docstring Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix pytorch tests; fix paddle api Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix attn_biases Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix AttnFuncWithCP backward Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix jax Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix attn with CP Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix paddle Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 14 Jun, 2024 2 commits
-
-
Kirthi Shankar Sivamani authored
* Apply formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Apply formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Kirthi Shankar Sivamani authored
* Initial config test Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * remove linters, fix clang-format Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix clang-format Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix clang-format Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Remove lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Adjust config Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * use config file Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * adjust pylintrc Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * pre-format fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Python only Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add FA module Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update CI configs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * CRLF -> LF Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * format Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * revert accidental formatting changes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * try with sudo Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * cpp formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix pylint error properly Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * some review comments Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * lint fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * add fp8 attn include in the correct file Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * autofix PRs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 30 May, 2024 1 commit
-
-
Charlene Yang authored
* add THD support Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add seq_offsets_o and use new offset calculation Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * addition to previous commit; fix unit test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add None for offset_o gradient Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: test padding between sequences Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: fix tests for padding between sequences Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix tests for sbhd/bshd layouts; clean up Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update cudnn-frontend and add tests for max_seqlen_q=1 and d=256 for inference Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * test sbhd/bshd layouts for sq1, d256 inference case Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * replace wording from accumulative to cumulative Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add offset tensors to custom fp8 mha tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add version control for cuDNN Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add sm>=90 constraint for thd support Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix cuDNN support for sq=1, d=256 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint and minor tweak for fp8 tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * modify cudnn version and restrict MQA/GQA support for THD Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add notes for seq offset tensors Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add dummy tensor to pass jax build Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add dummy tensor to pass paddle build Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix Jax CI Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
cyanguwa <8636796+cyanguwa@users.noreply.github.com>
-
- 25 May, 2024 1 commit
-
-
Charlene Yang authored
allow bias support for sm80/86/89 for cuDNN 9+ Signed-off-by:Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
-
- 16 Apr, 2024 1 commit
-
-
cyanguwa authored
* WIP: fp8 v1 fprop integration Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: minor fixes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add debug info Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add more debug info Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fprop working for h1; w/ debug info Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: add bprop Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * cleanup; bprop running but has mismatches Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add gitlab frontend as submodule Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * clean up and add back v0.9.2 FE support; fprop/bprop passing with 5e-2 tols Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix after merge; add bias_b/h to caching descriptor Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * distinguish fwd/bwd tensor types for bprop Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fix for F16 cases; include added dqkv_type and d_scale_dp Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * adjust out shape for bwd in test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add casting from/to FP8 to DPA module Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: bshd_bshd_bshd layout Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: support all sbhd/bshd layouts Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * clean up Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add qkvpacked and kvpacked support in both FusedAttnFunc and C levels Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove qkvpacked/kvpacked calls in DPA module (used for testing) Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove tp setup; add allow_non_contiguous; update FE; revert to sbh3d in tests; clean up Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add NVTE_FP8_DPA_BWD to control whether to use FP8 bwd or F16 bwd Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix MQA Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix MQA/GQA in FP8 v1 API Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update FE to 705d8e3, with API change Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * test causal mask Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * restrict mha_fill for THD format Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix fused attn with CP and comment out is_alibi code Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * clean up FE0.9 vs FE1.0 FP8 implementations, and related unit tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * change NVTE_FP8_DPA_BWD default to 1, and fix its use in qkvpacked/kvpacked APIs Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint and self.tp_size/group in FusedAttention() Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update FE to 6902c94 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add FP8 MHA support Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update to FE v1.3.0 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fixes for FP8 MHA with different configs Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * emit stats regardless of is_training Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix linear when input is not Float8Tensor Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix d_out type when f16 bprop Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix user buffer for layernorm_linear/linear and revert two FP8 casts in MHA Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add docstring for fp8_dpa/mha in recipe Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix backend selection to avoid FA Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * replace transpose with transpose_2d Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * use RMSE for FP8 unit tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * replace two more transpose with transpose_2d Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add FP8 initialization to FusedAttention Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * rm docs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Revert "add FP8 initialization to FusedAttention" This reverts commit 15fffd825d6f23f31ea709b16ba01dfd61efabf8. Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Change order of ctxs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * minor fixes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add back docs and mark as beta Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fixes for tests and docs Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 11 Mar, 2024 1 commit
-
-
Tim Moon authored
Remove deprecated cudnn_frontend::throw_if Deprecated in cudnn-frontend 1.1.0. Signed-off-by:Tim Moon <tmoon@nvidia.com>
-
- 28 Feb, 2024 1 commit
-
-
cyanguwa authored
* added support for arbitrary bias shapes for fused_attn Signed-off-by:
Alp Dener <adener@nvidia.com> * Fix linting Signed-off-by:
Alp Dener <adener@nvidia.com> * Add b1ss/bhss/11ss bias shapes when not requiring dBias Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add bias_b/h to plan cache Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fixed compile errors after PR653 merge Signed-off-by:
Alp Dener <adener@nvidia.com> * updated JAX unittests for new bias shapes Signed-off-by:
Alp Dener <adener@nvidia.com> * fixed mismatched mask type checking Signed-off-by:
Alp Dener <adener@nvidia.com> * corrected skip condition Signed-off-by:
Alp Dener <adener@nvidia.com> * fix selection logic for A100s Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * corrected skip checks for bias shapes Signed-off-by:
Alp Dener <adener@nvidia.com> * resolved test issues but neginf with float16 is still problematic with JAX Signed-off-by:
Alp Dener <adener@nvidia.com> * new bias shapes passing TE JAX CI for seqlen <= 512, seq_q == seq_kv and h_q == h_kv conditions Signed-off-by:
Alp Dener <adener@nvidia.com> * TE/JAX fused attn tests for new bias shapes passing with neg_inf=-2**27 for Bfloat16 and -2**15 for Float16 Signed-off-by:
Alp Dener <adener@nvidia.com> * code style fixes and test parameter ID cleanup Signed-off-by:
Alp Dener <adener@nvidia.com> * fixed incorrect skip condition for backward fused attn test Signed-off-by:
Alp Dener <adener@nvidia.com> --------- Signed-off-by:
Alp Dener <adener@nvidia.com> Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
Alp Dener <adener@nvidia.com>
-
- 22 Feb, 2024 1 commit
-
-
Reese Wang authored
* Refine MHA API Signed-off-by:
Reese Wang <rewang@nvidia.com> * Reuse func from the flax Signed-off-by:
Reese Wang <rewang@nvidia.com> * DPA draft Signed-off-by:
Reese Wang <rewang@nvidia.com> * qkv packed draft Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix test_layer with fused attn Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add attn_bias_type and enhance a few code flow Signed-off-by:
Reese Wang <rewang@nvidia.com> * Move scale_factor from __call__ to init Signed-off-by:
Reese Wang <rewang@nvidia.com> * Enhance the docs Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add DPA public API and tests Signed-off-by:
Reese Wang <rewang@nvidia.com> * Refine docs Signed-off-by:
Reese Wang <rewang@nvidia.com> * Refine docs Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix conflict Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add qkv separate fused attn Signed-off-by:
Reese Wang <rewang@nvidia.com> * Apply BSHD_BSHD_BSHD format Signed-off-by:
Reese Wang <rewang@nvidia.com> * Remove debug log Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add fused attention layer tests Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add NVTE_FUSED_ATTN docs Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fine-grained fused attn settings Signed-off-by:
Reese Wang <rewang@nvidia.com> * Remove the default value of num_attetnion_head and head_dim Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add teardown for fused attn env Signed-off-by:
Reese Wang <rewang@nvidia.com> * Unify the Optional notation Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix Pre/Post scale bias comments Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add no_mask tests Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add checkpoint_name for fused attn Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix the fused attn batcher Signed-off-by:
Reese Wang <rewang@nvidia.com> --------- Signed-off-by:
Reese Wang <rewang@nvidia.com>
-
- 08 Feb, 2024 1 commit
-
-
cyanguwa authored
* test alibi between fa and fu Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * move alibi slopes and bias to global to avoid repeating calculation Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix alibi slopes/bias generation Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix _is_flash_attention_supported to allow alibi type Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * disable padding mask when alibi is used for fused attn arbi backend Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add support for custom [n_heads] alibi_slopes in flash, fused, unfused attention Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * clean up last commit Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove alibi_type=none tests as they are unnecessary Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update cudnn-frontend to 1.0.2 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * change bias/dbias shape to allow b,1/1,h/b,h in arbi backend Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * tweak tests for arbi post_scale_bias [1,h,s,s] or alibi_slopes [n_heads] Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * change bias/dbias shape in max512 backend - incomplete Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove max512 changes from last commit and disable max512 (and arbi temporarily) for [b, h, s, s]; pending cuDNN backend support Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * clean up and tweak backend selection logic Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * replace || with () in docstring Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix bias shape for max512 backend Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * combine slopes/bias generation to one function get_alibi() and fix alibi tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix PR557 bugs Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Update transformer_engine/pytorch/attention.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
cyanguwa <8636796+cyanguwa@users.noreply.github.com> * encapsulate global alibi tensors into a dict cache Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * reduce alibi slopes test size Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update to cudnn-frontend 1.0.3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * use dBias shape to define bias_b/bias_h because jax materializes dBias rather than Bias in bwd abstract Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
cyanguwa <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 03 Feb, 2024 1 commit
-
-
cyanguwa authored
* Update cudnn frontend to 1.0.3 to fix cudnn v9 Nans Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * make d_out contiguous for bwd Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove cudnnDestroy to let torch handle it Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Update transformer_engine/pytorch/attention.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
cyanguwa <8636796+cyanguwa@users.noreply.github.com> * Update transformer_engine/pytorch/attention.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
cyanguwa <8636796+cyanguwa@users.noreply.github.com> * Update transformer_engine/pytorch/attention.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
cyanguwa <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
cyanguwa <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 29 Jan, 2024 1 commit
-
-
Alp Dener authored
* Removed cudaMalloc/WorkspaceManager in JAX csrc. JAX custom ops now request buffers from XLA for their workspace tensors. Signed-off-by:
Alp Dener <adener@nvidia.com> * removed unused GEMM C++ API in TE-JAX Signed-off-by:
Alp Dener <adener@nvidia.com> * fixed typo in layernorm_geglu_fp8_mlp and removed unnecessary shape reductions in primitives Signed-off-by:
Alp Dener <adener@nvidia.com> * fixed import order for linting Signed-off-by:
Alp Dener <adener@nvidia.com> * fixed custom op errors due to incorrect static arg nums in JAX jit Signed-off-by:
Alp Dener <adener@nvidia.com> * shifted cudnnSetStream further down the kernel to avoid error when executing dummy kernel call with nullptr stream Signed-off-by:
Alp Dener <adener@nvidia.com> * fixed linting errors for blank lines Signed-off-by:
Alp Dener <adener@nvidia.com> --------- Signed-off-by:
Alp Dener <adener@nvidia.com>
-
- 16 Jan, 2024 1 commit
-
-
zlsh80826 authored
* Support num_gqa_groups arguments Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add GQA support on the JAX bridge code Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix the kv stride of the arbitrary backend Signed-off-by:
Reese Wang <rewang@nvidia.com> * Complete rewrite fused attention tests and add GQA coverage Signed-off-by:
Reese Wang <rewang@nvidia.com> * Support unfused GQA Signed-off-by:
Reese Wang <rewang@nvidia.com> * Calculate seqlen before the primitive for the better perf Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add GQA layer tests Signed-off-by:
Reese Wang <rewang@nvidia.com> * Apply code style checks for te_jax Signed-off-by:
Reese Wang <rewang@nvidia.com> * Apply code style checks for tests Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add num_gqa_groups doc Signed-off-by:
Reese Wang <rewang@nvidia.com> * Refine the qkv_type Signed-off-by:
Reese Wang <rewang@nvidia.com> * Correct the variable naming Signed-off-by:
Reese Wang <rewang@nvidia.com> * Handle Max512 CAUSAL Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add WAR for the latest jax image Signed-off-by:
Reese Wang <rewang@nvidia.com> --------- Signed-off-by:
Reese Wang <rewang@nvidia.com>
-
- 08 Jan, 2024 1 commit
-
-
cyanguwa authored
fix FP8 dims Signed-off-by:Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
-
- 03 Jan, 2024 1 commit
-
-
Przemyslaw Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-