- 04 Oct, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* initial changes [wip] Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add padding mask support for FA Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Address review comments Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * rm causal mask from tests and add padding Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix some conflicts Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * conflicts Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add unpadding mask Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix padding mask Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix docs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * [wip] fix API Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add packing and unpacking Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * More fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * docs fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix atomic_add bf16 torch.compile Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Generate non all True masks Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Lint fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix core attention export and FusedAttn filter Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix all ONNX tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Memory optimization Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * More fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Optimizations and caching fixes in torch.dynamo Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Bug fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Review comments Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Padding optimizations Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fixes and reviews Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 25 Sep, 2023 1 commit
-
-
cyanguwa authored
* add flexible layout support Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add support for flexible qkv layout Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add more changes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fixes for compiling Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove redudant file Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix options device error Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix typos Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * more changes; WIP Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * more changes; WIP Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fixes and tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fixes and wrong results Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * sb3hd/bs3hd working on top of 3xsbhd/bshd/thd Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix dQ, dK, dV Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add nvtx Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove qkvso_strides on torch side; cover it in generateQKVStrides Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * all 15 layouts pass Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add workspace optimization Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fixes and test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * removed most debug info/clean up Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add note to deprecate some qkv layouts Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix code for unit tests in test_fused_attn.py Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * further remove debug info Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove a couple more comments Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix numerics tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fixes for lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix fp8 tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix onnx for core attn; not fixed Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove nvtx and add env var for workspace opt Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove testing for env var Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * replace zeros/zeros_like with empty/empty_like Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix nvtx marker name for _q_k_v API Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove sm80 when compiling for h100 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add mapping from qkv layout to layout group and qkv format Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * clean up enums mapping and remove trailing spaces Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * simplify workspace opt control logic; only need env var Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix fp8 test, and minor modifications for other tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * avoid overwriting model configs in unit test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * random fixes/improvements: get_qkv_format/etc, default values, docstrings, comments Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix minor issues: invalid syntax Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * change workspace opt logic back to FORCE_WORKSPACE_OPT Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix FP8 tests and generateStrides function Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix get_backend logic for max512/arbitrary Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix unit tests; need cleanup Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * clean up unit tests for layouts, and fix minor lint issue Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor tweaks for CI testing: onnx string issue and test fused attn first Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove one unsupported layout from max512 and add a check to qkvpacked API Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix te layer test; reduce test time Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * revert compiler option changes; add back sm80 for even h100 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove some unit tests or make them optional to reduce CI time Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove more unit tests temporarily Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove _q_k_v in naming and add NVTE_ERROR for FP8 Aux_CTX_Tensors size checks Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add more deprecation notes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove temp tests from last commit Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * replace with te::getenv Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove prints from last commit Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove redundant contiguous() Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove thd->bs3hd user warning to avoid GPU sync Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * adjust fused attn bs in tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * temporary fix for onnx issue; more fixes in PR 437 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove unused variables Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by: Charlene Yang Signed-off-by:
cyanguwa <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 11 Aug, 2023 1 commit
-
-
cyanguwa authored
* miscellenous fixes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add back pytorch csrc extensions.h Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add unit tests for dpa checkpointing Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove seqlen%32/64 checks for now Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix tests for core attn bias Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add tests for changes regarding rng_state in aux_ctx_tensor Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * reuse rng tracker from numerics in fused attn; skip checkpointing if FAv2 in numerics Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * uncomment comments used for testing Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix pre/post scale bias Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Update transformer_engine/pytorch/attention.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
cyanguwa <8636796+cyanguwa@users.noreply.github.com> * remove skipifs for FAv2 check after PR366 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove checkpointing tests for transformer layer; dpa tests still provide coverage Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * adjust random number range for tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Add upper bound to FA version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Check backend only when using FusedAttention Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * remove imports/variables related to FAv2 checks Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * further fix random number ranges for tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix variable referenced before assignment error Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
cyanguwa <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 22 Jun, 2023 1 commit
-
-
cyanguwa authored
* add long sequence support and unify three backends for fused attention Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update cudnn-frontend to v0.9.1 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * replace cpu_float2half_rn with __float2half_rn Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix backend selection and NVTEDType Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fixes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix ci Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * make cudnn plan caches thread_local Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix CI Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * replace cuDNN throw with NVTE_CHECK Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix replacement of cuDNN throw with NVTE_CHECK Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * force dropout probablity to 0 in inference mode Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * change negInfinity to be consistent with m512 fused attn Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove float2half conversion for scale_dropout Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add back runtime api for sm detection Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add gemm3 to enums FP8Fwd/BwdTensors Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * change dropout from no to yes for fmha_v1 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove output_rng_state in m512 kernels Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix elts_per_thread calculation in kvpacked fwd Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove dropout=0.0 restriction for m512 fused attn Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove output_rng_state completely from m512 kernels Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 02 Jun, 2023 1 commit
-
-
Jan Bielak authored
* Ignore IDE files Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Fix typing errors Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Ignore devcontainer files Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Avoid import from private module Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Apply @timmoon10 's suggestions Signed-off-by:
Jan Bielak <jbielak@nvidia.com> --------- Signed-off-by:
Jan Bielak <jbielak@nvidia.com>
-
- 21 Apr, 2023 1 commit
-
-
cyanguwa authored
* Add FP8 fused attention to TE for PyTorch Signed-off-by:
Charlene Yang <charleney@nvidia.com> * add license for cudnn-frontend, modify installation requirements, and refactor some headers for aesthetics Signed-off-by:
Charlene Yang <charleney@nvidia.com> * add c api docs for fused attention Signed-off-by:
Charlene Yang <charleney@nvidia.com> * add exception for unsupported precision/sequence length combinations Signed-off-by:
Charlene Yang <charleney@nvidia.com> * fix installation requirement for non fused attn use cases Signed-off-by:
Charlene Yang <charleney@nvidia.com> * fix docs for fused-attn Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * prefix enums with NVTE_ and replace old MHA_Matrix with NVTE_QKV_Matrix Signed-off-by:
Charlene Yang <charleney@nvidia.com> * minor fixes based on PR comments Signed-off-by:
Charlene Yang <charleney@nvidia.com> * fix description for kvpacked fwd Signed-off-by:
Charlene Yang <charleney@nvidia.com> * fix description of Bias in C api Signed-off-by:
Charlene Yang <charleney@nvidia.com> * minor fixes for cudnn requirement and description for QKV tensors Signed-off-by:
Charlene Yang <charleney@nvidia.com> * fix QKV layout description and support matrix for C api Signed-off-by:
Charlene Yang <charleney@nvidia.com> * add asserts to cpp_extensions for qkv layout/bias type/attn mask type Signed-off-by:
Charlene Yang <charleney@nvidia.com> * fix typo precision Signed-off-by:
Charlene Yang <charleney@nvidia.com> --------- Signed-off-by:
Charlene Yang <charleney@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Charlene Yang <charleney@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 03 Jan, 2023 1 commit
-
-
Przemyslaw Tredak authored
Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Przemek Tredak <ptredak@nvidia.com>
-
- 28 Sep, 2022 1 commit
-
-
Przemek Tredak authored
Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Przemek Tredak <ptredak@nvidia.com>
-