- 09 Oct, 2024 2 commits
-
-
Charlene Yang authored
* improve get_attention_backend logic Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * polish logic and wording Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove redundant comment Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Charlene Yang authored
* add extra_state change description for different TE versions Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add FAQ page Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update FAQ page Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix extra_state tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fixes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 08 Oct, 2024 1 commit
-
-
Charlene Yang authored
* add qkv descales to FA3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix sbhd shapes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * force the same dtype when comparing FA3 and cuDNN FP8 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Revert "force the same dtype when comparing FA3 and cuDNN FP8" This reverts commit 19e7f877026a19a32d2f02c6c9de20df4ae2e064. Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * force the same dtype when comparing FA3 and cuDNN FP8 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add try/except for FA3 when custom qkv descales are not supported Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * replace FA3 installation warning with a debug logging message Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused imports Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * avoid varlen_func for FP8 and improve messaging Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add SWA support for FA3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change preference reason for FP8 logic Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fixes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fix Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 07 Oct, 2024 1 commit
-
-
Xiaowei Ren authored
* change API for hierarchical CP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * move fp8 code before qkv reshape Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * try to insert A2A for hierarchical CP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * make fwd work Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * remove a redundant sync Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * make bwd of hierarchical CP work Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix dout a2a in bwd Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix q_f16 with fp8 Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * assert hierarchical CP implementation does not support THD format Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * bug fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * assert hierarchical CP does not support attn bias Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add unit test for hierarchical CP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix cp_comm_type in unit test Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * bug fix and code cleaning Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * minor change Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * an assert info change Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * dout shape fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move function definitions to the front of the first call Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix tensor view comments Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * refine CP unit test Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * typo fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * typo fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * save cp_size_a2a and rank_a2a in fwd Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add more explainations of cp_group in doc_string Signed-off-by:
Xiaowei Ren <xren@nvidia.com> --------- Signed-off-by:
Xiaowei Ren <xren@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 03 Oct, 2024 1 commit
-
-
Charlene Yang authored
move block_table arg to varlen_func section Signed-off-by:Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
-
- 27 Sep, 2024 1 commit
-
-
Paweł Gadziński authored
* Docs fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * docs fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * docs fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
Pawel Gadzinski <pgadzinski@nvidia.com>
-
- 19 Sep, 2024 1 commit
-
-
Xin Yao authored
* relax contiguous check for flash attention Signed-off-by:
Xin Yao <xiny@nvidia.com> * force contiguous for cp Signed-off-by:
Xin Yao <xiny@nvidia.com> --------- Signed-off-by:
Xin Yao <xiny@nvidia.com>
-
- 18 Sep, 2024 1 commit
-
-
Sudhakar Singh authored
* make rotary_base arg Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * rotary base can be a float Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> --------- Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 09 Sep, 2024 1 commit
-
-
Xiaowei Ren authored
* clean code for CP function args Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add a placeholder for Ulysses implementation Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * commit code change to CP+A2A Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * finish the draft fwd implementation of Ulysses Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add draft bwd implementation of Ulysses Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * make swa work with ulysses Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * commit FP8 code for Ulysses Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix qkv type in the bwd of FP8+CP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * typo fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix qkv_dtype of FP8+CP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * code refactoring Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * minor code change Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * config cp correction dtype of FP8+CP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * code style change Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * save chunk_ids Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * try to make Ulysses A2A async Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * make more a2a async Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix a2a_outputs Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix chunk_ids generation for A2A Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * avoid code duplication of a2a before attn Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * remove code duplication of a2a after attn Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add cp_stream in A2A implementation Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * bug fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix qkv of fp8_fwd + bf16_bwd Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix kernel order in cp a2a communication Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * code cleaning for CP a2a Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix merging with main Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix a2a communication order Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * adjust sequence chunk reordering for a2a Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add docstring for A2A implementation Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * change an assert info Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add unit tests of A2A implementation Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add more A2A unit test Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix CP unit tests Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add more cp unit tests Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix window size of no_mask Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fused attn does not support swa+no_mask Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * change num_gqa_groups to 2 for A2A implementation Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * function and variable renaming Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * code cleaning for CP all-gather implementation Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * some function renaming Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * remove redundant code Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * commit code change for kv all-gather implementation Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix all-gather implementation Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add a window size check Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * code cleaning Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add unit test of all_gather+no_mask Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix all-gather cp implementation Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * code cleaning Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * code format fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * code format fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix FP8 with A2A implementation Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add paper references to CP implementations with all-gather and all-to-all Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * change pdf to abs Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * elaborate cp_comm_type Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix CP docstring Signed-off-by:
Xiaowei Ren <xren@nvidia.com> --------- Signed-off-by:
Xiaowei Ren <xren@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 05 Sep, 2024 2 commits
-
-
Selvaraj Anandaraj authored
* Added offloading support FP8 attention Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com> * Update transformer_engine/pytorch/attention.py Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Selvaraj Anandaraj <anandaraj@wisc.edu> * Fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com> Signed-off-by:
Selvaraj Anandaraj <anandaraj@wisc.edu> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Xin Yao authored
* fp8 mha with rope Signed-off-by:
Xin Yao <xiny@nvidia.com> * avoid index select in cast ops Signed-off-by:
Xin Yao <xiny@nvidia.com> * avoid index select in fused_attn_fwd Signed-off-by:
Xin Yao <xiny@nvidia.com> * rename is_first_module_in_mha to fp8_output Signed-off-by:
Xin Yao <xiny@nvidia.com> * resolve comments Signed-off-by:
Xin Yao <xiny@nvidia.com> * resolve comments Signed-off-by:
Xin Yao <xiny@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move transpose to backward for fp8 input Signed-off-by:
Xin Yao <xiny@nvidia.com> * fix ut Signed-off-by:
Xin Yao <xiny@nvidia.com> * resolve comments Signed-off-by:
Xin Yao <xiny@nvidia.com> * update argument list for CP Signed-off-by:
Xin Yao <xiny@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix for FA3 Signed-off-by:
Xin Yao <xiny@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unnecessary copy of scale_inv Signed-off-by:
Xin Yao <xiny@nvidia.com> * skip fp8 dpa/mha tests when fa3 is not available Signed-off-by:
Xin Yao <xiny@nvidia.com> * fix a merge bug Signed-off-by:
Xin Yao <xiny@nvidia.com> --------- Signed-off-by:
Xin Yao <xiny@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 30 Aug, 2024 2 commits
-
-
Xiaowei Ren authored
* fix qkv_dtype of FP8+CP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * config cp correction dtype of FP8+CP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * code style change Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * always do FP8 CP correction in FP32 Signed-off-by:
Xiaowei Ren <xren@nvidia.com> --------- Signed-off-by:
Xiaowei Ren <xren@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
-
Charlene Yang authored
* fix FP8 logic when FA3 is not installed Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor tweak to make logic more explicit Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fixes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * limit FA3 warning to Hopper and NVTE_FLASH_ATTN=1 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * prefer fused attn for FP8 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
-
- 23 Aug, 2024 1 commit
-
-
Charlene Yang authored
* WIP: add fa3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: clean up Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * WIP: add benchmarks Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * differentiate func/varlen_func Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix parsing keyword for FA3 and remove bshd->thd conversion for flash_attn_func Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: add FP8 fwd support Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add FA3 FP8 fwd code and test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix assert for FA3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix FA3 FP8 logic and add tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update FA2 to <=2.6.3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * tweak unit tests for base/mask Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * set constraints for FA3 for sm90 and causal_bottom_right Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * revert debug changes in benchmark script Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 21 Aug, 2024 2 commits
-
-
Charlene Yang authored
* add support for padding in UnfusedDPA Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for padding_causal/_bottom_right Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix padding_causal/_bottom_right Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * need to test max512 backend Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert last commit Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix mask logic in unfused Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use actual_seqlen for alibi/causal_bottom_right padding Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fixes and convert causal to causal_bottom_right for inference Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use causal in kv cache inference test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * simplify get_alibi logic Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * simplify the non-padding path for get_alibi Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * avoid batch_size loop in generating padding_causal/_bottom_right masks Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Xiaowei Ren authored
* add window_size to AttnFuncWithCP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add seq_offsets_qkvo for cudnn thd Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add seq_offsets_qkvo to AttnFuncWithCP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix seq_offsets calculation of cudnn thd Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * remove a thd assert Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix bias for thd test Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add thd test for cudnn FA with CP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * skip GQA/MQA test for cuDNN THD Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * make sure seq_offsets are computed with qkv_group of hd_hd_hd while CP>1 Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix seq_offsets inputs Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * remove two comments Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix attn mask type for cudnn thd with cp Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix attn_mask_type check Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix attn_mask_type for cudnn fa with thd Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix a typo Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix out dout in bwd Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * assert cudnn+thd does not support attn bias Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * check if attn_mask_type has padding Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * minor change Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * change cp test batch size to 2 Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix code format Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix two assert info Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix assert comment Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix assert comments Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * minor fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix assert comments Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * assert swa+CP cannot work with thd format Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add a new CP function for swa Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add a missing dgrads Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * minor change Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add draft fwd function for swa+cp Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * minor change Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * enable flash attention for swa+cp Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * remove an assert of swa+cp Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * call SWAFuncWithCP for swa+cp Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * typo fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * use 2hd layout Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change qkv_format check Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add a code comment Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * tensor shape bug fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * tensor shape fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add function to compute cu_seqlens of a cp rank Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add cu_seqlens and cu_seqlens_padded to context parallelism Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * typo fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * minor change Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix FlashAttention output sequence length Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix cu_seqlens_kv_per_step calculation Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * zero dQKV for ending padded tokens Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * zero dQKV tensors of FlashAttention Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix softmax_lse correction Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * remove padded tokens of KV to save comounication Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * do not need to zero dkv for FlashAttention any mroe Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * zero out tensors Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * remove redundant code Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix CP unit test Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix kv shape of cp test with thd format Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * update cp unit test Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add simple code framework Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * try not to have a separate CP function for SWA Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * backup some code change Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * back up code Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * clean up fwd implementation of SWAFuncWithCP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * remove redundant code Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * code cleaning Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix assert info Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * reduce kv chunk concat overheads Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * minor change Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * make AttnFuncWithCP and SWAFuncWithCP have same API Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add a docstring Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * preliminary implementation of SWAFuncWithCP forward seems working Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix output shape of SWAFuncWithCP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * code refactoring for FlashAttention and add a code placeholder for bwd Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * use gather_along_first_dim Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * finish the preliminary implementation of bwd Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * remove redundant code Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix assert condition Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add draft implementation of SWA+CP with FusedAttention Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix attention mask type of swa+cp Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * code cleaning Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add qkv_layout Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add missing window_size argument Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * typo fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix kv shape of swa+cp Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * bug and typo fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix dout shape Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add multi stream in fwd of swa+cp Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * save chunk_ids_to_kv_ag in fwd Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add multi stream in bwd of swa+cp Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * minor fix to cp stream sync Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * rename AttnFuncWithCP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * check if window size is None Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix docstring of AttnFuncWithCP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * minor fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add env var for users to choose KV ag or KV p2p Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * update cp tests Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix window size in cp unit test Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix pytest skip messages Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add cp_comm_type into API Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * code cleaning Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add deterministic konb in cuDNN fused attn backend Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * pass fp8 and fp8_meta to attn_func_with_cp Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * assert only Fused Attn can support FP8+CP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * remove redundant assert Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add a fwd draft implementation of FP8 + CP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * save fp8 and fp8_meta Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * assert sequence length divisible requirements Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove a redundant qkv_layout compute Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * if condition change Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * some typo fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add support table of context parallelism Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * typo and code format fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * do not print multiple disabling messages Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * bug fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix aux_ctx_tensors of FP8 Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * bug fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix device in torch.arange and adjust code for the PR of MLA Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * commit code change for FP8+CP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * commit more code change for FP8+CP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * commit more fp8 code for FP8+CP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * bug fixes Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * bug fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * cast merged CP results from FP32 to BF16 Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * typo fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * minor change Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix softmax_lse Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix some bugs of FP8 dkv exchange Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * typo fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add FP8 unit test Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix typos and clean asserts Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix get_p2p_comm_info Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix dkv p2p exchange Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * minor fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * change FP8 dkv P2P to A2A Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add FP8+CP unit test Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * typo fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * assert amax reduction is needed for FP8+CP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove duplicated code Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * destroy process group in CP unit test Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * remove interval from fp8_recipe because it has been deprecated Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * try to fix the failed CP test with the latest CI pipeline Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove redundant f before string Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * change META_O_CP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> --------- Signed-off-by:
Xiaowei Ren <xren@nvidia.com> Co-authored-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Xiaowei Ren <xren@cs-cw-dfw-login-01.cm.cluster>
-
- 20 Aug, 2024 1 commit
-
-
hXl3s authored
feat(pytorch): Allow TransformerLayer and MultiheadAttention to accept sequence length parameters (#1066) * Added ability for seqlen for transformer and mha layer Signed-off-by:
Lukasz Pierscieniewski <lukaszp@nvidia.com> * Documentation for new parameters Signed-off-by:
Lukasz Pierscieniewski <lukaszp@nvidia.com> * Add tests for THD layout, assert for THD layout with KV-Cache Signed-off-by:
Lukasz Pierscieniewski <lukaszp@nvidia.com> * Fixed tests Signed-off-by:
Lukasz Pierscieniewski <lukaszp@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Move THD logic in shape calculation, add missing optional in params Signed-off-by:
Lukasz Pierscieniewski <lukaszp@nvidia.com> * Skip the THD test on GPUs older than Ampere Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> --------- Signed-off-by:
Lukasz Pierscieniewski <lukaszp@nvidia.com> Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Przemek Tredak <ptredak@nvidia.com>
-
- 16 Aug, 2024 1 commit
-
-
Xiaowei Ren authored
* add window_size to AttnFuncWithCP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add seq_offsets_qkvo for cudnn thd Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add seq_offsets_qkvo to AttnFuncWithCP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix seq_offsets calculation of cudnn thd Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * remove a thd assert Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix bias for thd test Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add thd test for cudnn FA with CP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * skip GQA/MQA test for cuDNN THD Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * make sure seq_offsets are computed with qkv_group of hd_hd_hd while CP>1 Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix seq_offsets inputs Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * remove two comments Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix attn mask type for cudnn thd with cp Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix attn_mask_type check Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix attn_mask_type for cudnn fa with thd Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix a typo Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix out dout in bwd Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * assert cudnn+thd does not support attn bias Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * check if attn_mask_type has padding Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * minor change Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * change cp test batch size to 2 Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix code format Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix two assert info Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix assert comment Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix assert comments Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * minor fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix assert comments Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * assert swa+CP cannot work with thd format Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add a new CP function for swa Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add a missing dgrads Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * minor change Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add draft fwd function for swa+cp Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * minor change Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * enable flash attention for swa+cp Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * remove an assert of swa+cp Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * call SWAFuncWithCP for swa+cp Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * typo fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * use 2hd layout Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change qkv_format check Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add a code comment Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * tensor shape bug fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * tensor shape fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add function to compute cu_seqlens of a cp rank Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add cu_seqlens and cu_seqlens_padded to context parallelism Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * typo fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * minor change Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix FlashAttention output sequence length Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix cu_seqlens_kv_per_step calculation Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * zero dQKV for ending padded tokens Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * zero dQKV tensors of FlashAttention Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix softmax_lse correction Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * remove padded tokens of KV to save comounication Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * do not need to zero dkv for FlashAttention any mroe Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * zero out tensors Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * remove redundant code Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix CP unit test Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix kv shape of cp test with thd format Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * update cp unit test Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add simple code framework Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * try not to have a separate CP function for SWA Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * backup some code change Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * back up code Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * clean up fwd implementation of SWAFuncWithCP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * remove redundant code Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * code cleaning Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix assert info Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * reduce kv chunk concat overheads Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * minor change Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * make AttnFuncWithCP and SWAFuncWithCP have same API Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add a docstring Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * preliminary implementation of SWAFuncWithCP forward seems working Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix output shape of SWAFuncWithCP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * code refactoring for FlashAttention and add a code placeholder for bwd Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * use gather_along_first_dim Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * finish the preliminary implementation of bwd Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * remove redundant code Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix assert condition Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add draft implementation of SWA+CP with FusedAttention Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix attention mask type of swa+cp Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * code cleaning Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add qkv_layout Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add missing window_size argument Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * typo fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix kv shape of swa+cp Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * bug and typo fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix dout shape Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add multi stream in fwd of swa+cp Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * save chunk_ids_to_kv_ag in fwd Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add multi stream in bwd of swa+cp Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * minor fix to cp stream sync Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * rename AttnFuncWithCP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * check if window size is None Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix docstring of AttnFuncWithCP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * minor fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add env var for users to choose KV ag or KV p2p Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * update cp tests Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix window size in cp unit test Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix pytest skip messages Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add cp_comm_type into API Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * code cleaning Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * assert sequence length divisible requirements Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support table of context parallelism Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * typo and code format fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * do not print multiple disabling messages Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * bug fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix device in torch.arange and adjust code for the PR of MLA Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typos and clean asserts Signed-off-by:
Xiaowei Ren <xren@nvidia.com> --------- Signed-off-by:
Xiaowei Ren <xren@nvidia.com> Co-authored-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Xiaowei Ren <xren@cs-cw-dfw-login-01.cm.cluster>
-
- 15 Aug, 2024 2 commits
-
-
Charlene Yang authored
fix typos regarding t in thd Signed-off-by:Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
-
Marks101 authored
Signed-off-by:
Markus Schnoes <markus.schnoes@gmx.de> Co-authored-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
-
- 13 Aug, 2024 1 commit
-
-
Charlene Yang authored
* merge k_channels and v_channels back to kv_channels and accept a tuple Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix isinstance call Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix MLA tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 06 Aug, 2024 3 commits
-
-
Charlene Yang authored
reduce the roundup of max_seqlen for THD to multiples of 64 Signed-off-by:Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
-
Charlene Yang authored
* fix logging in attention Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove logging in fwd/bwd methods due to CPU overhead Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: fix check_set_window_size messaging Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix typo Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix window_size messaging Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove redundant imports Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Charlene Yang authored
* add multi-latent attention for DPA Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Jax/Paddle API Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix typo in test script Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix too-many-boolean lint error Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Revert "fix lint" This reverts commit 67399a3a6f45bb4ce9e5eaa6bcce40b28e347e5b. Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix stride check in get_qkv_layout Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: fix layout_thd tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * WIP: debug info Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix merge conflict Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix thd pad_between_seqs=False/True tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 02 Aug, 2024 1 commit
-
-
Li Tao authored
fix an argument issue when flash_attn>=2.5.7 Signed-off-by:
Li Tao <lit@nvidia.com> Co-authored-by:
Li Tao <lit@nvidia.com> Co-authored-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
-
- 01 Aug, 2024 1 commit
-
-
Xiaowei Ren authored
* use 2hd layout Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change qkv_format check Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add a code comment Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * tensor shape bug fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * tensor shape fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add function to compute cu_seqlens of a cp rank Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add cu_seqlens and cu_seqlens_padded to context parallelism Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * typo fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * minor change Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix FlashAttention output sequence length Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix cu_seqlens_kv_per_step calculation Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * zero dQKV for ending padded tokens Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * zero dQKV tensors of FlashAttention Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix softmax_lse correction Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * remove padded tokens of KV to save comounication Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * do not need to zero dkv for FlashAttention any mroe Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * zero out tensors Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * remove redundant code Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix CP unit test Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix kv shape of cp test with thd format Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * update cp unit test Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove redundant code Signed-off-by:
Xiaowei Ren <xren@nvidia.com> --------- Signed-off-by:
Xiaowei Ren <xren@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
Xiaowei Ren <xren@cs-cw-dfw-login-01.cm.cluster>
-
- 26 Jul, 2024 1 commit
-
-
Charlene Yang authored
fix tp_size for GQA Signed-off-by:Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
-
- 25 Jul, 2024 1 commit
-
-
李金梁 authored
This bug will cause bug [ERROR] failed (exitcode: -11) local_rank: 0 (pid: 1761020) of binary: ~/megatron/bin/python. That is because we miss the rng_states that is required in attention recompute (for dropout), but no hint is provided. It is very very very difficult to trace and cost me two weeks. ```python before the start of training step] datetime: 2024-07-22 18:26:45 [2024-07-22 18:27:00,941] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: -11) local_rank: 0 (pid: 1761020) of binary: /home//miniconda3/envs/megatron/bin/python Traceback (most recent call last): File "/home//miniconda3/envs/megatron/bin/torchrun", line 33, in <module> sys.exit(load_entry_point('torch==2.2.1+cu121', 'console_scripts', 'torchrun')()) File "/home//miniconda3/envs/megatron/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper return f(*args, **kwargs) File "/home//miniconda3/envs/megatron/lib/python3.10/site-packages/torch/distributed/run.py", line 812, in main run(args) File "/home//miniconda3/envs/megatron/lib/python3.10/site-packages/torch/distributed/run.py", line 803, in run elastic_launch( File "/home//miniconda3/envs/megatron/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 135, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/home//miniconda3/envs/megatron/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ``` Signed-off-by:李金梁 <975761915@qq.com>
-
- 19 Jul, 2024 1 commit
-
-
Charlene Yang authored
* initialize output tensors to 0 for THD while waiting for cuDNN bug fix Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move fill_() to F16 loop Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix fused_attn_bwd() Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * correct typo in check_set_window_size Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * use nvtx3 instead Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 10 Jul, 2024 1 commit
-
-
Charlene Yang authored
* add cuDNN swa Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix SWA Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add set_deterministic and minor fixes for swa Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add AttentionParams Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * change window_size to int64_t; fix swa/determinism tests; cache _attention_backends Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add window_size to get_backend; fix jax and paddle Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fixes; add set_deter to bwd_impl Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix unit tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix FP8 tests due to determinism Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add support matrix for SWA and bias Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fixes and lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fixes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add wording on window_size special cases Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor tweak on wording Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix jax assertion error Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix wording Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * call bwd with deterministic=true for jax/paddle Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add determinism words in documentation Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 03 Jul, 2024 1 commit
-
-
Charlene Yang authored
* update to FE 1.5.1 and add bottom right causal Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adjust logic for backend selection Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update FE to 1.5.2 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add get_attention_backend function Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update get_attention_backend Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix get_attention_backend Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * tweak get_attention_backend and fix unit tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fixes for unfused, get_backend, etc Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update transformer_engine/pytorch/attention.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix cpu offload Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fixes for get_attention_backend Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * explicitly skip FP32 and padding tests because there is no support Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fix for window size check Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update check_set_window_size and add enc_dec_attn_mask_type/enc_dec_window_size Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fixes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 01 Jul, 2024 1 commit
-
-
Charlene Yang authored
* update FE to 1.5.2 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * enable unfused attn for cross attn Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * unify logging info Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * omit cudnn 9.1.1 and 9.2.1 due to bugs Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * set cu_seqlens_padded to cu_seqlens by default Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * replace variable name with ctx.variable Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Revert "enable unfused attn for cross attn" This reverts commit bc49f14fca904217a711b4a86c45a4a739a17a14. Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * restrict cudnn version for fp8 tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove mha_fill for FP8 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Revert "remove mha_fill for FP8" This reverts commit 83ffc44114dc6eb3d426d742b6c5a4d34805ec04. Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * lower cudnn version to >=9.2.1 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 18 Jun, 2024 2 commits
-
-
Charlene Yang authored
* simplify offset tensors Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fixes; tests pass Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix C lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * replace with_offset with with_padding Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * replace with_padding with padded Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fixes after merge Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fix for fused attn fwd/bwd calls Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Jax Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adjust spacing in docstring Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix pytorch tests; fix paddle api Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix attn_biases Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix AttnFuncWithCP backward Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix jax Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix attn with CP Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix paddle Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Charlene Yang authored
fix tp_initialized error Signed-off-by:Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
-
- 15 Jun, 2024 1 commit
-
-
Charlene Yang authored
* subclass DPA with BaseModule and test with test_gpt_checkpointing Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * test DPA only Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * test save and load Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove debug info Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor tweaks Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor tweak Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add hook in case core_attention._extra_state is missing Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * check named buffers in BaseModule; remove FP8 scratchpad override function; test FP8 for sm90+ Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fixes: test size, interval in recipe, named_buffer loop Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * move BaseModule from FusedAttention to DPA Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 14 Jun, 2024 3 commits
-
-
Kirthi Shankar Sivamani authored
* Apply formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Apply formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Kirthi Shankar Sivamani authored
* Initial config test Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * remove linters, fix clang-format Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix clang-format Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix clang-format Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Remove lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Adjust config Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * use config file Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * adjust pylintrc Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * pre-format fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Python only Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add FA module Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update CI configs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * CRLF -> LF Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * format Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * revert accidental formatting changes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * try with sudo Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * cpp formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix pylint error properly Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * some review comments Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * lint fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * add fp8 attn include in the correct file Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * autofix PRs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Charlene Yang authored
* add attention docs Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: update attention doc Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: update attention doc Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: update attention doc Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: update attn doc Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: update attn doc Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: update attn doc Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: update attention doc Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * first draft Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor tweak to first draft Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * clean up pictures Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * first draft for review Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fixes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add logging info/debug Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fix of an SWA message Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * use subprocess instaed of os.sys Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * clean up benchmark script Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add example script and update notebook Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor tweak Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor tweaks Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix Jax/Paddle related comments Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * rerun H100 benchmark Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * restrict fp8 tests to sm90+ Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * move get_cudnn_version from common to pytorch utils Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
-
- 13 Jun, 2024 1 commit
-
-
BoxiangW authored
* Add norm_factor arg into DotProductAttention Signed-off-by:
Boxiang Wang <boxiangw@nvidia.com> * Change kwarg name from `norm_factor` to `softmax_scale` Signed-off-by:
Boxiang Wang <boxiangw@nvidia.com> * Change all norm_factor representation into softmax_scale Signed-off-by:
Boxiang Wang <boxiangw@nvidia.com> * Update transformer_engine/pytorch/attention.py Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> * Update attention.py changing typo Signed-off-by:
BoxiangW <45734921+BoxiangW@users.noreply.github.com> --------- Signed-off-by:
Boxiang Wang <boxiangw@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
BoxiangW <45734921+BoxiangW@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 10 Jun, 2024 1 commit
-
-
Xiaowei Ren authored
* add seq_offsets_qkvo for cudnn thd Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add seq_offsets_qkvo to AttnFuncWithCP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix seq_offsets calculation of cudnn thd Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * remove a thd assert Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix bias for thd test Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add thd test for cudnn FA with CP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * skip GQA/MQA test for cuDNN THD Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * make sure seq_offsets are computed with qkv_group of hd_hd_hd while CP>1 Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix seq_offsets inputs Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * remove two comments Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix attn mask type for cudnn thd with cp Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix attn_mask_type check Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix attn_mask_type for cudnn fa with thd Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix a typo Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix out dout in bwd Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * assert cudnn+thd does not support attn bias Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * check if attn_mask_type has padding Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * minor change Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * change cp test batch size to 2 Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix code format Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix two assert info Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix assert comment Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix assert comments Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * minor fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix assert comments Signed-off-by:
Xiaowei Ren <xren@nvidia.com> --------- Signed-off-by:
Xiaowei Ren <xren@nvidia.com> Co-authored-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
-