- 23 Jul, 2024 4 commits
-
-
Tri Dao authored
-
Tri Dao authored
-
rocking authored
* Support ck in fmha * Add ck submodule * Do not return lse if return_softmax == false * Use receipt to speed up ck compile time * Integrate new version of ck_tile * Support dropout for mha_fwd() * Add dropout to mha_varlen_fwd() * Update ck to develop * Extract padding function for dropout randval * Extract randval transformation function * Sync the code structure and coding style with FA * Remove this line, c++ api will handle this. Sync with test_flash_attn.py * fix compile error * Add mha_bwd * Generate dropout seed and offset from user generator * update CK * Add mha_varlen_bwd * Use same python as build flash-attn to generate ck kernel * Fix bug of group mode fwd about returning softmax lse * larger the test tollerance * Add test_flash_attn_output() and test_flash_attn_varlen_output() * Always fill softmax_lse * Remove duplicate benchmark script, since we already implement mha_bwd * Refine get value from tuple * Use default parameter for stream_config * unblock all platform * Add comment * refine the test code * Refine naming * Add unpack to namespace * Do not hardcode the warp size 64 * Add more targets * Add README * Optimize mha_fwd if seqlen_q == 1 * Support get_wheel_url for rocm * Detect rocm environment by pytorch's IS_HIP_EXTENSION * update to lastest ck * Add necessary compile flag * Sync the api with upstream FA --------- Co-authored-by:
carlushuang <carlus.huang@amd.com> Co-authored-by:
Yichen Yan <wenji.yyc@alibaba-inc.com> Co-authored-by:
Po Yen Chen <PoYen.Chen@amd.com> Co-authored-by:
Yichen Yan <oraluben@outlook.com>
-
Ying Zhang authored
* fwd var-seq-len * fixes * benchmark * fixes --------- Co-authored-by:Tri Dao <tridao@users.noreply.github.com>
-
- 22 Jul, 2024 1 commit
-
-
Phil Wang authored
* check in the two ways of approaching backwards for softcapping, both functional * prepare the softcap switch for backwards * temporary * cleanup to the way Tri prefers * calculate dtanh when copying from scores -> dtanh Tensor * no ternary operators allowed for constexpr, so just use some hack found online * fix maybe_dtanh, restore some files * restore another file * move calculate_dtanh to utils and colocate with apply_softcap * cleanup * maybe last cleanup * save for another pr * remove a stray line * fix spacing * fix an issue, and make test_flash_attn.py ready to test softcapping backwards
-
- 11 Jul, 2024 1 commit
-
-
Tri Dao authored
-
- 10 Jul, 2024 4 commits
- 08 Jul, 2024 1 commit
-
-
Nicolas Patry authored
* Softcap v2 (fwd only). * Some missing interface + remove overrides in tests.
-
- 03 Jul, 2024 1 commit
-
-
muoshuosha authored
Co-authored-by:moshuosha <moshuosha@qq.com>
-
- 01 Jul, 2024 1 commit
-
-
cao lei authored
-
- 27 Jun, 2024 1 commit
-
-
Grigory Sizov authored
* Support unpadded LSE layout. Co-authored-by:
Xinfeng Xie <xfxie.ceca@gmail.com> Co-authored-by:
Jianyu Huang <hjyahead@gmail.com> * Cleanup * Fix unpadded LSE on split-kv path * Fix formatting and comments * Fix inline vs forceinline --------- Co-authored-by:
Xinfeng Xie <xfxie.ceca@gmail.com> Co-authored-by:
Jianyu Huang <hjyahead@gmail.com>
-
- 05 Apr, 2024 1 commit
-
-
Ivan Komarov authored
All integer parameters are specialized by default, so the two parameters removed in this commit could lead to kernel re-compilation, even if they were completely unused.
-
- 15 Mar, 2024 1 commit
-
-
Grigory Sizov authored
* Enable paged attention in varlen forward * Format + fix padding
-
- 21 Feb, 2024 1 commit
-
-
Tri Dao authored
-
- 23 Jan, 2024 1 commit
-
-
Tri Dao authored
Co-authored-by:ljss <450993438@qq.com>
-
- 21 Jan, 2024 1 commit
-
-
Curtis "Fjord" Hawthorne authored
-
- 14 Jan, 2024 1 commit
-
-
Tri Dao authored
-
- 13 Jan, 2024 1 commit
-
-
Tri Dao authored
-
- 05 Jan, 2024 2 commits
- 04 Jan, 2024 1 commit
-
-
Tri Dao authored
-
- 25 Dec, 2023 3 commits
- 24 Dec, 2023 1 commit
-
-
Tri Dao authored
-
- 23 Dec, 2023 1 commit
-
-
Tri Dao authored
-
- 22 Dec, 2023 2 commits
- 20 Dec, 2023 2 commits
-
-
Sanghun Cho authored
* hard-code alibi in fwd * use params.h as hun_heads * hard-code alibi in bwd * add alibi on/off option * compute alibi_start, ratio outside of kernels * fix minor merge conflict * add test_alibi.py * change apply_alibi() location before masking * add alibi in splitkv kernel * fix backward func # of returns * add out-of-bound check in apply_alibi() * update test_alibi.py * update test_alibi.py for kvcache * simplify alibi parameter interface * fix performance issue by computing alibi outside of branch * update test_flash_attn_varlen_func() for left padding * implement alibi_slopes (b, nh) loading * optimize apply_alibi() a bit * update test cases for alibi_slopes loading * reflect stylistic comments * disable "seqlenq_ngroups_swapped" when using alibi --------- Co-authored-by:monk.detective <monk.detective@kakaobrain.com>
-
Tri Dao authored
-
- 17 Dec, 2023 2 commits
- 01 Dec, 2023 1 commit
-
-
Tri Dao authored
-
- 20 Nov, 2023 2 commits
- 14 Nov, 2023 1 commit
-
-
Tri Dao authored
-
- 13 Nov, 2023 1 commit
-
-
Tri Dao authored
-