Commits · 299563626fbfcd8345e7da2f4e1bb93886b58341 · gaoqiong / flash-attention

23 Jul, 2024 4 commits

Fix test with alibi and cache_leftpad · 29956362
Tri Dao authored Jul 23, 2024

29956362
Don't specialize for hdim 224 to speed up compilation · 751c762c
Tri Dao authored Jul 22, 2024

751c762c

Support AMD ROCm on FlashAttention 2 (#1010) · d8f104e9

rocking authored Jul 23, 2024



* Support ck in fmha

* Add ck submodule

* Do not return lse if return_softmax == false

* Use receipt to speed up ck compile time

* Integrate new version of ck_tile

* Support dropout for mha_fwd()

* Add dropout to mha_varlen_fwd()

* Update ck to develop

* Extract padding function for dropout randval

* Extract randval transformation function

* Sync the code structure and coding style with FA

* Remove this line, c++ api will handle this.
Sync with test_flash_attn.py

* fix compile error

* Add mha_bwd

* Generate dropout seed and offset from user generator

* update CK

* Add mha_varlen_bwd

* Use same python as build flash-attn to generate ck kernel

* Fix bug of group mode fwd about returning softmax lse

* larger the test tollerance

* Add test_flash_attn_output() and test_flash_attn_varlen_output()

* Always fill softmax_lse

* Remove duplicate benchmark script, since we already implement mha_bwd

* Refine get value from tuple

* Use default parameter for stream_config

* unblock all platform

* Add comment

* refine the test code

* Refine naming

* Add unpack to namespace

* Do not hardcode the warp size 64

* Add more targets

* Add README

* Optimize mha_fwd if seqlen_q == 1

* Support get_wheel_url for rocm

* Detect rocm environment by pytorch's IS_HIP_EXTENSION

* update to lastest ck

* Add necessary compile flag

* Sync the api with upstream FA

---------
Co-authored-by: carlushuang <carlus.huang@amd.com>
Co-authored-by: Yichen Yan <wenji.yyc@alibaba-inc.com>
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
Co-authored-by: Yichen Yan <oraluben@outlook.com>

d8f104e9

Add var-seq-len to FA3 fp16 / bf16 fwd (#1072) · dfe1a59e

Ying Zhang authored Jul 22, 2024



* fwd var-seq-len

* fixes

* benchmark

* fixes

---------
Co-authored-by: Tri Dao <tridao@users.noreply.github.com>

dfe1a59e

22 Jul, 2024 1 commit

backwards for softcapping (#1033) · 5f1ae4a3

Phil Wang authored Jul 21, 2024

* check in the two ways of approaching backwards for softcapping, both functional

* prepare the softcap switch for backwards

* temporary

* cleanup to the way Tri prefers

* calculate dtanh when copying from scores -> dtanh Tensor

* no ternary operators allowed for constexpr, so just use some hack found online

* fix maybe_dtanh, restore some files

* restore another file

* move calculate_dtanh to utils and colocate with apply_softcap

* cleanup

* maybe last cleanup

* save for another pr

* remove a stray line

* fix spacing

* fix an issue, and make test_flash_attn.py ready to test softcapping backwards

5f1ae4a3

11 Jul, 2024 1 commit
- Implement cache_leftpad · 40e534a7
  Tri Dao authored Jul 11, 2024
  
  40e534a7
10 Jul, 2024 4 commits
- Relax dropout_fraction test · d0787acc
  Tri Dao authored Jul 10, 2024
  
  d0787acc
- Don't support softcap and dropout at the same time · dca6d89d
  Tri Dao authored Jul 10, 2024
```
These tests are failing so I'm just disabling this case for now
```
  dca6d89d
- More typo fixes · 81e01efd
  Tri Dao authored Jul 10, 2024
  
  81e01efd
- Only test backward if there's no softcapping · 3d41db3e
  Tri Dao authored Jul 10, 2024
  
  3d41db3e
08 Jul, 2024 1 commit
- Implement softcapping. (#1025) · 8f873cc6
  Nicolas Patry authored Jul 08, 2024
```
* Softcap v2 (fwd only).

* Some missing interface + remove overrides in tests.
```
  8f873cc6
03 Jul, 2024 1 commit
- Fix the varlen deterministic test (#1023) · 6df7e0a0
  muoshuosha authored Jul 04, 2024
```
Co-authored-by: moshuosha <moshuosha@qq.com>
```
  6df7e0a0
01 Jul, 2024 1 commit
- fix typo (#974) · 6a2a16e9
  cao lei authored Jun 30, 2024
  
  6a2a16e9
27 Jun, 2024 1 commit

Support unpadded LSE layout (#970) · f816dee6

Grigory Sizov authored Jun 27, 2024



* Support unpadded LSE layout.
Co-authored-by: Xinfeng Xie <xfxie.ceca@gmail.com>
Co-authored-by: Jianyu Huang <hjyahead@gmail.com>

* Cleanup

* Fix unpadded LSE on split-kv path

* Fix formatting and comments

* Fix inline vs forceinline

---------
Co-authored-by: Xinfeng Xie <xfxie.ceca@gmail.com>
Co-authored-by: Jianyu Huang <hjyahead@gmail.com>

f816dee6

05 Apr, 2024 1 commit

Fix spurious re-compilations of `rotary_kernel` (#911) · f692b98d

Ivan Komarov authored Apr 05, 2024

All integer parameters are specialized by default, so the two parameters
removed in this commit could lead to kernel re-compilation, even if
they were completely unused.

f692b98d

15 Mar, 2024 1 commit
- Enable paged attention in varlen forward (#831) · 2a15840f
  Grigory Sizov authored Mar 15, 2024
```
* Enable paged attention in varlen forward

* Format + fix padding
```
  2a15840f
21 Feb, 2024 1 commit
- Enable headdim 256 backward on consumer GPUs (Ampere, Ada) · 2406f288
  Tri Dao authored Feb 21, 2024
  
  2406f288
23 Jan, 2024 1 commit
- Implement page KV cache · 54e80a38
  Tri Dao authored Jan 22, 2024
```
Co-authored-by: ljss <450993438@qq.com>
```
  54e80a38
21 Jan, 2024 1 commit
- return z_loss (#768) · d8aacc51
  Curtis "Fjord" Hawthorne authored Jan 21, 2024
  
  d8aacc51
14 Jan, 2024 1 commit
- apply_dropout now takes tensor of rowcol layout · 10dad612
  Tri Dao authored Jan 14, 2024
  
  10dad612
13 Jan, 2024 1 commit
- Simplify writing softmax to gmem · a7b66ae2
  Tri Dao authored Jan 13, 2024
  
  a7b66ae2
05 Jan, 2024 2 commits
- [LayerNorm] Rename layernorm.py -> layer_norm.py · f5b308e2
  Tri Dao authored Jan 05, 2024
  
  f5b308e2
- [LayerNorm] Implement parallel layer norm in Triton · 665b55e2
  Tri Dao authored Jan 04, 2024
  
  665b55e2
04 Jan, 2024 1 commit
- [LayerNorm] Implement rowscale in Triton layernorm · aa5c6438
  Tri Dao authored Jan 04, 2024
  
  aa5c6438
25 Dec, 2023 3 commits
- Add test for BTLM init · 73df3be7
  Tri Dao authored Dec 25, 2023
  
  73df3be7
- Implement BTLM model · 7ffba9a5
  Tri Dao authored Dec 24, 2023
  
  7ffba9a5
- Pass alibi slopes to flash_attn_with_kvcache during generation · 3f7d5786
  Tri Dao authored Dec 24, 2023
  
  3f7d5786
24 Dec, 2023 1 commit
- Implement deterministic backward (thanks to Meituan) · 73265458
  Tri Dao authored Dec 23, 2023
  
  73265458
23 Dec, 2023 1 commit
- Implement norm head for Baichuan2 · 2c7d7b73
  Tri Dao authored Dec 22, 2023
  
  2c7d7b73
22 Dec, 2023 2 commits
- Add Alibi to MHA, test with Baichuan-13B · c3b21966
  Tri Dao authored Dec 21, 2023
  
  c3b21966
- Clean up alibi, implement non-causal alibi · 5ab9b366
  Tri Dao authored Dec 21, 2023
  
  5ab9b366
20 Dec, 2023 2 commits

Support alibi, by Sanghun Cho from Kakao Brain · e4f726fc

Sanghun Cho authored Dec 20, 2023



* hard-code alibi in fwd

* use params.h as hun_heads

* hard-code alibi in bwd

* add alibi on/off option

* compute alibi_start, ratio outside of kernels

* fix minor merge conflict

* add test_alibi.py

* change apply_alibi() location before masking

* add alibi in splitkv kernel

* fix backward func # of returns

* add out-of-bound check in apply_alibi()

* update test_alibi.py

* update test_alibi.py for kvcache

* simplify alibi parameter interface

* fix performance issue
by computing alibi outside of branch

* update test_flash_attn_varlen_func() for left padding

* implement alibi_slopes (b, nh) loading

* optimize apply_alibi() a bit

* update test cases for alibi_slopes loading

* reflect stylistic comments

* disable "seqlenq_ngroups_swapped" when using alibi

---------
Co-authored-by: monk.detective <monk.detective@kakaobrain.com>

e4f726fc

[LayerNorm] Implement dropout in fused residual + LN/RMSNorm · cd089597
Tri Dao authored Dec 19, 2023

cd089597

17 Dec, 2023 2 commits
- [CrossEntropy] Test longer sequences · 713bd3aa
  Tri Dao authored Dec 16, 2023
  
  713bd3aa
- [CrossEntropy] Implement logit_scale option · 08124c8f
  Tri Dao authored Dec 16, 2023
  
  08124c8f
01 Dec, 2023 1 commit
- [LayerNorm] Implement layer_norm_linear · 9356a1c0
  Tri Dao authored Nov 30, 2023
  
  9356a1c0
20 Nov, 2023 2 commits
- [CrossEntropy] Simplify the case of large vocab with Tensor Parallel · aaa14741
  Tri Dao authored Nov 19, 2023
  
  aaa14741
- fix flash ce mp large vocab (#673) · abf04a56
  Shijie authored Nov 20, 2023
  
  abf04a56
14 Nov, 2023 1 commit
- [LayerNorm] Add postnorm residual + LayerNorm/RMSNorm in Triton · 01771645
  Tri Dao authored Nov 13, 2023
  
  01771645
13 Nov, 2023 1 commit
- [LayerNorm] Implement residual + LayerNorm/RMSNorm in Triton · 79bd1a2d
  Tri Dao authored Nov 13, 2023
  
  79bd1a2d