Commits · 81e01efd4bef00ccdaa19eaa62f9cbcb39c528eb · gaoqiong / flash-attention

10 Jul, 2024 2 commits
- More typo fixes · 81e01efd
  Tri Dao authored Jul 10, 2024
  
  81e01efd
- Fix typo with softcapping · 72e27c63
  Tri Dao authored Jul 10, 2024
  
  72e27c63
09 Jul, 2024 1 commit
- missing commas and backwards return arguments (#1032) · f4628b43
  Phil Wang authored Jul 09, 2024
```
* missing commas

* another fix
```
  f4628b43
08 Jul, 2024 2 commits

Implement softcapping. (#1025) · 8f873cc6
Nicolas Patry authored Jul 08, 2024
```
* Softcap v2 (fwd only).

* Some missing interface + remove overrides in tests.
```
8f873cc6

Add the return_softmax_lse parameter to the flash_attn_with_kvcache function... · 4e8d6006

Jianwei Dong authored Jul 08, 2024

Add the return_softmax_lse parameter to the flash_attn_with_kvcache function to allow returning the logsumexp of the attention scores. (#989)

4e8d6006

27 Jun, 2024 1 commit

Support unpadded LSE layout (#970) · f816dee6

Grigory Sizov authored Jun 27, 2024



* Support unpadded LSE layout.
Co-authored-by: Xinfeng Xie <xfxie.ceca@gmail.com>
Co-authored-by: Jianyu Huang <hjyahead@gmail.com>

* Cleanup

* Fix unpadded LSE on split-kv path

* Fix formatting and comments

* Fix inline vs forceinline

---------
Co-authored-by: Xinfeng Xie <xfxie.ceca@gmail.com>
Co-authored-by: Jianyu Huang <hjyahead@gmail.com>

f816dee6

15 Mar, 2024 1 commit
- Enable paged attention in varlen forward (#831) · 2a15840f
  Grigory Sizov authored Mar 15, 2024
```
* Enable paged attention in varlen forward

* Format + fix padding
```
  2a15840f
23 Jan, 2024 2 commits
- Fixes an error in comment (#785) · 204c3c6d
  Tao He authored Jan 24, 2024
```
Signed-off-by: Tao He <sighingnow@gmail.com>
```
  204c3c6d
- Implement page KV cache · 54e80a38
  Tri Dao authored Jan 22, 2024
```
Co-authored-by: ljss <450993438@qq.com>
```
  54e80a38
13 Jan, 2024 1 commit
- Simplify writing softmax to gmem · a7b66ae2
  Tri Dao authored Jan 13, 2024
  
  a7b66ae2
24 Dec, 2023 1 commit
- Implement deterministic backward (thanks to Meituan) · 73265458
  Tri Dao authored Dec 23, 2023
  
  73265458
22 Dec, 2023 1 commit
- Clean up alibi, implement non-causal alibi · 5ab9b366
  Tri Dao authored Dec 21, 2023
  
  5ab9b366
20 Dec, 2023 2 commits

Format flash_attn_interface.py · bc28eacc
Tri Dao authored Dec 19, 2023

bc28eacc

Support alibi, by Sanghun Cho from Kakao Brain · e4f726fc

Sanghun Cho authored Dec 20, 2023



* hard-code alibi in fwd

* use params.h as hun_heads

* hard-code alibi in bwd

* add alibi on/off option

* compute alibi_start, ratio outside of kernels

* fix minor merge conflict

* add test_alibi.py

* change apply_alibi() location before masking

* add alibi in splitkv kernel

* fix backward func # of returns

* add out-of-bound check in apply_alibi()

* update test_alibi.py

* update test_alibi.py for kvcache

* simplify alibi parameter interface

* fix performance issue
by computing alibi outside of branch

* update test_flash_attn_varlen_func() for left padding

* implement alibi_slopes (b, nh) loading

* optimize apply_alibi() a bit

* update test cases for alibi_slopes loading

* reflect stylistic comments

* disable "seqlenq_ngroups_swapped" when using alibi

---------
Co-authored-by: monk.detective <monk.detective@kakaobrain.com>

e4f726fc

28 Nov, 2023 1 commit
- [CI] Only compile for CUDA 11.8 & 12.2, MAX_JOBS=2,add torch-nightly · d4a7c8ff
  Tri Dao authored Nov 27, 2023
  
  d4a7c8ff
27 Nov, 2023 1 commit
- Allow varlen_fwd to take optional seqused_k (#647) · ce3e7280
  Jeremy Reizenstein authored Nov 27, 2023
```
Co-authored-by: bottler <bottler@users.noreply.github.com>
```
  ce3e7280
03 Oct, 2023 1 commit
- [Gen] Accept cache_batch_idx to index into the KV cache · e279bf8e
  Tri Dao authored Oct 03, 2023
  
  e279bf8e
26 Sep, 2023 1 commit
- Implement local attention · 083e8f52
  Tri Dao authored Sep 24, 2023
```
Co-authored-by: Timothee Lacroix <t@mistral.ai>
```
  083e8f52
16 Sep, 2023 1 commit
- Implement rotary embedding in flash_attn_with_kvcache · ccbb14f3
  Tri Dao authored Sep 16, 2023
  
  ccbb14f3
11 Sep, 2023 1 commit
- Swap seqlen_q and nheads for MQA to speed it up (h/t Daniel Haziza) · ee77b931
  Tri Dao authored Sep 10, 2023
  
  ee77b931
05 Sep, 2023 1 commit
- Support cache_seqlens being integer · fd20f16a
  Tri Dao authored Sep 05, 2023
  
  fd20f16a
04 Sep, 2023 1 commit
- Implement flash_attn_with_kvcache · 37c6e054
  Tri Dao authored Sep 04, 2023
  
  37c6e054
25 Aug, 2023 1 commit
- Change causal mask to be aligned to bottom-right instead of top-left · 9e5e8bc9
  Tri Dao authored Aug 21, 2023
  
  9e5e8bc9
20 Aug, 2023 1 commit
- Import torch before flash_attn_2_cuda · d431f167
  Tri Dao authored Aug 19, 2023
  
  d431f167
18 Aug, 2023 1 commit
- Run isort and black on python files · f1a73d07
  Tri Dao authored Aug 18, 2023
  
  f1a73d07
01 Aug, 2023 1 commit
- [Docs] Fix docstring about Q nheads being divisible by KV nheads · 8f4cd4c1
  Tri Dao authored Jul 31, 2023
  
  8f4cd4c1
28 Jul, 2023 1 commit
- [Docs] Fix mention of MQA/GQA in qkvpacked functions · 840f7925
  Tri Dao authored Jul 28, 2023
  
  840f7925
27 Jul, 2023 1 commit

Enable CUDA graphs (#386) · a03f6f8e

Kirthi Shankar Sivamani authored Jul 27, 2023



* Add RNG state to kernel launch params
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Save seed and offset for backward
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Single thread write to global mem
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* compute_dq_dk_dv_1colblock get seed and offset from launch params
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* compute_dq_dk_dv_1rowblock get seed and offset from launch params
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Change forward c++ APIs to save RNG state for backward
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Change backward c++ APIs to set RNG state for bprop launcher
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Bug fixes
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Python side API changes
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Bug fix; only save seeds instead of full offset
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Account for 3D grid size
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

---------
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

a03f6f8e

18 Jul, 2023 1 commit
- Make sure dout is contiguous · b4cc152e
  Tri Dao authored Jul 17, 2023
  
  b4cc152e
17 Jul, 2023 1 commit
- FlashAttention-2 release · 4f285b35
  Tri Dao authored Jul 17, 2023
  
  4f285b35
03 Jul, 2023 1 commit
- [Doc] Change total -> total_q · e8a0b4ac
  Tri Dao authored Jul 02, 2023
  
  e8a0b4ac
13 Apr, 2023 1 commit
- Handle FlashAttnQKVPackedSplitFunc by making rng_state optional in backward · 7d25a4ec
  Kirthi Shankar Sivamani authored Apr 13, 2023
```
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
```
  7d25a4ec
12 Apr, 2023 1 commit
- Support CUDA graph capture · 31018c5f
  Kirthi Shankar Sivamani authored Apr 12, 2023
```
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
```
  31018c5f
31 Mar, 2023 1 commit
- Add option for deterministic execution · b6aa059b
  Kirthi Shankar Sivamani authored Mar 30, 2023
  
  b6aa059b
13 Dec, 2022 1 commit
- Fix the case when dout is not contiguous · 88c4e5db
  Tri Dao authored Dec 13, 2022
  
  88c4e5db
05 Nov, 2022 1 commit
- Parallelize CUDA bwd along seqlen_k instead of seqlen_q · 55778193
  Tri Dao authored Nov 05, 2022
```
This is faster since we only need to do atomic adds on dq, instead of atomic
adds on both dk and dv.
```
  55778193
24 Oct, 2022 1 commit
- Support all head dims that are multiples of 8, up to 128 · 46fd2a20
  Tri Dao authored Oct 24, 2022
  
  46fd2a20
23 Oct, 2022 1 commit
- Split bwd on the seqlen_q dimension · a5a8806d
  Tri Dao authored Oct 23, 2022
  
  a5a8806d
21 Oct, 2022 2 commits
- Split fwd on the seqlen_q dimension · a44f48df
  Tri Dao authored Oct 21, 2022
  
  a44f48df
- Rework dropout to decouple forward and backward · 1aa6d7d9
  Tri Dao authored Oct 18, 2022
```
They don't have to have the same block size, number of threads, etc.
```
  1aa6d7d9