Commits · e524c2caeb5259d12558c123492b6235efed651f · gaoqiong / flash-attention

28 Mar, 2024 10 commits
- allow small page sizes in flash api · e524c2ca
  skrider authored Feb 11, 2024
  
  e524c2ca
- paged copy refactor working for page size 256 · b1c18ca1
  skrider authored Feb 11, 2024
  
  b1c18ca1
- tests passing for single page k · 446204c7
  skrider authored Feb 11, 2024
  
  446204c7
- rearrange initial offset computation · a3e06cd5
  skrider authored Feb 11, 2024
  
  a3e06cd5
- implement kv page iteration functions · f67a6edf
  skrider authored Feb 11, 2024
  
  f67a6edf
- reshape gmem copy · a232f754
  skrider authored Feb 11, 2024
  
  a232f754
- add print statements for debugging · 0094cc95
  skrider authored Feb 09, 2024
  
  0094cc95
- add print statements for debugging · a4049ac8
  skrider authored Feb 08, 2024
  
  a4049ac8
- Add the option for the macro and note (#893) · 23e8fa5a
  Driss Guessous authored Mar 27, 2024
  
  23e8fa5a
- Minor fix in compute_attn_1rowblock_splitkv (#900) · 3e9414f1
  ljss authored Mar 28, 2024
  
  3e9414f1
15 Mar, 2024 1 commit
- Add in, macrosf for defining __grid_constant__ (#852) · 4a73e903
  Driss Guessous authored Mar 15, 2024
  
  4a73e903
21 Feb, 2024 1 commit
- Enable headdim 256 backward on consumer GPUs (Ampere, Ada) · 2406f288
  Tri Dao authored Feb 21, 2024
  
  2406f288
20 Feb, 2024 1 commit
- Don't need to reduce row_sum during online softmax · b32efb1a
  Tri Dao authored Feb 20, 2024
  
  b32efb1a
30 Jan, 2024 1 commit

Preprocessor switches to control functionality (#788) · 0658e320

Jeremy Reizenstein authored Jan 30, 2024



For faster and smaller builds in some simple cases,
provide switches to allow disabling
-backward
-alibi
-uneven k
-dropout
-local attention
Co-authored-by: Jeremy Francis Reizenstein <bottler@users.noreply.github.com>

0658e320

23 Jan, 2024 2 commits
- Implement page KV cache · 54e80a38
  Tri Dao authored Jan 22, 2024
```
Co-authored-by: ljss <450993438@qq.com>
```
  54e80a38
- Use int64_t instead of uint32_t in kernel_traits.h · 36bc29ed
  Tri Dao authored Jan 22, 2024
  
  36bc29ed
22 Jan, 2024 1 commit
- Use int64_t instead of uint32_t for index_t · 000b67f5
  Tri Dao authored Jan 22, 2024
  
  000b67f5
21 Jan, 2024 6 commits
- Remove configure in bwd kernel launch · ea8a25ca
  Tri Dao authored Jan 21, 2024
  
  ea8a25ca
- Update cutlass to v3.4.0 · 8f4d82cf
  Tri Dao authored Jan 20, 2024
  
  8f4d82cf
- Move rotary device functions to a separate file · 395e5a0d
  Tri Dao authored Jan 20, 2024
  
  395e5a0d
- Remove unused kernel_traits file · 3e2c827d
  Tri Dao authored Jan 20, 2024
  
  3e2c827d
- Refactor masking in fwd pass into 1 object · 66a127ae
  Tri Dao authored Jan 20, 2024
  
  66a127ae
- Change inline to __forceinline__, use __grid_constant__ param · ed4959b2
  Tri Dao authored Jan 20, 2024
  
  ed4959b2
20 Jan, 2024 1 commit
- Make Softmax an object · 6f706eff
  Tri Dao authored Jan 15, 2024
  
  6f706eff
15 Jan, 2024 2 commits
- Make Alibi an object · 4ea866ca
  Tri Dao authored Jan 14, 2024
  
  4ea866ca
- Move bwd preprocess kernels to a separate file · 5aca153d
  Tri Dao authored Jan 14, 2024
  
  5aca153d
14 Jan, 2024 5 commits
- Move softmax_rescale_o to softmax.h · df1418f9
  Tri Dao authored Jan 14, 2024
  
  df1418f9
- Move masking to a separate file (mask.h) · 6777336a
  Tri Dao authored Jan 14, 2024
  
  6777336a
- Remove seqq_parallel backward kernel that's not used · 9448264d
  Tri Dao authored Jan 14, 2024
  
  9448264d
- Move dropout to a separate file (dropout.h) · 1274ec3e
  Tri Dao authored Jan 14, 2024
  
  1274ec3e
- apply_dropout now takes tensor of rowcol layout · 10dad612
  Tri Dao authored Jan 14, 2024
  
  10dad612
13 Jan, 2024 2 commits
- Remove dead code in philox.cuh · d9cbcfb4
  Tri Dao authored Jan 13, 2024
  
  d9cbcfb4
- Simplify writing softmax to gmem · a7b66ae2
  Tri Dao authored Jan 13, 2024
  
  a7b66ae2
12 Jan, 2024 1 commit
- Simplify SmemLayoutVtransposed in kernel_traits.h · 8d1b169e
  Tri Dao authored Jan 12, 2024
  
  8d1b169e
24 Dec, 2023 1 commit
- Implement deterministic backward (thanks to Meituan) · 73265458
  Tri Dao authored Dec 23, 2023
  
  73265458
22 Dec, 2023 1 commit
- Clean up alibi, implement non-causal alibi · 5ab9b366
  Tri Dao authored Dec 21, 2023
  
  5ab9b366
20 Dec, 2023 1 commit

Support alibi, by Sanghun Cho from Kakao Brain · e4f726fc

Sanghun Cho authored Dec 20, 2023



* hard-code alibi in fwd

* use params.h as hun_heads

* hard-code alibi in bwd

* add alibi on/off option

* compute alibi_start, ratio outside of kernels

* fix minor merge conflict

* add test_alibi.py

* change apply_alibi() location before masking

* add alibi in splitkv kernel

* fix backward func # of returns

* add out-of-bound check in apply_alibi()

* update test_alibi.py

* update test_alibi.py for kvcache

* simplify alibi parameter interface

* fix performance issue
by computing alibi outside of branch

* update test_flash_attn_varlen_func() for left padding

* implement alibi_slopes (b, nh) loading

* optimize apply_alibi() a bit

* update test cases for alibi_slopes loading

* reflect stylistic comments

* disable "seqlenq_ngroups_swapped" when using alibi

---------
Co-authored-by: monk.detective <monk.detective@kakaobrain.com>

e4f726fc

27 Nov, 2023 2 commits
- Allow varlen_fwd to take optional seqused_k (#647) · ce3e7280
  Jeremy Reizenstein authored Nov 27, 2023
```
Co-authored-by: bottler <bottler@users.noreply.github.com>
```
  ce3e7280
- Fix performance regression with causal · b4bf9cc1
  Tri Dao authored Nov 26, 2023
  
  b4bf9cc1
20 Nov, 2023 1 commit
- Write zero to out / grad if seqlen_q or seqlen_k is zero · db2f8069
  Tri Dao authored Nov 19, 2023
  
  db2f8069