Commits · db2f80692cbe2b4e8fcf149b5b45e863f6b70c58 · gaoqiong / flash-attention

20 Nov, 2023 3 commits
- Write zero to out / grad if seqlen_q or seqlen_k is zero · db2f8069
  Tri Dao authored Nov 19, 2023
  
  db2f8069
- Update cutlass to 3.2.2 · 43bb6d8a
  Tri Dao authored Nov 19, 2023
  
  43bb6d8a
- add checks (#640) · dc4b9ad6
  Driss Guessous authored Nov 19, 2023
  
  dc4b9ad6
08 Oct, 2023 1 commit
- Change constexpr int to constexpr static int · 5a834254
  Tri Dao authored Oct 08, 2023
  
  5a834254
03 Oct, 2023 1 commit
- [Gen] Accept cache_batch_idx to index into the KV cache · e279bf8e
  Tri Dao authored Oct 03, 2023
  
  e279bf8e
26 Sep, 2023 1 commit
- Implement local attention · 083e8f52
  Tri Dao authored Sep 24, 2023
```
Co-authored-by: Timothee Lacroix <t@mistral.ai>
```
  083e8f52
24 Sep, 2023 3 commits
- Switch cutlass to newer commit to avoid compilation warning · 812cb1c9
  Tri Dao authored Sep 24, 2023
  
  812cb1c9
- Don't over-allocate dq_accum in case of varlen · 65c234ed
  Tri Dao authored Sep 24, 2023
  
  65c234ed
- Reduce number of templates for headdim > 128 · 1879e089
  Tri Dao authored Sep 23, 2023
  
  1879e089
21 Sep, 2023 1 commit
- Swap seqlen_q and ngroups when seqlen_q=1 (h/t Daniel Haziza) · 2d8ea9a5
  Tri Dao authored Sep 20, 2023
  
  2d8ea9a5
18 Sep, 2023 3 commits
- [Gen] Don't use ft_attention, use flash_attn_with_kvcache instead · dfe29f5e
  Tri Dao authored Sep 18, 2023
  
  dfe29f5e
- Swap seqlen_q, nheads for MQA when seqlen_q=1 for fwd (h/t Daniel H) · 3250ff3d
  Tri Dao authored Sep 18, 2023
  
  3250ff3d
- Remove template for (IsEvenMN=T, IsEvenK=F) to speed up compilation · 43617dea
  Tri Dao authored Sep 18, 2023
  
  43617dea
17 Sep, 2023 1 commit
- Set block size to 64 x 64 for kvcache to avoid nvcc segfaults · c984208d
  Tri Dao authored Sep 17, 2023
  
  c984208d
16 Sep, 2023 2 commits
- Implement rotary embedding in flash_attn_with_kvcache · ccbb14f3
  Tri Dao authored Sep 16, 2023
  
  ccbb14f3
- [CE] Implement CrossEntropyLoss in Triton · 5400fdc4
  Tri Dao authored Sep 15, 2023
  
  5400fdc4
13 Sep, 2023 1 commit
- Simplify the implementation of KVcache attn by appending KV first · 56b7fc6e
  Tri Dao authored Sep 13, 2023
  
  56b7fc6e
12 Sep, 2023 1 commit
- Remove some unused headers · bb9beb36
  Tri Dao authored Sep 12, 2023
  
  bb9beb36
11 Sep, 2023 1 commit
- Swap seqlen_q and nheads for MQA to speed it up (h/t Daniel Haziza) · ee77b931
  Tri Dao authored Sep 10, 2023
  
  ee77b931
04 Sep, 2023 5 commits
- Implement flash_attn_with_kvcache · 37c6e054
  Tri Dao authored Sep 04, 2023
  
  37c6e054
- Remove constexpr in launch template to fix CI compilation · 6a89b2f1
  Tri Dao authored Sep 03, 2023
  
  6a89b2f1
- Try switching back to Cutlass 3.2.0 · 97ba7a62
  Tri Dao authored Sep 03, 2023
  
  97ba7a62
- Bump to v2.1.2 · 1dc1b6c8
  Tri Dao authored Sep 03, 2023
  
  1dc1b6c8
- Remove unused sdPsum in dot_do_o function · 5953c4f5
  Tri Dao authored Sep 03, 2023
  
  5953c4f5
03 Sep, 2023 1 commit
- Fix splitKV combine function when local LSEs are all -inf · 26d7d92f
  Tri Dao authored Sep 03, 2023
  
  26d7d92f
01 Sep, 2023 2 commits
- Remove commented out code in bwd (#512) · 37e32feb
  Sophia Wisdom authored Sep 01, 2023
```
* Remove lots of comments

* Remove unused traits
```
  37e32feb
- Remove old code in utils.h (#511) · dd8a7549
  Sophia Wisdom authored Sep 01, 2023
  
  dd8a7549
30 Aug, 2023 2 commits
- bump cutlass submodule (#504) · 866a9d33
  Aman Gupta Karmani authored Aug 30, 2023
  
  866a9d33
- Fix typo with lse_max == -INFINITY · 31920dda
  Tri Dao authored Aug 29, 2023
  
  31920dda
29 Aug, 2023 1 commit
- Implement splitKV attention · b1fbbd83
  Tri Dao authored Aug 29, 2023
  
  b1fbbd83
28 Aug, 2023 3 commits
- Use generate_kernels.py script from Driss Guessous · 7a983df7
  Tri Dao authored Aug 28, 2023
  
  7a983df7
- [ft_attention] Fix for seqlen=8136 (#488) · c3f2a632
  dan_the_3rd authored Aug 28, 2023
```
When seqlen=8136, `smem_sz = 48840`, and apparently starting the kernel returns an `invalid argument` CUDA error.

`48840 < 48 * 1024` but apparently it's still above the limit somehow..?
Tested on A100
```
  c3f2a632
- Update Cutlass to v3.2.0 · 757058d4
  Tri Dao authored Aug 27, 2023
  
  757058d4
25 Aug, 2023 1 commit
- Change causal mask to be aligned to bottom-right instead of top-left · 9e5e8bc9
  Tri Dao authored Aug 21, 2023
  
  9e5e8bc9
24 Aug, 2023 1 commit
- Support flash attention 2 with causal masking when KV's seq length is longer... · e07aa036
  BoxiangW authored Aug 24, 2023
```
Support flash attention 2 with causal masking when KV's seq length is longer than Q's seq length. (#436)
```
  e07aa036
17 Aug, 2023 1 commit
- [FusedDense] Run black on fused_dense.py · bcfa7c97
  Tri Dao authored Aug 16, 2023
  
  bcfa7c97
16 Aug, 2023 1 commit
- Fix Bwd NaN for varlen when seqlen_q >> seqlen_k and causal · c65b5106
  Tri Dao authored Aug 16, 2023
  
  c65b5106
13 Aug, 2023 2 commits
- Prepare for Cutlass 3.2 · dbd79237
  Tri Dao authored Aug 13, 2023
  
  dbd79237
- Update to Cutlass 3.1 · 3524e13c
  Tri Dao authored Aug 13, 2023
  
  3524e13c
01 Aug, 2023 1 commit
- Fix race condition in bwd (overwriting sK) · 1c41d2b0
  Tri Dao authored Aug 01, 2023
  
  1c41d2b0