Commits · ed4959b2ebb92c0fb11aaef04f6e16d231ccea24 · gaoqiong / flash-attention

21 Jan, 2024 1 commit
- Change inline to __forceinline__, use __grid_constant__ param · ed4959b2
  Tri Dao authored Jan 20, 2024
  
  ed4959b2
20 Jan, 2024 1 commit
- Make Softmax an object · 6f706eff
  Tri Dao authored Jan 15, 2024
  
  6f706eff
15 Jan, 2024 2 commits
- Make Alibi an object · 4ea866ca
  Tri Dao authored Jan 14, 2024
  
  4ea866ca
- Move bwd preprocess kernels to a separate file · 5aca153d
  Tri Dao authored Jan 14, 2024
  
  5aca153d
14 Jan, 2024 5 commits
- Move softmax_rescale_o to softmax.h · df1418f9
  Tri Dao authored Jan 14, 2024
  
  df1418f9
- Move masking to a separate file (mask.h) · 6777336a
  Tri Dao authored Jan 14, 2024
  
  6777336a
- Remove seqq_parallel backward kernel that's not used · 9448264d
  Tri Dao authored Jan 14, 2024
  
  9448264d
- Move dropout to a separate file (dropout.h) · 1274ec3e
  Tri Dao authored Jan 14, 2024
  
  1274ec3e
- apply_dropout now takes tensor of rowcol layout · 10dad612
  Tri Dao authored Jan 14, 2024
  
  10dad612
13 Jan, 2024 2 commits
- Remove dead code in philox.cuh · d9cbcfb4
  Tri Dao authored Jan 13, 2024
  
  d9cbcfb4
- Simplify writing softmax to gmem · a7b66ae2
  Tri Dao authored Jan 13, 2024
  
  a7b66ae2
12 Jan, 2024 1 commit
- Simplify SmemLayoutVtransposed in kernel_traits.h · 8d1b169e
  Tri Dao authored Jan 12, 2024
  
  8d1b169e
10 Jan, 2024 1 commit
- [LayerNorm] Initialize mean and rstd tensor using x.device · c9861a03
  Tri Dao authored Jan 09, 2024
  
  c9861a03
08 Jan, 2024 1 commit
- Typo in README (#760) · 99ea4baa
  Erich Schubert authored Jan 08, 2024
  
  99ea4baa
05 Jan, 2024 3 commits
- [LayerNorm] Switch from CUDA to Triton implementation · abbc1311
  Tri Dao authored Jan 05, 2024
  
  abbc1311
- [LayerNorm] Rename layernorm.py -> layer_norm.py · f5b308e2
  Tri Dao authored Jan 05, 2024
  
  f5b308e2
- [LayerNorm] Implement parallel layer norm in Triton · 665b55e2
  Tri Dao authored Jan 04, 2024
  
  665b55e2
04 Jan, 2024 1 commit
- [LayerNorm] Implement rowscale in Triton layernorm · aa5c6438
  Tri Dao authored Jan 04, 2024
  
  aa5c6438
03 Jan, 2024 1 commit
- Fix: implement deterministic backward in mha (#748) · 386e3911
  jiaxingli authored Jan 03, 2024
```
* fix deterministic

* fix deterministic
```
  386e3911
26 Dec, 2023 1 commit
- Bump to v2.4.2 · 1a2c3e8c
  Tri Dao authored Dec 25, 2023
  
  1a2c3e8c
25 Dec, 2023 4 commits
- Add test for BTLM init · 73df3be7
  Tri Dao authored Dec 25, 2023
  
  73df3be7
- Implement BTLM model · 7ffba9a5
  Tri Dao authored Dec 24, 2023
  
  7ffba9a5
- Implement muParam · 2e29dacf
  Tri Dao authored Dec 24, 2023
  
  2e29dacf
- Pass alibi slopes to flash_attn_with_kvcache during generation · 3f7d5786
  Tri Dao authored Dec 24, 2023
  
  3f7d5786
24 Dec, 2023 3 commits
- Bump to v2.4.1 · f8448524
  Tri Dao authored Dec 23, 2023
  
  f8448524
- Don't dispatch to local if window size >= seqlen_k · 0842ec0d
  Tri Dao authored Dec 23, 2023
  
  0842ec0d
- Implement deterministic backward (thanks to Meituan) · 73265458
  Tri Dao authored Dec 23, 2023
  
  73265458
23 Dec, 2023 1 commit
- Implement norm head for Baichuan2 · 2c7d7b73
  Tri Dao authored Dec 22, 2023
  
  2c7d7b73
22 Dec, 2023 7 commits
- [CI] Don't compile for python 3.7 pytorch 2.2 · 68f178aa
  Tri Dao authored Dec 22, 2023
  
  68f178aa
- Bump to v2.4.0 · 73162773
  Tri Dao authored Dec 22, 2023
  
  73162773
- Mention Alibi in README · 50d144c9
  Tri Dao authored Dec 21, 2023
  
  50d144c9
- Update cutlass to v3.3.0 · 8448c028
  Tri Dao authored Dec 21, 2023
  
  8448c028
- Add Alibi to MHA, test with Baichuan-13B · c3b21966
  Tri Dao authored Dec 21, 2023
  
  c3b21966
- [CI] Use torch-nightly 20231106 instead of 20231127 · 701b51bf
  Tri Dao authored Dec 21, 2023
  
  701b51bf
- Clean up alibi, implement non-causal alibi · 5ab9b366
  Tri Dao authored Dec 21, 2023
  
  5ab9b366
20 Dec, 2023 4 commits

Format flash_attn_interface.py · bc28eacc
Tri Dao authored Dec 19, 2023

bc28eacc
[Gen] Remove minor dead code · 0a146185
Tri Dao authored Dec 19, 2023

0a146185

Support alibi, by Sanghun Cho from Kakao Brain · e4f726fc

Sanghun Cho authored Dec 20, 2023



* hard-code alibi in fwd

* use params.h as hun_heads

* hard-code alibi in bwd

* add alibi on/off option

* compute alibi_start, ratio outside of kernels

* fix minor merge conflict

* add test_alibi.py

* change apply_alibi() location before masking

* add alibi in splitkv kernel

* fix backward func # of returns

* add out-of-bound check in apply_alibi()

* update test_alibi.py

* update test_alibi.py for kvcache

* simplify alibi parameter interface

* fix performance issue
by computing alibi outside of branch

* update test_flash_attn_varlen_func() for left padding

* implement alibi_slopes (b, nh) loading

* optimize apply_alibi() a bit

* update test cases for alibi_slopes loading

* reflect stylistic comments

* disable "seqlenq_ngroups_swapped" when using alibi

---------
Co-authored-by: monk.detective <monk.detective@kakaobrain.com>

e4f726fc

[LayerNorm] Implement dropout in fused residual + LN/RMSNorm · cd089597
Tri Dao authored Dec 19, 2023

cd089597

17 Dec, 2023 1 commit
- [CrossEntropy] Test longer sequences · 713bd3aa
  Tri Dao authored Dec 16, 2023
  
  713bd3aa