Commits · 1aa6d7d9b60bf8fbb5584f057934bdee15ed33fe · gaoqiong / flash-attention

21 Oct, 2022 1 commit
- Rework dropout to decouple forward and backward · 1aa6d7d9
  Tri Dao authored Oct 18, 2022
```
They don't have to have the same block size, number of threads, etc.
```
  1aa6d7d9
14 Oct, 2022 2 commits
- Fix QKV interface to allocate output in Python · 1b9facac
  Tri Dao authored Oct 14, 2022
  
  1b9facac
- Implement attention kernel that splits the batch into two · 5badfb78
  Tri Dao authored Oct 13, 2022
  
  5badfb78
06 Oct, 2022 1 commit

Antoine Adam authored Oct 06, 2022

According to the `setup.py` file, only dependencies are torch and einops. But the `bert_padding.py` file requires `numpy` only to multiply the elements of a `torch.Size` object. This change aims at allowing the use of FlashAttention without numpy.

4e38df05

11 Sep, 2022 1 commit
- Relax assert to allow both bf16 and fp16 · 13403e81
  Tri Dao authored Sep 11, 2022
  
  13403e81
06 Sep, 2022 1 commit
- Update flash_attention.py · b410d14f
  eric-tc-wong authored Sep 06, 2022
```
Recasting query and key after rotary_emb()
```
  b410d14f
09 Aug, 2022 1 commit
- Add back need_weights in FlashMHA · 19d12610
  Tri Dao authored Aug 09, 2022
  
  19d12610
05 Aug, 2022 2 commits
- Support index_first_axis with more than 2 dimensions · 6cc73425
  Tri Dao authored Aug 05, 2022
  
  6cc73425
- Allow headdim 128 in FlashMHA interface · 713ea302
  Tri Dao authored Aug 05, 2022
  
  713ea302
04 Jul, 2022 2 commits
- Do P * dP (pointwise) in the bwd in fp32 instead of fp16 · a5559a0e
  Tri Dao authored Jul 03, 2022
  
  a5559a0e
- Implement cross attention · 6c3a8c65
  Tri Dao authored Jun 30, 2022
  
  6c3a8c65
03 Jul, 2022 1 commit
- Add missing __init__.py · af4a9ce0
  Gustaf authored Jul 03, 2022
  
  af4a9ce0
02 Jun, 2022 1 commit
- Rename src -> flash_attn · 5a61cb77
  Tri Dao authored Jun 01, 2022
  
  5a61cb77