Commits · eeca63a72a4b0f80e158488cd0ebb040f98ec4bf · gaoqiong / flash-attention

25 Jun, 2022 1 commit
- Bug fix: wrong smem_o write pointer for d=16 · eeca63a7
  Tri Dao authored Jun 25, 2022
  
  eeca63a7
12 Jun, 2022 3 commits
- Refactor Gmem code to store q, k, v pointers separately · 5d07483b
  Tri Dao authored Jun 12, 2022
  
  5d07483b
- Implement bwd for head dim 128 · d3e64409
  Tri Dao authored Jun 11, 2022
  
  d3e64409
- Implement fwd for head dim 128 · 0d854692
  Tri Dao authored Jun 05, 2022
  
  0d854692
04 Jun, 2022 1 commit
- Set block size of SM75 fwd to 256 if there's no dropout · 321c57d0
  Tri Dao authored Jun 04, 2022
```
This speeds up the fwd by 1.5x.
```
  321c57d0
03 Jun, 2022 1 commit
- Support Turing mma instructions · 2712aa4c
  Tri Dao authored Jun 02, 2022
  
  2712aa4c
26 May, 2022 1 commit
- Rename, add benchmarking script · 9dbc491a
  Tri Dao authored May 26, 2022
  
  9dbc491a
20 May, 2022 1 commit
- First release · 1fcbe6f0
  Tri Dao authored May 20, 2022
  
  1fcbe6f0