Commits · 46fd2a20b20849598b1abb438ea09520449d7eb3 · gaoqiong / flash-attention

24 Oct, 2022 1 commit
- Support all head dims that are multiples of 8, up to 128 · 46fd2a20
  Tri Dao authored Oct 24, 2022
  
  46fd2a20
23 Oct, 2022 2 commits
- Attempt to use atomicCAS to replace atomicAdd(bfloat16) · 9e92a1f2
  Tri Dao authored Oct 23, 2022
  
  9e92a1f2
- Split bwd on the seqlen_q dimension · a5a8806d
  Tri Dao authored Oct 23, 2022
  
  a5a8806d
10 Jul, 2022 1 commit
- Refactor to template on __half, implement bf16 util functions · e518a4b3
  Tri Dao authored Jul 08, 2022
  
  e518a4b3
04 Jul, 2022 1 commit
- Implement cross attention · 6c3a8c65
  Tri Dao authored Jun 30, 2022
  
  6c3a8c65
30 Jun, 2022 1 commit
- Support batch size > 64K by swapping grid.x and grid.y · f66603cb
  Tri Dao authored Jun 29, 2022
  
  f66603cb
12 Jun, 2022 1 commit
- Refactor Gmem code to store q, k, v pointers separately · 5d07483b
  Tri Dao authored Jun 12, 2022
  
  5d07483b
26 May, 2022 1 commit
- Rename, add benchmarking script · 9dbc491a
  Tri Dao authored May 26, 2022
  
  9dbc491a
20 May, 2022 1 commit
- First release · 1fcbe6f0
  Tri Dao authored May 20, 2022
  
  1fcbe6f0