Commits · 35d589fa81a68b7cb806982af4fafac0f19d644d · gaoqiong / flash-attention

14 Oct, 2022 1 commit
- Implement attention kernel that splits the batch into two · 5badfb78
  Tri Dao authored Oct 13, 2022
  
  5badfb78
10 Jul, 2022 4 commits
- Implement for bf16 · de19de7a
  Tri Dao authored Jul 09, 2022
  
  de19de7a
- Refactor gemm_cl to template on either __half or __nv_bfloat16 · 6a77a6da
  Tri Dao authored Jul 08, 2022
  
  6a77a6da
- Refactor to template on __half, implement bf16 util functions · e518a4b3
  Tri Dao authored Jul 08, 2022
  
  e518a4b3
- Fix Illegal Memory Access bug in fwd when d=16 · 2dc1b205
  Tri Dao authored Jul 09, 2022
  
  2dc1b205
04 Jul, 2022 1 commit
- Implement cross attention · 6c3a8c65
  Tri Dao authored Jun 30, 2022
  
  6c3a8c65
30 Jun, 2022 1 commit
- Support batch size > 64K by swapping grid.x and grid.y · f66603cb
  Tri Dao authored Jun 29, 2022
  
  f66603cb
12 Jun, 2022 3 commits
- Refactor Gmem code to store q, k, v pointers separately · 5d07483b
  Tri Dao authored Jun 12, 2022
  
  5d07483b
- Implement bwd for head dim 128 · d3e64409
  Tri Dao authored Jun 11, 2022
  
  d3e64409
- Implement fwd for head dim 128 · 0d854692
  Tri Dao authored Jun 05, 2022
  
  0d854692
02 Jun, 2022 1 commit
- Use Cutlass gemm as WarpMma · 14dc326e
  Tri Dao authored Jun 02, 2022
  
  14dc326e
26 May, 2022 1 commit
- Rename, add benchmarking script · 9dbc491a
  Tri Dao authored May 26, 2022
  
  9dbc491a
20 May, 2022 1 commit
- First release · 1fcbe6f0
  Tri Dao authored May 20, 2022
  
  1fcbe6f0