Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
flash-attention
Commits
5d07483bbcce7ee727952a8ea8425aaaecd5a451
Switch branch/tag
flash-attention
csrc
flash_attn
src
fmha
12 Jun, 2022
3 commits
Refactor Gmem code to store q, k, v pointers separately
· 5d07483b
Tri Dao
authored
Jun 12, 2022
5d07483b
Implement bwd for head dim 128
· d3e64409
Tri Dao
authored
Jun 11, 2022
d3e64409
Implement fwd for head dim 128
· 0d854692
Tri Dao
authored
Jun 05, 2022
0d854692
03 Jun, 2022
2 commits
Reduce smem usage for Q and dO in the backward pass
· b17c6fe2
Tri Dao
authored
Jun 03, 2022
From 4KB per buffer to 2KB per buffer. This saves us 8KB of smem (each Q and dO have 2 buffers)
b17c6fe2
Support Turing mma instructions
· 2712aa4c
Tri Dao
authored
Jun 02, 2022
2712aa4c
02 Jun, 2022
2 commits
Remove softmax fp16 max
· 05087332
Tri Dao
authored
Jun 02, 2022
05087332
Use Cutlass gemm as WarpMma
· 14dc326e
Tri Dao
authored
Jun 02, 2022
14dc326e
26 May, 2022
1 commit
Rename, add benchmarking script
· 9dbc491a
Tri Dao
authored
May 26, 2022
9dbc491a