Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
flash-attention
Commits
27f8f890dff58986391b606bc7c181c3b9f5148a
Switch branch/tag
flash-attention
csrc
fused_dense_lib
30 May, 2023
1 commit
[FusedDense] Allocate lt_workspace on input device
· 27f8f890
Tri Dao
authored
May 30, 2023
27f8f890
07 Apr, 2023
1 commit
[FusedDense] Set workspace size to 32M for Hopper and 4M for others
· dec4f2e9
Tri Dao
authored
Apr 06, 2023
dec4f2e9
15 Mar, 2023
1 commit
Support H100 for other CUDA extensions
· dc08ea1c
Tri Dao
authored
Mar 15, 2023
dc08ea1c
18 Jan, 2023
1 commit
[FusedDense] Support relu, rename FusedDenseGeluDense -> FusedMLP
· 88173a1a
Tri Dao
authored
Jan 17, 2023
88173a1a
24 Dec, 2022
1 commit
Implement TensorParallel for FusedDense and FusedDenseGeluDense
· 226a1b72
Tri Dao
authored
Dec 23, 2022
226a1b72
23 Dec, 2022
1 commit
Simplify FusedDense
· e68ebbe8
Tri Dao
authored
Dec 22, 2022
e68ebbe8
15 Nov, 2022
1 commit
Mention that some CUDA extensions have only been tested on A100s
· 43ab0b52
Tri Dao
authored
Nov 15, 2022
43ab0b52
14 Nov, 2022
2 commits
Add GPT and ViT models
· 2e33fc8e
Tri Dao
authored
Nov 13, 2022
2e33fc8e
Add fused_dense and dropout_add_layernorm CUDA extensions
· fa6d1ce4
Tri Dao
authored
Nov 13, 2022
fa6d1ce4