Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
flash-attention
Commits
ce26d3d73d07e9779c5ba6fb2ca3bc187a34c6cc
Switch branch/tag
flash-attention
flash_attn
ops
fused_dense.py
02 Jan, 2023
1 commit
[FusedDense] Limit matrix dims to 2M (instead of 64k)
· 1ec09ebd
Tri Dao
authored
Jan 01, 2023
1ec09ebd
01 Jan, 2023
1 commit
[FusedDense] Kick off input all_gather before weight dtype conversion
· 65b4064b
Tri Dao
authored
Dec 31, 2022
65b4064b
25 Dec, 2022
1 commit
Implement Tensor Parallel for transformer Block
· a8cfe515
Tri Dao
authored
Dec 25, 2022
a8cfe515
24 Dec, 2022
1 commit
Implement TensorParallel for FusedDense and FusedDenseGeluDense
· 226a1b72
Tri Dao
authored
Dec 23, 2022
226a1b72
23 Dec, 2022
1 commit
Simplify FusedDense
· e68ebbe8
Tri Dao
authored
Dec 22, 2022
e68ebbe8
14 Nov, 2022
2 commits
Add MLP, MHA, Block, Embedding modules
· d4b320b3
Tri Dao
authored
Nov 13, 2022
d4b320b3
Add fused_dense and dropout_add_layernorm CUDA extensions
· fa6d1ce4
Tri Dao
authored
Nov 13, 2022
fa6d1ce4