Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
flash-attention
Commits
48bc6eacd61b4b57bbd250057655d52f7068ba2f
Switch branch/tag
flash-attention
csrc
ft_attention
30 May, 2023
1 commit
[Gen] Add rotary base as an argument to FT attention kernel
· 48bc6eac
Tri Dao
authored
May 30, 2023
48bc6eac
21 Apr, 2023
1 commit
[Gen] Fix FT kernel smem size, CG when batch size changed
· 311d6606
Tri Dao
authored
Apr 20, 2023
311d6606
29 Mar, 2023
1 commit
[FT] Fix FT's single query attention for bf16 hdim128 rotary
· f5d0fbd4
Tri Dao
authored
Mar 28, 2023
f5d0fbd4
15 Mar, 2023
1 commit
Support H100 for other CUDA extensions
· dc08ea1c
Tri Dao
authored
Mar 15, 2023
dc08ea1c
15 Jan, 2023
2 commits
[Gen] Pass qkv_stride to ft_attention kernel for batched generation
· f1e01c27
Tri Dao
authored
Jan 15, 2023
f1e01c27
[Gen] Make generation work with Tensor Parallel
· 7c219154
Tri Dao
authored
Jan 15, 2023
7c219154
04 Jan, 2023
3 commits
[Gen, FT] Use fp32 accum for FMA
· be1afaa2
Tri Dao
authored
Jan 03, 2023
be1afaa2
[Gen, FT] Use tlength instead of params.timestep for rotary
· f266fc72
Tri Dao
authored
Jan 03, 2023
f266fc72
[Gen] Add kernel from FasterTransformer for benchmarking
· a01d1213
Tri Dao
authored
Jan 03, 2023
a01d1213