Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
flash-attention
Commits
ccbb14f38ee1e51a6f65bccbef9e4765acba6d79
Switch branch/tag
flash-attention
csrc
ft_attention
ft_attention.cpp
16 Sep, 2023
1 commit
Implement rotary embedding in flash_attn_with_kvcache
· ccbb14f3
Tri Dao
authored
Sep 16, 2023
ccbb14f3
23 Jul, 2023
1 commit
[FT] Implement MQA/GQA
· a157cc8c
Tri Dao
authored
Jul 22, 2023
a157cc8c
06 Jul, 2023
1 commit
[FT] rotary_cos/sin should have batch_size dimension
· 2800efc7
Tri Dao
authored
Jul 06, 2023
2800efc7
03 Jul, 2023
1 commit
[FT] rotary_cos/sin should have shape (dim) instead of (seqlen, dim)
· 3a9bfd07
Tri Dao
authored
Jul 03, 2023
3a9bfd07
02 Jul, 2023
1 commit
[Rotary] Make sure frequency calculation is in fp32
· 62e98144
Tri Dao
authored
Jul 02, 2023
62e98144
30 May, 2023
1 commit
[Gen] Add rotary base as an argument to FT attention kernel
· 48bc6eac
Tri Dao
authored
May 30, 2023
48bc6eac
15 Jan, 2023
2 commits
[Gen] Pass qkv_stride to ft_attention kernel for batched generation
· f1e01c27
Tri Dao
authored
Jan 15, 2023
f1e01c27
[Gen] Make generation work with Tensor Parallel
· 7c219154
Tri Dao
authored
Jan 15, 2023
7c219154
04 Jan, 2023
1 commit
[Gen] Add kernel from FasterTransformer for benchmarking
· a01d1213
Tri Dao
authored
Jan 03, 2023
a01d1213