Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
flash-attention
Commits
4d87e4d875077ad9efd25030efa4ab0ba92c19e1
Switch branch/tag
flash-attention
flash_attn
utils
generation.py
22 Mar, 2023
1 commit
Implement GPT-J
· 4d87e4d8
Tri Dao
authored
Mar 22, 2023
4d87e4d8
23 Jan, 2023
1 commit
[OPT] Load fp16 weights on CPU before moving to GPU
· 78b7a1dc
Tri Dao
authored
Jan 22, 2023
78b7a1dc
18 Jan, 2023
1 commit
[Gen] Add OPT to generation test
· f68d41ec
Tri Dao
authored
Jan 17, 2023
f68d41ec
15 Jan, 2023
1 commit
[Gen] Make generation work with Tensor Parallel
· 7c219154
Tri Dao
authored
Jan 15, 2023
7c219154
08 Jan, 2023
3 commits
[Gen] Remove commented code
· f95c2fc1
Tri Dao
authored
Jan 07, 2023
f95c2fc1
[Gen] Add timing option
· b4859900
Tri Dao
authored
Jan 07, 2023
b4859900
[Gen] Implement top-k and top-p sampling
· e02fd588
Tri Dao
authored
Jan 07, 2023
e02fd588
04 Jan, 2023
1 commit
[Gen] Add option to run generation with FT attention kernel
· a668890f
Tri Dao
authored
Jan 03, 2023
a668890f
28 Dec, 2022
2 commits
Bump to v0.2.6
· a6ec1782
Tri Dao
authored
Dec 27, 2022
a6ec1782
Implement generation for GPT
· 63670fd8
Tri Dao
authored
Dec 27, 2022
63670fd8