Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
flash-attention
Commits
"sgl-kernel/vscode:/vscode.git/clone" did not exist on "1e85589dc5a41f0fe69260c961a23d91c4b58a85"
4d87e4d875077ad9efd25030efa4ab0ba92c19e1
Switch branch/tag
flash-attention
tests
models
test_gpt_generation.py
22 Mar, 2023
1 commit
Implement GPT-J
· 4d87e4d8
Tri Dao
authored
Mar 22, 2023
4d87e4d8
23 Jan, 2023
1 commit
[OPT] Load fp16 weights on CPU before moving to GPU
· 78b7a1dc
Tri Dao
authored
Jan 22, 2023
78b7a1dc
18 Jan, 2023
2 commits
[Gen] Add OPT to generation test
· f68d41ec
Tri Dao
authored
Jan 17, 2023
f68d41ec
[FusedDense] Support relu, rename FusedDenseGeluDense -> FusedMLP
· 88173a1a
Tri Dao
authored
Jan 17, 2023
88173a1a
15 Jan, 2023
2 commits
[Gen] Pass qkv_stride to ft_attention kernel for batched generation
· f1e01c27
Tri Dao
authored
Jan 15, 2023
f1e01c27
[Gen] Make generation work with Tensor Parallel
· 7c219154
Tri Dao
authored
Jan 15, 2023
7c219154
08 Jan, 2023
3 commits
[Gen] Add timing option
· b4859900
Tri Dao
authored
Jan 07, 2023
b4859900
[Gen] Adjust shape of kv_cache when using FT
· 0938298e
Tri Dao
authored
Jan 07, 2023
0938298e
[Gen] Implement top-k and top-p sampling
· e02fd588
Tri Dao
authored
Jan 07, 2023
e02fd588
07 Jan, 2023
1 commit
[Gen] Test generation with rotary embedding
· 11be742a
Tri Dao
authored
Jan 07, 2023
11be742a
04 Jan, 2023
1 commit
[Gen] Add option to run generation with FT attention kernel
· a668890f
Tri Dao
authored
Jan 03, 2023
a668890f
28 Dec, 2022
1 commit
Implement generation for GPT
· 63670fd8
Tri Dao
authored
Dec 27, 2022
63670fd8