- 13 Apr, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 12 Apr, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 15 Mar, 2023 1 commit
-
-
Tri Dao authored
-
- 25 Jan, 2023 1 commit
-
-
Tri Dao authored
-
- 26 Nov, 2022 1 commit
-
-
Tri Dao authored
-
- 05 Nov, 2022 1 commit
-
-
Tri Dao authored
This is faster since we only need to do atomic adds on dq, instead of atomic adds on both dk and dv.
-
- 24 Oct, 2022 2 commits
- 23 Oct, 2022 1 commit
-
-
Tri Dao authored
-
- 22 Oct, 2022 1 commit
-
-
Tri Dao authored
-
- 21 Oct, 2022 2 commits
- 17 Oct, 2022 1 commit
-
-
YangShu authored
as title.
-
- 16 Oct, 2022 1 commit
-
-
Tri Dao authored
-
- 14 Oct, 2022 1 commit
-
-
Tri Dao authored
-
- 10 Jul, 2022 1 commit
-
-
Tri Dao authored
-
- 04 Jul, 2022 2 commits
- 26 Jun, 2022 1 commit
-
-
Tri Dao authored
-
- 25 Jun, 2022 1 commit
-
-
Tri Dao authored
-
- 12 Jun, 2022 3 commits
- 04 Jun, 2022 1 commit
-
-
Tri Dao authored
This speeds up the fwd by 1.5x.
-
- 03 Jun, 2022 1 commit
-
-
Tri Dao authored
-
- 26 May, 2022 1 commit
-
-
Tri Dao authored
-
- 20 May, 2022 1 commit
-
-
Tri Dao authored
-