- 21 Oct, 2022 1 commit
-
-
Tri Dao authored
They don't have to have the same block size, number of threads, etc.
-
- 10 Jul, 2022 1 commit
-
-
Tri Dao authored
-
- 25 Jun, 2022 1 commit
-
-
Tri Dao authored
-
- 12 Jun, 2022 2 commits
- 03 Jun, 2022 1 commit
-
-
Tri Dao authored
From 4KB per buffer to 2KB per buffer. This saves us 8KB of smem (each Q and dO have 2 buffers)
-
- 26 May, 2022 1 commit
-
-
Tri Dao authored
-
- 20 May, 2022 1 commit
-
-
Tri Dao authored
-