- 30 Aug, 2023 5 commits
-
-
Aman Gupta Karmani authored
-
dan_the_3rd authored
Co-authored-by: danthe3rd <danthe3rd>
-
dan_the_3rd authored
Co-authored-by: danthe3rd <danthe3rd>
-
GAOXinyu authored
when using checkpoint_lvl=2, we all_gather_raw(x) without async_op=True. So we don't need to wait for handle. Just skip.
-
Tri Dao authored
-
- 29 Aug, 2023 4 commits
-
-
Tri Dao authored
-
Jeffrey Quesnelle authored
-
Su Zhu authored
* add unpad_input_for_concatenated_sequences * modify docstring
-
Tri Dao authored
-
- 28 Aug, 2023 4 commits
-
-
Tri Dao authored
-
dan_the_3rd authored
When seqlen=8136, `smem_sz = 48840`, and apparently starting the kernel returns an `invalid argument` CUDA error. `48840 < 48 * 1024` but apparently it's still above the limit somehow..? Tested on A100
-
Tri Dao authored
-
Tri Dao authored
-
- 27 Aug, 2023 1 commit
-
-
Tri Dao authored
-
- 26 Aug, 2023 6 commits
- 25 Aug, 2023 4 commits
-
-
Tri Dao authored
-
Aman Gupta Karmani authored
-
Tri Dao authored
-
Tri Dao authored
-
- 24 Aug, 2023 2 commits
-
-
BoxiangW authored
Support flash attention 2 with causal masking when KV's seq length is longer than Q's seq length. (#436)
-
Aman Gupta Karmani authored
-
- 22 Aug, 2023 2 commits
- 21 Aug, 2023 1 commit
-
-
GAOXinyu authored
-
- 20 Aug, 2023 2 commits
-
-
Xuechen Li authored
* q * add comment.
-
Tri Dao authored
-
- 19 Aug, 2023 2 commits
-
-
Tri Dao authored
-
Xuechen Li authored
* fix name. * set inv function. * add map back function. * handle gqa. * add type annotation to avoid confusion. * fix docstr. * test inverse remap logic.
-
- 18 Aug, 2023 5 commits
-
-
Tri Dao authored
-
Tri Dao authored
-
Xuechen Li authored
* uneql rank. * trim. * enable passing in number of heads for each rank. * simplify. * simplify. * cleanup. * fix col parallel. * fix bug with row parallel. * fit out proj. * refac. * fix sharding logic. * refac sharding. * refac. * support multiple of. * make fn reuseable. * fix bug in dimensions. * scaffold. * test uneven heads. * fix test by adding barrier. * refac. * reuse code. * clean up.
-
Tri Dao authored
-
Tri Dao authored
-
- 17 Aug, 2023 2 commits