- 18 Aug, 2023 5 commits
-
-
Tri Dao authored
-
Tri Dao authored
-
Xuechen Li authored
* uneql rank. * trim. * enable passing in number of heads for each rank. * simplify. * simplify. * cleanup. * fix col parallel. * fix bug with row parallel. * fit out proj. * refac. * fix sharding logic. * refac sharding. * refac. * support multiple of. * make fn reuseable. * fix bug in dimensions. * scaffold. * test uneven heads. * fix test by adding barrier. * refac. * reuse code. * clean up.
-
Tri Dao authored
-
Tri Dao authored
-
- 17 Aug, 2023 4 commits
- 16 Aug, 2023 2 commits
- 15 Aug, 2023 1 commit
-
-
Xuechen Li authored
* prelim. * add hf convertion fn. * mlp. * change name. * fix bug. * inverse permute. * change comment. * revert style changes. * fix. * add doc. * revert. * enable load safe. * fix safe load. * fix import. * fix typing-related lints. * fix ckpt loading logic. * make single gpu work. * test with parallel. * ckpt format. * enable pretrained state dict. * remove unused imports. * remove unused. * mark idea related.
-
- 14 Aug, 2023 4 commits
-
-
Tri Dao authored
-
Aman Gupta Karmani authored
-
Tri Dao authored
-
Tri Dao authored
-
- 13 Aug, 2023 7 commits
-
-
Tri Dao authored
-
Tri Dao authored
-
Tri Dao authored
* piercefreeman-feature/demo-wheels: (25 commits) Install standard non-wheel package Remove release creation Build wheel on each push Isolate 2.0.0 & cuda12 Clean setup.py imports Remove builder project Bump version Add notes to github action workflow Add torch dependency to final build Exclude cuda erroring builds Exclude additional disallowed matrix params Full version matrix Add CUDA 11.7 Release is actually unsupported echo OS version Temp disable deploy OS version build numbers Restore full build matrix Refactor and clean of setup.py Strip cuda name from torch version ...
-
Tri Dao authored
Merge branch 'feature/demo-wheels' of https://github.com/piercefreeman/flash-attention into piercefreeman-feature/demo-wheels * 'feature/demo-wheels' of https://github.com/piercefreeman/flash-attention: (25 commits) Install standard non-wheel package Remove release creation Build wheel on each push Isolate 2.0.0 & cuda12 Clean setup.py imports Remove builder project Bump version Add notes to github action workflow Add torch dependency to final build Exclude cuda erroring builds Exclude additional disallowed matrix params Full version matrix Add CUDA 11.7 Release is actually unsupported echo OS version Temp disable deploy OS version build numbers Restore full build matrix Refactor and clean of setup.py Strip cuda name from torch version ...
-
Tri Dao authored
-
Tri Dao authored
-
Tri Dao authored
-
- 11 Aug, 2023 4 commits
-
-
Pierce Freeman authored
-
Pierce Freeman authored
-
Pierce Freeman authored
-
Pierce Freeman authored
-
- 10 Aug, 2023 1 commit
-
-
Tri Dao authored
-
- 01 Aug, 2023 5 commits
- 29 Jul, 2023 1 commit
-
-
Tri Dao authored
-
- 28 Jul, 2023 3 commits
-
-
Tri Dao authored
-
Tri Dao authored
-
Kirthi Shankar Sivamani authored
* Bump version to 2.0.2 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update version in Dockerfile Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 27 Jul, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* Add RNG state to kernel launch params Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Save seed and offset for backward Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Single thread write to global mem Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * compute_dq_dk_dv_1colblock get seed and offset from launch params Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * compute_dq_dk_dv_1rowblock get seed and offset from launch params Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Change forward c++ APIs to save RNG state for backward Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Change backward c++ APIs to set RNG state for bprop launcher Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Bug fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Python side API changes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Bug fix; only save seeds instead of full offset Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Account for 3D grid size Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 26 Jul, 2023 2 commits
-
-
Tri Dao authored
-
Haodong Lyu authored
-