- 11 Sep, 2023 1 commit
-
-
Tri Dao authored
-
- 04 Sep, 2023 5 commits
- 03 Sep, 2023 1 commit
-
-
Tri Dao authored
-
- 01 Sep, 2023 2 commits
-
-
Sophia Wisdom authored
* Remove lots of comments * Remove unused traits
-
Sophia Wisdom authored
-
- 30 Aug, 2023 2 commits
-
-
Aman Gupta Karmani authored
-
Tri Dao authored
-
- 29 Aug, 2023 1 commit
-
-
Tri Dao authored
-
- 28 Aug, 2023 3 commits
-
-
Tri Dao authored
-
dan_the_3rd authored
When seqlen=8136, `smem_sz = 48840`, and apparently starting the kernel returns an `invalid argument` CUDA error. `48840 < 48 * 1024` but apparently it's still above the limit somehow..? Tested on A100
-
Tri Dao authored
-
- 25 Aug, 2023 1 commit
-
-
Tri Dao authored
-
- 24 Aug, 2023 1 commit
-
-
BoxiangW authored
Support flash attention 2 with causal masking when KV's seq length is longer than Q's seq length. (#436)
-
- 17 Aug, 2023 1 commit
-
-
Tri Dao authored
-
- 16 Aug, 2023 1 commit
-
-
Tri Dao authored
-
- 13 Aug, 2023 2 commits
- 01 Aug, 2023 2 commits
- 27 Jul, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* Add RNG state to kernel launch params Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Save seed and offset for backward Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Single thread write to global mem Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * compute_dq_dk_dv_1colblock get seed and offset from launch params Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * compute_dq_dk_dv_1rowblock get seed and offset from launch params Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Change forward c++ APIs to save RNG state for backward Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Change backward c++ APIs to set RNG state for bprop launcher Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Bug fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Python side API changes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Bug fix; only save seeds instead of full offset Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Account for 3D grid size Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 23 Jul, 2023 2 commits
-
-
Joel Lamy-Poirier authored
-
Tri Dao authored
-
- 21 Jul, 2023 1 commit
-
-
Tri Dao authored
-
- 19 Jul, 2023 2 commits
-
-
Ikko Eltociear Ashimine authored
unintialized -> uninitialized
-
- 17 Jul, 2023 1 commit
-
-
Tri Dao authored
-
- 06 Jul, 2023 1 commit
-
-
Tri Dao authored
-
- 03 Jul, 2023 1 commit
-
-
Tri Dao authored
-
- 02 Jul, 2023 1 commit
-
-
Tri Dao authored
-
- 30 May, 2023 2 commits
- 26 Apr, 2023 1 commit
-
-
Tri Dao authored
-
- 21 Apr, 2023 1 commit
-
-
Tri Dao authored
-
- 15 Apr, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 13 Apr, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 12 Apr, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-