- 10 Jul, 2024 2 commits
- 09 Jul, 2024 1 commit
-
-
Phil Wang authored
* missing commas * another fix
-
- 08 Jul, 2024 2 commits
-
-
Nicolas Patry authored
* Softcap v2 (fwd only). * Some missing interface + remove overrides in tests.
-
Jianwei Dong authored
Add the return_softmax_lse parameter to the flash_attn_with_kvcache function to allow returning the logsumexp of the attention scores. (#989)
-
- 27 Jun, 2024 1 commit
-
-
Grigory Sizov authored
* Support unpadded LSE layout. Co-authored-by:
Xinfeng Xie <xfxie.ceca@gmail.com> Co-authored-by:
Jianyu Huang <hjyahead@gmail.com> * Cleanup * Fix unpadded LSE on split-kv path * Fix formatting and comments * Fix inline vs forceinline --------- Co-authored-by:
Xinfeng Xie <xfxie.ceca@gmail.com> Co-authored-by:
Jianyu Huang <hjyahead@gmail.com>
-
- 15 Mar, 2024 1 commit
-
-
Grigory Sizov authored
* Enable paged attention in varlen forward * Format + fix padding
-
- 23 Jan, 2024 2 commits
-
-
Tao He authored
Signed-off-by:Tao He <sighingnow@gmail.com>
-
Tri Dao authored
Co-authored-by:ljss <450993438@qq.com>
-
- 13 Jan, 2024 1 commit
-
-
Tri Dao authored
-
- 24 Dec, 2023 1 commit
-
-
Tri Dao authored
-
- 22 Dec, 2023 1 commit
-
-
Tri Dao authored
-
- 20 Dec, 2023 2 commits
-
-
Tri Dao authored
-
Sanghun Cho authored
* hard-code alibi in fwd * use params.h as hun_heads * hard-code alibi in bwd * add alibi on/off option * compute alibi_start, ratio outside of kernels * fix minor merge conflict * add test_alibi.py * change apply_alibi() location before masking * add alibi in splitkv kernel * fix backward func # of returns * add out-of-bound check in apply_alibi() * update test_alibi.py * update test_alibi.py for kvcache * simplify alibi parameter interface * fix performance issue by computing alibi outside of branch * update test_flash_attn_varlen_func() for left padding * implement alibi_slopes (b, nh) loading * optimize apply_alibi() a bit * update test cases for alibi_slopes loading * reflect stylistic comments * disable "seqlenq_ngroups_swapped" when using alibi --------- Co-authored-by:monk.detective <monk.detective@kakaobrain.com>
-
- 28 Nov, 2023 1 commit
-
-
Tri Dao authored
-
- 27 Nov, 2023 1 commit
-
-
Jeremy Reizenstein authored
Co-authored-by:bottler <bottler@users.noreply.github.com>
-
- 03 Oct, 2023 1 commit
-
-
Tri Dao authored
-
- 26 Sep, 2023 1 commit
-
-
Tri Dao authored
Co-authored-by:Timothee Lacroix <t@mistral.ai>
-
- 16 Sep, 2023 1 commit
-
-
Tri Dao authored
-
- 11 Sep, 2023 1 commit
-
-
Tri Dao authored
-
- 05 Sep, 2023 1 commit
-
-
Tri Dao authored
-
- 04 Sep, 2023 1 commit
-
-
Tri Dao authored
-
- 25 Aug, 2023 1 commit
-
-
Tri Dao authored
-
- 20 Aug, 2023 1 commit
-
-
Tri Dao authored
-
- 18 Aug, 2023 1 commit
-
-
Tri Dao authored
-
- 01 Aug, 2023 1 commit
-
-
Tri Dao authored
-
- 28 Jul, 2023 1 commit
-
-
Tri Dao authored
-
- 27 Jul, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* Add RNG state to kernel launch params Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Save seed and offset for backward Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Single thread write to global mem Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * compute_dq_dk_dv_1colblock get seed and offset from launch params Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * compute_dq_dk_dv_1rowblock get seed and offset from launch params Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Change forward c++ APIs to save RNG state for backward Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Change backward c++ APIs to set RNG state for bprop launcher Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Bug fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Python side API changes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Bug fix; only save seeds instead of full offset Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Account for 3D grid size Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 18 Jul, 2023 1 commit
-
-
Tri Dao authored
-
- 17 Jul, 2023 1 commit
-
-
Tri Dao authored
-
- 03 Jul, 2023 1 commit
-
-
Tri Dao authored
-
- 13 Apr, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 12 Apr, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 31 Mar, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
-
- 13 Dec, 2022 1 commit
-
-
Tri Dao authored
-
- 05 Nov, 2022 1 commit
-
-
Tri Dao authored
This is faster since we only need to do atomic adds on dq, instead of atomic adds on both dk and dv.
-
- 24 Oct, 2022 1 commit
-
-
Tri Dao authored
-
- 23 Oct, 2022 1 commit
-
-
Tri Dao authored
-
- 21 Oct, 2022 2 commits