- 23 Jul, 2024 2 commits
-
-
Tri Dao authored
-
rocking authored
* Support ck in fmha * Add ck submodule * Do not return lse if return_softmax == false * Use receipt to speed up ck compile time * Integrate new version of ck_tile * Support dropout for mha_fwd() * Add dropout to mha_varlen_fwd() * Update ck to develop * Extract padding function for dropout randval * Extract randval transformation function * Sync the code structure and coding style with FA * Remove this line, c++ api will handle this. Sync with test_flash_attn.py * fix compile error * Add mha_bwd * Generate dropout seed and offset from user generator * update CK * Add mha_varlen_bwd * Use same python as build flash-attn to generate ck kernel * Fix bug of group mode fwd about returning softmax lse * larger the test tollerance * Add test_flash_attn_output() and test_flash_attn_varlen_output() * Always fill softmax_lse * Remove duplicate benchmark script, since we already implement mha_bwd * Refine get value from tuple * Use default parameter for stream_config * unblock all platform * Add comment * refine the test code * Refine naming * Add unpack to namespace * Do not hardcode the warp size 64 * Add more targets * Add README * Optimize mha_fwd if seqlen_q == 1 * Support get_wheel_url for rocm * Detect rocm environment by pytorch's IS_HIP_EXTENSION * update to lastest ck * Add necessary compile flag * Sync the api with upstream FA --------- Co-authored-by:
carlushuang <carlus.huang@amd.com> Co-authored-by:
Yichen Yan <wenji.yyc@alibaba-inc.com> Co-authored-by:
Po Yen Chen <PoYen.Chen@amd.com> Co-authored-by:
Yichen Yan <oraluben@outlook.com>
-
- 11 Jul, 2024 1 commit
-
-
Tri Dao authored
-
- 10 Jul, 2024 2 commits
- 08 Jul, 2024 1 commit
-
-
Nicolas Patry authored
* Softcap v2 (fwd only). * Some missing interface + remove overrides in tests.
-
- 26 May, 2024 1 commit
-
-
Corey James Levinson authored
When timeout connecting, you get URLError: <urlopen error timed out>, In that case, build it from source.
-
- 06 May, 2024 1 commit
-
-
Wei Ji authored
Set `packaging` and `ninja` as build time dependencies rather than runtime dependencies.
-
- 08 Apr, 2024 1 commit
-
-
Tri Dao authored
-
- 14 Mar, 2024 2 commits
-
-
Arvind Sundararajan authored
-
Chirag Jain authored
-
- 18 Feb, 2024 1 commit
-
-
Qubitium authored
Optimize compile to 1: avoid oom 2: minimize swap usage 3: avoid threads starvation when letting ninja decide how many workers to spawn or manual MAX_JOBS "guesses". Logic is to take the min value of MAX_JOBS auto-calculated by two metrics: 1: cpu cores 2: free memory. This should allow flash-attn to compile close to the most efficient manner under any consumer/server env. (#832)
-
- 28 Nov, 2023 1 commit
-
-
Tri Dao authored
-
- 04 Oct, 2023 1 commit
-
-
Tri Dao authored
-
- 24 Sep, 2023 1 commit
-
-
Tri Dao authored
-
- 22 Sep, 2023 1 commit
-
-
Tri Dao authored
-
- 18 Sep, 2023 3 commits
-
-
Tri Dao authored
-
Federico Berto authored
* Add nvcc note on bare_metal_version `RuntimeError` * Run Black formatting
-
Tri Dao authored
-
- 12 Sep, 2023 1 commit
-
-
Tri Dao authored
-
- 04 Sep, 2023 1 commit
-
-
Tri Dao authored
-
- 29 Aug, 2023 1 commit
-
-
Tri Dao authored
-
- 18 Aug, 2023 1 commit
-
-
Tri Dao authored
-
- 14 Aug, 2023 2 commits
-
-
Aman Gupta Karmani authored
-
Tri Dao authored
-
- 13 Aug, 2023 1 commit
-
-
Tri Dao authored
-
- 01 Aug, 2023 1 commit
-
-
Tri Dao authored
-
- 17 Jul, 2023 1 commit
-
-
Tri Dao authored
-
- 08 Jun, 2023 2 commits
-
-
Pierce Freeman authored
-
Pierce Freeman authored
-
- 03 Jun, 2023 6 commits
-
-
Pierce Freeman authored
-
Pierce Freeman authored
-
Pierce Freeman authored
-
Pierce Freeman authored
-
Pierce Freeman authored
-
Pierce Freeman authored
-
- 19 May, 2023 1 commit
-
-
Max H. Gerlach authored
-
- 12 May, 2023 1 commit
-
-
Tri Dao authored
-
- 26 Apr, 2023 1 commit
-
-
Tri Dao authored
-
- 21 Apr, 2023 1 commit
-
-
Tri Dao authored
-