Commits · 26f4b5fb9bb308d0477235e2b66f64034b48db47 · gaoqiong / flash-attention

29 Jul, 2024 1 commit
- Update torch to 2.4.0 (#8) · 5a3e6ebf
  Sage Moore authored Jul 29, 2024
```
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
```
  5a3e6ebf
23 Jul, 2024 3 commits

Split bwd into more .cu files to speed up compilation · 65f723bb
Tri Dao authored Jul 23, 2024

65f723bb
Don't specialize for hdim 224 to speed up compilation · 751c762c
Tri Dao authored Jul 22, 2024

751c762c

Support AMD ROCm on FlashAttention 2 (#1010) · d8f104e9

rocking authored Jul 23, 2024



* Support ck in fmha

* Add ck submodule

* Do not return lse if return_softmax == false

* Use receipt to speed up ck compile time

* Integrate new version of ck_tile

* Support dropout for mha_fwd()

* Add dropout to mha_varlen_fwd()

* Update ck to develop

* Extract padding function for dropout randval

* Extract randval transformation function

* Sync the code structure and coding style with FA

* Remove this line, c++ api will handle this.
Sync with test_flash_attn.py

* fix compile error

* Add mha_bwd

* Generate dropout seed and offset from user generator

* update CK

* Add mha_varlen_bwd

* Use same python as build flash-attn to generate ck kernel

* Fix bug of group mode fwd about returning softmax lse

* larger the test tollerance

* Add test_flash_attn_output() and test_flash_attn_varlen_output()

* Always fill softmax_lse

* Remove duplicate benchmark script, since we already implement mha_bwd

* Refine get value from tuple

* Use default parameter for stream_config

* unblock all platform

* Add comment

* refine the test code

* Refine naming

* Add unpack to namespace

* Do not hardcode the warp size 64

* Add more targets

* Add README

* Optimize mha_fwd if seqlen_q == 1

* Support get_wheel_url for rocm

* Detect rocm environment by pytorch's IS_HIP_EXTENSION

* update to lastest ck

* Add necessary compile flag

* Sync the api with upstream FA

---------
Co-authored-by: carlushuang <carlus.huang@amd.com>
Co-authored-by: Yichen Yan <wenji.yyc@alibaba-inc.com>
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
Co-authored-by: Yichen Yan <oraluben@outlook.com>

d8f104e9

11 Jul, 2024 1 commit
- [CI] Switch from CUDA 12.2 to 12.3 · 844912dc
  Tri Dao authored Jul 11, 2024
  
  844912dc
10 Jul, 2024 2 commits
- Split into more .cu files to speed up compilation · 908511b2
  Tri Dao authored Jul 10, 2024
  
  908511b2
- Drop support for pytorch 1.12, 1.13, and python 3.7 · beb2bf2a
  Tri Dao authored Jul 09, 2024
  
  beb2bf2a
08 Jul, 2024 1 commit
- Implement softcapping. (#1025) · 8f873cc6
  Nicolas Patry authored Jul 08, 2024
```
* Softcap v2 (fwd only).

* Some missing interface + remove overrides in tests.
```
  8f873cc6
07 Jun, 2024 1 commit
- Upgrade to torch 2.3.1 (#5) · ba625d50
  Woosuk Kwon authored Jun 06, 2024
  
  ba625d50
26 May, 2024 1 commit

add exception to Timeout Error (#963) · beb8b8ba

Corey James Levinson authored May 26, 2024

When timeout connecting, you get URLError: <urlopen error timed out>, In that case, build it from source.

beb8b8ba

19 May, 2024 1 commit
- Remove dropout & Uneven K · eee8e47c
  Woosuk Kwon authored May 19, 2024
  
  eee8e47c
06 May, 2024 2 commits
- Upgrade to PyTorch 2.3.0 · a45acf9f
  Woosuk Kwon authored May 06, 2024
  
  a45acf9f
- Move packaging and ninja from install_requires to setup_requires (#937) · 9c0e9ee8
  Wei Ji authored May 07, 2024
```
Set `packaging` and `ninja` as build time dependencies rather than runtime dependencies.
```
  9c0e9ee8
22 Apr, 2024 2 commits
- Pin PyTorch and CUDA versions · cb02853e
  Woosuk Kwon authored Apr 22, 2024
  
  cb02853e
- Upgrade to PyTorch 2.2.1 · 0e892ad6
  Woosuk Kwon authored Apr 22, 2024
  
  0e892ad6
08 Apr, 2024 1 commit
- [CI] Compile with torch 2.3.0.dev20240207 · 2aea958f
  Tri Dao authored Apr 07, 2024
  
  2aea958f
28 Mar, 2024 2 commits
- flash-attn -> vllm-flash-attn · 498cd8c3
  Woosuk Kwon authored Mar 28, 2024
  
  498cd8c3
- Remove backward pass · 47ad0761
  Woosuk Kwon authored Mar 28, 2024
  
  47ad0761
14 Mar, 2024 2 commits
- Support ARM builds (#757) · 26c9e827
  Arvind Sundararajan authored Mar 13, 2024
  
  26c9e827
- Make nvcc threads configurable via environment variable (#885) · 50896ec5
  Chirag Jain authored Mar 14, 2024
  
  50896ec5
18 Feb, 2024 1 commit

Optimize compile to 1: avoid oom 2: minimize swap usage 3: avoid threads... · f45bbb4c

Qubitium authored Feb 18, 2024

Optimize compile to 1: avoid oom 2: minimize swap usage 3: avoid threads starvation when letting ninja decide how many workers to spawn or manual MAX_JOBS "guesses". Logic is to take the min value of MAX_JOBS auto-calculated by two metrics: 1: cpu cores 2: free memory. This should allow flash-attn to compile close to the most efficient manner under any consumer/server env. (#832)

f45bbb4c

28 Nov, 2023 1 commit
- [CI] Only compile for CUDA 11.8 & 12.2, MAX_JOBS=2,add torch-nightly · d4a7c8ff
  Tri Dao authored Nov 27, 2023
  
  d4a7c8ff
04 Oct, 2023 1 commit
- [CI] Use official Pytorch 2.1, add CUDA 11.8 for Pytorch 2.1 · 5e525a8d
  Tri Dao authored Oct 03, 2023
  
  5e525a8d
24 Sep, 2023 1 commit
- Reduce number of templates for headdim > 128 · 1879e089
  Tri Dao authored Sep 23, 2023
  
  1879e089
22 Sep, 2023 1 commit
- Re-enable compilation for Hopper · bff31471
  Tri Dao authored Sep 21, 2023
  
  bff31471
18 Sep, 2023 3 commits
- [Gen] Don't use ft_attention, use flash_attn_with_kvcache instead · dfe29f5e
  Tri Dao authored Sep 18, 2023
  
  dfe29f5e
- [Minor] add nvcc note on bare_metal_version `RuntimeError` (#552) · fa3ddcba
  Federico Berto authored Sep 19, 2023
```
* Add nvcc note on bare_metal_version `RuntimeError`

* Run Black formatting
```
  fa3ddcba
- Don't compile for Pytorch 2.1 on CUDA 12.1 due to nvcc segfaults · 799f56fa
  Tri Dao authored Sep 17, 2023
  
  799f56fa
12 Sep, 2023 1 commit
- Remove some unused headers · bb9beb36
  Tri Dao authored Sep 12, 2023
  
  bb9beb36
04 Sep, 2023 1 commit
- Require CUDA 11.6+, clean up setup.py · 0c04943f
  Tri Dao authored Sep 03, 2023
  
  0c04943f
29 Aug, 2023 1 commit
- Implement splitKV attention · b1fbbd83
  Tri Dao authored Aug 29, 2023
  
  b1fbbd83
18 Aug, 2023 1 commit
- Don't need to set TORCH_CUDA_ARCH_LIST in setup.py · cbb4cf5f
  Tri Dao authored Aug 18, 2023
  
  cbb4cf5f
14 Aug, 2023 2 commits
- fix binary wheel installation when nvcc is not available (#448) · aab603af
  Aman Gupta Karmani authored Aug 14, 2023
  
  aab603af
- Use single thread compilation for cuda12.1, torch2.1 to avoid OOM CI · 9c531bdc
  Tri Dao authored Aug 14, 2023
  
  9c531bdc
13 Aug, 2023 1 commit
- Fix wheel building · 2ddeaa40
  Tri Dao authored Aug 13, 2023
  
  2ddeaa40
01 Aug, 2023 1 commit
- Fix race condition in bwd (overwriting sK) · 1c41d2b0
  Tri Dao authored Aug 01, 2023
  
  1c41d2b0
17 Jul, 2023 1 commit
- FlashAttention-2 release · 4f285b35
  Tri Dao authored Jul 17, 2023
  
  4f285b35
08 Jun, 2023 2 commits
- Clean setup.py imports · 9af165c3
  Pierce Freeman authored Jun 07, 2023
  
  9af165c3
- Add notes to github action workflow · 494b2aa4
  Pierce Freeman authored Jun 04, 2023
  
  494b2aa4
03 Jun, 2023 1 commit
- Refactor and clean of setup.py · ea2ed886
  Pierce Freeman authored Jun 02, 2023
  
  ea2ed886