Commits · 3c458cff771c2c22ee5b3296f557194b62ccff53 · gaoqiong / flash-attention

13 Aug, 2023 4 commits

Merge branch 'feature/demo-wheels' of... · 3c458cff

Tri Dao authored Aug 13, 2023

Merge branch 'feature/demo-wheels' of https://github.com/piercefreeman/flash-attention into piercefreeman-feature/demo-wheels

* 'feature/demo-wheels' of https://github.com/piercefreeman/flash-attention: (25 commits)
  Install standard non-wheel package
  Remove release creation
  Build wheel on each push
  Isolate 2.0.0 & cuda12
  Clean setup.py imports
  Remove builder project
  Bump version
  Add notes to github action workflow
  Add torch dependency to final build
  Exclude cuda erroring builds
  Exclude additional disallowed matrix params
  Full version matrix
  Add CUDA 11.7
  Release is actually unsupported
  echo OS version
  Temp disable deploy
  OS version build numbers
  Restore full build matrix
  Refactor and clean of setup.py
  Strip cuda name from torch version
  ...

3c458cff

Prepare for Cutlass 3.2 · dbd79237
Tri Dao authored Aug 13, 2023

dbd79237
Bump to v2.0.5 · c5e87b11
Tri Dao authored Aug 13, 2023

c5e87b11
Update to Cutlass 3.1 · 3524e13c
Tri Dao authored Aug 13, 2023

3524e13c

11 Aug, 2023 4 commits
- Install standard non-wheel package · 6ef3bd80
  Pierce Freeman authored Aug 10, 2023
  
  6ef3bd80
- Remove release creation · ecc65354
  Pierce Freeman authored Aug 10, 2023
  
  ecc65354
- Build wheel on each push · bc6d4992
  Pierce Freeman authored Aug 10, 2023
  
  bc6d4992
- Isolate 2.0.0 & cuda12 · 565615c6
  Pierce Freeman authored Aug 10, 2023
  
  565615c6
10 Aug, 2023 1 commit
- [MLP] Change the check for out_features being None · 364a5b4a
  Tri Dao authored Aug 10, 2023
  
  364a5b4a
01 Aug, 2023 5 commits
- Bump to v2.0.4 · d30f2e1c
  Tri Dao authored Aug 01, 2023
  
  d30f2e1c
- Fix race condition in bwd (overwriting sK) · 1c41d2b0
  Tri Dao authored Aug 01, 2023
  
  1c41d2b0
- Bump to v2.0.3 · a4e5d1ed
  Tri Dao authored Jul 31, 2023
  
  a4e5d1ed
- [Docs] Fix docstring about Q nheads being divisible by KV nheads · 8f4cd4c1
  Tri Dao authored Jul 31, 2023
  
  8f4cd4c1
- Fix masking of bwd when seqlen is not divisible by 128 · a4f148b6
  Tri Dao authored Jul 31, 2023
  
  a4f148b6
29 Jul, 2023 1 commit
- [GPT] Implement parallel LLaMa · 184b992d
  Tri Dao authored Jul 28, 2023
  
  184b992d
28 Jul, 2023 3 commits
- [Docs] Fix mention of MQA/GQA in qkvpacked functions · 840f7925
  Tri Dao authored Jul 28, 2023
  
  840f7925
- [Benchmark] Add script to benchmark FlashAttention · 60499abc
  Tri Dao authored Jul 28, 2023
  
  60499abc
- Request for v2.0.2 (#388) · 32a953f4
  Kirthi Shankar Sivamani authored Jul 28, 2023
```
* Bump version to 2.0.2
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Update version in Dockerfile
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

---------
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
```
  32a953f4
27 Jul, 2023 1 commit

Enable CUDA graphs (#386) · a03f6f8e

Kirthi Shankar Sivamani authored Jul 27, 2023



* Add RNG state to kernel launch params
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Save seed and offset for backward
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Single thread write to global mem
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* compute_dq_dk_dv_1colblock get seed and offset from launch params
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* compute_dq_dk_dv_1rowblock get seed and offset from launch params
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Change forward c++ APIs to save RNG state for backward
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Change backward c++ APIs to set RNG state for bprop launcher
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Bug fixes
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Python side API changes
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Bug fix; only save seeds instead of full offset
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Account for 3D grid size
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

---------
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

a03f6f8e

26 Jul, 2023 4 commits
- [MLP] Edit ParallelGatedMlp · 4c98d0b4
  Tri Dao authored Jul 26, 2023
  
  4c98d0b4
- Implement ParallelGatedMlp (#251) · 8ee62efc
  Haodong Lyu authored Jul 27, 2023
  
  8ee62efc
- [GPT] Add LLaMa-13B to test · 56ccaff1
  Tri Dao authored Jul 26, 2023
  
  56ccaff1
- [Rotary] Fix tests when loading state dict with rotary inv_freqs · 8e9820a5
  Tri Dao authored Jul 26, 2023
  
  8e9820a5
23 Jul, 2023 10 commits
- Bump to v2.0.1 · b2520724
  Tri Dao authored Jul 23, 2023
  
  b2520724
- [LayerNorm] Add test for randomness · 2a2a3c4b
  Tri Dao authored Jul 23, 2023
  
  2a2a3c4b
- Fix random state for dropout_layer_norm (#315) · 767b71cc
  Joel Lamy-Poirier authored Jul 23, 2023
  
  767b71cc
- [GPT] Implement Falcon · d38357dd
  Tri Dao authored Jul 23, 2023
  
  d38357dd
- Allow rotary embeddings for Bert (#363) · 684196b8
  Kiarash Jamali authored Jul 23, 2023
  
  684196b8
- README syntax highlighting (#365) · cbf982af
  Ian Timmis authored Jul 23, 2023
```
* README syntax highlighting

Adds syntax highlighting to README

* Update README.md
```
  cbf982af
- [MHA] Implement MQA/GQA · 425dbcb6
  Tri Dao authored Jul 23, 2023
  
  425dbcb6
- [Rotary] Don't store inv_freq in state_dict · ec9f74ab
  Tri Dao authored Jul 22, 2023
  
  ec9f74ab
- [FT] Implement MQA/GQA · a157cc8c
  Tri Dao authored Jul 22, 2023
  
  a157cc8c
- [MLP] Add ParallelMLP · 75e334d4
  Tri Dao authored Jul 22, 2023
  
  75e334d4
22 Jul, 2023 1 commit
- [GPT] Enable FlashAttention for GPT-J · b3177dfa
  Tri Dao authored Jul 21, 2023
  
  b3177dfa
21 Jul, 2023 2 commits
- [Block] Re-enable DropPath · 6fc1e07d
  Tri Dao authored Jul 21, 2023
  
  6fc1e07d
- Fix using dO stride for O, which can cause memory error in bwd · 9ee0ff1d
  Tri Dao authored Jul 20, 2023
  
  9ee0ff1d
20 Jul, 2023 2 commits
- Merge pull request #360 from chuanli11/fix/dockerfile · 2dd87d06
  Tri Dao authored Jul 20, 2023
```
remove checkout v2.0.0.post1 from dockerfile
```
  2dd87d06
- remove checkout v2.0.0.post1 from dockerfile · 30fd8c17
  chuanli11 authored Jul 20, 2023
  
  30fd8c17
19 Jul, 2023 2 commits
- Merge pull request #348 from eltociear/patch-2 · b8020d73
  Tri Dao authored Jul 19, 2023
```
[LayerNorm] Fix typo in ln_api.cpp
```
  b8020d73
- [LayerNorm] Fix typo in ln_api.cpp · dfc60f6b
  Ikko Eltociear Ashimine authored Jul 20, 2023
```
unintialized -> uninitialized
```
  dfc60f6b