Commits · dfe1a59e4bcd932248f1e72b7c39cb2f80c18d2e · gaoqiong / flash-attention

23 Jul, 2024 1 commit

Add var-seq-len to FA3 fp16 / bf16 fwd (#1072) · dfe1a59e

Ying Zhang authored Jul 22, 2024



* fwd var-seq-len

* fixes

* benchmark

* fixes

---------
Co-authored-by: Tri Dao <tridao@users.noreply.github.com>

dfe1a59e

22 Jul, 2024 4 commits

Remove torchlib dependency from cpp files (#1083) · cb516f85
Cameron Shinn authored Jul 22, 2024

cb516f85

backwards for softcapping (#1033) · 5f1ae4a3

Phil Wang authored Jul 21, 2024

* check in the two ways of approaching backwards for softcapping, both functional

* prepare the softcap switch for backwards

* temporary

* cleanup to the way Tri prefers

* calculate dtanh when copying from scores -> dtanh Tensor

* no ternary operators allowed for constexpr, so just use some hack found online

* fix maybe_dtanh, restore some files

* restore another file

* move calculate_dtanh to utils and colocate with apply_softcap

* cleanup

* maybe last cleanup

* save for another pr

* remove a stray line

* fix spacing

* fix an issue, and make test_flash_attn.py ready to test softcapping backwards

5f1ae4a3

remove lambda (#1056) · ef3e358a
youkaichao authored Jul 21, 2024

ef3e358a
catch typo (#1058) · 4df62e14
Jorge António authored Jul 22, 2024

4df62e14

15 Jul, 2024 1 commit
- [FA3] BF16 forward · 74b0761f
  Tri Dao authored Jul 14, 2024
  
  74b0761f
13 Jul, 2024 1 commit
- Pass seqused_k to _flash_attn_varlen_forward · 898dd4bb
  Tri Dao authored Jul 13, 2024
  
  898dd4bb
11 Jul, 2024 8 commits
- Add FA3 image · 7ef24848
  Tri Dao authored Jul 11, 2024
  
  7ef24848
- FA3 initial code release · 7f67966c
  Tri Dao authored Jul 11, 2024
  
  7f67966c
- Temporarily switch to cutlass fork for more shapes · b4a9dd6c
  Tri Dao authored Jul 11, 2024
  
  b4a9dd6c
- Bump to v2.6.1 · 7551202c
  Tri Dao authored Jul 11, 2024
  
  7551202c
- [CI] Switch from CUDA 12.2 to 12.3 · 844912dc
  Tri Dao authored Jul 11, 2024
  
  844912dc
- Implement cache_leftpad · 40e534a7
  Tri Dao authored Jul 11, 2024
  
  40e534a7
- [CI] Compile with pytorch 2.4.0.dev20240514 · 116b05f9
  Tri Dao authored Jul 11, 2024
  
  116b05f9
- Bump v2.6.0 · da11d1b8
  Tri Dao authored Jul 10, 2024
  
  da11d1b8
10 Jul, 2024 8 commits
- Relax dropout_fraction test · d0787acc
  Tri Dao authored Jul 10, 2024
  
  d0787acc
- Don't support softcap and dropout at the same time · dca6d89d
  Tri Dao authored Jul 10, 2024
```
These tests are failing so I'm just disabling this case for now
```
  dca6d89d
- More typo fixes · 81e01efd
  Tri Dao authored Jul 10, 2024
  
  81e01efd
- Fix typo with softcapping · 72e27c63
  Tri Dao authored Jul 10, 2024
  
  72e27c63
- Only test backward if there's no softcapping · 3d41db3e
  Tri Dao authored Jul 10, 2024
  
  3d41db3e
- Split into more .cu files to speed up compilation · 908511b2
  Tri Dao authored Jul 10, 2024
  
  908511b2
- Minor cleanup of softcapping · 1d536d7d
  Tri Dao authored Jul 09, 2024
  
  1d536d7d
- Drop support for pytorch 1.12, 1.13, and python 3.7 · beb2bf2a
  Tri Dao authored Jul 09, 2024
  
  beb2bf2a
09 Jul, 2024 1 commit
- missing commas and backwards return arguments (#1032) · f4628b43
  Phil Wang authored Jul 09, 2024
```
* missing commas

* another fix
```
  f4628b43
08 Jul, 2024 2 commits

Implement softcapping. (#1025) · 8f873cc6
Nicolas Patry authored Jul 08, 2024
```
* Softcap v2 (fwd only).

* Some missing interface + remove overrides in tests.
```
8f873cc6

Add the return_softmax_lse parameter to the flash_attn_with_kvcache function... · 4e8d6006

Jianwei Dong authored Jul 08, 2024

Add the return_softmax_lse parameter to the flash_attn_with_kvcache function to allow returning the logsumexp of the attention scores. (#989)

4e8d6006

03 Jul, 2024 1 commit
- Fix the varlen deterministic test (#1023) · 6df7e0a0
  muoshuosha authored Jul 04, 2024
```
Co-authored-by: moshuosha <moshuosha@qq.com>
```
  6df7e0a0
01 Jul, 2024 5 commits

Fix typos of comments about shape. (#837) · 9486635c
66RING authored Jul 01, 2024

9486635c

Fix KeyError handling for non-existing key in state_dict.pop() (#898) · 0d810cfb

JDKWangGuan authored Jun 30, 2024

Update handling for KeyError in state_dict.pop() for non-existing keys.
Changed state_dict.pop(f"h.{d}.attn.bias") to state_dict.pop(f"h.{d}.attn.bias", None) to prevent KeyError exceptions.


The following code can re-produce the issue
```
from transformers import AutoTokenizer, GPT2Model, GPT2Config
from flash_attn.models.gpt import GPTLMHeadModel, GPTModel

# >>> transformers.__version__
# '4.38.2'

model_path = 'gpt2'
output_model_path = 'gpt2_model'
config = GPT2Config.from_pretrained(model_path, output_hidden_states=True)
model = GPT2Model.from_pretrained(model_path, from_tf=False, config=config)
'''
model fine-tuning here
'''
# dump the fine-tuned model
model.save_pretrained(output_model_path)

# load the fine-tuned model
config = GPT2Config.from_pretrained(output_model_path, output_hidden_states=True)
model = GPTModel.from_pretrained(output_model_path, config=config, strict=True)  # failed due to KeyError: 'h.0.attn.bias'
model = GPTLMHeadModel.from_pretrained(output_model_path, config=config, strict=True)  # failed due to KeyError: 'h.0.attn.bias'

```

0d810cfb

fix typo (#974) · 6a2a16e9
cao lei authored Jun 30, 2024

6a2a16e9

Fixing argument checking when using `seqlenq_ngroups_swapped`. (#976) · 5bf20196

Nicolas Patry authored Jul 01, 2024

When user send `out` as a parameter of the function
`seqlenq_ngroups_swapped` with parameters that trigger,
the CHECK_SHAPE is incorrect (since q shape is modified.)

5bf20196

remove swizzle part of `sV.data()` to get a completely non-swizzle `sVtNoSwizzle` (#984) · ab59ec35
Liang authored Jul 01, 2024
```
Co-authored-by: zl <zl@deepseek.com>
```
ab59ec35

27 Jun, 2024 1 commit

Support unpadded LSE layout (#970) · f816dee6

Grigory Sizov authored Jun 27, 2024



* Support unpadded LSE layout.
Co-authored-by: Xinfeng Xie <xfxie.ceca@gmail.com>
Co-authored-by: Jianyu Huang <hjyahead@gmail.com>

* Cleanup

* Fix unpadded LSE on split-kv path

* Fix formatting and comments

* Fix inline vs forceinline

---------
Co-authored-by: Xinfeng Xie <xfxie.ceca@gmail.com>
Co-authored-by: Jianyu Huang <hjyahead@gmail.com>

f816dee6

26 May, 2024 7 commits
- Update citation · 320fb594
  Tri Dao authored May 26, 2024
  
  320fb594
- Limit to MAX_JOBS=1 with CUDA 12.2 · e2e4333c
  Tri Dao authored May 26, 2024
  
  e2e4333c
- Bump to 2.5.9 · ce735035
  Tri Dao authored May 26, 2024
  
  ce735035
- Update to Cutlass 3.5 · d732be1e
  Tri Dao authored May 26, 2024
  
  d732be1e
- [CI] Compile for pytorch 2.4.0.dev20240407 (for nvcr 24.05) · af627063
  Tri Dao authored May 26, 2024
  
  af627063
- Update for python3.12 (#870) · 40e66723
  Wongboo authored May 27, 2024
  
  40e66723
- add exception to Timeout Error (#963) · beb8b8ba
  Corey James Levinson authored May 26, 2024
```
When timeout connecting, you get URLError: <urlopen error timed out>, In that case, build it from source.
```
  beb8b8ba