Commits · 81e01efd4bef00ccdaa19eaa62f9cbcb39c528eb · gaoqiong / flash-attention

10 Jul, 2024 2 commits
- More typo fixes · 81e01efd
  Tri Dao authored Jul 10, 2024
  
  81e01efd
- Fix typo with softcapping · 72e27c63
  Tri Dao authored Jul 10, 2024
  
  72e27c63
09 Jul, 2024 1 commit
- missing commas and backwards return arguments (#1032) · f4628b43
  Phil Wang authored Jul 09, 2024
```
* missing commas

* another fix
```
  f4628b43
08 Jul, 2024 2 commits

Implement softcapping. (#1025) · 8f873cc6
Nicolas Patry authored Jul 08, 2024
```
* Softcap v2 (fwd only).

* Some missing interface + remove overrides in tests.
```
8f873cc6

Add the return_softmax_lse parameter to the flash_attn_with_kvcache function... · 4e8d6006

Jianwei Dong authored Jul 08, 2024

Add the return_softmax_lse parameter to the flash_attn_with_kvcache function to allow returning the logsumexp of the attention scores. (#989)

4e8d6006

01 Jul, 2024 1 commit

Fix KeyError handling for non-existing key in state_dict.pop() (#898) · 0d810cfb

JDKWangGuan authored Jun 30, 2024

Update handling for KeyError in state_dict.pop() for non-existing keys.
Changed state_dict.pop(f"h.{d}.attn.bias") to state_dict.pop(f"h.{d}.attn.bias", None) to prevent KeyError exceptions.


The following code can re-produce the issue
```
from transformers import AutoTokenizer, GPT2Model, GPT2Config
from flash_attn.models.gpt import GPTLMHeadModel, GPTModel

# >>> transformers.__version__
# '4.38.2'

model_path = 'gpt2'
output_model_path = 'gpt2_model'
config = GPT2Config.from_pretrained(model_path, output_hidden_states=True)
model = GPT2Model.from_pretrained(model_path, from_tf=False, config=config)
'''
model fine-tuning here
'''
# dump the fine-tuned model
model.save_pretrained(output_model_path)

# load the fine-tuned model
config = GPT2Config.from_pretrained(output_model_path, output_hidden_states=True)
model = GPTModel.from_pretrained(output_model_path, config=config, strict=True)  # failed due to KeyError: 'h.0.attn.bias'
model = GPTLMHeadModel.from_pretrained(output_model_path, config=config, strict=True)  # failed due to KeyError: 'h.0.attn.bias'

```

0d810cfb

27 Jun, 2024 1 commit

Support unpadded LSE layout (#970) · f816dee6

Grigory Sizov authored Jun 27, 2024



* Support unpadded LSE layout.
Co-authored-by: Xinfeng Xie <xfxie.ceca@gmail.com>
Co-authored-by: Jianyu Huang <hjyahead@gmail.com>

* Cleanup

* Fix unpadded LSE on split-kv path

* Fix formatting and comments

* Fix inline vs forceinline

---------
Co-authored-by: Xinfeng Xie <xfxie.ceca@gmail.com>
Co-authored-by: Jianyu Huang <hjyahead@gmail.com>

f816dee6

26 May, 2024 3 commits
- Update citation · 320fb594
  Tri Dao authored May 26, 2024
  
  320fb594
- Limit to MAX_JOBS=1 with CUDA 12.2 · e2e4333c
  Tri Dao authored May 26, 2024
  
  e2e4333c
- Bump to 2.5.9 · ce735035
  Tri Dao authored May 26, 2024
  
  ce735035
23 May, 2024 1 commit
- remove an unused import (#960) · 22339db1
  lancerts authored May 23, 2024
  
  22339db1
26 Apr, 2024 2 commits
- Bump to v2.5.8 · 9a11f440
  Tri Dao authored Apr 26, 2024
  
  9a11f440
- [CrossEntropy] Change ignored_index -> ignore_index · ec6d2214
  Tri Dao authored Apr 26, 2024
  
  ec6d2214
08 Apr, 2024 1 commit
- Bump to v2.5.7 · 85881f54
  Tri Dao authored Apr 07, 2024
  
  85881f54
05 Apr, 2024 1 commit

Fix spurious re-compilations of `rotary_kernel` (#911) · f692b98d

Ivan Komarov authored Apr 05, 2024

All integer parameters are specialized by default, so the two parameters
removed in this commit could lead to kernel re-compilation, even if
they were completely unused.

f692b98d

19 Mar, 2024 1 commit
- [LayerNorm] Update layer_norm_linear · 36587c01
  Tri Dao authored Mar 18, 2024
  
  36587c01
15 Mar, 2024 2 commits
- fix: cast the alibi slopes to torch.float32 (#846) · 6bbc5323
  Markus Krimmel authored Mar 15, 2024
  
  6bbc5323
- Enable paged attention in varlen forward (#831) · 2a15840f
  Grigory Sizov authored Mar 15, 2024
```
* Enable paged attention in varlen forward

* Format + fix padding
```
  2a15840f
02 Mar, 2024 1 commit
- Bump to v2.5.6 · 6c9e60de
  Tri Dao authored Mar 01, 2024
  
  6c9e60de
21 Feb, 2024 2 commits
- Bump to v2.5.5 · 87a12776
  Tri Dao authored Feb 21, 2024
  
  87a12776
- Bump to v2.5.4 · 43950dda
  Tri Dao authored Feb 20, 2024
  
  43950dda
10 Feb, 2024 2 commits
- Bump to v2.5.3 · 5cdabc28
  Tri Dao authored Feb 10, 2024
  
  5cdabc28
- Add window_size option to ParallelMHA · a190df01
  Tri Dao authored Feb 04, 2024
  
  a190df01
31 Jan, 2024 2 commits
- Bump to v2.5.2 · 61a77724
  Tri Dao authored Jan 31, 2024
  
  61a77724
- Add window_size option to MHA and GPT · ef0ed106
  Tri Dao authored Jan 31, 2024
  
  ef0ed106
30 Jan, 2024 2 commits
- [CI] Install torch 2.3 using index · dc72d960
  Tri Dao authored Jan 30, 2024
  
  dc72d960
- Bump to v2.5.1 · daf37a9d
  Tri Dao authored Jan 29, 2024
  
  daf37a9d
27 Jan, 2024 1 commit

Updated missing docstrings for args and returns in bert_padding.py (#795) · c94cd097

Avelina9X authored Jan 27, 2024

* Updated docstrings of bert_padding.py

Added docstrings for missing arguments in the unpad and pad methods.

* Update bert_padding.py

Fixed spelling mistakes

c94cd097

23 Jan, 2024 4 commits
- Fixes an error in comment (#785) · 204c3c6d
  Tao He authored Jan 24, 2024
```
Signed-off-by: Tao He <sighingnow@gmail.com>
```
  204c3c6d
- Bump to v2.5.0 · 197f2083
  Tri Dao authored Jan 22, 2024
  
  197f2083
- Implement page KV cache · 54e80a38
  Tri Dao authored Jan 22, 2024
```
Co-authored-by: ljss <450993438@qq.com>
```
  54e80a38
- [LayerNorm] Don't exit early in the backward pass (fix #781) · bdcae547
  Tri Dao authored Jan 22, 2024
  
  bdcae547
22 Jan, 2024 2 commits
- [CI] Fix CUDA 12.2.2 compilation · e43a4cea
  Tri Dao authored Jan 21, 2024
  
  e43a4cea
- Bump to v2.4.3 · f9d73761
  Tri Dao authored Jan 21, 2024
  
  f9d73761
21 Jan, 2024 1 commit
- return z_loss (#768) · d8aacc51
  Curtis "Fjord" Hawthorne authored Jan 21, 2024
  
  d8aacc51
13 Jan, 2024 1 commit
- Simplify writing softmax to gmem · a7b66ae2
  Tri Dao authored Jan 13, 2024
  
  a7b66ae2
10 Jan, 2024 1 commit
- [LayerNorm] Initialize mean and rstd tensor using x.device · c9861a03
  Tri Dao authored Jan 09, 2024
  
  c9861a03
05 Jan, 2024 3 commits
- [LayerNorm] Switch from CUDA to Triton implementation · abbc1311
  Tri Dao authored Jan 05, 2024
  
  abbc1311
- [LayerNorm] Rename layernorm.py -> layer_norm.py · f5b308e2
  Tri Dao authored Jan 05, 2024
  
  f5b308e2
- [LayerNorm] Implement parallel layer norm in Triton · 665b55e2
  Tri Dao authored Jan 04, 2024
  
  665b55e2