Commits · d431f16751bf42033a67f7c98251f70d225ab62f · gaoqiong / flash-attention

20 Aug, 2023 1 commit
- Import torch before flash_attn_2_cuda · d431f167
  Tri Dao authored Aug 19, 2023
  
  d431f167
19 Aug, 2023 1 commit

map custom model state_dict back to huggingface format (#465) · 7fcd3e6a

Xuechen Li authored Aug 18, 2023

* fix name.

* set inv function.

* add map back function.

* handle gqa.

* add type annotation to avoid confusion.

* fix docstr.

* test inverse remap logic.

7fcd3e6a

18 Aug, 2023 4 commits

Run isort and black on python files · f1a73d07
Tri Dao authored Aug 18, 2023

f1a73d07

support when num_heads is not divisible by world_size; resolves #459 (#461) · bb4cded1

Xuechen Li authored Aug 18, 2023

* uneql rank.

* trim.

* enable passing in number of heads for each rank.

* simplify.

* simplify.

* cleanup.

* fix col parallel.

* fix bug with row parallel.

* fit out proj.

* refac.

* fix sharding logic.

* refac sharding.

* refac.

* support multiple of.

* make fn reuseable.

* fix bug in dimensions.

* scaffold.

* test uneven heads.

* fix test by adding barrier.

* refac.

* reuse code.

* clean up.

bb4cded1

[ViT] Run black on vit.py · ada4710d
Tri Dao authored Aug 17, 2023

ada4710d
[ViT] Minor fix so it runs · a81900d4
Tri Dao authored Aug 17, 2023

a81900d4

17 Aug, 2023 4 commits
- [GPT] Run black on gpt.py · 4b661a56
  Tri Dao authored Aug 16, 2023
  
  4b661a56
- [MHA] Run black on mha.py · bec5b3d3
  Tri Dao authored Aug 16, 2023
  
  bec5b3d3
- [FusedDense] Allow Row/ColumnParallelLinear to have uneven split · cb0daccc
  Tri Dao authored Aug 16, 2023
  
  cb0daccc
- [FusedDense] Run black on fused_dense.py · bcfa7c97
  Tri Dao authored Aug 16, 2023
  
  bcfa7c97
16 Aug, 2023 1 commit
- Fix Bwd NaN for varlen when seqlen_q >> seqlen_k and causal · c65b5106
  Tri Dao authored Aug 16, 2023
  
  c65b5106
15 Aug, 2023 1 commit

enable loading hf llama checkpoints for training (#446) · 0f7853c6

Xuechen Li authored Aug 15, 2023

* prelim.

* add hf convertion fn.

* mlp.

* change name.

* fix bug.

* inverse permute.

* change comment.

* revert style changes.

* fix.

* add doc.

* revert.

* enable load safe.

* fix safe load.

* fix import.

* fix typing-related lints.

* fix ckpt loading logic.

* make single gpu work.

* test with parallel.

* ckpt format.

* enable pretrained state dict.

* remove unused imports.

* remove unused.

* mark idea related.

0f7853c6

14 Aug, 2023 3 commits
- Bump to v2.0.7 · c60851a8
  Tri Dao authored Aug 14, 2023
  
  c60851a8
- [CI] Fix MATRIX_CUDA_VERSION check · f8dccfc9
  Tri Dao authored Aug 14, 2023
  
  f8dccfc9
- Use single thread compilation for cuda12.1, torch2.1 to avoid OOM CI · 9c531bdc
  Tri Dao authored Aug 14, 2023
  
  9c531bdc
13 Aug, 2023 2 commits
- Bump to v2.0.6 · 67ae6fd7
  Tri Dao authored Aug 13, 2023
  
  67ae6fd7
- Bump to v2.0.5 · c5e87b11
  Tri Dao authored Aug 13, 2023
  
  c5e87b11
10 Aug, 2023 1 commit
- [MLP] Change the check for out_features being None · 364a5b4a
  Tri Dao authored Aug 10, 2023
  
  364a5b4a
01 Aug, 2023 3 commits
- Bump to v2.0.4 · d30f2e1c
  Tri Dao authored Aug 01, 2023
  
  d30f2e1c
- Bump to v2.0.3 · a4e5d1ed
  Tri Dao authored Jul 31, 2023
  
  a4e5d1ed
- [Docs] Fix docstring about Q nheads being divisible by KV nheads · 8f4cd4c1
  Tri Dao authored Jul 31, 2023
  
  8f4cd4c1
29 Jul, 2023 1 commit
- [GPT] Implement parallel LLaMa · 184b992d
  Tri Dao authored Jul 28, 2023
  
  184b992d
28 Jul, 2023 3 commits
- [Docs] Fix mention of MQA/GQA in qkvpacked functions · 840f7925
  Tri Dao authored Jul 28, 2023
  
  840f7925
- [Benchmark] Add script to benchmark FlashAttention · 60499abc
  Tri Dao authored Jul 28, 2023
  
  60499abc
- Request for v2.0.2 (#388) · 32a953f4
  Kirthi Shankar Sivamani authored Jul 28, 2023
```
* Bump version to 2.0.2
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Update version in Dockerfile
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

---------
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
```
  32a953f4
27 Jul, 2023 1 commit

Enable CUDA graphs (#386) · a03f6f8e

Kirthi Shankar Sivamani authored Jul 27, 2023



* Add RNG state to kernel launch params
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Save seed and offset for backward
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Single thread write to global mem
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* compute_dq_dk_dv_1colblock get seed and offset from launch params
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* compute_dq_dk_dv_1rowblock get seed and offset from launch params
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Change forward c++ APIs to save RNG state for backward
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Change backward c++ APIs to set RNG state for bprop launcher
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Bug fixes
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Python side API changes
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Bug fix; only save seeds instead of full offset
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Account for 3D grid size
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

---------
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

a03f6f8e

26 Jul, 2023 2 commits
- [MLP] Edit ParallelGatedMlp · 4c98d0b4
  Tri Dao authored Jul 26, 2023
  
  4c98d0b4
- Implement ParallelGatedMlp (#251) · 8ee62efc
  Haodong Lyu authored Jul 27, 2023
  
  8ee62efc
23 Jul, 2023 6 commits
- Bump to v2.0.1 · b2520724
  Tri Dao authored Jul 23, 2023
  
  b2520724
- [GPT] Implement Falcon · d38357dd
  Tri Dao authored Jul 23, 2023
  
  d38357dd
- Allow rotary embeddings for Bert (#363) · 684196b8
  Kiarash Jamali authored Jul 23, 2023
  
  684196b8
- [MHA] Implement MQA/GQA · 425dbcb6
  Tri Dao authored Jul 23, 2023
  
  425dbcb6
- [Rotary] Don't store inv_freq in state_dict · ec9f74ab
  Tri Dao authored Jul 22, 2023
  
  ec9f74ab
- [MLP] Add ParallelMLP · 75e334d4
  Tri Dao authored Jul 22, 2023
  
  75e334d4
22 Jul, 2023 1 commit
- [GPT] Enable FlashAttention for GPT-J · b3177dfa
  Tri Dao authored Jul 21, 2023
  
  b3177dfa
21 Jul, 2023 1 commit
- [Block] Re-enable DropPath · 6fc1e07d
  Tri Dao authored Jul 21, 2023
  
  6fc1e07d
18 Jul, 2023 1 commit
- Make sure dout is contiguous · b4cc152e
  Tri Dao authored Jul 17, 2023
  
  b4cc152e
17 Jul, 2023 2 commits
- FlashAttention-2 release · 4f285b35
  Tri Dao authored Jul 17, 2023
  
  4f285b35
- Bump to v1.0.9 · 6d48e14a
  Tri Dao authored Jul 17, 2023
  
  6d48e14a
08 Jul, 2023 1 commit

rotary: update cos/sin cache when switching from inference mode · 70ab266a

Volodymyr Kyrylov authored Jul 08, 2023

This resolves RuntimeErrors after running evaluation in inference mode:

```
  File "/home/proger/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/proger/.local/lib/python3.10/site-packages/flash_attn/modules/mha.py", line 492, in forward
    qkv = self.rotary_emb(qkv)
  File "/home/proger/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/proger/.local/lib/python3.10/site-packages/flash_attn/layers/rotary.py", line 229, in forward
    return apply_rotary_emb_qkv_(
  File "/home/proger/.local/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
RuntimeError: Inference tensors cannot be saved for backward. To work around you can make a clone to get a normal tensor and use it in autograd.
```

70ab266a