Commits · 27f8f890dff58986391b606bc7c181c3b9f5148a · gaoqiong / flash-attention

30 May, 2023 1 commit
- [FusedDense] Allocate lt_workspace on input device · 27f8f890
  Tri Dao authored May 30, 2023
  
  27f8f890
07 Apr, 2023 1 commit
- [FusedDense] Set workspace size to 32M for Hopper and 4M for others · dec4f2e9
  Tri Dao authored Apr 06, 2023
  
  dec4f2e9
15 Mar, 2023 1 commit
- Support H100 for other CUDA extensions · dc08ea1c
  Tri Dao authored Mar 15, 2023
  
  dc08ea1c
18 Jan, 2023 1 commit
- [FusedDense] Support relu, rename FusedDenseGeluDense -> FusedMLP · 88173a1a
  Tri Dao authored Jan 17, 2023
  
  88173a1a
24 Dec, 2022 1 commit
- Implement TensorParallel for FusedDense and FusedDenseGeluDense · 226a1b72
  Tri Dao authored Dec 23, 2022
  
  226a1b72
23 Dec, 2022 1 commit
- Simplify FusedDense · e68ebbe8
  Tri Dao authored Dec 22, 2022
  
  e68ebbe8
15 Nov, 2022 1 commit
- Mention that some CUDA extensions have only been tested on A100s · 43ab0b52
  Tri Dao authored Nov 15, 2022
  
  43ab0b52
14 Nov, 2022 2 commits
- Add GPT and ViT models · 2e33fc8e
  Tri Dao authored Nov 13, 2022
  
  2e33fc8e
- Add fused_dense and dropout_add_layernorm CUDA extensions · fa6d1ce4
  Tri Dao authored Nov 13, 2022
  
  fa6d1ce4