Commits · ce26d3d73d07e9779c5ba6fb2ca3bc187a34c6cc · gaoqiong / flash-attention

01 Jan, 2023 2 commits
- [Bert] Fix embedding layer norm before embedding dropout · 714c1b4f
  Tri Dao authored Jan 01, 2023
  
  714c1b4f
- [GPT] Refactor function to shard state_dict for TensorParallel · ef1ba918
  Tri Dao authored Jan 01, 2023
  
  ef1ba918
28 Dec, 2022 1 commit
- Implement generation for GPT · 63670fd8
  Tri Dao authored Dec 27, 2022
  
  63670fd8
27 Dec, 2022 2 commits
- Support loading GPT2 weights from Huggingface · 9d797d88
  Tri Dao authored Dec 27, 2022
  
  9d797d88
- Implement Tensor Parallel for GPT model · b4018a50
  Tri Dao authored Dec 25, 2022
  
  b4018a50
23 Dec, 2022 1 commit
- Simplify FusedDense · e68ebbe8
  Tri Dao authored Dec 22, 2022
  
  e68ebbe8
21 Dec, 2022 1 commit
- Implement XPos (Sun et al.) · 496e4f52
  Tri Dao authored Dec 21, 2022
  
  496e4f52
19 Dec, 2022 1 commit
- Implement BERT · 5fb6df0e
  Tri Dao authored Dec 18, 2022
  
  5fb6df0e
23 Nov, 2022 1 commit
- [ViT] Use dropout_add_ln for the 1st layer norm · 1feb9426
  Tri Dao authored Nov 23, 2022
  
  1feb9426
14 Nov, 2022 1 commit
- Add GPT and ViT models · 2e33fc8e
  Tri Dao authored Nov 13, 2022
  
  2e33fc8e