Commits · f95c2fc108a1a63e2fdd9aed87190ac3fd492af2 · gaoqiong / flash-attention

08 Jan, 2023 3 commits
- [Gen] Remove commented code · f95c2fc1
  Tri Dao authored Jan 07, 2023
  
  f95c2fc1
- [Gen] Add timing option · b4859900
  Tri Dao authored Jan 07, 2023
  
  b4859900
- [Gen] Implement top-k and top-p sampling · e02fd588
  Tri Dao authored Jan 07, 2023
  
  e02fd588
07 Jan, 2023 2 commits
- [Gen] Test generation with rotary embedding · 11be742a
  Tri Dao authored Jan 07, 2023
  
  11be742a
- [TP] Implement TensorParallel without sequence parallel · 93383bd5
  Tri Dao authored Jan 07, 2023
  
  93383bd5
04 Jan, 2023 1 commit
- [Gen] Add option to run generation with FT attention kernel · a668890f
  Tri Dao authored Jan 03, 2023
  
  a668890f
28 Dec, 2022 2 commits
- Bump to v0.2.6 · a6ec1782
  Tri Dao authored Dec 27, 2022
  
  a6ec1782
- Implement generation for GPT · 63670fd8
  Tri Dao authored Dec 27, 2022
  
  63670fd8
27 Dec, 2022 2 commits
- Tweak CrossEntropyLoss to take process_group in init · c6ecd40a
  Tri Dao authored Dec 27, 2022
  
  c6ecd40a
- Implement Tensor Parallel for GPT model · b4018a50
  Tri Dao authored Dec 25, 2022
  
  b4018a50
24 Dec, 2022 1 commit
- Implement TensorParallel for FusedDense and FusedDenseGeluDense · 226a1b72
  Tri Dao authored Dec 23, 2022
  
  226a1b72
18 Nov, 2022 1 commit
- Add __init__.py files to subdirectories for installation · ece539ab
  Tri Dao authored Nov 17, 2022
  
  ece539ab
23 Oct, 2022 1 commit
- Move benchmark utils, support AMP · fb88e5e4
  Tri Dao authored Oct 23, 2022
  
  fb88e5e4