Commits · f1e01c27ba92f62339338113fef0b0cee9e81443 · gaoqiong / flash-attention

15 Jan, 2023 2 commits
- [Gen] Pass qkv_stride to ft_attention kernel for batched generation · f1e01c27
  Tri Dao authored Jan 15, 2023
  
  f1e01c27
- [Gen] Make generation work with Tensor Parallel · 7c219154
  Tri Dao authored Jan 15, 2023
  
  7c219154
08 Jan, 2023 3 commits
- [Gen] Add timing option · b4859900
  Tri Dao authored Jan 07, 2023
  
  b4859900
- [Gen] Adjust shape of kv_cache when using FT · 0938298e
  Tri Dao authored Jan 07, 2023
  
  0938298e
- [Gen] Implement top-k and top-p sampling · e02fd588
  Tri Dao authored Jan 07, 2023
  
  e02fd588
07 Jan, 2023 2 commits
- [Gen] Test generation with rotary embedding · 11be742a
  Tri Dao authored Jan 07, 2023
  
  11be742a
- [TP] Implement TensorParallel without sequence parallel · 93383bd5
  Tri Dao authored Jan 07, 2023
  
  93383bd5
04 Jan, 2023 1 commit
- [Gen] Add option to run generation with FT attention kernel · a668890f
  Tri Dao authored Jan 03, 2023
  
  a668890f
01 Jan, 2023 1 commit
- [GPT] Refactor function to shard state_dict for TensorParallel · ef1ba918
  Tri Dao authored Jan 01, 2023
  
  ef1ba918
28 Dec, 2022 1 commit
- Implement generation for GPT · 63670fd8
  Tri Dao authored Dec 27, 2022
  
  63670fd8
27 Dec, 2022 3 commits
- Support loading GPT2 weights from Huggingface · 9d797d88
  Tri Dao authored Dec 27, 2022
  
  9d797d88
- Tweak CrossEntropyLoss to take process_group in init · c6ecd40a
  Tri Dao authored Dec 27, 2022
  
  c6ecd40a
- Implement Tensor Parallel for GPT model · b4018a50
  Tri Dao authored Dec 25, 2022
  
  b4018a50
20 Dec, 2022 1 commit
- Implement last_layer_subset optimization for BERT · 13cdceb3
  Tri Dao authored Dec 19, 2022
  
  13cdceb3
19 Dec, 2022 1 commit
- Implement BERT · 5fb6df0e
  Tri Dao authored Dec 18, 2022
  
  5fb6df0e