Commits · 93383bd55bfffb0fa2c4584c4849971152397035 · gaoqiong / flash-attention

07 Jan, 2023 1 commit
- [TP] Implement TensorParallel without sequence parallel · 93383bd5
  Tri Dao authored Jan 07, 2023
  
  93383bd5
04 Jan, 2023 1 commit
- [Gen] Add option to run generation with FT attention kernel · a668890f
  Tri Dao authored Jan 03, 2023
  
  a668890f
01 Jan, 2023 1 commit
- [GPT] Refactor function to shard state_dict for TensorParallel · ef1ba918
  Tri Dao authored Jan 01, 2023
  
  ef1ba918
28 Dec, 2022 1 commit
- Implement generation for GPT · 63670fd8
  Tri Dao authored Dec 27, 2022
  
  63670fd8
27 Dec, 2022 3 commits
- Support loading GPT2 weights from Huggingface · 9d797d88
  Tri Dao authored Dec 27, 2022
  
  9d797d88
- Tweak CrossEntropyLoss to take process_group in init · c6ecd40a
  Tri Dao authored Dec 27, 2022
  
  c6ecd40a
- Implement Tensor Parallel for GPT model · b4018a50
  Tri Dao authored Dec 25, 2022
  
  b4018a50
20 Dec, 2022 1 commit
- Implement last_layer_subset optimization for BERT · 13cdceb3
  Tri Dao authored Dec 19, 2022
  
  13cdceb3
19 Dec, 2022 1 commit
- Implement BERT · 5fb6df0e
  Tri Dao authored Dec 18, 2022
  
  5fb6df0e