Commits · 0fa7175f0936db7fbe303ca47b25fafca49ef032 · OpenDAS / Megatron-LM

19 Mar, 2021 1 commit
- Bfloat fused softmax + fused layer norm · 0fa7175f
  Mohammad Shoeybi authored Mar 19, 2021
  
  0fa7175f
08 Mar, 2021 1 commit
- Bfloat with fp32 grad acc · b4bc51b1
  Mohammad Shoeybi authored Mar 08, 2021
  
  b4bc51b1
25 Feb, 2021 1 commit
- Don't import deprecated model from realm_model which is broken. · c6f18ccf
  Jared Casper authored Feb 25, 2021
  
  c6f18ccf
23 Feb, 2021 1 commit
- Added code for building embeddings and savings · bcd605f8
  Mostofa Patwary authored Feb 23, 2021
  
  bcd605f8
15 Feb, 2021 1 commit
- addressed the comments given by Mohammad · 447c1171
  Mostofa Patwary authored Feb 15, 2021
  
  447c1171
13 Feb, 2021 1 commit
- More comments and some cleanup (e.g., better variable names) · 5489bda9
  Deepak Narayanan authored Feb 13, 2021
  
  5489bda9
09 Feb, 2021 1 commit

Interleaved pipeline execution and code refactoring · dd889062

Deepak Narayanan authored Dec 12, 2020

- Split a model's computation into multiple virtual stages as needed,
and schedule communication correctly between these virtual stages
- Move schedule code into `schedules.py` and communication code into
`p2p_communication.py`
- Use hyphens instead of spaces in all time logging for consistency
- Factor out code in megatron/training.py into helper functions
- Refactor evaluate() function: make it use forward_backward_schedule
functions

dd889062

05 Feb, 2021 1 commit
- address review comments · 0cb36de2
  Vijay Korthikanti authored Feb 05, 2021
  
  0cb36de2
04 Feb, 2021 1 commit
- conditioning fused kernels · 4916bae6
  Vijay Korthikanti authored Feb 04, 2021
  
  4916bae6
01 Feb, 2021 1 commit
- cleared the commented codes · e919dd8e
  Mostofa Patwary authored Feb 01, 2021
  
  e919dd8e
29 Jan, 2021 2 commits
- WIP: main_retriver_merge · 0295bb89
  Mostofa Patwary authored Jan 29, 2021
  
  0295bb89
- WIP: main_retriver_merge · 17d897e0
  Mostofa Patwary authored Jan 29, 2021
  
  17d897e0
27 Jan, 2021 3 commits
- Move rearranging query_key_value and key_value values in old checkpoints to... · 76960d7c
  Jared Casper authored Jan 27, 2021
```
Move rearranging query_key_value and key_value values in old checkpoints to when the checkpoint is loaded instead of runtime..
```
  76960d7c
- Teach merge_mp_partitions how to write out a pipelined model. · 7cabbe67
  Jared Casper authored Jan 27, 2021
  
  7cabbe67
- vit model does not get imported automatically · ab507293
  mohammad authored Jan 26, 2021
  
  ab507293
26 Jan, 2021 2 commits
- added params norms · 57d1356e
  mohammad authored Jan 25, 2021
  
  57d1356e
- Adressing more review comments · e6c7b05e
  Vijay Korthikanti authored Jan 25, 2021
  
  e6c7b05e
25 Jan, 2021 2 commits
- Clarify module.initialize_word_embeddings. · 7be2648a
  Jared Casper authored Jan 25, 2021
  
  7be2648a
- Use set_tensor_model_parallel_attributes in bert_model as well. · c4c68dce
  Jared Casper authored Jan 25, 2021
  
  c4c68dce
22 Jan, 2021 3 commits
- Fixing merge_mp_partitions · 78066ab0
  Jared Casper authored Jan 20, 2021
  
  78066ab0
- Addressing review comments · a7169297
  Vijay Korthikanti authored Jan 22, 2021
  
  a7169297
- attention_mask_func cleanup · ebf8b89e
  Vijay Korthikanti authored Jan 22, 2021
  
  ebf8b89e
13 Jan, 2021 1 commit
- Adressing more review comments · 4ae54b55
  Vijay Korthikanti authored Jan 12, 2021
  
  4ae54b55
12 Jan, 2021 3 commits
- Adress more review comments · d836d498
  Vijay Korthikanti authored Jan 12, 2021
  
  d836d498
- address review comments · 4b3519cb
  Vijay Korthikanti authored Jan 12, 2021
  
  4b3519cb
- Readme update + change gpt2 to gpt · 152aab30
  Mohammad Shoeybi authored Jan 11, 2021
  
  152aab30
09 Jan, 2021 2 commits
- minor fixes · 834d6dd5
  Vijay Korthikanti authored Jan 08, 2021
  
  834d6dd5
- decoder support in transformers · 4b506832
  Vijay Korthikanti authored Jan 08, 2021
  
  4b506832
08 Jan, 2021 1 commit
- vision transformer model and vision classification task · 456f1728
  Vijay Korthikanti authored Jan 08, 2021
  
  456f1728
05 Jan, 2021 1 commit
- Only create task heads on last pipeline stage. · f772fbc9
  Jared Casper authored Jan 05, 2021
  
  f772fbc9
27 Dec, 2020 1 commit
- moved module to model and removed fp16 · b84d7a90
  mohammad authored Dec 26, 2020
  
  b84d7a90
25 Dec, 2020 1 commit
- moved entire optimizer build and tested · 28062e14
  mohammad authored Dec 24, 2020
  
  28062e14
22 Dec, 2020 3 commits
- Address Deepak's comments · 83671bbf
  mshoeybi authored Dec 21, 2020
  
  83671bbf
- Add residual connection in fp32 machinery · 8bed1d63
  mohammad authored Dec 21, 2020
  
  8bed1d63
- Add the option for fp32 residual connection (fp32 residual connection... · 62632d39
  mohammad authored Dec 21, 2020
```
Add the option for fp32 residual connection (fp32 residual connection machinery still needs to be added)
```
  62632d39
19 Dec, 2020 1 commit
- Add pipelining to GLUE and RACE tasks · caa9dca5
  Jared Casper authored Nov 30, 2020
  
  caa9dca5
17 Nov, 2020 1 commit
- Update code used for finetuning to latest API. · b219ff00
  Jared Casper authored Nov 16, 2020
  
  b219ff00
12 Nov, 2020 3 commits
- Small bugfix in bert_model.py: make sure word_embeddings is initialized before... · 69a546be
  Deepak Narayanan authored Nov 12, 2020
```
Small bugfix in bert_model.py: make sure word_embeddings is initialized before instantiating lm_head
```
  69a546be
- Refactor code according to Jared's comments: move pipelining and... · 1979c242
  Deepak Narayanan authored Nov 12, 2020
```
Refactor code according to Jared's comments: move pipelining and non-pipelining training loops into separate methods

Also, use mpu.get_*_model_parallel_size() instead of args.*_model_parallel_size
```
  1979c242
- Only transpose hidden_states when necessary · 6abf39be
  Deepak Narayanan authored Nov 03, 2020
  
  6abf39be