Commits · 1d4e8760cbf6cdb00c5f6ba2d676f8fb5a87fa7a · OpenDAS / Megatron-LM

19 Dec, 2020 16 commits
- Fix text generation without recompute · 1d4e8760
  Jared Casper authored Dec 10, 2020
  
  1d4e8760
- Nicer error messages for deprecated arguments · 2623551d
  Jared Casper authored Dec 10, 2020
  
  2623551d
- Change lr-warmup-percent to lr-warmup-fraction · 9321d5c6
  Jared Casper authored Dec 10, 2020
  
  9321d5c6
- Work batch-size name changes into task code · 3afcba6e
  Jared Casper authored Dec 09, 2020
  
  3afcba6e
- Initial implementation of pipelined text generation · 5c45db4a
  Jared Casper authored Dec 09, 2020
  
  5c45db4a
- Add pipelining to GLUE and RACE tasks · caa9dca5
  Jared Casper authored Nov 30, 2020
  
  caa9dca5
- Better memory tracking across pipeline-parallel ranks · 3574b8e6
  Deepak Narayanan authored Dec 06, 2020
  
  3574b8e6
- Address Jared's comments · 00ac56ab
  mohammad authored Dec 09, 2020
  
  00ac56ab
- Sample based learning rate computation · 22ab91bb
  mohammad authored Dec 08, 2020
  
  22ab91bb
- Minor fixes for batch size rampup · 6a68502d
  mohammad authored Dec 08, 2020
  
  6a68502d
- Support for ramping up the batch size · de0b70a0
  mohammad authored Dec 08, 2020
  
  de0b70a0
- Minor refactoring · c30ba0f7
  mohammad authored Dec 08, 2020
  
  c30ba0f7
- Add constant num micro-batches calculator · feecd5d9
  mohammad authored Dec 07, 2020
  
  feecd5d9
- Add micro-batch size calculator · 6ea23928
  mohammad authored Dec 06, 2020
  
  6ea23928
- Rename --batch-size to --micro-batch-size and drop in-minibatch from... · 9019bbf4
  mohammad authored Dec 06, 2020
```
Rename --batch-size to --micro-batch-size and drop in-minibatch from --num-micro-batches-in-minibatch
```
  9019bbf4
- Make an eval iteration the same number of samples as a training iteration · a84a5fa0
  Jared Casper authored Dec 03, 2020
  
  a84a5fa0
03 Dec, 2020 1 commit
- found a bug in consumed tokens initialization · e2a4d426
  mohammad authored Dec 02, 2020
  
  e2a4d426
02 Dec, 2020 3 commits
- addressed Jareds comments · fa80af26
  mohammad authored Dec 02, 2020
  
  fa80af26
- addressed Jareds comments · 98989693
  mohammad authored Dec 02, 2020
  
  98989693
- addrressed jareds comments · cebd3b8b
  mohammad authored Dec 02, 2020
  
  cebd3b8b
30 Nov, 2020 2 commits
- refactored learning rate scheduler so addition of variable batch size is easier · ff12df6b
  mohammad authored Nov 29, 2020
  
  ff12df6b
- added refactored learning rate · 16193619
  mohammad authored Nov 29, 2020
  
  16193619
29 Nov, 2020 2 commits
- implemented blending datasets · 65290033
  mohammad authored Nov 28, 2020
  
  65290033
- added blendable dataset · d3bb1a06
  mohammad authored Nov 28, 2020
  
  d3bb1a06
28 Nov, 2020 1 commit
- added consumed tokens to checkpoints and some refactoring · f0a445fa
  mohammad authored Nov 27, 2020
  
  f0a445fa
26 Nov, 2020 1 commit
- simplified sampler · 4311b695
  mohammad authored Nov 25, 2020
  
  4311b695
18 Nov, 2020 1 commit
- Replace deprecated torch.norm with torch.linalg.norm. · 17035d6c
  Jared Casper authored Nov 17, 2020
  
  17035d6c
17 Nov, 2020 1 commit
- Update code used for finetuning to latest API. · b219ff00
  Jared Casper authored Nov 16, 2020
  
  b219ff00
12 Nov, 2020 12 commits
- Make sure dataloader state is the same after checkpoint is loaded · cd4822f1
  Deepak Narayanan authored Nov 12, 2020
  
  cd4822f1
- Move division of loss tensor by number of microbatches to training.py · c671de3e
  Deepak Narayanan authored Nov 12, 2020
  
  c671de3e
- Small bugfix in bert_model.py: make sure word_embeddings is initialized before... · 69a546be
  Deepak Narayanan authored Nov 12, 2020
```
Small bugfix in bert_model.py: make sure word_embeddings is initialized before instantiating lm_head
```
  69a546be
- Refactor code according to Jared's comments: move pipelining and... · 1979c242
  Deepak Narayanan authored Nov 12, 2020
```
Refactor code according to Jared's comments: move pipelining and non-pipelining training loops into separate methods

Also, use mpu.get_*_model_parallel_size() instead of args.*_model_parallel_size
```
  1979c242
- Allocate tensor in `communicate()` method directly on GPU (instead of... · 9ff6f473
  mshoeybi authored Nov 11, 2020
```
Allocate tensor in `communicate()` method directly on GPU (instead of allocating on CPU and then moving to GPU)
```
  9ff6f473
- Remove timing labels that don't make sense · fdcaeba0
  Deepak Narayanan authored Nov 10, 2020
  
  fdcaeba0
- Better communication timing · 7d367b1d
  Deepak Narayanan authored Nov 10, 2020
  
  7d367b1d
- Divide gradient by number of microbatches in minibatch · 3d7194c4
  Deepak Narayanan authored Nov 03, 2020
  
  3d7194c4
- Better 'forward' and 'backward' timing in megatron/training.py · a6756bf8
  Deepak Narayanan authored Nov 03, 2020
  
  a6756bf8
- Move training schedule to 1F1B for memory efficiency · 3e6898e6
  Deepak Narayanan authored Oct 24, 2020
  
  3e6898e6
- Only transpose hidden_states when necessary · 6abf39be
  Deepak Narayanan authored Nov 03, 2020
  
  6abf39be
- Refactor word_embeddings_weight() logic into separate method, and other Mohammad comments · 57c3b364
  Deepak Narayanan authored Nov 03, 2020
  
  57c3b364