Commits · 1b3dfa2ff9fe1643e15ddd1cf775abcdb2146f13 · OpenDAS / Megatron-LM

"llm/llama.cpp/examples/vscode:/vscode.git/clone" did not exist on "768ab4df541275c05eec5ee5db2f89661302610d"

06 Feb, 2021 1 commit
- Use torch.cuda.synchronize() right after calling batch_isend_irecv() communication API · 1b3dfa2f
  Deepak Narayanan authored Feb 05, 2021
  
  1b3dfa2f
12 Jan, 2021 1 commit
- fixed validation loss reporting in tensorboard · 5b74f764
  mohammad authored Jan 11, 2021
  
  5b74f764
05 Jan, 2021 2 commits
- addressed Jareds and Deepaks comments · db88a27b
  mohammad authored Jan 05, 2021
  
  db88a27b
- further refactor, tested, and changed master to main · 22fa9bac
  mohammad authored Jan 04, 2021
  
  22fa9bac
04 Jan, 2021 1 commit
- Use batched send and recv instead of torch.distributed.ring_exchange() · d899988e
  Deepak Narayanan authored Jan 04, 2021
  
  d899988e
31 Dec, 2020 1 commit
- addressed deepaks comments · 43529f78
  mohammad authored Dec 30, 2020
  
  43529f78
27 Dec, 2020 1 commit
- moved module to model and removed fp16 · b84d7a90
  mohammad authored Dec 26, 2020
  
  b84d7a90
25 Dec, 2020 5 commits
- further refactoring · 0888a3e1
  mohammad authored Dec 25, 2020
  
  0888a3e1
- moved entire optimizer build and tested · 28062e14
  mohammad authored Dec 24, 2020
  
  28062e14
- megatron optimizer tested, before working on clip grad · fb218c9d
  mohammad authored Dec 24, 2020
  
  fb218c9d
- fp32 is also working · 2eaa3ccc
  mohammad authored Dec 24, 2020
  
  2eaa3ccc
- working on the new optimizer · c6a58e41
  mohammad authored Dec 24, 2020
  
  c6a58e41
23 Dec, 2020 1 commit
- Checkpoint should be saved only after evaluation pass is run to make sure... · 13bde16f
  Deepak Narayanan authored Dec 23, 2020
```
Checkpoint should be saved only after evaluation pass is run to make sure validation losses are identical after loading checkpoint
```
  13bde16f
22 Dec, 2020 1 commit
- Add the option for fp32 residual connection (fp32 residual connection... · 62632d39
  mohammad authored Dec 21, 2020
```
Add the option for fp32 residual connection (fp32 residual connection machinery still needs to be added)
```
  62632d39
19 Dec, 2020 16 commits
- Move args writer to the beginning of training · 6e9d5cb0
  mohammad authored Dec 12, 2020
  
  6e9d5cb0
- Fix TensorBoard writes · b81cad66
  mohammad authored Dec 12, 2020
  
  b81cad66
- Fix loss addition in TensorBoard · 5a304ede
  mshoeybi authored Dec 12, 2020
  
  5a304ede
- Some bugfixes · 29a69547
  mshoeybi authored Dec 11, 2020
  
  29a69547
- Address Jared's comments · 56243e19
  mshoeybi authored Dec 11, 2020
  
  56243e19
- Fix some bugs, add exit-duration capability · a31833ce
  mshoeybi authored Dec 11, 2020
  
  a31833ce
- Change lr-warmup-percent to lr-warmup-fraction · 9321d5c6
  Jared Casper authored Dec 10, 2020
  
  9321d5c6
- Work batch-size name changes into task code · 3afcba6e
  Jared Casper authored Dec 09, 2020
  
  3afcba6e
- Add pipelining to GLUE and RACE tasks · caa9dca5
  Jared Casper authored Nov 30, 2020
  
  caa9dca5
- Better memory tracking across pipeline-parallel ranks · 3574b8e6
  Deepak Narayanan authored Dec 06, 2020
  
  3574b8e6
- Sample based learning rate computation · 22ab91bb
  mohammad authored Dec 08, 2020
  
  22ab91bb
- Minor refactoring · c30ba0f7
  mohammad authored Dec 08, 2020
  
  c30ba0f7
- Add constant num micro-batches calculator · feecd5d9
  mohammad authored Dec 07, 2020
  
  feecd5d9
- Add micro-batch size calculator · 6ea23928
  mohammad authored Dec 06, 2020
  
  6ea23928
- Rename --batch-size to --micro-batch-size and drop in-minibatch from... · 9019bbf4
  mohammad authored Dec 06, 2020
```
Rename --batch-size to --micro-batch-size and drop in-minibatch from --num-micro-batches-in-minibatch
```
  9019bbf4
- Make an eval iteration the same number of samples as a training iteration · a84a5fa0
  Jared Casper authored Dec 03, 2020
  
  a84a5fa0
02 Dec, 2020 1 commit
- addrressed jareds comments · cebd3b8b
  mohammad authored Dec 02, 2020
  
  cebd3b8b
30 Nov, 2020 1 commit
- refactored learning rate scheduler so addition of variable batch size is easier · ff12df6b
  mohammad authored Nov 29, 2020
  
  ff12df6b
28 Nov, 2020 1 commit
- added consumed tokens to checkpoints and some refactoring · f0a445fa
  mohammad authored Nov 27, 2020
  
  f0a445fa
26 Nov, 2020 1 commit
- simplified sampler · 4311b695
  mohammad authored Nov 25, 2020
  
  4311b695
12 Nov, 2020 6 commits
- Make sure dataloader state is the same after checkpoint is loaded · cd4822f1
  Deepak Narayanan authored Nov 12, 2020
  
  cd4822f1
- Move division of loss tensor by number of microbatches to training.py · c671de3e
  Deepak Narayanan authored Nov 12, 2020
  
  c671de3e
- Refactor code according to Jared's comments: move pipelining and... · 1979c242
  Deepak Narayanan authored Nov 12, 2020
```
Refactor code according to Jared's comments: move pipelining and non-pipelining training loops into separate methods

Also, use mpu.get_*_model_parallel_size() instead of args.*_model_parallel_size
```
  1979c242
- Allocate tensor in `communicate()` method directly on GPU (instead of... · 9ff6f473
  mshoeybi authored Nov 11, 2020
```
Allocate tensor in `communicate()` method directly on GPU (instead of allocating on CPU and then moving to GPU)
```
  9ff6f473
- Remove timing labels that don't make sense · fdcaeba0
  Deepak Narayanan authored Nov 10, 2020
  
  fdcaeba0
- Better communication timing · 7d367b1d
  Deepak Narayanan authored Nov 10, 2020
  
  7d367b1d