Commits · de1ae5b2f61ef8be8cc1f56b970cbbf9fadf4e38 · OpenDAS / Megatron-LM

04 Jan, 2022 1 commit
- added flag to control deallocation of pipeline outputs · de1ae5b2
  Lawrence McAfee authored Jan 04, 2022
  
  de1ae5b2
10 Dec, 2021 1 commit
- output tensor now allocated directly on gpu. · 85876092
  Lawrence McAfee authored Dec 10, 2021
  
  85876092
08 Dec, 2021 1 commit
- added docstrings to new methods · e33d63cc
  Lawrence McAfee authored Dec 08, 2021
  
  e33d63cc
07 Dec, 2021 2 commits
- working: interleaving; free_output_tensor() now handles none/tensor/list · 2f25c570
  Lawrence McAfee authored Dec 07, 2021
  
  2f25c570
- working for pure pipeline parallelism, w/ no interleaving · 86da10e9
  Lawrence McAfee authored Dec 07, 2021
  
  86da10e9
09 Aug, 2021 1 commit
- removed saving of input/output tensors for forward-only passes of pipeline schedules · 0865c4dc
  Lawrence McAfee authored Aug 09, 2021
  
  0865c4dc
30 Jul, 2021 2 commits

Comment in pretrain_t5.py to explain how pipeline parallelism is implemented for T5 model · 2babcaf6
Deepak Narayanan authored Jul 30, 2021

2babcaf6

Support for pipeline parallelism in T5 model · 46c74b4c

Deepak Narayanan authored Jun 22, 2021

- Accumulate encoder hidden state gradient to handle skip connection
- Correctly compute the number of layers in encoder / decoder for T5 model
- Ensure e weights are initialized the same way in embeddings
- Synchronize embedding gradients across encoder and decoder for T5 model
- Support for checkpoint loading and saving

46c74b4c

29 Jul, 2021 1 commit
- Use timers kwargs correctly to prevent bug with new p2p_communication API · e727de99
  Deepak Narayanan authored Jul 29, 2021
  
  e727de99
21 Jul, 2021 1 commit
- Use helper method in megatron/schedules.py as intended · 77bff386
  Deepak Narayanan authored Jul 21, 2021
  
  77bff386
24 Mar, 2021 3 commits
- Fixing text generation and zeroshot eval and addressing comments. · 64a83fb5
  Jared Casper authored Mar 24, 2021
  
  64a83fb5
- Tasks seems to be working. · b938ec51
  Jared Casper authored Mar 23, 2021
  
  b938ec51
- pipeline code simplification · 3b91262e
  Vijay Korthikanti authored Mar 02, 2021
  
  3b91262e
03 Mar, 2021 2 commits
- Remove pipeline stall timing (and avoid Explorer perf regressions) · ab09d819
  Deepak Narayanan authored Mar 03, 2021
  
  ab09d819
- Get PyTorch batched communication API working for interleaved schedule · 78cf869f
  Deepak Narayanan authored Feb 25, 2021
  
  78cf869f
26 Feb, 2021 2 commits
- Fix deadlock when get_num_microbatches() < pipeline-parallel size (don't try... · 08ddde0c
  Deepak Narayanan authored Feb 26, 2021
```
Fix deadlock when get_num_microbatches() < pipeline-parallel size (don't try to measure pipeline stall)
```
  08ddde0c
- Support Torch DDP for single-stage, num_microbatches() > 1 · 65eeb427
  Deepak Narayanan authored Feb 24, 2021
  
  65eeb427
13 Feb, 2021 1 commit
- More comments and some cleanup (e.g., better variable names) · 5489bda9
  Deepak Narayanan authored Feb 13, 2021
  
  5489bda9
10 Feb, 2021 2 commits
- Comments in megatron/schedules.py and address a few more comments · 626645c0
  Deepak Narayanan authored Feb 10, 2021
  
  626645c0
- Move unwrap to megatron/utils.py and clean up imports in megatron/schedules.py · cc691cbf
  Deepak Narayanan authored Feb 10, 2021
  
  cc691cbf
09 Feb, 2021 2 commits

Put in barriers in appropriate places to measure length of pipeline stall · 8e922d5b
Deepak Narayanan authored Jan 09, 2021

8e922d5b

Interleaved pipeline execution and code refactoring · dd889062

Deepak Narayanan authored Dec 12, 2020

- Split a model's computation into multiple virtual stages as needed,
and schedule communication correctly between these virtual stages
- Move schedule code into `schedules.py` and communication code into
`p2p_communication.py`
- Use hyphens instead of spaces in all time logging for consistency
- Factor out code in megatron/training.py into helper functions
- Refactor evaluate() function: make it use forward_backward_schedule
functions

dd889062