Commits · 5fb0ff9ae3c024d9c60906f0e1707610df491e72 · OpenDAS / Megatron-LM

16 Feb, 2022 1 commit

Sangkug Lym authored Feb 13, 2022

remove redundant linear layer class definition

add fuse_gradient_accumulation attribute to weights for simple targetting

reflect feedback and clean up the codes

arg change

83b1e42f

12 Jan, 2022 1 commit
- Phase1 merge: vit optimizations + dataset enhancements + scaled_softmax kernel · 7a77abd9
  Vijay Korthikanti authored Jan 12, 2022
  
  7a77abd9
11 Jan, 2022 1 commit
- partially cleaned · 806422e5
  Lawrence McAfee authored Jan 11, 2022
  
  806422e5
17 Dec, 2021 1 commit
- pipeline_fixes · 17843605
  Vijay Korthikanti authored Dec 17, 2021
  
  17843605
19 Aug, 2021 1 commit
- removed contiguous buffer for checkpointed activation · e923ec52
  mshoeybi authored Aug 19, 2021
  
  e923ec52
30 Jul, 2021 1 commit

Support for pipeline parallelism in T5 model · 46c74b4c

Deepak Narayanan authored Jun 22, 2021

- Accumulate encoder hidden state gradient to handle skip connection
- Correctly compute the number of layers in encoder / decoder for T5 model
- Ensure e weights are initialized the same way in embeddings
- Synchronize embedding gradients across encoder and decoder for T5 model
- Support for checkpoint loading and saving

46c74b4c

09 Feb, 2021 2 commits

Break up tensors sent between pipeline stages into smaller chunks that can be all-gathered · 27fc4689
Deepak Narayanan authored Jan 20, 2021

27fc4689

Interleaved pipeline execution and code refactoring · dd889062

Deepak Narayanan authored Dec 12, 2020

- Split a model's computation into multiple virtual stages as needed,
and schedule communication correctly between these virtual stages
- Move schedule code into `schedules.py` and communication code into
`p2p_communication.py`
- Use hyphens instead of spaces in all time logging for consistency
- Factor out code in megatron/training.py into helper functions
- Refactor evaluate() function: make it use forward_backward_schedule
functions

dd889062

25 Jan, 2021 1 commit
- Use set_tensor_model_parallel_attributes in bert_model as well. · c4c68dce
  Jared Casper authored Jan 25, 2021
  
  c4c68dce
04 Jan, 2021 1 commit
- Use batched send and recv instead of torch.distributed.ring_exchange() · d899988e
  Deepak Narayanan authored Jan 04, 2021
  
  d899988e
25 Dec, 2020 2 commits
- grads is removed from mpu · dfd8ed47
  mohammad authored Dec 25, 2020
  
  dfd8ed47
- fp32 is also working · 2eaa3ccc
  mohammad authored Dec 24, 2020
  
  2eaa3ccc
22 Dec, 2020 1 commit
- Add the option for fp32 residual connection (fp32 residual connection... · 62632d39
  mohammad authored Dec 21, 2020
```
Add the option for fp32 residual connection (fp32 residual connection machinery still needs to be added)
```
  62632d39
19 Dec, 2020 1 commit
- Initial implementation of pipelined text generation · 5c45db4a
  Jared Casper authored Dec 09, 2020
  
  5c45db4a
12 Nov, 2020 2 commits

Intra-layer MP -> Tensor MP, Inter-layer MP -> Pipeline MP · 52a5f2f2
Deepak Narayanan authored Oct 20, 2020

52a5f2f2

Pipeline parallelism implementation with periodic full-pipeline syncs · 7abd3e90

Deepak Narayanan authored Aug 29, 2020

Also includes following changes for inter-layer model-parallel implementation:
- Refactoring of model implementations
- Training loop changes to support inter-layer communication using `ring_exchange`
- New groups for inter-layer communication
- Checkpoint changes
- Command line arguments

7abd3e90

02 Sep, 2020 1 commit
- added splitting checkpointed activations · d6c4248b
  mohammad authored Sep 02, 2020
  
  d6c4248b
06 Aug, 2020 1 commit
- Implementing lazy parallel initialization · 5c04ceb3
  Boris Fomitchev authored Aug 05, 2020
```
Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
```
  5c04ceb3
03 Aug, 2020 1 commit
- added gpu initialization and option to avoid master values · 0403b808
  mohammad authored Aug 03, 2020
  
  0403b808
26 Jul, 2020 2 commits
- Add additional assertion on Indexer to test correctness, and limit verbosity in other classes · eaa5d877
  Neel Kant authored Jul 25, 2020
  
  eaa5d877
- Complete fix for APEX absence in NeMo · aa0ee72e
  Boris Fomitchev authored Jul 25, 2020
  
  aa0ee72e
16 Apr, 2020 1 commit
- changed licence 2019 to 2020 · 463d1257
  Mohammad authored Apr 16, 2020
  
  463d1257
29 Oct, 2019 1 commit
- Major refactoring, combining gpt2 and bert · 73af1290
  Mohammad Shoeybi authored Oct 29, 2019
  
  73af1290
08 Oct, 2019 1 commit
- created megatron package · b886b7bb
  Mohammad Shoeybi authored Oct 08, 2019
  
  b886b7bb
30 Jul, 2019 1 commit

large update including model parallelism and gpt2 · abe36e2e

Raul Puri authored Jul 29, 2019


Co-authored-by: shoeybi <shoeybim@gmail.com>
Co-authored-by: raulpuric <raulpuric@berkeley.edu>
Co-authored-by: jaredcasper <jaredcasper@gmail.com>
Co-authored-by: mpatwary <mostofa.patwary@gmail.com>
Co-authored-by: plegresl <plegresl@gmail.com>

abe36e2e