Commits · ee0848351d3a523aebfbbc9e449cf416416327c9 · OpenDAS / Megatron-LM

07 Apr, 2023 1 commit
- add needed helper functions · 3ce6a1c2
  Abhinav Khattar authored Apr 07, 2023
```
Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
```
  3ce6a1c2
03 Apr, 2023 3 commits
- Fix spacing · 4fcb2f45
  MaximumEntropy authored Apr 03, 2023
```
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
```
  4fcb2f45
- Undo parallel state changes · 96f4c5d2
  MaximumEntropy authored Apr 03, 2023
```
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
```
  96f4c5d2
- Initial commit for untied embeddings · 82c7ba57
  MaximumEntropy authored Dec 07, 2022
```
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
```
  82c7ba57
23 Mar, 2023 1 commit
- Move pipeline parallel functionality into core with associated changes. · 3c92fa93
  Jared Casper authored Mar 23, 2023
  
  3c92fa93
06 Oct, 2022 2 commits
- Adding some basic unit tests · b69e2195
  shanmugamr authored Oct 06, 2022
  
  b69e2195
- Setter for pipeline parallel split rank, remove print · 6defe188
  Eric Harper authored Oct 05, 2022
  
  6defe188
27 Sep, 2022 1 commit
- Small fixes. · 77753d0a
  Jared Casper authored Sep 27, 2022
  
  77753d0a
26 Sep, 2022 1 commit
- Check for pipeline_parallel > 2 when using interleaving. · e7e99721
  Jared Casper authored Sep 26, 2022
  
  e7e99721
24 Sep, 2022 3 commits
- Improve docstrings, destory global memory buffer. · e00a1ca1
  Jared Casper authored Sep 23, 2022
  
  e00a1ca1
- Move get_num_layers into transformer.py. · a94d0a6e
  Jared Casper authored Sep 23, 2022
  
  a94d0a6e
- Alias core.parallel_state as mpu and use it throughout code. RIP mpu. · 5942af97
  Jared Casper authored Sep 23, 2022
  
  5942af97
23 Sep, 2022 3 commits

Move layers from mpu to core.tensor_parallel. · c2ea914f
Jared Casper authored Sep 23, 2022

c2ea914f
Start Megatron-Core with vocab parallel cross entropy · 2e6a46e4
Jared Casper authored Sep 22, 2022
```
Also merged in some changed from apex
```
2e6a46e4

Jared Casper authored Sep 23, 2022

The LICENSE file says everything is 3-clause BSD, which is what we want,
but at some point the Apache license was added to the top of some files
and that proliferated. This commit removes the Apache license from any
files that we own the copyright to.

Also updates the copyright year and removes the unnessary coding=utf-8
line.

fabad461

26 May, 2022 1 commit
- bug fix for get_data_parallel_src_rank · 9ad1944d
  Vijay Korthikanti authored May 26, 2022
  
  9ad1944d
08 Feb, 2022 1 commit
- fixed t5 'get_num_layers()' · 2fadaa50
  Lawrence McAfee authored Feb 08, 2022
  
  2fadaa50
04 Feb, 2022 1 commit
- renamed argument; 'embed' -> 'embedding' · c04c4977
  Lawrence McAfee authored Feb 04, 2022
  
  c04c4977
01 Feb, 2022 1 commit
- comments, cleanup. · b93bef00
  Lawrence McAfee authored Feb 01, 2022
  
  b93bef00
31 Jan, 2022 1 commit
- working for t5 [ encoder embedding only ] · 3af6725d
  Lawrence McAfee authored Jan 31, 2022
  
  3af6725d
25 Jan, 2022 1 commit
- working with interleaving · 804ed2e6
  Lawrence McAfee authored Jan 24, 2022
  
  804ed2e6
24 Jan, 2022 3 commits
- added args.transformer_pipeline_model_parallel_size · a06af061
  Lawrence McAfee authored Jan 24, 2022
  
  a06af061
- fixed args.virtual_pipeline_model_parallel_size · c2b7d0b3
  Lawrence McAfee authored Jan 24, 2022
  
  c2b7d0b3
- working when no interleaving · 33dc8e9c
  Lawrence McAfee authored Jan 24, 2022
  
  33dc8e9c
12 Jan, 2022 1 commit
- Phase1 merge: vit optimizations + dataset enhancements + scaled_softmax kernel · 7a77abd9
  Vijay Korthikanti authored Jan 12, 2022
  
  7a77abd9
17 Dec, 2021 2 commits
- minor fixes · f2bf5a56
  Vijay Korthikanti authored Dec 17, 2021
  
  f2bf5a56
- pipeline_fixes · 17843605
  Vijay Korthikanti authored Dec 17, 2021
  
  17843605
05 Nov, 2021 1 commit
- t5_pipeline_fix · ea128da5
  Vijay Korthikanti authored Nov 05, 2021
  
  ea128da5
16 Aug, 2021 1 commit

Destroy more groups in `destroy_model_parallel` · eddf7593

eqy authored Aug 16, 2021

Some tests expect a clean model parallel slate and complain if a previous test left something behind; this change clears more variables that the tests complain about.

eddf7593

30 Jul, 2021 1 commit

Support for pipeline parallelism in T5 model · 46c74b4c

Deepak Narayanan authored Jun 22, 2021

- Accumulate encoder hidden state gradient to handle skip connection
- Correctly compute the number of layers in encoder / decoder for T5 model
- Ensure e weights are initialized the same way in embeddings
- Synchronize embedding gradients across encoder and decoder for T5 model
- Support for checkpoint loading and saving

46c74b4c

13 Feb, 2021 1 commit
- More comments and some cleanup (e.g., better variable names) · 5489bda9
  Deepak Narayanan authored Feb 13, 2021
  
  5489bda9
09 Feb, 2021 1 commit

Interleaved pipeline execution and code refactoring · dd889062

Deepak Narayanan authored Dec 12, 2020

- Split a model's computation into multiple virtual stages as needed,
and schedule communication correctly between these virtual stages
- Move schedule code into `schedules.py` and communication code into
`p2p_communication.py`
- Use hyphens instead of spaces in all time logging for consistency
- Factor out code in megatron/training.py into helper functions
- Refactor evaluate() function: make it use forward_backward_schedule
functions

dd889062

04 Jan, 2021 1 commit
- Use batched send and recv instead of torch.distributed.ring_exchange() · d899988e
  Deepak Narayanan authored Jan 04, 2021
  
  d899988e
19 Dec, 2020 2 commits
- Add comment describing _PIPELINE_GLOBAL_RANKS · 51315905
  Jared Casper authored Dec 10, 2020
  
  51315905
- Initial implementation of pipelined text generation · 5c45db4a
  Jared Casper authored Dec 09, 2020
  
  5c45db4a
12 Nov, 2020 2 commits

Intra-layer MP -> Tensor MP, Inter-layer MP -> Pipeline MP · 52a5f2f2
Deepak Narayanan authored Oct 20, 2020

52a5f2f2

Pipeline parallelism implementation with periodic full-pipeline syncs · 7abd3e90

Deepak Narayanan authored Aug 29, 2020

Also includes following changes for inter-layer model-parallel implementation:
- Refactoring of model implementations
- Training loop changes to support inter-layer communication using `ring_exchange`
- New groups for inter-layer communication
- Checkpoint changes
- Command line arguments

7abd3e90

06 Aug, 2020 1 commit
- Implementing lazy parallel initialization · 5c04ceb3
  Boris Fomitchev authored Aug 05, 2020
```
Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
```
  5c04ceb3
30 Jul, 2020 1 commit
- Changes for NeMo/lightning compatibility · 417c7f6a
  Boris Fomitchev authored Jul 30, 2020
```
Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
```
  417c7f6a
26 Jul, 2020 1 commit
- Add additional assertion on Indexer to test correctness, and limit verbosity in other classes · eaa5d877
  Neel Kant authored Jul 25, 2020
  
  eaa5d877