Commits · d520d24fdf2eb7c2e76aa5ca3020cbfd07c42910 · OpenDAS / Megatron-LM

"vscode:/vscode.git/clone" did not exist on "ea6275dfbc263216c79fcde4abc56363400bde23"

09 Dec, 2024 1 commit
- 升级megatron_v0.10[B · 3bec6514
  xingjinliang authored Dec 09, 2024
  
  3bec6514
27 Mar, 2024 1 commit
- v1 · 051f58f1
  liangjing authored Mar 27, 2024
  
  051f58f1
23 Mar, 2023 1 commit
- Move pipeline parallel functionality into core with associated changes. · 3c92fa93
  Jared Casper authored Mar 23, 2023
  
  3c92fa93
24 Sep, 2022 1 commit
- Alias core.parallel_state as mpu and use it throughout code. RIP mpu. · 5942af97
  Jared Casper authored Sep 23, 2022
  
  5942af97
23 Sep, 2022 1 commit

Jared Casper authored Sep 23, 2022

The LICENSE file says everything is 3-clause BSD, which is what we want,
but at some point the Apache license was added to the top of some files
and that proliferated. This commit removes the Apache license from any
files that we own the copyright to.

Also updates the copyright year and removes the unnessary coding=utf-8
line.

fabad461

10 Aug, 2022 1 commit
- Timing levels · 77efcccb
  Mohammad Shoeybi authored Aug 10, 2022
  
  77efcccb
30 Jul, 2021 1 commit

Support for pipeline parallelism in T5 model · 46c74b4c

Deepak Narayanan authored Jun 22, 2021

- Accumulate encoder hidden state gradient to handle skip connection
- Correctly compute the number of layers in encoder / decoder for T5 model
- Ensure e weights are initialized the same way in embeddings
- Synchronize embedding gradients across encoder and decoder for T5 model
- Support for checkpoint loading and saving

46c74b4c

16 Apr, 2021 1 commit
- Integrate code from t5_main into existing code. · 48a5e0da
  Jared Casper authored Apr 15, 2021
  
  48a5e0da
24 Mar, 2021 1 commit
- pipeline code simplification · 3b91262e
  Vijay Korthikanti authored Mar 02, 2021
  
  3b91262e
09 Feb, 2021 1 commit

Interleaved pipeline execution and code refactoring · dd889062

Deepak Narayanan authored Dec 12, 2020

- Split a model's computation into multiple virtual stages as needed,
and schedule communication correctly between these virtual stages
- Move schedule code into `schedules.py` and communication code into
`p2p_communication.py`
- Use hyphens instead of spaces in all time logging for consistency
- Factor out code in megatron/training.py into helper functions
- Refactor evaluate() function: make it use forward_backward_schedule
functions

dd889062

25 Jan, 2021 1 commit
- Adding option to remove the binary head for BERT · dcff1acd
  Mohammad Shoeybi authored Jan 25, 2021
  
  dcff1acd
19 Dec, 2020 1 commit
- Fix TensorBoard writes · b81cad66
  mohammad authored Dec 12, 2020
  
  b81cad66
12 Nov, 2020 5 commits
- Move division of loss tensor by number of microbatches to training.py · c671de3e
  Deepak Narayanan authored Nov 12, 2020
  
  c671de3e
- Refactor code according to Jared's comments: move pipelining and... · 1979c242
  Deepak Narayanan authored Nov 12, 2020
```
Refactor code according to Jared's comments: move pipelining and non-pipelining training loops into separate methods

Also, use mpu.get_*_model_parallel_size() instead of args.*_model_parallel_size
```
  1979c242
- Divide gradient by number of microbatches in minibatch · 3d7194c4
  Deepak Narayanan authored Nov 03, 2020
  
  3d7194c4
- Intra-layer MP -> Tensor MP, Inter-layer MP -> Pipeline MP · 52a5f2f2
  Deepak Narayanan authored Oct 20, 2020
  
  52a5f2f2
- Pipeline parallelism implementation with periodic full-pipeline syncs · 7abd3e90
  Deepak Narayanan authored Aug 29, 2020
```
Also includes following changes for inter-layer model-parallel implementation:
- Refactoring of model implementations
- Training loop changes to support inter-layer communication using `ring_exchange`
- New groups for inter-layer communication
- Checkpoint changes
- Command line arguments
```
  7abd3e90
07 Jul, 2020 1 commit
- Addressed Jared's comments · 8d7f508a
  Neel Kant authored Jul 06, 2020
  
  8d7f508a
24 Jun, 2020 1 commit
- More ict_merge changes and interactive testing · 3354081f
  Neel Kant authored Jun 23, 2020
  
  3354081f
05 Jun, 2020 2 commits
- addressed jareds comments · 197c132e
  mohammad authored Jun 05, 2020
  
  197c132e
- added fp16 lm cross entropy to bert · 78022005
  mohammad authored Jun 05, 2020
  
  78022005
03 May, 2020 1 commit
- Move get_train_val... to dataset_utils · 16a64c41
  Neel Kant authored May 03, 2020
  
  16a64c41
16 Apr, 2020 1 commit
- changed licence 2019 to 2020 · 463d1257
  Mohammad authored Apr 16, 2020
  
  463d1257
13 Apr, 2020 1 commit
- data loading for BERT and GPT cleaned up · 2e38461b
  Mohammad authored Apr 12, 2020
  
  2e38461b
31 Mar, 2020 1 commit
- arguments in the model refactored · deffcb6a
  Mohammad authored Mar 31, 2020
  
  deffcb6a
30 Mar, 2020 2 commits
- removed building tokenizer from bert dataset · dedb2ef7
  Mohammad authored Mar 30, 2020
  
  dedb2ef7
- both bert and gpt2 tested and working · 1788c910
  Mohammad authored Mar 29, 2020
  
  1788c910
29 Mar, 2020 3 commits
- refactored pretrain-bert · 3366a5b0
  Mohammad authored Mar 28, 2020
  
  3366a5b0
- refactored training · 27e14f82
  Mohammad authored Mar 28, 2020
  
  27e14f82
- utils partially refactored and tested · 3f58649b
  Mohammad authored Mar 28, 2020
  
  3f58649b
28 Mar, 2020 2 commits
- working on utils · 5050203f
  Mohammad authored Mar 28, 2020
  
  5050203f
- arguments.py refactored · 94e2ca57
  Mohammad authored Mar 27, 2020
  
  94e2ca57
27 Mar, 2020 1 commit
- working on bert · 599e959a
  Mohammad authored Mar 26, 2020
  
  599e959a
17 Mar, 2020 2 commits
- moved albert to bert · 09e05c6f
  Mohammad Shoeybi authored Mar 16, 2020
  
  09e05c6f
- moved pretrain albert to pretrain bert · 3e4e1ab2
  Mohammad Shoeybi authored Mar 16, 2020
  
  3e4e1ab2
14 Jan, 2020 1 commit
- added query-key layer scaling and softmax fp32 option · 691747b1
  Mohammad Shoeybi authored Jan 13, 2020
  
  691747b1
21 Nov, 2019 1 commit
- fixed padding issue · f6a6811f
  Mohammad Shoeybi authored Nov 20, 2019
  
  f6a6811f
19 Nov, 2019 1 commit
- Plumbing for new albert dataset, including mods to arguments for data loaders. · 1a1cfcff
  Jared Casper authored Nov 18, 2019
  
  1a1cfcff
31 Oct, 2019 1 commit
- refacotred for code reuse · 34be7dd3
  Mohammad Shoeybi authored Oct 30, 2019
  
  34be7dd3
29 Oct, 2019 1 commit
- Major refactoring, combining gpt2 and bert · 73af1290
  Mohammad Shoeybi authored Oct 29, 2019
  
  73af1290