Commits · 83b1e42f3012da2f3674b118e83d9d33d9aba633 · OpenDAS / Megatron-LM

16 Feb, 2022 1 commit

Sangkug Lym authored Feb 13, 2022

remove redundant linear layer class definition

add fuse_gradient_accumulation attribute to weights for simple targetting

reflect feedback and clean up the codes

arg change

83b1e42f

15 Feb, 2022 3 commits
- adress review comments · 488f8c02
  Vijay Korthikanti authored Feb 11, 2022
  
  488f8c02
- minor fixes · 6f3bf9c0
  Vijay Korthikanti authored Feb 01, 2022
  
  6f3bf9c0
- vision third phase merge: pretraining methods + mit,swin backbones · 48c2a144
  Vijay Korthikanti authored Jan 31, 2022
  
  48c2a144
04 Feb, 2022 1 commit
- renamed argument; 'embed' -> 'embedding' · c04c4977
  Lawrence McAfee authored Feb 04, 2022
  
  c04c4977
01 Feb, 2022 1 commit
- comments, cleanup. · b93bef00
  Lawrence McAfee authored Feb 01, 2022
  
  b93bef00
28 Jan, 2022 1 commit
- changing class name AnnealingLR to OptimizerParamScheduler · 04ecc834
  Vijay Korthikanti authored Jan 28, 2022
  
  04ecc834
27 Jan, 2022 1 commit
- address review comments · 53931b8b
  Vijay Korthikanti authored Jan 27, 2022
  
  53931b8b
26 Jan, 2022 1 commit
- address review comments · 8acbbe25
  Vijay Korthikanti authored Jan 26, 2022
  
  8acbbe25
25 Jan, 2022 1 commit
- working with interleaving · 804ed2e6
  Lawrence McAfee authored Jan 24, 2022
  
  804ed2e6
24 Jan, 2022 3 commits
- added args.transformer_pipeline_model_parallel_size · a06af061
  Lawrence McAfee authored Jan 24, 2022
  
  a06af061
- fixed args.virtual_pipeline_model_parallel_size · c2b7d0b3
  Lawrence McAfee authored Jan 24, 2022
  
  c2b7d0b3
- working when no interleaving · 33dc8e9c
  Lawrence McAfee authored Jan 24, 2022
  
  33dc8e9c
22 Jan, 2022 1 commit
- second phase of vision code merge · d8c85650
  Vijay Korthikanti authored Jan 21, 2022
  
  d8c85650
19 Jan, 2022 1 commit
- flag for data parallel random initialization · a7ee77ea
  Vijay Korthikanti authored Jan 19, 2022
  
  a7ee77ea
13 Jan, 2022 1 commit
- removed argument flag; also, converted torch.FloatTenstor -> torch.empty in previous commit · 18846a0a
  Lawrence McAfee authored Jan 13, 2022
  
  18846a0a
12 Jan, 2022 1 commit
- Phase1 merge: vit optimizations + dataset enhancements + scaled_softmax kernel · 7a77abd9
  Vijay Korthikanti authored Jan 12, 2022
  
  7a77abd9
04 Jan, 2022 1 commit
- added flag to control deallocation of pipeline outputs · de1ae5b2
  Lawrence McAfee authored Jan 04, 2022
  
  de1ae5b2
24 Nov, 2021 1 commit
- Add support for signal-based dynamic checkpointing · dbe6c728
  Szymon Migacz authored Nov 23, 2021
  
  dbe6c728
23 Nov, 2021 1 commit
- removed assert that prevented virtual pipelinining to be used with distributed checkpointing · 1b28a51b
  Lawrence McAfee authored Nov 23, 2021
  
  1b28a51b
22 Nov, 2021 1 commit
- assert minimum pytorch version for distributed checkpoint activations · be0da070
  Lawrence McAfee authored Nov 22, 2021
  
  be0da070
11 Nov, 2021 1 commit

persistent fused layer norm · a2fdcdf0

Sangkug Lym authored Oct 08, 2021

fix the guard to fall back to the baseline fused layer norm kernel

Persisten ln: move the guard for supported hidden sizes to layer norm module

a2fdcdf0

10 Oct, 2021 1 commit
- tested and working · 8c119d80
  mshoeybi authored Oct 10, 2021
  
  8c119d80
02 Sep, 2021 3 commits
- reflect feedback · 3f652469
  slym authored Sep 02, 2021
  
  3f652469
- minor changes · 16c90445
  slym authored Sep 02, 2021
  
  16c90445
- t # This is a combination of 2 commits. · cf7efd4f
  Sangkug Lym authored Aug 30, 2021
```
allreduce overlap with wgrad gemm

change custom delay to dummy add
```
  cf7efd4f
23 Aug, 2021 1 commit
- tested · cb5e611d
  mshoeybi authored Aug 22, 2021
  
  cb5e611d
21 Aug, 2021 2 commits
- some cleanup · c61dc22f
  mshoeybi authored Aug 21, 2021
  
  c61dc22f
- added for pp · b8940b96
  mshoeybi authored Aug 21, 2021
  
  b8940b96
19 Aug, 2021 5 commits
- update readme and arguement definition · 99f47676
  slym authored Aug 19, 2021
  
  99f47676
- Checkpoint a set number of invidividual Transformer layers · c1e0689d
  slym authored Aug 10, 2021
```
consider the case of pipeline-model prallelism

clean up arugments

argument naming cleanup

update readme and examples
```
  c1e0689d
- onlly support pp=1 · 7b585440
  mshoeybi authored Aug 19, 2021
  
  7b585440
- pushed a fix for torch ddp · 6a0ef5b1
  mshoeybi authored Aug 18, 2021
  
  6a0ef5b1
- made contiguous buffer in local ddp default · e8fb052f
  mshoeybi authored Aug 18, 2021
  
  e8fb052f
17 Aug, 2021 1 commit
- updated argument name · 52b2296b
  Lawrence McAfee authored Aug 17, 2021
  
  52b2296b
16 Aug, 2021 2 commits
- added evaluation logic; finalized flag levels · 9dec5374
  Lawrence McAfee authored Aug 16, 2021
  
  9dec5374
- added flag/logic for emptying unused memory · 3bd2e973
  Lawrence McAfee authored Aug 16, 2021
  
  3bd2e973
11 Aug, 2021 1 commit
- added asserts/checks for local ddp and params_have_main_grad · f597f02e
  Lawrence McAfee authored Aug 11, 2021
  
  f597f02e
30 Jul, 2021 1 commit

Support for pipeline parallelism in T5 model · 46c74b4c

Deepak Narayanan authored Jun 22, 2021

- Accumulate encoder hidden state gradient to handle skip connection
- Correctly compute the number of layers in encoder / decoder for T5 model
- Ensure e weights are initialized the same way in embeddings
- Synchronize embedding gradients across encoder and decoder for T5 model
- Support for checkpoint loading and saving

46c74b4c

13 Jul, 2021 1 commit
- fixed help message; removed redundant destination variable · bc5a8e20
  Lawrence McAfee authored Jul 13, 2021
  
  bc5a8e20