Commits · b93bef00d4fdcd8adc9a276e9834c7714aa559c2 · OpenDAS / Megatron-LM

01 Feb, 2022 2 commits
- comments, cleanup. · b93bef00
  Lawrence McAfee authored Feb 01, 2022
  
  b93bef00
- found root source of t5 issue (fast layer norm) · bea16fa3
  Lawrence McAfee authored Feb 01, 2022
  
  bea16fa3
31 Jan, 2022 1 commit
- added 'no-op' layer, to replace transformer layer when num_layers == 0. · 1fa6990c
  Lawrence McAfee authored Jan 31, 2022
  
  1fa6990c
29 Jan, 2022 1 commit
- narrowed issue to pipeline rank 0, virtual pipeline rank >= 1 · 5bc9f889
  Lawrence McAfee authored Jan 28, 2022
  
  5bc9f889
26 Jan, 2022 1 commit
- further clarified viewless tensor comment in transformer.py · d16e2a24
  Lawrence McAfee authored Jan 26, 2022
  
  d16e2a24
25 Jan, 2022 1 commit
- limit 'make_viewless_tensor()' to case of micro_batch_size == 1; added comment · 24369dd6
  Lawrence McAfee authored Jan 25, 2022
  
  24369dd6
24 Jan, 2022 1 commit
- fixed args.virtual_pipeline_model_parallel_size · c2b7d0b3
  Lawrence McAfee authored Jan 24, 2022
  
  c2b7d0b3
11 Jan, 2022 2 commits
- added comments · a1fe4805
  Lawrence McAfee authored Jan 11, 2022
  
  a1fe4805
- partially cleaned · 806422e5
  Lawrence McAfee authored Jan 11, 2022
  
  806422e5
10 Jan, 2022 1 commit
- loss matches; memory savings for multi-node (tested n3, n16) · 270d6412
  Lawrence McAfee authored Jan 10, 2022
  
  270d6412
08 Jan, 2022 1 commit
- more iterating on 'viewless tensor' methods · ed0c8714
  Lawrence McAfee authored Jan 07, 2022
  
  ed0c8714
07 Jan, 2022 1 commit
- debugging make_standalone_tensor(), safely_set_tensor_data_attr() · 5422d23a
  Lawrence McAfee authored Jan 07, 2022
  
  5422d23a
17 Dec, 2021 2 commits
- minor fixes · f2bf5a56
  Vijay Korthikanti authored Dec 17, 2021
  
  f2bf5a56
- pipeline_fixes · 17843605
  Vijay Korthikanti authored Dec 17, 2021
  
  17843605
22 Nov, 2021 1 commit
- removed distribute_checkpointed_activations_helper() · 5993f04b
  Lawrence McAfee authored Nov 22, 2021
  
  5993f04b
11 Nov, 2021 1 commit

Sangkug Lym authored Oct 08, 2021

fix the guard to fall back to the baseline fused layer norm kernel

Persisten ln: move the guard for supported hidden sizes to layer norm module

a2fdcdf0

29 Oct, 2021 1 commit
- made model stateless with respect to inference · efc750b6
  mshoeybi authored Oct 29, 2021
  
  efc750b6
12 Oct, 2021 1 commit
- fixed a bug · b10ca211
  mshoeybi authored Oct 12, 2021
  
  b10ca211
08 Oct, 2021 1 commit
- pipelining works · a7539b0f
  mshoeybi authored Oct 07, 2021
  
  a7539b0f
06 Oct, 2021 1 commit
- jit function argument type specification · 838af7d9
  Sangkug Lym authored Oct 05, 2021
  
  838af7d9
01 Oct, 2021 1 commit
- simple pipelining works · 8f160844
  mshoeybi authored Oct 01, 2021
  
  8f160844
29 Sep, 2021 1 commit
- added multi-batch inference · 390ddef8
  mshoeybi authored Sep 29, 2021
  
  390ddef8
27 Sep, 2021 1 commit
- refactoring, tested and working · 25f9c3f0
  mshoeybi authored Sep 26, 2021
  
  25f9c3f0
20 Sep, 2021 1 commit
- Inference context optimization · 8b9fe87b
  Mohammad Shoeybi authored Sep 20, 2021
  
  8b9fe87b
27 Aug, 2021 1 commit
- Use nvfuser at pytorch >= 1.10 · dba2506d
  Sangkug Lym authored Aug 27, 2021
  
  dba2506d
21 Aug, 2021 4 commits
- some cleanup · c61dc22f
  mshoeybi authored Aug 21, 2021
  
  c61dc22f
- added for pp · b8940b96
  mshoeybi authored Aug 21, 2021
  
  b8940b96
- resolved conflicts · 7f2cc3a4
  mshoeybi authored Aug 20, 2021
  
  7f2cc3a4
- resolved conflicts · 30b92cf5
  mshoeybi authored Aug 20, 2021
  
  30b92cf5
19 Aug, 2021 2 commits
- Checkpoint a set number of invidividual Transformer layers · c1e0689d
  slym authored Aug 10, 2021
```
consider the case of pipeline-model prallelism

clean up arugments

argument naming cleanup

update readme and examples
```
  c1e0689d
- removed contiguous buffer for checkpointed activation · e923ec52
  mshoeybi authored Aug 19, 2021
  
  e923ec52
30 Jul, 2021 1 commit

Support for pipeline parallelism in T5 model · 46c74b4c

Deepak Narayanan authored Jun 22, 2021

- Accumulate encoder hidden state gradient to handle skip connection
- Correctly compute the number of layers in encoder / decoder for T5 model
- Ensure e weights are initialized the same way in embeddings
- Synchronize embedding gradients across encoder and decoder for T5 model
- Support for checkpoint loading and saving

46c74b4c

02 Jul, 2021 1 commit
- fix typo · e515f026
  hwijeen authored Jul 02, 2021
  
  e515f026
02 Apr, 2021 1 commit
- Addressed MR comments, mostly adding comments to code. · e270f68a
  Jared Casper authored Apr 02, 2021
  
  e270f68a
24 Mar, 2021 1 commit
- pipeline code simplification · 3b91262e
  Vijay Korthikanti authored Mar 02, 2021
  
  3b91262e
19 Mar, 2021 1 commit
- Bfloat fused softmax + fused layer norm · 0fa7175f
  Mohammad Shoeybi authored Mar 19, 2021
  
  0fa7175f
08 Mar, 2021 1 commit
- Bfloat with fp32 grad acc · b4bc51b1
  Mohammad Shoeybi authored Mar 08, 2021
  
  b4bc51b1
13 Feb, 2021 1 commit
- More comments and some cleanup (e.g., better variable names) · 5489bda9
  Deepak Narayanan authored Feb 13, 2021
  
  5489bda9
09 Feb, 2021 1 commit

Interleaved pipeline execution and code refactoring · dd889062

Deepak Narayanan authored Dec 12, 2020

- Split a model's computation into multiple virtual stages as needed,
and schedule communication correctly between these virtual stages
- Move schedule code into `schedules.py` and communication code into
`p2p_communication.py`
- Use hyphens instead of spaces in all time logging for consistency
- Factor out code in megatron/training.py into helper functions
- Refactor evaluate() function: make it use forward_backward_schedule
functions

dd889062

29 Jan, 2021 1 commit
- WIP: main_retriver_merge · 17d897e0
  Mostofa Patwary authored Jan 29, 2021
  
  17d897e0