Commits · 5b7736ca57d9d35dc4ffd11078df489960bbbfb1 · OpenDAS / Megatron-LM

03 Apr, 2023 1 commit
- addressing more comments · 5b7736ca
  Mostofa Patwary authored Apr 03, 2023
  
  5b7736ca
31 Mar, 2023 1 commit
- merging rope to main · 1e0e555c
  Mostofa Patwary authored Mar 31, 2023
  
  1e0e555c
22 Feb, 2023 1 commit
- Retro · 17a6044d
  Lawrence McAfee authored Feb 21, 2023
  
  17a6044d
24 Sep, 2022 1 commit
- Alias core.parallel_state as mpu and use it throughout code. RIP mpu. · 5942af97
  Jared Casper authored Sep 23, 2022
  
  5942af97
23 Sep, 2022 2 commits

Move layers from mpu to core.tensor_parallel. · c2ea914f
Jared Casper authored Sep 23, 2022

c2ea914f

Jared Casper authored Sep 23, 2022

The LICENSE file says everything is 3-clause BSD, which is what we want,
but at some point the Apache license was added to the top of some files
and that proliferated. This commit removes the Apache license from any
files that we own the copyright to.

Also updates the copyright year and removes the unnessary coding=utf-8
line.

fabad461

21 Jul, 2022 1 commit
- Remove deprecated destination argument to state_dict functions and make all... · 928a200c
  Jared Casper authored Jul 21, 2022
```
Remove deprecated destination argument to state_dict functions and make all arguments keyword to avoid warnings.
```
  928a200c
24 May, 2022 2 commits
- preallocating global buffer to avoid memory fragmentation · 9dc3c42a
  Vijay Korthikanti authored May 24, 2022
  
  9dc3c42a
- fix for sequence parallelism in bert pooling · 8474e6e5
  Vijay Korthikanti authored May 24, 2022
  
  8474e6e5
20 May, 2022 2 commits
- avoiding sequence parallelism on the pooler · 3f91f09b
  Vijay Korthikanti authored May 20, 2022
  
  3f91f09b
- bert regression fixes · 6aaafee6
  Vijay Korthikanti authored May 20, 2022
  
  6aaafee6
17 May, 2022 1 commit
- address review comments · 356eb36a
  Vijay Korthikanti authored May 16, 2022
  
  356eb36a
28 Apr, 2022 1 commit
- address review comments · cfd2e216
  Vijay Korthikanti authored Apr 28, 2022
  
  cfd2e216
30 Mar, 2022 1 commit
- sequence parallelism for embedding dropout and last linear layer + memory optimizations · eec218d8
  Vijay Korthikanti authored Mar 30, 2022
  
  eec218d8
15 Mar, 2022 2 commits
- fixed. · 65e6bc32
  Lawrence McAfee authored Mar 15, 2022
  
  65e6bc32
- debugging. · 4c598f9d
  Lawrence McAfee authored Mar 15, 2022
  
  4c598f9d
07 Mar, 2022 2 commits
- fixes to main merge · 269f28f7
  Vijay Korthikanti authored Mar 07, 2022
  
  269f28f7
- refactor to help merge with main · 0d77c0e9
  Vijay Korthikanti authored Mar 07, 2022
  
  0d77c0e9
02 Mar, 2022 1 commit
- layernorm grad sync + name chnages · c0f10643
  Vijay Korthikanti authored Mar 02, 2022
  
  c0f10643
19 Feb, 2022 1 commit
- tensor model parallelism memory optmization · 5d4689c4
  Vijay Korthikanti authored Feb 18, 2022
  
  5d4689c4
16 Feb, 2022 1 commit

gradient accumulation fusion · 83b1e42f

Sangkug Lym authored Feb 13, 2022

remove redundant linear layer class definition

add fuse_gradient_accumulation attribute to weights for simple targetting

reflect feedback and clean up the codes

arg change

83b1e42f

17 Dec, 2021 1 commit
- pipeline_fixes · 17843605
  Vijay Korthikanti authored Dec 17, 2021
  
  17843605
01 Oct, 2021 1 commit
- Fix inference after T5 pipeline merge · f2c35bb0
  Jared Casper authored Oct 01, 2021
```
Adds some backward compatibility code so old inference code still works.
```
  f2c35bb0
29 Sep, 2021 1 commit
- added multi-batch inference · 390ddef8
  mshoeybi authored Sep 29, 2021
  
  390ddef8
20 Sep, 2021 1 commit
- Inference context optimization · 8b9fe87b
  Mohammad Shoeybi authored Sep 20, 2021
  
  8b9fe87b
02 Sep, 2021 1 commit
- Adding checkpoint_util and associted loader and saver. · 03d09af0
  Jared Casper authored Jun 30, 2021
  
  03d09af0
18 Aug, 2021 1 commit
- Re-add assert for no pipeline paralleism for T5 · 1f4966dc
  Jared Casper authored Aug 18, 2021
  
  1f4966dc
30 Jul, 2021 3 commits

Fix grad norm computation · 5c8238c3
Deepak Narayanan authored Jul 30, 2021

5c8238c3
Add assertion for now preventing usage of pipeline parallelism with T5 model · da1c96e9
Deepak Narayanan authored Jul 30, 2021

da1c96e9

Support for pipeline parallelism in T5 model · 46c74b4c

Deepak Narayanan authored Jun 22, 2021

- Accumulate encoder hidden state gradient to handle skip connection
- Correctly compute the number of layers in encoder / decoder for T5 model
- Ensure e weights are initialized the same way in embeddings
- Synchronize embedding gradients across encoder and decoder for T5 model
- Support for checkpoint loading and saving

46c74b4c

18 May, 2021 1 commit
- cleaning the code · 2eaf6c79
  Mostofa Patwary authored May 18, 2021
  
  2eaf6c79
12 May, 2021 1 commit
- DPR finetune and evaluation · 6d03d7af
  Mostofa Patwary authored May 11, 2021
  
  6d03d7af
11 May, 2021 1 commit
- DPR evaluation debugging · 220637f9
  Mostofa Patwary authored May 11, 2021
  
  220637f9
02 Apr, 2021 1 commit
- Addressed MR comments, mostly adding comments to code. · e270f68a
  Jared Casper authored Apr 02, 2021
  
  e270f68a
24 Mar, 2021 1 commit
- pipeline code simplification · 3b91262e
  Vijay Korthikanti authored Mar 02, 2021
  
  3b91262e
29 Jan, 2021 1 commit
- WIP: main_retriver_merge · 17d897e0
  Mostofa Patwary authored Jan 29, 2021
  
  17d897e0
22 Jan, 2021 1 commit
- attention_mask_func cleanup · ebf8b89e
  Vijay Korthikanti authored Jan 22, 2021
  
  ebf8b89e
13 Jan, 2021 1 commit
- Adressing more review comments · 4ae54b55
  Vijay Korthikanti authored Jan 12, 2021
  
  4ae54b55
12 Jan, 2021 2 commits
- Adress more review comments · d836d498
  Vijay Korthikanti authored Jan 12, 2021
  
  d836d498
- address review comments · 4b3519cb
  Vijay Korthikanti authored Jan 12, 2021
  
  4b3519cb