- 07 Apr, 2023 1 commit
-
-
Abhinav Khattar authored
Signed-off-by:Abhinav Khattar <aklife97@gmail.com>
-
- 03 Apr, 2023 3 commits
-
-
MaximumEntropy authored
Signed-off-by:MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
-
MaximumEntropy authored
Signed-off-by:MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
-
MaximumEntropy authored
Signed-off-by:MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
-
- 23 Mar, 2023 1 commit
-
-
Jared Casper authored
-
- 06 Oct, 2022 2 commits
-
-
shanmugamr authored
-
Eric Harper authored
-
- 27 Sep, 2022 1 commit
-
-
Jared Casper authored
-
- 26 Sep, 2022 1 commit
-
-
Jared Casper authored
-
- 24 Sep, 2022 3 commits
-
-
Jared Casper authored
-
Jared Casper authored
-
Jared Casper authored
-
- 23 Sep, 2022 3 commits
-
-
Jared Casper authored
-
Jared Casper authored
Also merged in some changed from apex
-
Jared Casper authored
The LICENSE file says everything is 3-clause BSD, which is what we want, but at some point the Apache license was added to the top of some files and that proliferated. This commit removes the Apache license from any files that we own the copyright to. Also updates the copyright year and removes the unnessary coding=utf-8 line.
-
- 26 May, 2022 1 commit
-
-
Vijay Korthikanti authored
-
- 08 Feb, 2022 1 commit
-
-
Lawrence McAfee authored
-
- 04 Feb, 2022 1 commit
-
-
Lawrence McAfee authored
-
- 01 Feb, 2022 1 commit
-
-
Lawrence McAfee authored
-
- 31 Jan, 2022 1 commit
-
-
Lawrence McAfee authored
-
- 25 Jan, 2022 1 commit
-
-
Lawrence McAfee authored
-
- 24 Jan, 2022 3 commits
-
-
Lawrence McAfee authored
-
Lawrence McAfee authored
-
Lawrence McAfee authored
-
- 12 Jan, 2022 1 commit
-
-
Vijay Korthikanti authored
-
- 17 Dec, 2021 2 commits
-
-
Vijay Korthikanti authored
-
Vijay Korthikanti authored
-
- 05 Nov, 2021 1 commit
-
-
Vijay Korthikanti authored
-
- 16 Aug, 2021 1 commit
-
-
eqy authored
Some tests expect a clean model parallel slate and complain if a previous test left something behind; this change clears more variables that the tests complain about.
-
- 30 Jul, 2021 1 commit
-
-
Deepak Narayanan authored
- Accumulate encoder hidden state gradient to handle skip connection - Correctly compute the number of layers in encoder / decoder for T5 model - Ensure e weights are initialized the same way in embeddings - Synchronize embedding gradients across encoder and decoder for T5 model - Support for checkpoint loading and saving
-
- 13 Feb, 2021 1 commit
-
-
Deepak Narayanan authored
-
- 09 Feb, 2021 1 commit
-
-
Deepak Narayanan authored
- Split a model's computation into multiple virtual stages as needed, and schedule communication correctly between these virtual stages - Move schedule code into `schedules.py` and communication code into `p2p_communication.py` - Use hyphens instead of spaces in all time logging for consistency - Factor out code in megatron/training.py into helper functions - Refactor evaluate() function: make it use forward_backward_schedule functions
-
- 04 Jan, 2021 1 commit
-
-
Deepak Narayanan authored
-
- 19 Dec, 2020 2 commits
-
-
Jared Casper authored
-
Jared Casper authored
-
- 12 Nov, 2020 2 commits
-
-
Deepak Narayanan authored
-
Deepak Narayanan authored
Also includes following changes for inter-layer model-parallel implementation: - Refactoring of model implementations - Training loop changes to support inter-layer communication using `ring_exchange` - New groups for inter-layer communication - Checkpoint changes - Command line arguments
-
- 06 Aug, 2020 1 commit
-
-
Boris Fomitchev authored
Signed-off-by:Boris Fomitchev <bfomitchev@nvidia.com>
-
- 30 Jul, 2020 1 commit
-
-
Boris Fomitchev authored
Signed-off-by:Boris Fomitchev <bfomitchev@nvidia.com>
-
- 26 Jul, 2020 1 commit
-
-
Neel Kant authored
-