- 17 Aug, 2021 3 commits
-
-
- 16 Aug, 2021 3 commits
-
-
Jared Casper authored
changed torch distributed init method from tcp to env See merge request ADLR/megatron-lm!297
-
Mohammad Shoeybi authored
-
eqy authored
Some tests expect a clean model parallel slate and complain if a previous test left something behind; this change clears more variables that the tests complain about.
-
- 13 Aug, 2021 1 commit
-
-
Jared Casper authored
removed saving of input/output tensors for forward-only passes of pipeline schedules See merge request ADLR/megatron-lm!300
-
- 12 Aug, 2021 4 commits
-
-
Jared Casper authored
Lmcafee/copygrad fix v2 See merge request ADLR/megatron-lm!299
-
Jared Casper authored
fused kernels simd fix for short sequences See merge request ADLR/megatron-lm!302
-
Vijay Korthikanti authored
-
Vijay Korthikanti authored
-
- 11 Aug, 2021 1 commit
-
-
Lawrence McAfee authored
-
- 10 Aug, 2021 2 commits
-
-
Jared Casper authored
scripts for sc21 See merge request ADLR/megatron-lm!298
-
Mohammad Shoeybi authored
-
- 09 Aug, 2021 2 commits
-
-
Lawrence McAfee authored
-
ssh://gitlab-master.nvidia.com:12051/ADLR/megatron-lmLawrence McAfee authored
Merge branch 'main' of ssh://gitlab-master.nvidia.com:12051/ADLR/megatron-lm into lmcafee/copygrad-fix-v2
-
- 29 Jul, 2021 2 commits
-
-
Jared Casper authored
Use timers kwargs correctly to prevent bug with new p2p_communication API See merge request ADLR/megatron-lm!295
-
Deepak Narayanan authored
-
- 27 Jul, 2021 2 commits
-
-
Deepak Narayanan authored
See merge request ADLR/megatron-lm!293
-
Mostofa Patwary authored
-
- 21 Jul, 2021 2 commits
-
-
Jared Casper authored
Use helper method in megatron/schedules.py as intended See merge request ADLR/megatron-lm!292
-
Deepak Narayanan authored
-
- 20 Jul, 2021 1 commit
-
-
Taebum Kim authored
-
- 16 Jul, 2021 2 commits
-
-
Lawrence McAfee authored
-
Haibin Lin authored
-
- 14 Jul, 2021 2 commits
-
-
Jared Casper authored
added memory stats (allocated/reserved) to tensorboard logging See merge request ADLR/megatron-lm!285
-
Jared Casper authored
fixed zero_grad for fp32_from_float16_groups See merge request ADLR/megatron-lm!283
-
- 13 Jul, 2021 1 commit
-
-
Lawrence McAfee authored
-
- 12 Jul, 2021 8 commits
-
-
Lawrence McAfee authored
-
Lawrence McAfee authored
-
Jared Casper authored
Update README to have a small note about interleaved schedule See merge request ADLR/megatron-lm!286
-
Mohammad Shoeybi authored
updated data processing readme See merge request ADLR/megatron-lm!287
-
Mostofa Patwary authored
-
Deepak Narayanan authored
-
Deepak Narayanan authored
-
Lawrence McAfee authored
-
- 09 Jul, 2021 1 commit
-
-
Lawrence McAfee authored
-
- 11 Jun, 2021 3 commits
-
-
Jared Casper authored
Pull in some GitHub PRs See merge request ADLR/megatron-lm!282
-
-
-