- 27 Mar, 2020 2 commits
-
-
Olatunji Ruwase authored
* Push to remote * Correctly handle multi output models by doing loss scaling in backward() Unit tests for multi output models * Fix formatting issues * Formatting issues fix * Fix formatting * Update DeepSpeedExamples submodule Enable Megatron model tests
-
Calogero Zarbo authored
* added zero_allow_untested_optimizer flag helpers * add zero_allow_untested_optimizer config constants * zero_allow_untested_optimizer logic with assertion * Added unit test and CustomOptimizer helper class
-
- 25 Mar, 2020 1 commit
-
-
Shaden Smith authored
-
- 10 Mar, 2020 2 commits
-
-
Samyam Rajbhandari authored
* Enhancement: Ability to load checkpoint without loading the optimizer states. Unittest testing saving and loading checkpoint with fused, unfused and zero optimizer. The unitest takes about 165s
-
Olatunji Ruwase authored
* add tests cases for onecycle policy with fp16/zero * Make lr schedulers support fp16 optimizers * Fix formatting * More specific naming Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
- 27 Feb, 2020 1 commit
-
-
Jeff Rasley authored
-
- 26 Feb, 2020 1 commit
-
-
Jeff Rasley authored
* add auto-detect to torch dist init * update tests to infer distributed init status * prevent crash if dist_init_required is True but already initiliazed * only init if safe to do so (forgot to add this file in prev commit)
-
- 22 Feb, 2020 1 commit
-
-
Olatunji Ruwase authored
* Support legacy optimizer fusion as config option * Configure for legacy optimizer fusion * Update configuration jsons for new apex
-
- 20 Feb, 2020 1 commit
-
-
Jeff Rasley authored
Also a fix for #94
-
- 15 Feb, 2020 1 commit
-
-
Jeff Rasley authored
bug fixes for adamw/lamb and corresponding tests
-
- 14 Feb, 2020 1 commit
-
-
Shaden Smith authored
* Porting BingBertSquad test * Updating default paths. * Enable model tests. * Updating DeepSpeedExamples submodule * Adding BingBertSquad's log uploads. * Messed up the submodule again :-)
-
- 12 Feb, 2020 1 commit
-
-
eltonzheng authored
-
- 10 Feb, 2020 1 commit
-
-
Shaden Smith authored
-
- 07 Feb, 2020 1 commit
-
-
Samyam Rajbhandari authored
* simplifying the batch config, using a single assert to test for validity and allowing for specifying only the micro batch size * Simplifying Batch Config, Adding ability to specify batch using just micro_batch, and adding a bunch of unit tests * ran formatting * Typo fixes and added the config file * reformatting * path fixes * removing print statements
-
- 06 Feb, 2020 2 commits
-
-
Olatunji Ruwase authored
Unit tests for add_XXX_arguments
-
Olatunji Ruwase authored
Co-authored-by:
Shaden Smith <ShadenTSmith@gmail.com> Co-authored-by:
Jeff Rasley <jerasley@microsoft.com>
-
- 05 Feb, 2020 1 commit
-
-
Shaden Smith authored
* Enables NCCL backend in @distributed_test * Adds pytest-forked to avoid CUDA re-initialization issue. * paste typo * transcription typo
-
- 04 Feb, 2020 4 commits
-
-
Jeff Rasley authored
-
Jeff Rasley authored
* add allreduce test * comment out set rank to cuda for now * switched back to gloo
-
Shaden Smith authored
* Adds distributed_test decorator and some unit tests. * Setting NCCL backend. * Parametrizes test. * rank -> local_rank * Temporarily disable CUDA initialization.
-
Shaden Smith authored
-
- 03 Feb, 2020 3 commits
-
-
Shaden Smith authored
-
Shaden Smith authored
-
Elton Zheng authored
-
- 01 Feb, 2020 1 commit
-
-
Elton Zheng authored
-