- 26 Mar, 2020 1 commit
-
-
Shaden Smith authored
-
- 22 Mar, 2020 1 commit
-
-
Calogero Zarbo authored
-
- 12 Mar, 2020 1 commit
-
-
Jeff Rasley authored
* add support for torch 1.3+ builds inside a docker build environment * remove apex imports
-
- 10 Mar, 2020 1 commit
-
-
Samyam Rajbhandari authored
* Enhancement: Ability to load checkpoint without loading the optimizer states. Unittest testing saving and loading checkpoint with fused, unfused and zero optimizer. The unitest takes about 165s
-
- 27 Feb, 2020 2 commits
-
-
Jeff Rasley authored
-
Jeff Rasley authored
* add mpirun support for openmpi 4.0 * add master addr support from args * switch mpi detection to use mpi4py * set constant for default distributed port * Make sure deepspeed_mpi exits in args
-
- 26 Feb, 2020 1 commit
-
-
Jeff Rasley authored
* add auto-detect to torch dist init * update tests to infer distributed init status * prevent crash if dist_init_required is True but already initiliazed * only init if safe to do so (forgot to add this file in prev commit)
-
- 22 Feb, 2020 1 commit
-
-
Olatunji Ruwase authored
* Support legacy optimizer fusion as config option * Configure for legacy optimizer fusion * Update configuration jsons for new apex
-
- 20 Feb, 2020 1 commit
-
-
Jeff Rasley authored
Also a fix for #94
-
- 03 Feb, 2020 1 commit
-
-
Olatunji Ruwase authored
-