- 07 Apr, 2020 1 commit
-
-
marload authored
-
- 06 Apr, 2020 1 commit
-
-
Shaden Smith authored
-
- 03 Apr, 2020 1 commit
-
-
kouml authored
-
- 28 Mar, 2020 1 commit
-
-
Shaden Smith authored
-
- 27 Mar, 2020 2 commits
-
-
Olatunji Ruwase authored
* Push to remote * Correctly handle multi output models by doing loss scaling in backward() Unit tests for multi output models * Fix formatting issues * Formatting issues fix * Fix formatting * Update DeepSpeedExamples submodule Enable Megatron model tests
-
Calogero Zarbo authored
* added zero_allow_untested_optimizer flag helpers * add zero_allow_untested_optimizer config constants * zero_allow_untested_optimizer logic with assertion * Added unit test and CustomOptimizer helper class
-
- 26 Mar, 2020 1 commit
-
-
Shaden Smith authored
-
- 25 Mar, 2020 1 commit
-
-
Shaden Smith authored
-
- 23 Mar, 2020 1 commit
-
-
Olatunji Ruwase authored
-
- 22 Mar, 2020 2 commits
-
-
Calogero Zarbo authored
-
kouml authored
* remove session_params in deepspeed_constants.py * add constants info at README.md
-
- 18 Mar, 2020 4 commits
-
-
Shaden Smith authored
* Better config filename * Clean up configuration ToC
-
Shaden Smith authored
* fix docs permalink * fix docs permalink
-
Shaden Smith authored
-
Shaden Smith authored
* Add coming soon to posts * Add what's new section to main page
-
- 17 Mar, 2020 5 commits
-
-
Shaden Smith authored
-
Shaden Smith authored
-
Shaden Smith authored
GitHub created a CNAME for us automatically. Cool.
-
Shaden Smith authored
-
Shaden Smith authored
-
- 12 Mar, 2020 1 commit
-
-
Jeff Rasley authored
* add support for torch 1.3+ builds inside a docker build environment * remove apex imports
-
- 11 Mar, 2020 2 commits
-
-
Jeff Rasley authored
-
Jeff Rasley authored
* allow installing a specific apex commit
-
- 10 Mar, 2020 4 commits
-
-
Samyam Rajbhandari authored
* Enhancement: Ability to load checkpoint without loading the optimizer states. Unittest testing saving and loading checkpoint with fused, unfused and zero optimizer. The unitest takes about 165s
-
Olatunji Ruwase authored
* add tests cases for onecycle policy with fp16/zero * Make lr schedulers support fp16 optimizers * Fix formatting * More specific naming Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
Shaden Smith authored
-
Cola authored
-
- 09 Mar, 2020 1 commit
-
-
Incomplete authored
* Add --no_sudo to run without sudo * Add --pip_mirror to set the pip mirror * Default to running pip without sudo * Typo * Add --pip_sudo to Dockerfile and azure-pipelines.yml Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
- 07 Mar, 2020 1 commit
-
-
Olatunji Ruwase authored
-
- 03 Mar, 2020 1 commit
-
-
Jeff Rasley authored
* add support for deepspeed env file to pass custom env values * simplify deepspeed config example
-
- 27 Feb, 2020 4 commits
-
-
Jeff Rasley authored
-
Jeff Rasley authored
* add text about mpirun
-
Jeff Rasley authored
* add mpirun support for openmpi 4.0 * add master addr support from args * switch mpi detection to use mpi4py * set constant for default distributed port * Make sure deepspeed_mpi exits in args
-
Jeff Rasley authored
-
- 26 Feb, 2020 1 commit
-
-
Jeff Rasley authored
* add auto-detect to torch dist init * update tests to infer distributed init status * prevent crash if dist_init_required is True but already initiliazed * only init if safe to do so (forgot to add this file in prev commit)
-
- 25 Feb, 2020 2 commits
-
-
zenlytix authored
* Update scripts to handle cases where you have other VMs in your sub * Support subs with other VMs and fix for PDSH permission error * Minor fix to support subs with other VMs * Added shutdown with or without delete VM option In Azure deallocate is like machine shutdown (and prevents billing). You can restart deallocated VM. To fully drop the VM delete is used. This command with "-d" option will fully delete the VM. Without any argument it justs deallocates / shutd down the VM.
-
zenlytix authored
* Update scripts to handle cases where you have other VMs in your sub * Support subs with other VMs and fix for PDSH permission error * Minor fix to support subs with other VMs
-
- 24 Feb, 2020 3 commits
-
-
Jeff Rasley authored
-
Shaden Smith authored
-
Shaden Smith authored
* Removes DeepSpeedDataSource * dropping unused imports Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-