- 29 Apr, 2020 1 commit
-
-
Samyam Rajbhandari authored
1) CSR parameter names should end with .weight. 2) When using basic optimizer directly, DeepSpeed should handle zero_grad. Letting the basic optimizer do the zero_grad resulted in residual gradients in the embedding layer due to unknown reasons.
-
- 27 Apr, 2020 1 commit
-
-
Shaden Smith authored
-
- 25 Apr, 2020 1 commit
-
-
Jeff Rasley authored
Remove explicit torch version requirement so that we can more easily support other versions
-
- 24 Apr, 2020 1 commit
-
-
Olatunji Ruwase authored
-
- 22 Apr, 2020 2 commits
-
-
Shaden Smith authored
-
Shaden Smith authored
-
- 21 Apr, 2020 1 commit
-
-
Olatunji Ruwase authored
Co-authored-by:Shaden Smith <Shaden.Smith@microsoft.com>
-
- 20 Apr, 2020 1 commit
-
-
marload authored
-
- 16 Apr, 2020 1 commit
-
-
Jeff Rasley authored
-
- 12 Apr, 2020 1 commit
-
-
Samyam Rajbhandari authored
-
- 10 Apr, 2020 1 commit
-
-
Shaden Smith authored
-
- 09 Apr, 2020 1 commit
-
-
Jeff Rasley authored
-
- 07 Apr, 2020 1 commit
-
-
marload authored
-
- 06 Apr, 2020 1 commit
-
-
Shaden Smith authored
-
- 03 Apr, 2020 1 commit
-
-
kouml authored
-
- 28 Mar, 2020 1 commit
-
-
Shaden Smith authored
-
- 27 Mar, 2020 2 commits
-
-
Olatunji Ruwase authored
* Push to remote * Correctly handle multi output models by doing loss scaling in backward() Unit tests for multi output models * Fix formatting issues * Formatting issues fix * Fix formatting * Update DeepSpeedExamples submodule Enable Megatron model tests
-
Calogero Zarbo authored
* added zero_allow_untested_optimizer flag helpers * add zero_allow_untested_optimizer config constants * zero_allow_untested_optimizer logic with assertion * Added unit test and CustomOptimizer helper class
-
- 26 Mar, 2020 1 commit
-
-
Shaden Smith authored
-
- 25 Mar, 2020 1 commit
-
-
Shaden Smith authored
-
- 23 Mar, 2020 1 commit
-
-
Olatunji Ruwase authored
-
- 22 Mar, 2020 2 commits
-
-
Calogero Zarbo authored
-
kouml authored
* remove session_params in deepspeed_constants.py * add constants info at README.md
-
- 18 Mar, 2020 4 commits
-
-
Shaden Smith authored
* Better config filename * Clean up configuration ToC
-
Shaden Smith authored
* fix docs permalink * fix docs permalink
-
Shaden Smith authored
-
Shaden Smith authored
* Add coming soon to posts * Add what's new section to main page
-
- 17 Mar, 2020 5 commits
-
-
Shaden Smith authored
-
Shaden Smith authored
-
Shaden Smith authored
GitHub created a CNAME for us automatically. Cool.
-
Shaden Smith authored
-
Shaden Smith authored
-
- 12 Mar, 2020 1 commit
-
-
Jeff Rasley authored
* add support for torch 1.3+ builds inside a docker build environment * remove apex imports
-
- 11 Mar, 2020 2 commits
-
-
Jeff Rasley authored
-
Jeff Rasley authored
* allow installing a specific apex commit
-
- 10 Mar, 2020 4 commits
-
-
Samyam Rajbhandari authored
* Enhancement: Ability to load checkpoint without loading the optimizer states. Unittest testing saving and loading checkpoint with fused, unfused and zero optimizer. The unitest takes about 165s
-
Olatunji Ruwase authored
* add tests cases for onecycle policy with fp16/zero * Make lr schedulers support fp16 optimizers * Fix formatting * More specific naming Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
Shaden Smith authored
-
Cola authored
-
- 09 Mar, 2020 1 commit
-
-
Incomplete authored
* Add --no_sudo to run without sudo * Add --pip_mirror to set the pip mirror * Default to running pip without sudo * Typo * Add --pip_sudo to Dockerfile and azure-pipelines.yml Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-