- 16 Mar, 2021 2 commits
-
-
Olatunji Ruwase authored
Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
Jeff Rasley authored
-
- 15 Mar, 2021 1 commit
-
-
Samyam Rajbhandari authored
* Fix mis-aligned-grad When a parameter is not divisible by world size, the partitioned gradients are mis-aligned due to incorrect padding handling. This PR should fix for that. * Formatting fix * Adding static_scale test back for Z3, and also changing hidden size to be not divisile by world_size * also removing alignment from flat fp16 buffers * Testing for hidden dim alignment * inference hook fix * Update stage3.py * formatting * [bug-fix] move params to gpu if offload params is turned off Co-authored-by:
Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by:
Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by:
Jeff Rasley <jerasley@microsoft.com>
-
- 11 Mar, 2021 1 commit
-
-
Jeff Rasley authored
-
- 08 Mar, 2021 1 commit
-
-
Samyam Rajbhandari authored
* Squash stage3 v1 (#146) Co-authored-by:
Samyam <samyamr@microsoft.com> Co-authored-by:
Jeff Rasley <jerasley@microsoft.com> Co-authored-by:
Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by:
Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by:
Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by:
Shaden Smith <ShadenTSmith@gmail.com> Co-authored-by:
eltonzheng <eltonz@microsoft.com> * Fix correctness bug (#147) * formatting fix (#150) * stage3 bugfix (API) update and simplified FP16 Z3 tests (#151) * fp16 Z3 API update and bugfix * revert debug change * ZeRO-3 detach and race condition bugfixes (#149) * trying out ZeRO-3 race condition fix * CUDA sync instead of stream * reduction stream sync * remove commented code * Fix optimizer state_dict KeyError (#148) Co-authored-by:
Jeff Rasley <jerasley@microsoft.com> * fix for smaller SGS sizes, ensures each grad is backed by unique tensors (#152) * Simplifying the logic for getting averaged gradients (#153) * skip for now * Z3 Docs redux (#154) * removing some TODOs and commented code (#155) * New Z3 defaults (#156) Co-authored-by:
Jeff Rasley <jerasley@microsoft.com> * formatting * megatron external params Co-authored-by:
Jeff Rasley <jerasley@microsoft.com> Co-authored-by:
Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by:
Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by:
Shaden Smith <ShadenTSmith@gmail.com> Co-authored-by:
eltonzheng <eltonz@microsoft.com>
-
- 12 Feb, 2021 1 commit
-
-
Olatunji Ruwase authored
* Activation checkpoint support for non tensor input/output * Format fixes * Address PR comments; Add ordering edge case tests
-
- 11 Feb, 2021 1 commit
-
-
Cheng Li authored
* work on flops profiler tutorial * update flops profiler tutorial * add flops profiler tutorial and fix names * work on flops profiler tutorial * update flops profiler tutorial * add flops profiler tutorial and fix names * fix tailing ws * fix names * remove multistep profiling and update docs * fix cases where functionals and submodules coexist in a parent module, update readme * fix typo * always invoke post hook function * fix module flops sum and update tests * update tutorial
-
- 29 Jan, 2021 1 commit
-
-
Jeff Rasley authored
-
- 27 Jan, 2021 1 commit
-
-
Jeff Rasley authored
-
- 20 Jan, 2021 1 commit
-
-
Shaden Smith authored
-
- 15 Jan, 2021 1 commit
-
-
Olatunji Ruwase authored
-
- 14 Jan, 2021 1 commit
-
-
Jeff Rasley authored
-
- 13 Jan, 2021 1 commit
-
-
Cheng Li authored
Co-authored-by:
Cheng Li <pistasable@gmail.com> Co-authored-by:
Jeff Rasley <jerasley@microsoft.com>
-
- 12 Jan, 2021 1 commit
-
-
Shaden Smith authored
Special thanks to @g-karthik for tracking this issue down.
-
- 08 Jan, 2021 2 commits
-
-
Olatunji Ruwase authored
* Add Linear warmup+decay lr schedule Update lr schedule unit tests * LR scheduler unit tests for LR Range Test and 1Cycle * Disable yapf to preserve parameterizaton * Disable test_pipe.py for CI debugging * Disable test_lr_scheduler for CI debugging * Disable test_lr_scheduler for CI debugging * Enable all unit tests for CI debugging Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
Jeff Rasley authored
-
- 06 Jan, 2021 1 commit
-
-
Jeff Rasley authored
Co-authored-by:
Reza Yazdani <reyazda@microsoft.com> Co-authored-by:
Olatunji Ruwase <olruwase@microsoft.com>
-
- 05 Jan, 2021 1 commit
-
-
gcooper-isi authored
Allow DeepSpeed models to be initialized with optimizer=None Co-authored-by:Shaden Smith <Shaden.Smith@microsoft.com>
-
- 04 Jan, 2021 1 commit
-
-
Olatunji Ruwase authored
-
- 23 Dec, 2020 1 commit
-
-
Jeff Rasley authored
Co-authored-by:Samyam Rajbhandari <samyamr@microsoft.com>
-
- 18 Dec, 2020 1 commit
-
-
Jeff Rasley authored
-
- 17 Dec, 2020 1 commit
-
-
Reza Yazdani authored
Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
- 02 Dec, 2020 1 commit
-
-
Jeff Rasley authored
-
- 01 Dec, 2020 1 commit
-
-
Reza Yazdani authored
* supporting different hidden dimensions * add support for larger hidden dimensions (greater than 8K) * remove empty line * add loop unrolling factor for dropout kernels * update different kernels based on the reviews Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
- 25 Nov, 2020 1 commit
-
-
Jeff Rasley authored
-
- 21 Nov, 2020 1 commit
-
-
Olatunji Ruwase authored
-
- 20 Nov, 2020 1 commit
-
-
Olatunji Ruwase authored
* Use zero-tensors for missing gradients to avoid size mismatch * Unit test for unbalanced gradients in ZeRO * Formatting fixes
-
- 19 Nov, 2020 1 commit
-
-
Jeff Rasley authored
* zero-1 memory fix * auto-tune max elems per comm to reduce padding/comm intervals * clean-up and added previously missing reduction options * fix testing backing to work with torch1.7
-
- 18 Nov, 2020 1 commit
-
-
Olatunji Ruwase authored
* Fix layout bug in ZeRO Stage 1 checkpoint logic Add elastic checkpoint option for ZeRO stage 1, default to True * Format fixes
-
- 12 Nov, 2020 1 commit
-
-
Jeff Rasley authored
Co-authored-by:
Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by:
Reza Yazdani <reyazda@microsoft.com>
-
- 10 Nov, 2020 1 commit
-
-
Olatunji Ruwase authored
* Progressive layer dropping docs (#499) * test * Adding tutorial and news page for pld * updating the tutorial and posts of PLD * update the finetune tutorial * Update PLD tutorial (#512) * Update installation instructions * Format fix * ZeRO tutorial * Format fixes * ZeRO-Offload * ZeRO and ZeRO-Offload tutorials * Update navigation page * Format fixes * Add yuxhe feedback * Fix blog post link * Fix OneBit-Adam link Tweak scheduler example * Fix date link * Add DeepSpeed_Adam * Add PLD tutorial to navigation Co-authored-by:
Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by:
Jeff Rasley <jerasley@microsoft.com> * updating the pld docs * DeepSpeed implementation of PLD (#508) * DeepSpeed implementation of PLD * Format fixes * Formatting fixes * Fix broken url * Address PR feedback * Bump DSE Co-authored-by:
Minjia Zhang <33713995+minjiaz@users.noreply.github.com> Co-authored-by:
Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by:
Jeff Rasley <jerasley@microsoft.com> Co-authored-by:
Minjia Zhang <minjiaz@microsoft.com>
-
- 30 Oct, 2020 1 commit
-
-
Reza Yazdani authored
* add adamW to CPU-ADAM implementation * supporting cpu-adam optimizer for zero-offload on deepspeed side * bump DSE to match cpu-adam updates Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
- 07 Oct, 2020 2 commits
-
-
Jeff Rasley authored
-
Jeff Rasley authored
-
- 29 Sep, 2020 1 commit
-
-
Olatunji Ruwase authored
* Disable default installation of CPU Adam * Handle cpufeature import/use errors separately
-
- 25 Sep, 2020 1 commit
-
-
Shaden Smith authored
-
- 22 Sep, 2020 1 commit
-
-
RezaYazdaniAminabadi authored
Co-authored-by:Conglong Li <conglong.li@gmail.com>
-
- 21 Sep, 2020 1 commit
-
-
RezaYazdaniAminabadi authored
-
- 18 Sep, 2020 2 commits
-
-
Shaden Smith authored
-
Jeff Rasley authored
This reverts commit 01b6e27e . Co-authored-by:
Shaden Smith <ShadenTSmith@gmail.com>
-