- 12 Jan, 2021 1 commit
-
-
Shaden Smith authored
Special thanks to @g-karthik for tracking this issue down.
-
- 08 Jan, 2021 2 commits
-
-
Olatunji Ruwase authored
* Add Linear warmup+decay lr schedule Update lr schedule unit tests * LR scheduler unit tests for LR Range Test and 1Cycle * Disable yapf to preserve parameterizaton * Disable test_pipe.py for CI debugging * Disable test_lr_scheduler for CI debugging * Disable test_lr_scheduler for CI debugging * Enable all unit tests for CI debugging Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
Jeff Rasley authored
-
- 06 Jan, 2021 1 commit
-
-
Jeff Rasley authored
Co-authored-by:
Reza Yazdani <reyazda@microsoft.com> Co-authored-by:
Olatunji Ruwase <olruwase@microsoft.com>
-
- 05 Jan, 2021 1 commit
-
-
gcooper-isi authored
Allow DeepSpeed models to be initialized with optimizer=None Co-authored-by:Shaden Smith <Shaden.Smith@microsoft.com>
-
- 04 Jan, 2021 1 commit
-
-
Olatunji Ruwase authored
-
- 23 Dec, 2020 1 commit
-
-
Jeff Rasley authored
Co-authored-by:Samyam Rajbhandari <samyamr@microsoft.com>
-
- 18 Dec, 2020 1 commit
-
-
Jeff Rasley authored
-
- 17 Dec, 2020 1 commit
-
-
Reza Yazdani authored
Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
- 02 Dec, 2020 1 commit
-
-
Jeff Rasley authored
-
- 01 Dec, 2020 1 commit
-
-
Reza Yazdani authored
* supporting different hidden dimensions * add support for larger hidden dimensions (greater than 8K) * remove empty line * add loop unrolling factor for dropout kernels * update different kernels based on the reviews Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
- 25 Nov, 2020 1 commit
-
-
Jeff Rasley authored
-
- 21 Nov, 2020 1 commit
-
-
Olatunji Ruwase authored
-
- 20 Nov, 2020 1 commit
-
-
Olatunji Ruwase authored
* Use zero-tensors for missing gradients to avoid size mismatch * Unit test for unbalanced gradients in ZeRO * Formatting fixes
-
- 19 Nov, 2020 1 commit
-
-
Jeff Rasley authored
* zero-1 memory fix * auto-tune max elems per comm to reduce padding/comm intervals * clean-up and added previously missing reduction options * fix testing backing to work with torch1.7
-
- 18 Nov, 2020 1 commit
-
-
Olatunji Ruwase authored
* Fix layout bug in ZeRO Stage 1 checkpoint logic Add elastic checkpoint option for ZeRO stage 1, default to True * Format fixes
-
- 12 Nov, 2020 1 commit
-
-
Jeff Rasley authored
Co-authored-by:
Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by:
Reza Yazdani <reyazda@microsoft.com>
-
- 10 Nov, 2020 1 commit
-
-
Olatunji Ruwase authored
* Progressive layer dropping docs (#499) * test * Adding tutorial and news page for pld * updating the tutorial and posts of PLD * update the finetune tutorial * Update PLD tutorial (#512) * Update installation instructions * Format fix * ZeRO tutorial * Format fixes * ZeRO-Offload * ZeRO and ZeRO-Offload tutorials * Update navigation page * Format fixes * Add yuxhe feedback * Fix blog post link * Fix OneBit-Adam link Tweak scheduler example * Fix date link * Add DeepSpeed_Adam * Add PLD tutorial to navigation Co-authored-by:
Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by:
Jeff Rasley <jerasley@microsoft.com> * updating the pld docs * DeepSpeed implementation of PLD (#508) * DeepSpeed implementation of PLD * Format fixes * Formatting fixes * Fix broken url * Address PR feedback * Bump DSE Co-authored-by:
Minjia Zhang <33713995+minjiaz@users.noreply.github.com> Co-authored-by:
Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by:
Jeff Rasley <jerasley@microsoft.com> Co-authored-by:
Minjia Zhang <minjiaz@microsoft.com>
-
- 30 Oct, 2020 1 commit
-
-
Reza Yazdani authored
* add adamW to CPU-ADAM implementation * supporting cpu-adam optimizer for zero-offload on deepspeed side * bump DSE to match cpu-adam updates Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
- 07 Oct, 2020 2 commits
-
-
Jeff Rasley authored
-
Jeff Rasley authored
-
- 29 Sep, 2020 1 commit
-
-
Olatunji Ruwase authored
* Disable default installation of CPU Adam * Handle cpufeature import/use errors separately
-
- 25 Sep, 2020 1 commit
-
-
Shaden Smith authored
-
- 22 Sep, 2020 1 commit
-
-
RezaYazdaniAminabadi authored
Co-authored-by:Conglong Li <conglong.li@gmail.com>
-
- 21 Sep, 2020 1 commit
-
-
RezaYazdaniAminabadi authored
-
- 18 Sep, 2020 3 commits
-
-
Shaden Smith authored
-
Jeff Rasley authored
This reverts commit 01b6e27e . Co-authored-by:
Shaden Smith <ShadenTSmith@gmail.com>
-
Shaden Smith authored
* Activation checkpointing bugfix and unit tests. * Activation checkpointing bugfix and unit tests. Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
- 16 Sep, 2020 1 commit
-
-
Olatunji Ruwase authored
Update lr schedule unit tests
-
- 15 Sep, 2020 1 commit
-
-
Jeff Rasley authored
* add pytest skips around tests that require certain ops to be installed
-
- 11 Sep, 2020 2 commits
-
-
Jeff Rasley authored
This reverts commit e549be60.
-
RezaYazdaniAminabadi authored
* supporting different intermediate sizes other than 4*hidden_dim * run precommit * uncommnet the unit tests Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
- 10 Sep, 2020 3 commits
-
-
Jeff Rasley authored
-
Shaden Smith authored
Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
Jeff Rasley authored
* ZeRO-Offload (squash) (#381) Co-authored-by:
Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by:
Reza Yazdani <reyazda@microsoft.com> Co-authored-by:
Jeff Rasley <jerasley@microsoft.com> Co-authored-by:
Jie <37380896+jren73@users.noreply.github.com> Co-authored-by:
Arash Ashari <arashari@microsoft.com> Co-authored-by:
Reza Yazdani <reyazda@microsoft.com> Co-authored-by:
Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by:
Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by:
arashashari <arashashari@ArashMSLaptop.redmond.corp.microsoft.com> Co-authored-by:
RezaYazdaniAminabadi <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by:
Reza Yazdani <reyazda@microsoft.com> Co-authored-by:
Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by:
Shaden Smith <Shaden.Smith@microsoft.com>
-
- 03 Sep, 2020 1 commit
-
-
Jeff Rasley authored
-
- 02 Sep, 2020 2 commits
-
-
Jeff Rasley authored
-
Jeff Rasley authored
* Sparse attn + ops/runtime refactor + v0.3.0 Co-authored-by:
Arash Ashari <arashari@microsoft.com> Co-authored-by:
Arash Ashari <arashari@microsoft.com>
-
- 10 Aug, 2020 1 commit
-
-
Jeff Rasley authored
* add fix and tests for get_lr from lr_scheduler before training starts
-
- 15 Jul, 2020 1 commit
-
-
Jeff Rasley authored
* empty grad fix * add unit tests for empty grad
-