- 21 Sep, 2020 1 commit
-
-
RezaYazdaniAminabadi authored
-
- 18 Sep, 2020 3 commits
-
-
Shaden Smith authored
-
Jeff Rasley authored
This reverts commit 01b6e27e . Co-authored-by:
Shaden Smith <ShadenTSmith@gmail.com>
-
Shaden Smith authored
* Activation checkpointing bugfix and unit tests. * Activation checkpointing bugfix and unit tests. Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
- 16 Sep, 2020 1 commit
-
-
Olatunji Ruwase authored
Update lr schedule unit tests
-
- 15 Sep, 2020 1 commit
-
-
Jeff Rasley authored
* add pytest skips around tests that require certain ops to be installed
-
- 11 Sep, 2020 2 commits
-
-
Jeff Rasley authored
This reverts commit e549be60.
-
RezaYazdaniAminabadi authored
* supporting different intermediate sizes other than 4*hidden_dim * run precommit * uncommnet the unit tests Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
- 10 Sep, 2020 3 commits
-
-
Jeff Rasley authored
-
Shaden Smith authored
Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
Jeff Rasley authored
* ZeRO-Offload (squash) (#381) Co-authored-by:
Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by:
Reza Yazdani <reyazda@microsoft.com> Co-authored-by:
Jeff Rasley <jerasley@microsoft.com> Co-authored-by:
Jie <37380896+jren73@users.noreply.github.com> Co-authored-by:
Arash Ashari <arashari@microsoft.com> Co-authored-by:
Reza Yazdani <reyazda@microsoft.com> Co-authored-by:
Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by:
Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by:
arashashari <arashashari@ArashMSLaptop.redmond.corp.microsoft.com> Co-authored-by:
RezaYazdaniAminabadi <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by:
Reza Yazdani <reyazda@microsoft.com> Co-authored-by:
Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by:
Shaden Smith <Shaden.Smith@microsoft.com>
-
- 09 Sep, 2020 1 commit
-
-
Ammar Ahmad Awan authored
* 1-bit adam (#353) Co-authored-by:
Jeff Rasley <jerasley@microsoft.com> Co-authored-by:
Your Name <you@example.com> Co-authored-by:
tanghl1994 <htang14@ur.rochester.edu> Co-authored-by:
Hank <tanghl1994@gmail.com> Co-authored-by:
root <root@node2x12b.cs.rochester.edu> Co-authored-by:
Ammar Ahmad Awan <awan.ammar@microsoft.com>
-
- 03 Sep, 2020 1 commit
-
-
Jeff Rasley authored
-
- 02 Sep, 2020 2 commits
-
-
Jeff Rasley authored
-
Jeff Rasley authored
* Sparse attn + ops/runtime refactor + v0.3.0 Co-authored-by:
Arash Ashari <arashari@microsoft.com> Co-authored-by:
Arash Ashari <arashari@microsoft.com>
-
- 01 Sep, 2020 5 commits
-
-
Shaden Smith authored
-
Samyam Rajbhandari authored
Renaming config files to gas3
-
Samyam Rajbhandari authored
-
Samyam Rajbhandari authored
-
Samyam Rajbhandari authored
* Adding gradient accumulation support for ZeRO Stage 2. Changing all Megatron-LM tests to also test gradient accumulation * Gradient Accumulation support for Stage 2. Model tests added to test the feature * formatting * Update deepspeed_light.py removing comment * Update ds_config_func_bs8_zero1.json reverting this file back. Its not needed for this PR * defining baseline prefix Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
- 10 Aug, 2020 1 commit
-
-
Jeff Rasley authored
* add fix and tests for get_lr from lr_scheduler before training starts
-
- 15 Jul, 2020 2 commits
-
-
Jeff Rasley authored
* empty grad fix * add unit tests for empty grad
-
Olatunji Ruwase authored
-
- 14 Jul, 2020 1 commit
-
-
Olatunji Ruwase authored
* Support saving and loading ZeRO checkpoints on different data parallelism degree. * Fix formatting * Support checkpoint with varying GPU count in ZeRO stage 1 * Fix formatting * Formatting fixes * Update model tests * Remove pprint * Minor fix * Fix formatting * Update model tests Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
- 11 Jul, 2020 1 commit
-
-
Jeff Rasley authored
* add amp support for deepspeed (non-ZeRO) * tests for amp mode
-
- 06 Jul, 2020 1 commit
-
-
Olatunji Ruwase authored
* Load non-DeepSpeed checkpoints into ZeRO optimizer * Handle parameters smaller than DP * Formatting fixes * Handle empty partitions * Fix perf bug Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
- 23 Jun, 2020 1 commit
-
-
Olatunji Ruwase authored
* Load non-DeepSpeed checkpoints into ZeRO optimizer * Handle parameters smaller than DP * Formatting fixes
-
- 30 May, 2020 1 commit
-
-
Jeff Rasley authored
-
- 29 May, 2020 1 commit
-
-
Jeff Rasley authored
* Transformer kernels release Co-authored-by:
Shaden Smith <ShadenTSmith@gmail.com> Co-authored-by:
Elton Zheng <eltonz@microsoft.com> Co-authored-by:
Reza Yazdani <reyazda@microsoft.com> Co-authored-by:
RezaYazdaniAminabadi <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by:
Tunji Ruwase <olruwase@microsoft.com> Co-authored-by:
Shaden Smith <ShadenTSmith@gmail.com> Co-authored-by:
Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by:
Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by:
Shaden Smith <ShadenTSmith@gmail.com> Co-authored-by:
Jeff Rasley <jerasley@microsoft.com> Co-authored-by:
Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by:
Shaden Smith <ShadenTSmith@gmail.com> Co-authored-by:
Elton Zheng <eltonz@microsoft.com> Co-authored-by:
Reza Yazdani <reyazda@microsoft.com> Co-authored-by:
RezaYazdaniAminabadi <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by:
Tunji Ruwase <olruwase@microsoft.com> Co-authored-by:
Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by:
Samyam Rajbhandari <samyamr@microsoft.com>
-
- 27 May, 2020 1 commit
-
-
Jeff Rasley authored
* updates to support fp32 grad clipping and disable max_grad_norm
-
- 20 May, 2020 1 commit
-
-
Jeff Rasley authored
-
- 19 May, 2020 1 commit
-
-
Jeff Rasley authored
Updates for ZeRO stage 2 + ZeRO stage 1 w. RS Co-authored-by:
Tunji Ruwase <olruwase@microsoft.com> Co-authored-by:
Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by:
Shaden Smith <ShadenTSmith@gmail.com> Co-authored-by:
Elton Zheng <eltonz@microsoft.com> Co-authored-by:
Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by:
yuxionghe <yuxhe@microsoft.com> Co-authored-by:
Arash Ashari <arashari@microsoft.com>
-
- 18 May, 2020 1 commit
-
-
Arash Ashari authored
* adding BingSqaud e2e test * updating the draft test; bring final step under try section * finalizinf test for base deepspeed and deepspeed with ZeRO * applying the comment (thanks Jeff); fixed formatting
-
- 11 May, 2020 1 commit
-
-
Olatunji Ruwase authored
* Support dynamic loss scale args in fp16 optimizers * Update names
-
- 06 May, 2020 1 commit
-
-
Shaden Smith authored
-
- 30 Apr, 2020 1 commit
-
-
Jeff Rasley authored
* update apex version to feb 5th commit * use gradient clipping instead of max grad norm in tests * add warning when user provides max_grad_norm * update examples commit
-
- 24 Apr, 2020 1 commit
-
-
Olatunji Ruwase authored
-
- 27 Mar, 2020 2 commits
-
-
Olatunji Ruwase authored
* Push to remote * Correctly handle multi output models by doing loss scaling in backward() Unit tests for multi output models * Fix formatting issues * Formatting issues fix * Fix formatting * Update DeepSpeedExamples submodule Enable Megatron model tests
-
Calogero Zarbo authored
* added zero_allow_untested_optimizer flag helpers * add zero_allow_untested_optimizer config constants * zero_allow_untested_optimizer logic with assertion * Added unit test and CustomOptimizer helper class
-
- 25 Mar, 2020 1 commit
-
-
Shaden Smith authored
-