- 09 Sep, 2020 2 commits
-
-
Jeff Rasley authored
-
Arash Ashari authored
-
- 06 Sep, 2020 2 commits
-
-
Arash Ashari authored
* adding BingSqaud e2e test * updating the draft test; bring final step under try section * finalizinf test for base deepspeed and deepspeed with ZeRO * applying the comment (thanks Jeff); fixed formatting * update Sparse Attention Tutorial * fixed few issues and applied comments for better organization and readability * updated sparse attention tutorial with making how to use section incremental; applying more comments Co-authored-by:arashashari <arashashari@ArashMSLaptop.redmond.corp.microsoft.com>
-
Olatunji Ruwase authored
-
- 05 Sep, 2020 2 commits
-
-
Shaden Smith authored
-
Arash Ashari authored
-
- 04 Sep, 2020 1 commit
-
-
Shaden Smith authored
-
- 03 Sep, 2020 2 commits
-
-
Arash Ashari authored
* adding link to Sparse Attention in Navigation page
-
Jeff Rasley authored
-
- 02 Sep, 2020 4 commits
-
-
Jeff Rasley authored
-
Jeff Rasley authored
Remove llvm/cmake install for now, causing pyyaml issues
-
Jeff Rasley authored
-
Jeff Rasley authored
* Sparse attn + ops/runtime refactor + v0.3.0 Co-authored-by:
Arash Ashari <arashari@microsoft.com> Co-authored-by:
Arash Ashari <arashari@microsoft.com>
-
- 01 Sep, 2020 6 commits
-
-
Shaden Smith authored
-
Jeff Rasley authored
-
Samyam Rajbhandari authored
Renaming config files to gas3
-
Samyam Rajbhandari authored
-
Samyam Rajbhandari authored
-
Samyam Rajbhandari authored
* Adding gradient accumulation support for ZeRO Stage 2. Changing all Megatron-LM tests to also test gradient accumulation * Gradient Accumulation support for Stage 2. Model tests added to test the feature * formatting * Update deepspeed_light.py removing comment * Update ds_config_func_bs8_zero1.json reverting this file back. Its not needed for this PR * defining baseline prefix Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
- 31 Aug, 2020 1 commit
-
-
Samyam Rajbhandari authored
* Update deepspeed_checkpointing.py * formatting Co-authored-by:Jeff Rasley <jerasley@microsoft.com>
-
- 28 Aug, 2020 1 commit
-
-
Jeff Rasley authored
-
- 27 Aug, 2020 1 commit
-
-
Jeff Rasley authored
* Create CODEOWNERS
-
- 18 Aug, 2020 1 commit
-
-
Jeff Rasley authored
* turn off multi-node launch if only 1 node
-
- 14 Aug, 2020 1 commit
-
-
Jeff Rasley authored
-
- 13 Aug, 2020 2 commits
-
-
Jeff Rasley authored
-
Jeff Rasley authored
* update fan out flag for pdsh
-
- 12 Aug, 2020 2 commits
-
-
Jeff Rasley authored
-
Shaden Smith authored
-
- 10 Aug, 2020 1 commit
-
-
Jeff Rasley authored
* add fix and tests for get_lr from lr_scheduler before training starts
-
- 08 Aug, 2020 1 commit
-
-
Shaden Smith authored
-
- 07 Aug, 2020 2 commits
-
-
Jeff Rasley authored
Add webinar on-demand links and update readme
-
Shaden Smith authored
The parenthesis alter the evaluation of the assert() and it will always evaluate to True.
-
- 01 Aug, 2020 1 commit
-
-
Emmanuel Kahembwe authored
mpu object is bound to the class instance.. the if statement uses `self.mpu' but just `mpu` is called in the following lines.. This raises a NameError
-
- 28 Jul, 2020 2 commits
-
-
Jeff Rasley authored
* fix nv_peer_mem version in dockerfile * fix security issue, remove pillow dependency (this is only needed for cifar example which has its own requirements.txt)
-
Emmanuel Kahembwe authored
-
- 27 Jul, 2020 1 commit
-
-
Jeff Rasley authored
-
- 25 Jul, 2020 1 commit
-
-
Shaden Smith authored
-
- 24 Jul, 2020 2 commits
-
-
Jeff Rasley authored
-
Jeff Rasley authored
-
- 23 Jul, 2020 1 commit
-
-
Jeff Rasley authored
* updates to amp to support grad clip and grad accumulation * zero grad using optimizer if in amp mode
-