Commits · 6cb332f1b1ea786b5d83ebf9c66cb1f85d1245f6 · OpenDAS / deepspeed

29 Apr, 2020 1 commit

Samyam Rajbhandari authored Apr 28, 2020

1) CSR parameter names should end with .weight. 
2) When using basic optimizer directly, DeepSpeed should handle zero_grad. Letting the basic optimizer do the zero_grad resulted in residual gradients in the embedding layer due to unknown reasons.

6cb332f1

27 Apr, 2020 1 commit
- Moved environment variable docs. (#203) · a0cd61e8
  Shaden Smith authored Apr 27, 2020
  
  a0cd61e8
25 Apr, 2020 1 commit

Remove explicit torch version requirement · 7cf65d0e

Jeff Rasley authored Apr 24, 2020

Remove explicit torch version requirement so that we can more easily support other versions

7cf65d0e

24 Apr, 2020 1 commit
- Fix index out of range error when parameter count is not multiple of ranks (#202) · 512a0d4d
  Olatunji Ruwase authored Apr 24, 2020
  
  512a0d4d
22 Apr, 2020 2 commits
- Fixes missing newline in code example (#201) · c014a55b
  Shaden Smith authored Apr 22, 2020
  
  c014a55b
- README and RTD improvements. (#198) · dd166ee6
  Shaden Smith authored Apr 21, 2020
  
  dd166ee6
21 Apr, 2020 1 commit
- Fix perf bug (#194) · bf4797c2
  Olatunji Ruwase authored Apr 20, 2020
```
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
```
  bf4797c2
20 Apr, 2020 1 commit
- Early Return Pattern "if return else return" -> "if return return" (#197) · b7f5cb78
  marload authored Apr 21, 2020
  
  b7f5cb78
16 Apr, 2020 1 commit
- Delete tmp · 675d73e0
  Jeff Rasley authored Apr 15, 2020
  
  675d73e0
12 Apr, 2020 1 commit
- AllReduce bucket fix. (#186) · 90017d3a
  Samyam Rajbhandari authored Apr 11, 2020
  
  90017d3a
10 Apr, 2020 1 commit
- Add BERT pretraining tutorial to navigation bar. (#190) · 4cbfcc75
  Shaden Smith authored Apr 10, 2020
  
  4cbfcc75
09 Apr, 2020 1 commit
- updating commit for deepspeed examples (#188) · 3ef0030a
  Jeff Rasley authored Apr 09, 2020
  
  3ef0030a
07 Apr, 2020 1 commit
- refactoring: Deduplication (#185) · 63da7f56
  marload authored Apr 08, 2020
  
  63da7f56
06 Apr, 2020 1 commit
- Ported BERT pre-training tutorial (#184) · 5fb22a05
  Shaden Smith authored Apr 06, 2020
  
  5fb22a05
03 Apr, 2020 1 commit
- add representation to optimizer (#181) · 3637b86b
  kouml authored Apr 04, 2020
  
  3637b86b
28 Mar, 2020 1 commit
- catching up DeepSpeedExamples (#177) · c1a3ec67
  Shaden Smith authored Mar 27, 2020
  
  c1a3ec67
27 Mar, 2020 2 commits

Support multi-output models (#170) · 53c73fe3

Olatunji Ruwase authored Mar 27, 2020

* Push to remote

* Correctly handle multi output models by doing loss scaling in backward()
Unit tests for multi output models

* Fix formatting issues

* Formatting issues fix

* Fix formatting

* Update DeepSpeedExamples submodule
Enable Megatron model tests

53c73fe3

Add "zero_allow_untested_optimizer" option in conf file (#173) · 43f27332

Calogero Zarbo authored Mar 27, 2020

* added zero_allow_untested_optimizer flag helpers

* add zero_allow_untested_optimizer config constants

* zero_allow_untested_optimizer logic with assertion

* Added unit test and CustomOptimizer helper class

43f27332

26 Mar, 2020 1 commit
- Fix ThroughputTimer with hybrid parallelism. (#171) · 20557f70
  Shaden Smith authored Mar 26, 2020
  
  20557f70
25 Mar, 2020 1 commit
- Adding static loss scaling for ZeRO. (#166) · a76572dc
  Shaden Smith authored Mar 25, 2020
  
  a76572dc
23 Mar, 2020 1 commit
- Export all python environment variables, not just PYTHONPATH (#165) · 012d91df
  Olatunji Ruwase authored Mar 22, 2020
  
  012d91df
22 Mar, 2020 2 commits
- removed restrictions for custom optimizer (#161) · ac9cc7fe
  Calogero Zarbo authored Mar 23, 2020
  
  ac9cc7fe
- removed session_params from deepspeed_constants.py (#162) · 62d3272e
  kouml authored Mar 23, 2020
```
* remove session_params in deepspeed_constants.py

* add constants info at README.md
```
  62d3272e
18 Mar, 2020 4 commits
- JSON configuration cleanup. (#151) · 1496247a
  Shaden Smith authored Mar 18, 2020
```
* Better config filename

* Clean up configuration ToC
```
  1496247a
- Fix permalinks (#149) · 29855c27
  Shaden Smith authored Mar 18, 2020
```
* fix docs permalink

* fix docs permalink
```
  29855c27
- Web edits (#147) · b84a1fa4
  Shaden Smith authored Mar 18, 2020
  
  b84a1fa4
- Web edits (#146) · 4d735946
  Shaden Smith authored Mar 17, 2020
```
* Add coming soon to posts

* Add what's new section to main page
```
  4d735946
17 Mar, 2020 5 commits
- Restoring CNAME (#145) · 85cc16ae
  Shaden Smith authored Mar 17, 2020
  
  85cc16ae
- drafting Jekyll webpage (#143) · 5042dc00
  Shaden Smith authored Mar 17, 2020
  
  5042dc00
- removing duplicated CNAME (#142) · d6bc44bf
  Shaden Smith authored Mar 17, 2020
```
GitHub created a CNAME for us automatically. Cool.
```
  d6bc44bf
- Add CNAME file. (#141) · 8b12cfb4
  Shaden Smith authored Mar 17, 2020
  
  8b12cfb4
- Create CNAME · 0b8d765a
  Shaden Smith authored Mar 17, 2020
  
  0b8d765a
12 Mar, 2020 1 commit

PyTorch 1.3+ build support (#135) · 3d3f8d36

Jeff Rasley authored Mar 12, 2020

* add support for torch 1.3+ builds inside a docker build environment
* remove apex imports

3d3f8d36

11 Mar, 2020 2 commits
- add skip reqs flag (#133) · e0f5cc68
  Jeff Rasley authored Mar 11, 2020
  
  e0f5cc68
- Install specific apex hash (#132) · 259f894a
  Jeff Rasley authored Mar 11, 2020
```
* allow installing a specific apex commit
```
  259f894a
10 Mar, 2020 4 commits
- Enhancement: Ability to load checkpoint without loading the optimizer… (#128) · 936117b5
  Samyam Rajbhandari authored Mar 10, 2020
```
* Enhancement: Ability to load checkpoint without loading the optimizer states. Unittest testing saving and loading checkpoint with fused, unfused and zero optimizer. The unitest takes about 165s
```
  936117b5
- Make lr schedulers support fp16 optimizers (#124) · 1c0b326e
  Olatunji Ruwase authored Mar 10, 2020
```
* add tests cases for onecycle policy with fp16/zero

* Make lr schedulers support fp16 optimizers

* Fix formatting

* More specific naming
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
```
  1c0b326e
- Use torch.cuda.device_count() (#126) · 27d83851
  Shaden Smith authored Mar 10, 2020
  
  27d83851
- Add missing link to list of schedulers (#125) · 8ad8a262
  Cola authored Mar 10, 2020
  
  8ad8a262
09 Mar, 2020 1 commit

Add two CLI options to help with the installation inside of conda (#113) · 5f6294bd

Incomplete authored Mar 09, 2020



* Add --no_sudo to run without sudo

* Add --pip_mirror to set the pip mirror

* Default to running pip without sudo

* Typo

* Add --pip_sudo to Dockerfile and azure-pipelines.yml
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

5f6294bd