Commits · ebbd5f643d3006c601183e6f5a111611663754c5 · OpenDAS / fairscale

10 Sep, 2020 1 commit
- [docs] use PyTorch Sphinx theme (#75) · ebbd5f64
  msbaines authored Sep 09, 2020
  
  ebbd5f64
09 Sep, 2020 7 commits
- [docs] include proper citations (#74) · f4531ab7
  msbaines authored Sep 09, 2020
  
  f4531ab7
- [fix] fix typo in requirements.txt (#73) · 5b75dd30
  msbaines authored Sep 09, 2020
  
  5b75dd30
- [docs] add docs for APIs (#72) · fad970aa
  msbaines authored Sep 09, 2020
  
  fad970aa
- [feat] OSS flatten state dict (#65) · 4f597233
  Benjamin Lefaudeux authored Sep 09, 2020
```
Changes the structure of the returned state dict with respect to the param_groups to make it closer to what a vanilla optimizer would return (un-shard them). Shard again when loading
```
  4f597233
- [docs] specify sphinx version (#71) · 6fe88a91
  msbaines authored Sep 08, 2020
  
  6fe88a91
- [docs] add requirements.txt (#70) · ddcb2aa7
  msbaines authored Sep 08, 2020
```
Needed for working correctly with readthedocs.org
```
  ddcb2aa7
- [docs] initial commit of documentation (#69) · f94367f7
  msbaines authored Sep 08, 2020
  
  f94367f7
08 Sep, 2020 1 commit

[feat] OSS: Sync all attributes (#67) · 5a268b25

Benjamin Lefaudeux authored Sep 08, 2020

Make sure that all attributes (not just LR) are in sync in between the OSS.param_groups and the actual wrapped optimizer. Some frameworks make it possible to alter any attribute on a scheduled basis, which proves useful depending on the optimizer, so the keys need to be generically supported (not just "lr"). Not syncing these attributes is a worst case scenario, since these adjustments are silently not propagated, fixing that.

5a268b25

04 Sep, 2020 1 commit
- [chore] initial conda build config (#64) · 3a203179
  msbaines authored Sep 04, 2020
```
Built via:

$ conda build --python 3.8 .
```
  3a203179
03 Sep, 2020 3 commits

[feat] Add a memory usage regression test to the OSS benchmark (#62) · ee38e1e0

Benjamin Lefaudeux authored Sep 03, 2020

* Aligning the optimizer state dict with what PyTorch expects

* Adding a check on the dict keys, ensure that `state` and `param_groups` are there

* after installing the specific isort, black and all, one liner to please the linter..

* Adding some measurement of the memory consumption while training + checkpointing

* mandatory lintfix commit

* brainfart, reset the memory use counter at the beginning of the training in case two of them are run in a row

* move reset stats call, hotfix

* move the optimizer to rmsprop, more stateful and still used in CV

* trying to figure out a sigsev in circleci

ee38e1e0

Add grad scaler (#48) · b6a5e634

Jun Ru Anderson authored Sep 03, 2020



Add GradScaler to Fairscale, subclassing PyTorch's GradScaler. Use GradScaler in the pipe benchmark; though it is not needed in this case, it is a good example of how to use gradient scaling for larger models that do require gradient scaling in order to converge.
Co-authored-by: Jun Ru Anderson <andersonic@fb.com>

b6a5e634

[fix] OSS pytorch-compliant state dict (#61) · 1d1d15ea

Benjamin Lefaudeux authored Sep 03, 2020

* Aligning the optimizer state dict with what PyTorch expects

* Adding a check on the dict keys, ensure that `state` and `param_groups` are there

* after installing the specific isort, black and all, one liner to please the linter..

1d1d15ea

28 Aug, 2020 4 commits

[chore] create v0.0.2 (#59) · 4488e17c
msbaines authored Aug 28, 2020

4488e17c

[test] specify chunks for pipe/transformer benchmark (#52) · d1d74413

Jun Ru Anderson authored Aug 28, 2020



* specify chunks for pipe/transformer benchmark

Set chunks to be equal to len(balance) for pipe/transformer benchmark. Will update words per second and memory usage checks in next commit (must test on CircleCI to find appropriate values)

* change benchmark words per second and memory usage

Did six runs for words-per-second, with results: 9144.40, 9163.91, 9993.01, 9082.82, 9155.09, 9000.67
Peak allocated bytes per device (which does not change between runs) were 193206272, 645632, 562688, 92688384 for devices 0, 1, 2 and 3, respectively

* increase batch size

batch size was small enough that the GPU's computing power was not the bottleneck, slowing training and specifically making more chunks slower. Increasing batch size has therefore increased training speed

* update benchmark numbers

ran six times, with wps 36917.44, 36797.65, 37006.03, 36872.84, 37129.31, 37003.31 and peak allocated bytes 4061909504, 4050944, 10427392, 2031824896 for devices 0,1,2 and 3 respectively.
Co-authored-by: Jun Ru Anderson <andersonic@fb.com>

d1d74413

[fix] optim/oss: work correctly with LRScheduler (#58) · ab32cb7d

msbaines authored Aug 28, 2020

* [fix] optim/oss: work correctly with LRScheduler

Sync lr before every step and before consolidate.

ab32cb7d

[fix] fix eval for oss_ddp (#55) · 8c8eb8e8
Min Xu authored Aug 28, 2020
```
- added train(mode) method to be aware of eval mode
```
8c8eb8e8

27 Aug, 2020 4 commits
- [fix] optim/oss: fix state cast (#56) · fb49b515
  msbaines authored Aug 27, 2020
```
Workaround PyTorch bug that casts state (pytorch/pytorch#43706).

Copied from https://github.com/pytorch/fairseq/blob/v0.9.0/fairseq/optim/fp16_optimizer.py#L251-L268
```
  fb49b515
- [refactor] optim/oss: save memory and time by avoiding duplicate copy of parameters (#57) · e4a0804c
  msbaines authored Aug 27, 2020
  
  e4a0804c
- [fix] optim/oss: PyTorch already handles putting state on proper device (#54) · 220ee323
  msbaines authored Aug 27, 2020
  
  220ee323
- [fix] optim/oss: support optimizers with additional step kwargs (#53) · 09028a0d
  msbaines authored Aug 26, 2020
```
* [fix] optim/oss: support optimizers with additional step kwargs

Some of the optimizers in apex support additional kwargs to step
such as scale.
```
  09028a0d
22 Aug, 2020 1 commit

[feat] optimizer state scaling (#44) · 5251a69a

Jun Ru Anderson authored Aug 21, 2020



Implement scaling of optimizer state when using pure-fp16 training to avoid underflow. Update benchmark to use pure-fp16. Modify state_dict methods to store and load the optimizer state scale.
Co-authored-by: Jun Ru Anderson <andersonic@fb.com>

5251a69a

21 Aug, 2020 2 commits

[feat] Simple macro OSS benchmark (#47) · 46c3776b

Benjamin Lefaudeux authored Aug 21, 2020



* initial commit, dummy training loop, pure pytorch but not DDP

* probably slightly broken, but rough DDP benchmark run

* adding the torchvision requirement for testing

* brainfart

* reduce the loss, do something slightly distributed

* Some cleanup, distributing the training on two GPUs

* some cleanup + adding a vanilla run, still not good to go

* less silly defaults, gtg for a start I think

* smaller batch to fit the smaller gpus used in the circleci rigs

* Adding some options for the benchmark, and regression testing

* [test] set torch seed for Adam tests (#49)

Set the torch seed for tests. xfail mixed precision and memory-efficient mixed-precision state_dict tests due to their states being cast to FP16 and back to FP32 during load_state_dict.
Co-authored-by: Jun Ru Anderson <andersonic@fb.com>

* linting, I really need to automate this isort insanity
Co-authored-by: Jun Ru Anderson <33384298+andersonic@users.noreply.github.com>
Co-authored-by: Jun Ru Anderson <andersonic@fb.com>

46c3776b

[test] set torch seed for Adam tests (#49) · 0e8c2a96

Jun Ru Anderson authored Aug 21, 2020



Set the torch seed for tests. xfail mixed precision and memory-efficient mixed-precision state_dict tests due to their states being cast to FP16 and back to FP32 during load_state_dict.
Co-authored-by: Jun Ru Anderson <andersonic@fb.com>

0e8c2a96

20 Aug, 2020 1 commit
- [fix] OSS restore state to proper device (#46) · c2d6f4b6
  Benjamin Lefaudeux authored Aug 20, 2020
```
* move the restored param groups to the original device

* adding a corresponding test
```
  c2d6f4b6
19 Aug, 2020 1 commit

[fix] fix tests and state_dict; refactor tests (#45) · 9d6c7b6a

Jun Ru Anderson authored Aug 19, 2020



Refactor tests to remove duplicated code. Fix the state_dict test to instantiate the second optimizer with the correct precision. Fix Adam.load_state_dict to make optimizer state the right type.
Co-authored-by: Jun Ru Anderson <andersonic@fb.com>

9d6c7b6a

18 Aug, 2020 1 commit

[feat] allow fp16 optimizer state with Adam (#41) · 8ee5a8ff

Jun Ru Anderson authored Aug 18, 2020



Allow training with optimizer state in fp16. Use an enum to select from full-precision, mixed precision, memory efficient mixed precision and pure fp16. Improve clarity of testing code
Co-authored-by: Jun Ru Anderson <andersonic@fb.com>

8ee5a8ff

14 Aug, 2020 5 commits

[feat] add mixed precision Adam (#40) · e2d8f573

Jun Ru Anderson authored Aug 14, 2020



Add support for mixed-precision (half precision params, full precision gradients) and memory-efficient (half precision params and half precision gradients) training with Adam
Co-authored-by: Jun Ru Anderson <andersonic@fb.com>

e2d8f573

[fix] Properly restore a sharded optim state (#39) · 585f177b

Benjamin Lefaudeux authored Aug 14, 2020



* hotfix a half-cooked optimizer state restoration, the global shared state also needs to be restored

* [cleanup] get 100% coverage on oss.py (#38)
authored-by: Mandeep Singh Baines <msb@fb.com>

* better unit testing, check that the .param_groups attribute is properly in sync with the loaded state
Co-authored-by: msbaines <35972327+msbaines@users.noreply.github.com>

585f177b

[cleanup] get 100% coverage on oss.py (#38) · 3427a039
msbaines authored Aug 13, 2020
```
authored-by: Mandeep Singh Baines <msb@fb.com>
```
3427a039
[chore] enforce code coverage (#37) · fffd3c76
msbaines authored Aug 13, 2020
```
* Set baseline coverage to 94%
```
fffd3c76
[test] using PyTorch v1.6 for Lint checks (#36) · b35a3d3f
msbaines authored Aug 13, 2020

b35a3d3f

13 Aug, 2020 5 commits
- [chore] enable codecov (#35) · 2f638e5a
  msbaines authored Aug 13, 2020
  
  2f638e5a
- [refactor] remove type_shim.h (#33) · bc822902
  Jun Ru Anderson authored Aug 13, 2020
  
  bc822902
- [chore] run tests on PyTorch 1.6.0 and gpu tests on 1.6.0 and 1.5.1 (#34) · 571f5efa
  msbaines authored Aug 13, 2020
  
  571f5efa
- [feat] remove support for non-multitensor Adam · 81a2cf04
  Jun Ru Anderson authored Aug 13, 2020
```
Co-authored-by: Jun Ru Anderson <andersonic@fb.com>
```
  81a2cf04
- Aligning OSS state dict with... · 57079b08
  Benjamin Lefaudeux authored Aug 12, 2020
```
Aligning OSS state dict with `https://pytorch.org/docs/stable/_modules/torch/optim/optimizer.html#Optimizer` (#31)
```
  57079b08
08 Aug, 2020 1 commit
- [fix] fix test_oss.py when host have 2 GPUs (#26) · d9e6ceaa
  Min Xu authored Aug 07, 2020
```
Co-authored-by: Min Xu <m1n@fb.com>
```
  d9e6ceaa
06 Aug, 2020 2 commits
- [feat] add ddp that works with oss with reduce() not all_reduce() (#19) · 525e709b
  Min Xu authored Aug 06, 2020
```
Co-authored-by: Min Xu <m1n@fb.com>
```
  525e709b
- add pytest coverage (#24) · 4cd2590c
  Min Xu authored Aug 06, 2020
```
Co-authored-by: Min Xu <m1n@fb.com>
```
  4cd2590c