Commits · 63f7796a72f9afbcba64f9bf0df11753ecc4558d · OpenDAS / fairscale

17 Sep, 2020 2 commits

Tom Birch authored Sep 17, 2020

Adds support for distributing pipeline stages across multiple processes (and therefore multiple machines)
* Adds a style argument to the Pipe constructor, defaulting to PipelineStyle.SingleProcess, but also supporting PipelineStyle.MultiProcess
* Added support for lazy construction of modules (see lazy_construction for an example)
* Added two implementations of inter-process communication: one based on rpc with globally visible queues, one based on send/recv
* Copied all the relevant tests from tests/pipe to tests/pipe_process and modified them to exercise PipelineStyle.MultiProcess

63f7796a

[feat] Sharded DDP - small refactor and new features (#97) · 49a198c9

Benjamin Lefaudeux authored Sep 17, 2020

- rename oss_ddp to ShardedDataParallel
- some refactoring
- ShardedDataParallel owns the sharded optimizer, exposed if need be
- some small perf bumps

49a198c9

15 Sep, 2020 2 commits
- [feat] Gracefully handle local/global state dict queries (#89) · d16e9f61
  Benjamin Lefaudeux authored Sep 15, 2020
```
Return either the local or global state when queried, depending on a prior consolidation
```
  d16e9f61
- [feat ] OSS : optional closure argument for the optimizer (#86) · 3d7f524a
  Benjamin Lefaudeux authored Sep 15, 2020
```
Make OSS compatible with optimizers which do not support the closure argument
```
  3d7f524a
09 Sep, 2020 1 commit

[feat] OSS flatten state dict (#65) · 4f597233

Benjamin Lefaudeux authored Sep 09, 2020

Changes the structure of the returned state dict with respect to the param_groups to make it closer to what a vanilla optimizer would return (un-shard them). Shard again when loading

4f597233

08 Sep, 2020 1 commit

[feat] OSS: Sync all attributes (#67) · 5a268b25

Benjamin Lefaudeux authored Sep 08, 2020

Make sure that all attributes (not just LR) are in sync in between the OSS.param_groups and the actual wrapped optimizer. Some frameworks make it possible to alter any attribute on a scheduled basis, which proves useful depending on the optimizer, so the keys need to be generically supported (not just "lr"). Not syncing these attributes is a worst case scenario, since these adjustments are silently not propagated, fixing that.

5a268b25

03 Sep, 2020 2 commits

Add grad scaler (#48) · b6a5e634

Jun Ru Anderson authored Sep 03, 2020



Add GradScaler to Fairscale, subclassing PyTorch's GradScaler. Use GradScaler in the pipe benchmark; though it is not needed in this case, it is a good example of how to use gradient scaling for larger models that do require gradient scaling in order to converge.
Co-authored-by: Jun Ru Anderson <andersonic@fb.com>

b6a5e634

[fix] OSS pytorch-compliant state dict (#61) · 1d1d15ea

Benjamin Lefaudeux authored Sep 03, 2020

* Aligning the optimizer state dict with what PyTorch expects

* Adding a check on the dict keys, ensure that `state` and `param_groups` are there

* after installing the specific isort, black and all, one liner to please the linter..

1d1d15ea

28 Aug, 2020 2 commits
- [fix] optim/oss: work correctly with LRScheduler (#58) · ab32cb7d
  msbaines authored Aug 28, 2020
```
* [fix] optim/oss: work correctly with LRScheduler

Sync lr before every step and before consolidate.
```
  ab32cb7d
- [fix] fix eval for oss_ddp (#55) · 8c8eb8e8
  Min Xu authored Aug 28, 2020
```
- added train(mode) method to be aware of eval mode
```
  8c8eb8e8
27 Aug, 2020 3 commits
- [refactor] optim/oss: save memory and time by avoiding duplicate copy of parameters (#57) · e4a0804c
  msbaines authored Aug 27, 2020
  
  e4a0804c
- [fix] optim/oss: PyTorch already handles putting state on proper device (#54) · 220ee323
  msbaines authored Aug 27, 2020
  
  220ee323
- [fix] optim/oss: support optimizers with additional step kwargs (#53) · 09028a0d
  msbaines authored Aug 26, 2020
```
* [fix] optim/oss: support optimizers with additional step kwargs

Some of the optimizers in apex support additional kwargs to step
such as scale.
```
  09028a0d
22 Aug, 2020 1 commit

[feat] optimizer state scaling (#44) · 5251a69a

Jun Ru Anderson authored Aug 21, 2020



Implement scaling of optimizer state when using pure-fp16 training to avoid underflow. Update benchmark to use pure-fp16. Modify state_dict methods to store and load the optimizer state scale.
Co-authored-by: Jun Ru Anderson <andersonic@fb.com>

5251a69a

21 Aug, 2020 1 commit

[test] set torch seed for Adam tests (#49) · 0e8c2a96

Jun Ru Anderson authored Aug 21, 2020



Set the torch seed for tests. xfail mixed precision and memory-efficient mixed-precision state_dict tests due to their states being cast to FP16 and back to FP32 during load_state_dict.
Co-authored-by: Jun Ru Anderson <andersonic@fb.com>

0e8c2a96

20 Aug, 2020 1 commit
- [fix] OSS restore state to proper device (#46) · c2d6f4b6
  Benjamin Lefaudeux authored Aug 20, 2020
```
* move the restored param groups to the original device

* adding a corresponding test
```
  c2d6f4b6
19 Aug, 2020 1 commit

[fix] fix tests and state_dict; refactor tests (#45) · 9d6c7b6a

Jun Ru Anderson authored Aug 19, 2020



Refactor tests to remove duplicated code. Fix the state_dict test to instantiate the second optimizer with the correct precision. Fix Adam.load_state_dict to make optimizer state the right type.
Co-authored-by: Jun Ru Anderson <andersonic@fb.com>

9d6c7b6a

18 Aug, 2020 1 commit

[feat] allow fp16 optimizer state with Adam (#41) · 8ee5a8ff

Jun Ru Anderson authored Aug 18, 2020



Allow training with optimizer state in fp16. Use an enum to select from full-precision, mixed precision, memory efficient mixed precision and pure fp16. Improve clarity of testing code
Co-authored-by: Jun Ru Anderson <andersonic@fb.com>

8ee5a8ff

14 Aug, 2020 2 commits

[feat] add mixed precision Adam (#40) · e2d8f573

Jun Ru Anderson authored Aug 14, 2020



Add support for mixed-precision (half precision params, full precision gradients) and memory-efficient (half precision params and half precision gradients) training with Adam
Co-authored-by: Jun Ru Anderson <andersonic@fb.com>

e2d8f573

[fix] Properly restore a sharded optim state (#39) · 585f177b

Benjamin Lefaudeux authored Aug 14, 2020



* hotfix a half-cooked optimizer state restoration, the global shared state also needs to be restored

* [cleanup] get 100% coverage on oss.py (#38)
authored-by: Mandeep Singh Baines <msb@fb.com>

* better unit testing, check that the .param_groups attribute is properly in sync with the loaded state
Co-authored-by: msbaines <35972327+msbaines@users.noreply.github.com>

585f177b

13 Aug, 2020 1 commit

Aligning OSS state dict with... · 57079b08

Benjamin Lefaudeux authored Aug 12, 2020

Aligning OSS state dict with `https://pytorch.org/docs/stable/_modules/torch/optim/optimizer.html#Optimizer` (#31)

57079b08

08 Aug, 2020 1 commit
- [fix] fix test_oss.py when host have 2 GPUs (#26) · d9e6ceaa
  Min Xu authored Aug 07, 2020
```
Co-authored-by: Min Xu <m1n@fb.com>
```
  d9e6ceaa
06 Aug, 2020 1 commit
- [feat] add ddp that works with oss with reduce() not all_reduce() (#19) · 525e709b
  Min Xu authored Aug 06, 2020
```
Co-authored-by: Min Xu <m1n@fb.com>
```
  525e709b
31 Jul, 2020 3 commits
- [feat] Implement OSS save and load of the sharded state from a single replica (#16) · 8e363567
  Benjamin Lefaudeux authored Jul 31, 2020
  
  8e363567
- [feat] add FusedAdam (#10) · bfba68d8
  Jun Ru Anderson authored Jul 30, 2020
```
Add FusedAdam, update benchmark and add tests.
Co-authored-by: Jun Ru Anderson <andersonic@fb.com>
```
  bfba68d8
- [feat] Model parallel (#3) · 30f5009a
  Tom Birch authored Jul 22, 2020
  
  30f5009a
08 Jul, 2020 1 commit
- Initial commit · 0cd65242
  Mandeep Singh Baines authored Jul 07, 2020
  
  0cd65242