Commits · 6e7ad7989c129f896a679436694dcce2dc34104a · OpenDAS / fairscale

06 Oct, 2020 1 commit
- [refactor] moe: simplify logic removing top expert (#125) · 6e7ad798
  msbaines authored Oct 05, 2020
  
  6e7ad798
05 Oct, 2020 1 commit
- [fix] moe: fix Top2Gate to work on GPU (#124) · 662667d0
  msbaines authored Oct 05, 2020
  
  662667d0
02 Oct, 2020 1 commit
- [feat] moe: initial implementation of Top2Gating (#118) · 7815f6f3
  msbaines authored Oct 01, 2020
  
  7815f6f3
01 Oct, 2020 3 commits
- [fix] re-run black to fix CPU tests on master (#123) · 2eee136f
  msbaines authored Oct 01, 2020
  
  2eee136f
- Support optimizer state sharding for megatron (#121) · 379c6bf0
  Joshua Meier authored Oct 01, 2020
```
support optimizer state sharding for megatron
```
  379c6bf0
- [fix] OSS: Eager gradient release - free memory (#120) · 1c2a6f6b
  Benjamin Lefaudeux authored Sep 30, 2020
```
* minor, but gives some memory back
* adjust CI and regression checks to 4 gpu
```
  1c2a6f6b
29 Sep, 2020 1 commit
- [ShardedDDP] Sync buffers + small cleanup (#112) · 79ded821
  Benjamin Lefaudeux authored Sep 28, 2020
```
- adding the buffer broadcast option
- minor cleanup in shardedDDP
```
  79ded821
24 Sep, 2020 3 commits
- Update README.md (#110) · 41819af9
  Vittorio Caggiano authored Sep 24, 2020
```
add badges and link to readthedoc
```
  41819af9
- Update oss.rst (#107) · 274478d0
  Vittorio Caggiano authored Sep 24, 2020
  
  274478d0
- [fix] OSS benchmark cleanup (#109) · 53553474
  Benjamin Lefaudeux authored Sep 24, 2020
```
- small benchmark refactor, only one for all backends and ddp
- deterministic, enforce alignment with pytorch ddp
```
  53553474
22 Sep, 2020 3 commits
- [chore] Documentation fixes, no more ref issues and more API fields (#103) · 7c5203eb
  Benjamin Lefaudeux authored Sep 22, 2020
```
* various fixes, no more issues with `make html` and more API fields should be populated
```
  7c5203eb
- [bug] Make OSS Gloo-compliant (#102) · b488dcfa
  Benjamin Lefaudeux authored Sep 22, 2020
```
* Broadcasting grad-enabled tensors is forbidden in Gloo, because this is not differentiable. Workaround
```
  b488dcfa
- [chore] OSS doc (#101) · d80c38f9
  Benjamin Lefaudeux authored Sep 22, 2020
```
* Doc extensions to some APIs
* FIx the benchmark and tutorial
```
  d80c38f9
17 Sep, 2020 6 commits

Tom Birch authored Sep 17, 2020

Adds support for distributing pipeline stages across multiple processes (and therefore multiple machines)
* Adds a style argument to the Pipe constructor, defaulting to PipelineStyle.SingleProcess, but also supporting PipelineStyle.MultiProcess
* Added support for lazy construction of modules (see lazy_construction for an example)
* Added two implementations of inter-process communication: one based on rpc with globally visible queues, one based on send/recv
* Copied all the relevant tests from tests/pipe to tests/pipe_process and modified them to exercise PipelineStyle.MultiProcess

63f7796a

[feat] Sharded DDP - small refactor and new features (#97) · 49a198c9

Benjamin Lefaudeux authored Sep 17, 2020

- rename oss_ddp to ShardedDataParallel
- some refactoring
- ShardedDataParallel owns the sharded optimizer, exposed if need be
- some small perf bumps

49a198c9

[docs] update installation instructions now that we have a pip package (#95) · 2ddce57f
msbaines authored Sep 17, 2020

2ddce57f
[bug] OSS README typo #2 · db047d13
Benjamin Lefaudeux authored Sep 17, 2020

db047d13
[bug] hotfixes to OSS Readme (#94) · 656828bb
Benjamin Lefaudeux authored Sep 17, 2020

656828bb
[chore] OSS: add a small sphinx tutorial, similar to README (#92) · 2d415f30
Benjamin Lefaudeux authored Sep 16, 2020
```
Add a small tutorial, similar to the OSS README
```
2d415f30

16 Sep, 2020 2 commits
- [chore] .gitignore update (#85) · 426d8449
  Benjamin Lefaudeux authored Sep 16, 2020
```
skip python venv files, and coverage reports
```
  426d8449
- [cleanup] fix pre-commit mypy issues (#87) · 4a874a6b
  msbaines authored Sep 16, 2020
  
  4a874a6b
15 Sep, 2020 2 commits
- [feat] Gracefully handle local/global state dict queries (#89) · d16e9f61
  Benjamin Lefaudeux authored Sep 15, 2020
```
Return either the local or global state when queried, depending on a prior consolidation
```
  d16e9f61
- [feat ] OSS : optional closure argument for the optimizer (#86) · 3d7f524a
  Benjamin Lefaudeux authored Sep 15, 2020
```
Make OSS compatible with optimizers which do not support the closure argument
```
  3d7f524a
14 Sep, 2020 1 commit
- [feat] Adding an example in the README for OSS (#79) · 6851247a
  Benjamin Lefaudeux authored Sep 14, 2020
  
  6851247a
12 Sep, 2020 1 commit
- [docs] add a Pipeline tutorial (#82) · 20278c0d
  msbaines authored Sep 11, 2020
  
  20278c0d
11 Sep, 2020 1 commit
- [docs] comment out mmf references (#80) · 61aac92a
  msbaines authored Sep 11, 2020
```
Also updated canonical_url to be fairscale.
Also removed unnecessary htmi_theme_path.
```
  61aac92a
10 Sep, 2020 3 commits
- [fix] OSS async broadcast (#78) · dda23993
  Benjamin Lefaudeux authored Sep 10, 2020
```
Changes the broadcast calls in the OSS step() function to make them asynchronous
```
  dda23993
- [docs] do not install pytorch_sphinx_theme as editable (#76) · df11eaa2
  msbaines authored Sep 09, 2020
  
  df11eaa2
- [docs] use PyTorch Sphinx theme (#75) · ebbd5f64
  msbaines authored Sep 09, 2020
  
  ebbd5f64
09 Sep, 2020 7 commits
- [docs] include proper citations (#74) · f4531ab7
  msbaines authored Sep 09, 2020
  
  f4531ab7
- [fix] fix typo in requirements.txt (#73) · 5b75dd30
  msbaines authored Sep 09, 2020
  
  5b75dd30
- [docs] add docs for APIs (#72) · fad970aa
  msbaines authored Sep 09, 2020
  
  fad970aa
- [feat] OSS flatten state dict (#65) · 4f597233
  Benjamin Lefaudeux authored Sep 09, 2020
```
Changes the structure of the returned state dict with respect to the param_groups to make it closer to what a vanilla optimizer would return (un-shard them). Shard again when loading
```
  4f597233
- [docs] specify sphinx version (#71) · 6fe88a91
  msbaines authored Sep 08, 2020
  
  6fe88a91
- [docs] add requirements.txt (#70) · ddcb2aa7
  msbaines authored Sep 08, 2020
```
Needed for working correctly with readthedocs.org
```
  ddcb2aa7
- [docs] initial commit of documentation (#69) · f94367f7
  msbaines authored Sep 08, 2020
  
  f94367f7
08 Sep, 2020 1 commit

[feat] OSS: Sync all attributes (#67) · 5a268b25

Benjamin Lefaudeux authored Sep 08, 2020

Make sure that all attributes (not just LR) are in sync in between the OSS.param_groups and the actual wrapped optimizer. Some frameworks make it possible to alter any attribute on a scheduled basis, which proves useful depending on the optimizer, so the keys need to be generically supported (not just "lr"). Not syncing these attributes is a worst case scenario, since these adjustments are silently not propagated, fixing that.

5a268b25

04 Sep, 2020 1 commit
- [chore] initial conda build config (#64) · 3a203179
  msbaines authored Sep 04, 2020
```
Built via:

$ conda build --python 3.8 .
```
  3a203179
03 Sep, 2020 2 commits

[feat] Add a memory usage regression test to the OSS benchmark (#62) · ee38e1e0

Benjamin Lefaudeux authored Sep 03, 2020

* Aligning the optimizer state dict with what PyTorch expects

* Adding a check on the dict keys, ensure that `state` and `param_groups` are there

* after installing the specific isort, black and all, one liner to please the linter..

* Adding some measurement of the memory consumption while training + checkpointing

* mandatory lintfix commit

* brainfart, reset the memory use counter at the beginning of the training in case two of them are run in a row

* move reset stats call, hotfix

* move the optimizer to rmsprop, more stateful and still used in CV

* trying to figure out a sigsev in circleci

ee38e1e0

Add grad scaler (#48) · b6a5e634

Jun Ru Anderson authored Sep 03, 2020



Add GradScaler to Fairscale, subclassing PyTorch's GradScaler. Use GradScaler in the pipe benchmark; though it is not needed in this case, it is a good example of how to use gradient scaling for larger models that do require gradient scaling in order to converge.
Co-authored-by: Jun Ru Anderson <andersonic@fb.com>

b6a5e634