Commits · ca74ee2217e04e43311e608a2cf2a51a822db926 · OpenDAS / fairscale

19 Dec, 2020 1 commit

[OSS] Getting rid of the "should bucket" hash table, just use a list + non... · ca74ee22

Benjamin Lefaudeux authored Dec 19, 2020

[OSS] Getting rid of the "should bucket" hash table, just use a list + non trainable params fix (#259)

* Getting rid of the "should bucket" hash table, just use a list
Properly handle all params, with or without requires_grad

* make sure that this case is unit tested

ca74ee22

17 Dec, 2020 2 commits
- [fix] OSS - resolve fp16 overflow in clip grad norm (#263) · 2df5ca2d
  Joshua Meier authored Dec 17, 2020
  
  2df5ca2d
- [fix] OSS - typo + small perf fix (#256) · 2d9243bf
  Benjamin Lefaudeux authored Dec 16, 2020
```
* typo, sorry about that

* small perf fix
```
  2d9243bf
16 Dec, 2020 1 commit
- [perf] ShardedDDP: better handling of the callback queue, try to consume it as we go. (#254) · 351f35e1
  Benjamin Lefaudeux authored Dec 16, 2020
```
* Better handling of the callback queue, try to consume it as we go.

* dumping buckets for the reduce part, always the same unused params issue
```
  351f35e1
10 Dec, 2020 1 commit

[fix] Check ShardedDDP / DDP parity + bugfix (#242) · 138b2033

Benjamin Lefaudeux authored Dec 09, 2020

* unit test checking ddp and sharded_ddp equivalence, reproducing the issue that Sean spotted
* fixing the issue, not counting requests in flight properly
* adding a multiple optimizers case

138b2033

04 Dec, 2020 1 commit

[fix] Fix iGPT buckets with ShardedDDP (#223) · 6d223777

Benjamin Lefaudeux authored Dec 03, 2020

* proper unit testing, but no other solution than disabling bucketing for now, couple of options tested do not work

6d223777

21 Nov, 2020 1 commit

[feat] ShardedDataParallel with autoreduce (#157) · ad933b34

Benjamin Lefaudeux authored Nov 21, 2020

* rewrite using autograd and Variable execution queue to make the reduce automatic
* share buckets with OSS to remove duplication
* some speed still likely on the table since the speed vs. bucketing does not match expectations, could be a follow up

ad933b34

16 Nov, 2020 1 commit
- [feat] OSS-aware clip grads, bridge sharded states (#167) · ade312c4
  Benjamin Lefaudeux authored Nov 16, 2020
```
add a clip gradients util, equivalent to torch's but aware of the sharded states. Add a corresponding unit test
```
  ade312c4
10 Nov, 2020 1 commit

Single-process control via PipeRPCWrapper (#156) · 5d4f50fb

Tom Birch authored Nov 10, 2020

Adds support for:
* Reused layers (e.g. for weight sharing)
* Lazily-constructed layers
* Single-process control via PipeRPCWrapper
* PipelineStyle.AsyncScheudle, which lays the foundation for asynchronous pipeline work by introducing an event loop for each rank/worker to process either activations or gradients as they arrive

Also added examples for multi-process and PipeRPCWrapper

5d4f50fb

04 Nov, 2020 1 commit
- [feat] oss: add rank_local_state_dict staticmethod (#174) · 0d1f058b
  msbaines authored Nov 04, 2020
  
  0d1f058b
23 Oct, 2020 1 commit
- [refactor] OSS - broadcasts - getting rid of the while loop (#165) · a31b08a5
  Benjamin Lefaudeux authored Oct 23, 2020
```
* small refactor, getting rid of the while loop
```
  a31b08a5
20 Oct, 2020 1 commit
- [refactor][minor] OSS - small refactor of the bucketing (#153) · 61bb32b5
  Benjamin Lefaudeux authored Oct 20, 2020
```
* small refactor, code cleanup
* broadcast tensor .data attribute directly
```
  61bb32b5
14 Oct, 2020 1 commit
- [bugfix] OSS + Apex (#136) · 37c686e7
  Benjamin Lefaudeux authored Oct 14, 2020
```
* fixing the issue wrt Apex, validated with Latte, Classy would need another pass
```
  37c686e7
08 Oct, 2020 2 commits
- [fix] OSS unit test to check data group (#129) · 81ac5b28
  Benjamin Lefaudeux authored Oct 08, 2020
```
* new unit test to catch rank issues in OSS
```
  81ac5b28
- [fix] megatron + oss (#127) · 82dbd5d8
  ngoyal2707 authored Oct 08, 2020
```
authored-by: Naman Goyal <namangoyal@learnfair0755.h2.fair>
```
  82dbd5d8
06 Oct, 2020 1 commit

[feat] OSS/SDP : bucketing (#122) · 341d8b2b

Benjamin Lefaudeux authored Oct 05, 2020

Same bucketing strategy for OSS and SDP:
sort everything ahead of time, per rank and per size, smaller tensors first. Bucket the smallest elements in a fixed buffer, send async, then send all the others async, and get back to the bucket. Once done then scatter the contents if needed

341d8b2b

01 Oct, 2020 3 commits
- [fix] re-run black to fix CPU tests on master (#123) · 2eee136f
  msbaines authored Oct 01, 2020
  
  2eee136f
- Support optimizer state sharding for megatron (#121) · 379c6bf0
  Joshua Meier authored Oct 01, 2020
```
support optimizer state sharding for megatron
```
  379c6bf0
- [fix] OSS: Eager gradient release - free memory (#120) · 1c2a6f6b
  Benjamin Lefaudeux authored Sep 30, 2020
```
* minor, but gives some memory back
* adjust CI and regression checks to 4 gpu
```
  1c2a6f6b
22 Sep, 2020 3 commits
- [chore] Documentation fixes, no more ref issues and more API fields (#103) · 7c5203eb
  Benjamin Lefaudeux authored Sep 22, 2020
```
* various fixes, no more issues with `make html` and more API fields should be populated
```
  7c5203eb
- [bug] Make OSS Gloo-compliant (#102) · b488dcfa
  Benjamin Lefaudeux authored Sep 22, 2020
```
* Broadcasting grad-enabled tensors is forbidden in Gloo, because this is not differentiable. Workaround
```
  b488dcfa
- [chore] OSS doc (#101) · d80c38f9
  Benjamin Lefaudeux authored Sep 22, 2020
```
* Doc extensions to some APIs
* FIx the benchmark and tutorial
```
  d80c38f9
17 Sep, 2020 2 commits
- [feat] Sharded DDP - small refactor and new features (#97) · 49a198c9
  Benjamin Lefaudeux authored Sep 17, 2020
```
- rename oss_ddp to ShardedDataParallel
- some refactoring
- ShardedDataParallel owns the sharded optimizer, exposed if need be
- some small perf bumps
```
  49a198c9
- [chore] OSS: add a small sphinx tutorial, similar to README (#92) · 2d415f30
  Benjamin Lefaudeux authored Sep 16, 2020
```
Add a small tutorial, similar to the OSS README
```
  2d415f30
15 Sep, 2020 2 commits
- [feat] Gracefully handle local/global state dict queries (#89) · d16e9f61
  Benjamin Lefaudeux authored Sep 15, 2020
```
Return either the local or global state when queried, depending on a prior consolidation
```
  d16e9f61
- [feat ] OSS : optional closure argument for the optimizer (#86) · 3d7f524a
  Benjamin Lefaudeux authored Sep 15, 2020
```
Make OSS compatible with optimizers which do not support the closure argument
```
  3d7f524a
10 Sep, 2020 1 commit
- [fix] OSS async broadcast (#78) · dda23993
  Benjamin Lefaudeux authored Sep 10, 2020
```
Changes the broadcast calls in the OSS step() function to make them asynchronous
```
  dda23993
09 Sep, 2020 1 commit

[feat] OSS flatten state dict (#65) · 4f597233

Benjamin Lefaudeux authored Sep 09, 2020

Changes the structure of the returned state dict with respect to the param_groups to make it closer to what a vanilla optimizer would return (un-shard them). Shard again when loading

4f597233

08 Sep, 2020 1 commit

[feat] OSS: Sync all attributes (#67) · 5a268b25

Benjamin Lefaudeux authored Sep 08, 2020

Make sure that all attributes (not just LR) are in sync in between the OSS.param_groups and the actual wrapped optimizer. Some frameworks make it possible to alter any attribute on a scheduled basis, which proves useful depending on the optimizer, so the keys need to be generically supported (not just "lr"). Not syncing these attributes is a worst case scenario, since these adjustments are silently not propagated, fixing that.

5a268b25

03 Sep, 2020 1 commit

[fix] OSS pytorch-compliant state dict (#61) · 1d1d15ea

Benjamin Lefaudeux authored Sep 03, 2020

* Aligning the optimizer state dict with what PyTorch expects

* Adding a check on the dict keys, ensure that `state` and `param_groups` are there

* after installing the specific isort, black and all, one liner to please the linter..

1d1d15ea

28 Aug, 2020 1 commit

[fix] optim/oss: work correctly with LRScheduler (#58) · ab32cb7d

msbaines authored Aug 28, 2020

* [fix] optim/oss: work correctly with LRScheduler

Sync lr before every step and before consolidate.

ab32cb7d

27 Aug, 2020 4 commits
- [fix] optim/oss: fix state cast (#56) · fb49b515
  msbaines authored Aug 27, 2020
```
Workaround PyTorch bug that casts state (pytorch/pytorch#43706).

Copied from https://github.com/pytorch/fairseq/blob/v0.9.0/fairseq/optim/fp16_optimizer.py#L251-L268
```
  fb49b515
- [refactor] optim/oss: save memory and time by avoiding duplicate copy of parameters (#57) · e4a0804c
  msbaines authored Aug 27, 2020
  
  e4a0804c
- [fix] optim/oss: PyTorch already handles putting state on proper device (#54) · 220ee323
  msbaines authored Aug 27, 2020
  
  220ee323
- [fix] optim/oss: support optimizers with additional step kwargs (#53) · 09028a0d
  msbaines authored Aug 26, 2020
```
* [fix] optim/oss: support optimizers with additional step kwargs

Some of the optimizers in apex support additional kwargs to step
such as scale.
```
  09028a0d
21 Aug, 2020 1 commit

[feat] Simple macro OSS benchmark (#47) · 46c3776b

Benjamin Lefaudeux authored Aug 21, 2020



* initial commit, dummy training loop, pure pytorch but not DDP

* probably slightly broken, but rough DDP benchmark run

* adding the torchvision requirement for testing

* brainfart

* reduce the loss, do something slightly distributed

* Some cleanup, distributing the training on two GPUs

* some cleanup + adding a vanilla run, still not good to go

* less silly defaults, gtg for a start I think

* smaller batch to fit the smaller gpus used in the circleci rigs

* Adding some options for the benchmark, and regression testing

* [test] set torch seed for Adam tests (#49)

Set the torch seed for tests. xfail mixed precision and memory-efficient mixed-precision state_dict tests due to their states being cast to FP16 and back to FP32 during load_state_dict.
Co-authored-by: Jun Ru Anderson <andersonic@fb.com>

* linting, I really need to automate this isort insanity
Co-authored-by: Jun Ru Anderson <33384298+andersonic@users.noreply.github.com>
Co-authored-by: Jun Ru Anderson <andersonic@fb.com>

46c3776b

20 Aug, 2020 1 commit
- [fix] OSS restore state to proper device (#46) · c2d6f4b6
  Benjamin Lefaudeux authored Aug 20, 2020
```
* move the restored param groups to the original device

* adding a corresponding test
```
  c2d6f4b6
14 Aug, 2020 2 commits

[fix] Properly restore a sharded optim state (#39) · 585f177b

Benjamin Lefaudeux authored Aug 14, 2020



* hotfix a half-cooked optimizer state restoration, the global shared state also needs to be restored

* [cleanup] get 100% coverage on oss.py (#38)
authored-by: Mandeep Singh Baines <msb@fb.com>

* better unit testing, check that the .param_groups attribute is properly in sync with the loaded state
Co-authored-by: msbaines <35972327+msbaines@users.noreply.github.com>

585f177b

[cleanup] get 100% coverage on oss.py (#38) · 3427a039
msbaines authored Aug 13, 2020
```
authored-by: Mandeep Singh Baines <msb@fb.com>
```
3427a039

13 Aug, 2020 1 commit

Aligning OSS state dict with... · 57079b08

Benjamin Lefaudeux authored Aug 12, 2020

Aligning OSS state dict with `https://pytorch.org/docs/stable/_modules/torch/optim/optimizer.html#Optimizer` (#31)

57079b08