Commits · 14491030c4e87116a393ba7569b29dba45f58b3f · OpenDAS / fairscale

21 Jan, 2021 1 commit
- [feat] Enabling ViT in OSS benchmarks (#322) · 8a49a748
  Benjamin Lefaudeux authored Jan 21, 2021
  
  8a49a748
30 Dec, 2020 1 commit

[fix] Dead code removal for OSS (#276) · fb8d9137

Benjamin Lefaudeux authored Dec 29, 2020

* removing a dead call since ShardedDDP, small speedup
* unrelated, but filling in the changelog
* another nit

fb8d9137

22 Nov, 2020 1 commit

[fix] More robust stats for regression testing (#204) · 2b121242

Benjamin Lefaudeux authored Nov 22, 2020

* testing median and MAD

* synchronize on kernels to make sure that we're measuring the actual completion time

* adjusting the circleci threshold, not that the speed has regressed but because we measure proper cuda execution time

2b121242

21 Nov, 2020 1 commit

[feat] ShardedDataParallel with autoreduce (#157) · ad933b34

Benjamin Lefaudeux authored Nov 21, 2020

* rewrite using autograd and Variable execution queue to make the reduce automatic
* share buckets with OSS to remove duplication
* some speed still likely on the table since the speed vs. bucketing does not match expectations, could be a follow up

ad933b34

18 Nov, 2020 1 commit

[feat] ShardedOptim: Distributed Grad Scaler (for torch AMP) (#182) · d85acf72

Benjamin Lefaudeux authored Nov 17, 2020

* adding a shard-aware GradScaler wrap, credits to Sean Naren for the idea
* adding stubs & explanations in the documentation

d85acf72

16 Nov, 2020 1 commit
- [feat] OSS-aware clip grads, bridge sharded states (#167) · ade312c4
  Benjamin Lefaudeux authored Nov 16, 2020
```
add a clip gradients util, equivalent to torch's but aware of the sharded states. Add a corresponding unit test
```
  ade312c4
12 Nov, 2020 1 commit
- [fix] Pure cpu support for benchmarks/oss.py (#185) · 2fe93203
  Yuanyuan (Ana) Shen authored Nov 12, 2020
```
* now works on a machine without cuda, easier to debug and quick test
```
  2fe93203
06 Nov, 2020 1 commit
- [feature] Add a torch AMP benchmark option and test job (#175) · cc766aa5
  Benjamin Lefaudeux authored Nov 05, 2020
```
* oss benchmark: add an --amp option
* add a circleCI test
```
  cc766aa5
23 Oct, 2020 1 commit
- [feat][minor] OSS Benchmark - add a debug option to add some tensor dumps (#166) · 34f35fba
  Benjamin Lefaudeux authored Oct 23, 2020
```
* Some ease of use in the benchmark tool, add a debug option
```
  34f35fba
21 Oct, 2020 1 commit

[feature] OSS: Use MNIST to benchmark (#159) · 6f8a8652

Benjamin Lefaudeux authored Oct 21, 2020

* switching to MNIST
* updating the reference values, should be good to go
* download dataset once for all processes

6f8a8652

20 Oct, 2020 1 commit
- [feat][minor] OSS benchmark - pick the model via args (#152) · 49a3d9bc
  Benjamin Lefaudeux authored Oct 20, 2020
```
* Minor, ease of life to debug and makes it possible to test a host of models with the same code
```
  49a3d9bc
17 Oct, 2020 1 commit
- [feat][minor] OSS: benchmark - adding a cpu option (#144) · 10062e58
  Benjamin Lefaudeux authored Oct 16, 2020
```
* adding a cpu option
* adjust the reference loss
```
  10062e58
14 Oct, 2020 1 commit
- [feat] OSS: adding a --profile option to the benchmark (#135) · 34915bf8
  Benjamin Lefaudeux authored Oct 14, 2020
  
  34915bf8
10 Oct, 2020 1 commit
- [bugfix] OSS no reduce loss (#133) · 177151e0
  Benjamin Lefaudeux authored Oct 09, 2020
```
* bugfix
* adjust default non-regression loss, not all_reduced now
```
  177151e0
09 Oct, 2020 1 commit
- [minor] OSS: bring DDP in the benchmark (#130) · bfd88cad
  Benjamin Lefaudeux authored Oct 08, 2020
```
More realistic benchmarks, comparing apples to apples. DDP/OSS+DDP/OSS+SDP
```
  bfd88cad
06 Oct, 2020 1 commit

[feat] OSS/SDP : bucketing (#122) · 341d8b2b

Benjamin Lefaudeux authored Oct 05, 2020

Same bucketing strategy for OSS and SDP:
sort everything ahead of time, per rank and per size, smaller tensors first. Bucket the smallest elements in a fixed buffer, send async, then send all the others async, and get back to the bucket. Once done then scatter the contents if needed

341d8b2b

29 Sep, 2020 1 commit
- [ShardedDDP] Sync buffers + small cleanup (#112) · 79ded821
  Benjamin Lefaudeux authored Sep 28, 2020
```
- adding the buffer broadcast option
- minor cleanup in shardedDDP
```
  79ded821
24 Sep, 2020 1 commit

[fix] OSS benchmark cleanup (#109) · 53553474

Benjamin Lefaudeux authored Sep 24, 2020

- small benchmark refactor, only one for all backends and ddp
- deterministic, enforce alignment with pytorch ddp

53553474

22 Sep, 2020 2 commits
- [bug] Make OSS Gloo-compliant (#102) · b488dcfa
  Benjamin Lefaudeux authored Sep 22, 2020
```
* Broadcasting grad-enabled tensors is forbidden in Gloo, because this is not differentiable. Workaround
```
  b488dcfa
- [chore] OSS doc (#101) · d80c38f9
  Benjamin Lefaudeux authored Sep 22, 2020
```
* Doc extensions to some APIs
* FIx the benchmark and tutorial
```
  d80c38f9
17 Sep, 2020 1 commit

[feat] Sharded DDP - small refactor and new features (#97) · 49a198c9

Benjamin Lefaudeux authored Sep 17, 2020

- rename oss_ddp to ShardedDataParallel
- some refactoring
- ShardedDataParallel owns the sharded optimizer, exposed if need be
- some small perf bumps

49a198c9

16 Sep, 2020 1 commit
- [cleanup] fix pre-commit mypy issues (#87) · 4a874a6b
  msbaines authored Sep 16, 2020
  
  4a874a6b
09 Sep, 2020 1 commit

[feat] OSS flatten state dict (#65) · 4f597233

Benjamin Lefaudeux authored Sep 09, 2020

Changes the structure of the returned state dict with respect to the param_groups to make it closer to what a vanilla optimizer would return (un-shard them). Shard again when loading

4f597233

03 Sep, 2020 2 commits

[feat] Add a memory usage regression test to the OSS benchmark (#62) · ee38e1e0

Benjamin Lefaudeux authored Sep 03, 2020

* Aligning the optimizer state dict with what PyTorch expects

* Adding a check on the dict keys, ensure that `state` and `param_groups` are there

* after installing the specific isort, black and all, one liner to please the linter..

* Adding some measurement of the memory consumption while training + checkpointing

* mandatory lintfix commit

* brainfart, reset the memory use counter at the beginning of the training in case two of them are run in a row

* move reset stats call, hotfix

* move the optimizer to rmsprop, more stateful and still used in CV

* trying to figure out a sigsev in circleci

ee38e1e0

[fix] OSS pytorch-compliant state dict (#61) · 1d1d15ea

Benjamin Lefaudeux authored Sep 03, 2020

* Aligning the optimizer state dict with what PyTorch expects

* Adding a check on the dict keys, ensure that `state` and `param_groups` are there

* after installing the specific isort, black and all, one liner to please the linter..

1d1d15ea

21 Aug, 2020 1 commit

[feat] Simple macro OSS benchmark (#47) · 46c3776b

Benjamin Lefaudeux authored Aug 21, 2020



* initial commit, dummy training loop, pure pytorch but not DDP

* probably slightly broken, but rough DDP benchmark run

* adding the torchvision requirement for testing

* brainfart

* reduce the loss, do something slightly distributed

* Some cleanup, distributing the training on two GPUs

* some cleanup + adding a vanilla run, still not good to go

* less silly defaults, gtg for a start I think

* smaller batch to fit the smaller gpus used in the circleci rigs

* Adding some options for the benchmark, and regression testing

* [test] set torch seed for Adam tests (#49)

Set the torch seed for tests. xfail mixed precision and memory-efficient mixed-precision state_dict tests due to their states being cast to FP16 and back to FP32 during load_state_dict.
Co-authored-by: Jun Ru Anderson <andersonic@fb.com>

* linting, I really need to automate this isort insanity
Co-authored-by: Jun Ru Anderson <33384298+andersonic@users.noreply.github.com>
Co-authored-by: Jun Ru Anderson <andersonic@fb.com>

46c3776b