Commits · e4da75ea9e1b3e5d9a08e7df56bf07b1a9e60fd1 · OpenDAS / fairscale

21 Oct, 2021 1 commit
- [chore] Update the PyTorch version that we run benchmarks with. (#823) · e4da75ea
  anj-s authored Oct 21, 2021
```
* update pytorch version for benchmarks

* reduce golden data precision check
```
  e4da75ea
08 May, 2021 1 commit
- [chore][benchmarks] Add license file headers for all files in fairscale/benchmarks (#670) · a9156260
  anj-s authored May 08, 2021
```
* add license file headers for all files

* fix lint
```
  a9156260
08 Mar, 2021 1 commit
- [chore] OSS perf test, super minor (#495) · 886aa327
  Benjamin Lefaudeux authored Mar 08, 2021
  
  886aa327
04 Mar, 2021 1 commit
- [fix] Cache MNIST fetchs, use alternative URLs (#465) · 0491715f
  Benjamin Lefaudeux authored Mar 03, 2021
  
  0491715f
01 Mar, 2021 1 commit

[chores]: make CI more efficient and update py39 env a bit (#447) · 5eb6b8c7

Min Xu authored Mar 01, 2021

* [chores]: CI py39 on GPU and more efficiency

* add test list files

* fix

* add test list files

* split benchmark run into 2 runs

* fix 1.8 version and balance benchmarks

* fix

* fix

* fix

* fix

* recording tests

* py39 install fix

* test again

* move tests

* reorg tests

* skip tests for torch 1.8 due to an upstream bug

* removed __init__.py from tests since it confuses pytest

* Revert "removed __init__.py from tests since it confuses pytest"

This reverts commit 7e156ba33dfaa5ed052031780613ec0cb57a45b0.

* don't include __init__ in file list

* notes on __init__.py and added missing ones

* fixed mypy in a test file

* balance test runtime

* better pip install

* balance more

* pip fix

* balance

* balance more, all test should finish within 20m now

* minor license update

* trying cu102

* more doc and addressed Ben's comments

* debugging

* debugging...

5eb6b8c7

03 Feb, 2021 1 commit
- [feat][minor] OSS Benchmark - regression test + background testing new optims (#352) · de713d1e
  Benjamin Lefaudeux authored Feb 03, 2021
```
* restoring the regression test, adding a test of the for_each optims
* fix the regression test on circleci
* removing unused flags
```
  de713d1e
25 Jan, 2021 1 commit

[refactor] Add benchmark config object and validation function (#314) · 331aed2c

anj-s authored Jan 25, 2021



* [refactor]Remove unused variables and refactor common configurations

* move helper function to call site

* fixed lint errors

* fix lint errors

* fix lint errors

* fix lint errors

* fix import order

* format files

* remove unused imports

* fix lint errors

* fix lint errors

* refactor common utilities

* address PR comments

* sorted imports

* add space

* modify comment

* added doc strings and addressed PR comments.

* addressed PR comments

* added another comment to clarify.

* fixing lint errors

* addressed PR comments

* addressed PR comments

* fixed typos

* initialize var

* rename seq_pred to lm

* fix lint errors

* move datasets and models into separate folders

* add the folders created

* fix lint errors

* create golden config to stats mapping

* add common batching for both synthetic and real data

* fixed lint errors

* enable real pipe benchmakrs with new golden data

* reduce seq len to avoid OOM

* updated golden data

* add logging

* add golden data

* add golden data

* fix lint errors

* add doc string

* remove unused class

* add seq len and batch size to the config

* remove commented out line

* address comments

* rename imports

* refactor common logic in dataloaders

* add golden configs

* lint changes

* merge latest changes

* lint errors

* address PR comments

* initial refactoring

* lint fixes

* fix lint errors

* update comment
Co-authored-by: Anjali Sridhar <anj@devfair0443.h2.fair>

331aed2c

21 Jan, 2021 1 commit
- [feat] Enabling ViT in OSS benchmarks (#322) · 8a49a748
  Benjamin Lefaudeux authored Jan 21, 2021
  
  8a49a748
30 Dec, 2020 1 commit

[fix] Dead code removal for OSS (#276) · fb8d9137

Benjamin Lefaudeux authored Dec 29, 2020

* removing a dead call since ShardedDDP, small speedup
* unrelated, but filling in the changelog
* another nit

fb8d9137

22 Nov, 2020 1 commit

[fix] More robust stats for regression testing (#204) · 2b121242

Benjamin Lefaudeux authored Nov 22, 2020

* testing median and MAD

* synchronize on kernels to make sure that we're measuring the actual completion time

* adjusting the circleci threshold, not that the speed has regressed but because we measure proper cuda execution time

2b121242

21 Nov, 2020 1 commit

[feat] ShardedDataParallel with autoreduce (#157) · ad933b34

Benjamin Lefaudeux authored Nov 21, 2020

* rewrite using autograd and Variable execution queue to make the reduce automatic
* share buckets with OSS to remove duplication
* some speed still likely on the table since the speed vs. bucketing does not match expectations, could be a follow up

ad933b34

18 Nov, 2020 1 commit

[feat] ShardedOptim: Distributed Grad Scaler (for torch AMP) (#182) · d85acf72

Benjamin Lefaudeux authored Nov 17, 2020

* adding a shard-aware GradScaler wrap, credits to Sean Naren for the idea
* adding stubs & explanations in the documentation

d85acf72

16 Nov, 2020 1 commit
- [feat] OSS-aware clip grads, bridge sharded states (#167) · ade312c4
  Benjamin Lefaudeux authored Nov 16, 2020
```
add a clip gradients util, equivalent to torch's but aware of the sharded states. Add a corresponding unit test
```
  ade312c4
12 Nov, 2020 1 commit
- [fix] Pure cpu support for benchmarks/oss.py (#185) · 2fe93203
  Yuanyuan (Ana) Shen authored Nov 12, 2020
```
* now works on a machine without cuda, easier to debug and quick test
```
  2fe93203
06 Nov, 2020 1 commit
- [feature] Add a torch AMP benchmark option and test job (#175) · cc766aa5
  Benjamin Lefaudeux authored Nov 05, 2020
```
* oss benchmark: add an --amp option
* add a circleCI test
```
  cc766aa5
23 Oct, 2020 1 commit
- [feat][minor] OSS Benchmark - add a debug option to add some tensor dumps (#166) · 34f35fba
  Benjamin Lefaudeux authored Oct 23, 2020
```
* Some ease of use in the benchmark tool, add a debug option
```
  34f35fba
21 Oct, 2020 1 commit

[feature] OSS: Use MNIST to benchmark (#159) · 6f8a8652

Benjamin Lefaudeux authored Oct 21, 2020

* switching to MNIST
* updating the reference values, should be good to go
* download dataset once for all processes

6f8a8652

20 Oct, 2020 1 commit
- [feat][minor] OSS benchmark - pick the model via args (#152) · 49a3d9bc
  Benjamin Lefaudeux authored Oct 20, 2020
```
* Minor, ease of life to debug and makes it possible to test a host of models with the same code
```
  49a3d9bc
17 Oct, 2020 1 commit
- [feat][minor] OSS: benchmark - adding a cpu option (#144) · 10062e58
  Benjamin Lefaudeux authored Oct 16, 2020
```
* adding a cpu option
* adjust the reference loss
```
  10062e58
14 Oct, 2020 1 commit
- [feat] OSS: adding a --profile option to the benchmark (#135) · 34915bf8
  Benjamin Lefaudeux authored Oct 14, 2020
  
  34915bf8
10 Oct, 2020 1 commit
- [bugfix] OSS no reduce loss (#133) · 177151e0
  Benjamin Lefaudeux authored Oct 09, 2020
```
* bugfix
* adjust default non-regression loss, not all_reduced now
```
  177151e0
09 Oct, 2020 1 commit
- [minor] OSS: bring DDP in the benchmark (#130) · bfd88cad
  Benjamin Lefaudeux authored Oct 08, 2020
```
More realistic benchmarks, comparing apples to apples. DDP/OSS+DDP/OSS+SDP
```
  bfd88cad
06 Oct, 2020 1 commit

[feat] OSS/SDP : bucketing (#122) · 341d8b2b

Benjamin Lefaudeux authored Oct 05, 2020

Same bucketing strategy for OSS and SDP:
sort everything ahead of time, per rank and per size, smaller tensors first. Bucket the smallest elements in a fixed buffer, send async, then send all the others async, and get back to the bucket. Once done then scatter the contents if needed

341d8b2b

29 Sep, 2020 1 commit
- [ShardedDDP] Sync buffers + small cleanup (#112) · 79ded821
  Benjamin Lefaudeux authored Sep 28, 2020
```
- adding the buffer broadcast option
- minor cleanup in shardedDDP
```
  79ded821
24 Sep, 2020 1 commit

[fix] OSS benchmark cleanup (#109) · 53553474

Benjamin Lefaudeux authored Sep 24, 2020

- small benchmark refactor, only one for all backends and ddp
- deterministic, enforce alignment with pytorch ddp

53553474

22 Sep, 2020 2 commits
- [bug] Make OSS Gloo-compliant (#102) · b488dcfa
  Benjamin Lefaudeux authored Sep 22, 2020
```
* Broadcasting grad-enabled tensors is forbidden in Gloo, because this is not differentiable. Workaround
```
  b488dcfa
- [chore] OSS doc (#101) · d80c38f9
  Benjamin Lefaudeux authored Sep 22, 2020
```
* Doc extensions to some APIs
* FIx the benchmark and tutorial
```
  d80c38f9
17 Sep, 2020 1 commit

[feat] Sharded DDP - small refactor and new features (#97) · 49a198c9

Benjamin Lefaudeux authored Sep 17, 2020

- rename oss_ddp to ShardedDataParallel
- some refactoring
- ShardedDataParallel owns the sharded optimizer, exposed if need be
- some small perf bumps

49a198c9

16 Sep, 2020 1 commit
- [cleanup] fix pre-commit mypy issues (#87) · 4a874a6b
  msbaines authored Sep 16, 2020
  
  4a874a6b
09 Sep, 2020 1 commit

[feat] OSS flatten state dict (#65) · 4f597233

Benjamin Lefaudeux authored Sep 09, 2020

Changes the structure of the returned state dict with respect to the param_groups to make it closer to what a vanilla optimizer would return (un-shard them). Shard again when loading

4f597233

03 Sep, 2020 2 commits

[feat] Add a memory usage regression test to the OSS benchmark (#62) · ee38e1e0

Benjamin Lefaudeux authored Sep 03, 2020

* Aligning the optimizer state dict with what PyTorch expects

* Adding a check on the dict keys, ensure that `state` and `param_groups` are there

* after installing the specific isort, black and all, one liner to please the linter..

* Adding some measurement of the memory consumption while training + checkpointing

* mandatory lintfix commit

* brainfart, reset the memory use counter at the beginning of the training in case two of them are run in a row

* move reset stats call, hotfix

* move the optimizer to rmsprop, more stateful and still used in CV

* trying to figure out a sigsev in circleci

ee38e1e0

[fix] OSS pytorch-compliant state dict (#61) · 1d1d15ea

Benjamin Lefaudeux authored Sep 03, 2020

* Aligning the optimizer state dict with what PyTorch expects

* Adding a check on the dict keys, ensure that `state` and `param_groups` are there

* after installing the specific isort, black and all, one liner to please the linter..

1d1d15ea

21 Aug, 2020 1 commit

[feat] Simple macro OSS benchmark (#47) · 46c3776b

Benjamin Lefaudeux authored Aug 21, 2020



* initial commit, dummy training loop, pure pytorch but not DDP

* probably slightly broken, but rough DDP benchmark run

* adding the torchvision requirement for testing

* brainfart

* reduce the loss, do something slightly distributed

* Some cleanup, distributing the training on two GPUs

* some cleanup + adding a vanilla run, still not good to go

* less silly defaults, gtg for a start I think

* smaller batch to fit the smaller gpus used in the circleci rigs

* Adding some options for the benchmark, and regression testing

* [test] set torch seed for Adam tests (#49)

Set the torch seed for tests. xfail mixed precision and memory-efficient mixed-precision state_dict tests due to their states being cast to FP16 and back to FP32 during load_state_dict.
Co-authored-by: Jun Ru Anderson <andersonic@fb.com>

* linting, I really need to automate this isort insanity
Co-authored-by: Jun Ru Anderson <33384298+andersonic@users.noreply.github.com>
Co-authored-by: Jun Ru Anderson <andersonic@fb.com>

46c3776b