- 30 Mar, 2022 1 commit
-
-
Paul Johnson authored
This is no longer needed since isort's version is 5.10 Also fix black version to 22.3.0 to fix issue with click dependency. Update files that now fail with new version of black {a = 2 ** 4} -> {a = 2**4}
-
- 08 Mar, 2022 1 commit
-
-
Min Xu authored
* copyright headers * isort and pyproject.toml * precommit and requirement for isort-seed-config * mypy * dummy change * numpy version for pre-commit * fix mypy issue caused by numpy Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 14 Feb, 2022 1 commit
-
-
Min Xu authored
* update pytest versions * [test] test related changes - upgrade to newer pytorch versions - added function to make test more deterministic on A100 and TF32 - fixed some tests so that they are correctly skipped on a single GPU system * more fixes * formatting overly long lines * format * better test without trigger a warning * fix an optim state bug with newer pytorch - adam optimizer seems to return "step" as a singleton tensor now in the nightly build - this fixes it assumeing non-tensor value can still be loaded back by the optimizer * improve oss.py - use min_loss for regression checking is a bit more reliable - also increased the num epochs from 10 to 12 * small oss.py fix * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 12 Nov, 2021 1 commit
-
-
Anupam Bhatnagar authored
* adding pre-commit files * applying pre-commit to all files * adding no-strict-optional argument to mypy in circle ci config * fix typo * updating python versions * [skip ci] remove extra args * adding python 3.9 * [skip ci] set pre-commit version in requirements-dev.txt * set CACHE_VERSION * move linters from circleci to github actions * update python version * update python version in benchmarks_2 * moving to python 3.9.7
-
- 24 Oct, 2021 1 commit
-
-
anj-s authored
* relax speed constraints * relax the regressions constraints
-
- 21 Oct, 2021 1 commit
-
-
anj-s authored
* update pytorch version for benchmarks * reduce golden data precision check
-
- 08 May, 2021 1 commit
-
-
anj-s authored
* add license file headers for all files * fix lint
-
- 08 Mar, 2021 1 commit
-
-
Benjamin Lefaudeux authored
-
- 04 Mar, 2021 1 commit
-
-
Benjamin Lefaudeux authored
-
- 01 Mar, 2021 1 commit
-
-
Min Xu authored
* [chores]: CI py39 on GPU and more efficiency * add test list files * fix * add test list files * split benchmark run into 2 runs * fix 1.8 version and balance benchmarks * fix * fix * fix * fix * recording tests * py39 install fix * test again * move tests * reorg tests * skip tests for torch 1.8 due to an upstream bug * removed __init__.py from tests since it confuses pytest * Revert "removed __init__.py from tests since it confuses pytest" This reverts commit 7e156ba33dfaa5ed052031780613ec0cb57a45b0. * don't include __init__ in file list * notes on __init__.py and added missing ones * fixed mypy in a test file * balance test runtime * better pip install * balance more * pip fix * balance * balance more, all test should finish within 20m now * minor license update * trying cu102 * more doc and addressed Ben's comments * debugging * debugging * better capture the errors * debugging * fix pyenv command * add universe repo * update to cuda 11 for 171 * add a test file, improved the checking script
-
- 03 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* restoring the regression test, adding a test of the for_each optims * fix the regression test on circleci * removing unused flags
-
- 25 Jan, 2021 1 commit
-
-
anj-s authored
* [refactor]Remove unused variables and refactor common configurations * move helper function to call site * fixed lint errors * fix lint errors * fix lint errors * fix lint errors * fix import order * format files * remove unused imports * fix lint errors * fix lint errors * refactor common utilities * address PR comments * sorted imports * add space * modify comment * added doc strings and addressed PR comments. * addressed PR comments * added another comment to clarify. * fixing lint errors * addressed PR comments * addressed PR comments * fixed typos * initialize var * rename seq_pred to lm * fix lint errors * move datasets and models into separate folders * add the folders created * fix lint errors * create golden config to stats mapping * add common batching for both synthetic and real data * fixed lint errors * enable real pipe benchmakrs with new golden data * reduce seq len to avoid OOM * updated golden data * add logging * add golden data * add golden data * fix lint errors * add doc string * remove unused class * add seq len and batch size to the config * remove commented out line * address comments * rename imports * refactor common logic in dataloaders * add golden configs * lint changes * merge latest changes * lint errors * address PR comments * initial refactoring * lint fixes * fix lint errors * update comment Co-authored-by:Anjali Sridhar <anj@devfair0443.h2.fair>
-
- 21 Jan, 2021 1 commit
-
-
Benjamin Lefaudeux authored
-
- 30 Dec, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* removing a dead call since ShardedDDP, small speedup * unrelated, but filling in the changelog * another nit
-
- 22 Nov, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* testing median and MAD * synchronize on kernels to make sure that we're measuring the actual completion time * adjusting the circleci threshold, not that the speed has regressed but because we measure proper cuda execution time
-
- 21 Nov, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* rewrite using autograd and Variable execution queue to make the reduce automatic * share buckets with OSS to remove duplication * some speed still likely on the table since the speed vs. bucketing does not match expectations, could be a follow up
-
- 18 Nov, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* adding a shard-aware GradScaler wrap, credits to Sean Naren for the idea * adding stubs & explanations in the documentation
-
- 16 Nov, 2020 1 commit
-
-
Benjamin Lefaudeux authored
add a clip gradients util, equivalent to torch's but aware of the sharded states. Add a corresponding unit test
-
- 12 Nov, 2020 1 commit
-
-
Yuanyuan (Ana) Shen authored
* now works on a machine without cuda, easier to debug and quick test
-
- 06 Nov, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* oss benchmark: add an --amp option * add a circleCI test
-
- 23 Oct, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* Some ease of use in the benchmark tool, add a debug option
-
- 21 Oct, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* switching to MNIST * updating the reference values, should be good to go * download dataset once for all processes
-
- 20 Oct, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* Minor, ease of life to debug and makes it possible to test a host of models with the same code
-
- 17 Oct, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* adding a cpu option * adjust the reference loss
-
- 14 Oct, 2020 1 commit
-
-
Benjamin Lefaudeux authored
-
- 10 Oct, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* bugfix * adjust default non-regression loss, not all_reduced now
-
- 09 Oct, 2020 1 commit
-
-
Benjamin Lefaudeux authored
More realistic benchmarks, comparing apples to apples. DDP/OSS+DDP/OSS+SDP
-
- 06 Oct, 2020 1 commit
-
-
Benjamin Lefaudeux authored
Same bucketing strategy for OSS and SDP: sort everything ahead of time, per rank and per size, smaller tensors first. Bucket the smallest elements in a fixed buffer, send async, then send all the others async, and get back to the bucket. Once done then scatter the contents if needed
-
- 29 Sep, 2020 1 commit
-
-
Benjamin Lefaudeux authored
- adding the buffer broadcast option - minor cleanup in shardedDDP
-
- 24 Sep, 2020 1 commit
-
-
Benjamin Lefaudeux authored
- small benchmark refactor, only one for all backends and ddp - deterministic, enforce alignment with pytorch ddp
-
- 22 Sep, 2020 2 commits
-
-
Benjamin Lefaudeux authored
* Broadcasting grad-enabled tensors is forbidden in Gloo, because this is not differentiable. Workaround
-
Benjamin Lefaudeux authored
* Doc extensions to some APIs * FIx the benchmark and tutorial
-
- 17 Sep, 2020 1 commit
-
-
Benjamin Lefaudeux authored
- rename oss_ddp to ShardedDataParallel - some refactoring - ShardedDataParallel owns the sharded optimizer, exposed if need be - some small perf bumps
-
- 16 Sep, 2020 1 commit
-
-
msbaines authored
-
- 09 Sep, 2020 1 commit
-
-
Benjamin Lefaudeux authored
Changes the structure of the returned state dict with respect to the param_groups to make it closer to what a vanilla optimizer would return (un-shard them). Shard again when loading
-
- 03 Sep, 2020 2 commits
-
-
Benjamin Lefaudeux authored
* Aligning the optimizer state dict with what PyTorch expects * Adding a check on the dict keys, ensure that `state` and `param_groups` are there * after installing the specific isort, black and all, one liner to please the linter.. * Adding some measurement of the memory consumption while training + checkpointing * mandatory lintfix commit * brainfart, reset the memory use counter at the beginning of the training in case two of them are run in a row * move reset stats call, hotfix * move the optimizer to rmsprop, more stateful and still used in CV * trying to figure out a sigsev in circleci
-
Benjamin Lefaudeux authored
* Aligning the optimizer state dict with what PyTorch expects * Adding a check on the dict keys, ensure that `state` and `param_groups` are there * after installing the specific isort, black and all, one liner to please the linter..
-
- 21 Aug, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* initial commit, dummy training loop, pure pytorch but not DDP * probably slightly broken, but rough DDP benchmark run * adding the torchvision requirement for testing * brainfart * reduce the loss, do something slightly distributed * Some cleanup, distributing the training on two GPUs * some cleanup + adding a vanilla run, still not good to go * less silly defaults, gtg for a start I think * smaller batch to fit the smaller gpus used in the circleci rigs * Adding some options for the benchmark, and regression testing * [test] set torch seed for Adam tests (#49) Set the torch seed for tests. xfail mixed precision and memory-efficient mixed-precision state_dict tests due to their states being cast to FP16 and back to FP32 during load_state_dict. Co-authored-by:
Jun Ru Anderson <andersonic@fb.com> * linting, I really need to automate this isort insanity Co-authored-by:
Jun Ru Anderson <33384298+andersonic@users.noreply.github.com> Co-authored-by:
Jun Ru Anderson <andersonic@fb.com>
-