- 03 Feb, 2021 1 commit
-
-
anj-s authored
* mp cleanup * round of multiprocess refactoring * test golden run * print cuda stats * fix lint errors * enable multiprocess pipe benchmarks * set world size to be available gpus * more changes * use synthetic loaders for intermediate pipeline stages * merged master * fix for the devices property * dataloader fix * modify rank check * print wps stats * enable verification * fix logging * fix flag name * fix flag name * check for rank * fix indent * pass args * pass args * modify golden data * remove unused print messsage * fix lint errors * add comments * fix benchmarks Co-authored-by:Anjali Sridhar <anj@devfair0443.h2.fair>
-
- 29 Jan, 2021 1 commit
-
-
Min Xu authored
* [test]: test with py39 + torch 1.8 nightly * version fix * more fix * fix version function for nightly version * fix torch_pg build * invalidate cache * separate benchmark requirements * comment * fixed mypy * fixed a test
-
- 27 Jan, 2021 1 commit
-
-
msbaines authored
Also, we can save time by only running unittests once instead of twice (with and without coverage).
-
- 25 Jan, 2021 1 commit
-
-
Min Xu authored
* [test] cover python 3.7 to 3.9 on CPU - covering common python versions on CPU tests - added doc build test * add doc build test * skipping failing tests on py39 * catching doc build warnings * add doc build to py38 and py39 * minor fix * fix doc build for adascale * removed dead code * fix the skipping * skip unit test for py39 * add failing example * no more py39 skipping the tests
-
- 16 Jan, 2021 1 commit
-
-
msbaines authored
-
- 15 Jan, 2021 1 commit
-
-
msbaines authored
-
- 11 Jan, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* tentatively fixing the cpu version of circleci jobs, now pipe tests are the last ones standing * fixing oss backcompat, trying to fix rpc in old pytorch also * fixing the file based init in torch 1.5
-
- 05 Jan, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* adding the pytest timeout plugin to properly root out hanging tests * removing redundant code, slightly more reasonable timeout, works on single cuda * finding the root bug for some of the cpu hangs, rpc init * propagating all the rpc init test changes to the pipe and model parallel tests
-
- 30 Dec, 2020 1 commit
-
-
Benjamin Lefaudeux authored
- tighter regression detection, based on the best case vs. worst case - still run all configurations, useful for comparisons but not a target
-
- 22 Dec, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* keep two torch 1.7 profiles to save cuda 10.1 testing
-
- 30 Nov, 2020 1 commit
-
-
Benjamin Lefaudeux authored
-
- 22 Nov, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* testing median and MAD * synchronize on kernels to make sure that we're measuring the actual completion time * adjusting the circleci threshold, not that the speed has regressed but because we measure proper cuda execution time
-
- 21 Nov, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* rewrite using autograd and Variable execution queue to make the reduce automatic * share buckets with OSS to remove duplication * some speed still likely on the table since the speed vs. bucketing does not match expectations, could be a follow up
-
- 20 Nov, 2020 1 commit
-
-
msbaines authored
-
- 19 Nov, 2020 1 commit
-
-
msbaines authored
-
- 06 Nov, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* oss benchmark: add an --amp option * add a circleCI test
-
- 30 Oct, 2020 1 commit
-
-
msbaines authored
-
- 29 Oct, 2020 1 commit
-
-
msbaines authored
-
- 28 Oct, 2020 1 commit
-
-
msbaines authored
-
- 23 Oct, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* Some ease of use in the benchmark tool, add a debug option
-
- 22 Oct, 2020 1 commit
-
-
Benjamin Lefaudeux authored
-
- 21 Oct, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* switching to MNIST * updating the reference values, should be good to go * download dataset once for all processes
-
- 17 Oct, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* adding a cpu option * adjust the reference loss
-
- 16 Oct, 2020 1 commit
-
-
msbaines authored
-
- 14 Oct, 2020 1 commit
-
-
msbaines authored
-
- 10 Oct, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* bugfix * adjust default non-regression loss, not all_reduced now
-
- 09 Oct, 2020 1 commit
-
-
Benjamin Lefaudeux authored
More realistic benchmarks, comparing apples to apples. DDP/OSS+DDP/OSS+SDP
-
- 08 Oct, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* new unit test to catch rank issues in OSS
-
- 01 Oct, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* minor, but gives some memory back * adjust CI and regression checks to 4 gpu
-
- 24 Sep, 2020 1 commit
-
-
Benjamin Lefaudeux authored
- small benchmark refactor, only one for all backends and ddp - deterministic, enforce alignment with pytorch ddp
-
- 22 Sep, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* Broadcasting grad-enabled tensors is forbidden in Gloo, because this is not differentiable. Workaround
-
- 17 Sep, 2020 2 commits
-
-
Tom Birch authored
Adds support for distributing pipeline stages across multiple processes (and therefore multiple machines) * Adds a style argument to the Pipe constructor, defaulting to PipelineStyle.SingleProcess, but also supporting PipelineStyle.MultiProcess * Added support for lazy construction of modules (see lazy_construction for an example) * Added two implementations of inter-process communication: one based on rpc with globally visible queues, one based on send/recv * Copied all the relevant tests from tests/pipe to tests/pipe_process and modified them to exercise PipelineStyle.MultiProcess
-
Benjamin Lefaudeux authored
- rename oss_ddp to ShardedDataParallel - some refactoring - ShardedDataParallel owns the sharded optimizer, exposed if need be - some small perf bumps
-
- 03 Sep, 2020 1 commit
-
-
Jun Ru Anderson authored
Add GradScaler to Fairscale, subclassing PyTorch's GradScaler. Use GradScaler in the pipe benchmark; though it is not needed in this case, it is a good example of how to use gradient scaling for larger models that do require gradient scaling in order to converge. Co-authored-by:Jun Ru Anderson <andersonic@fb.com>
-
- 21 Aug, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* initial commit, dummy training loop, pure pytorch but not DDP * probably slightly broken, but rough DDP benchmark run * adding the torchvision requirement for testing * brainfart * reduce the loss, do something slightly distributed * Some cleanup, distributing the training on two GPUs * some cleanup + adding a vanilla run, still not good to go * less silly defaults, gtg for a start I think * smaller batch to fit the smaller gpus used in the circleci rigs * Adding some options for the benchmark, and regression testing * [test] set torch seed for Adam tests (#49) Set the torch seed for tests. xfail mixed precision and memory-efficient mixed-precision state_dict tests due to their states being cast to FP16 and back to FP32 during load_state_dict. Co-authored-by:
Jun Ru Anderson <andersonic@fb.com> * linting, I really need to automate this isort insanity Co-authored-by:
Jun Ru Anderson <33384298+andersonic@users.noreply.github.com> Co-authored-by:
Jun Ru Anderson <andersonic@fb.com>
-
- 14 Aug, 2020 1 commit
-
-
msbaines authored
-
- 13 Aug, 2020 2 commits
- 31 Jul, 2020 2 commits
-
-
Jun Ru Anderson authored
Add FusedAdam, update benchmark and add tests. Co-authored-by:Jun Ru Anderson <andersonic@fb.com>
-
msbaines authored
-