- 30 Dec, 2020 4 commits
-
-
Benjamin Lefaudeux authored
- tighter regression detection, based on the best case vs. worst case - still run all configurations, useful for comparisons but not a target
-
anj-s authored
[refactor] Remove unused variables, add configuration objects and basic cleanup for pipe benchmarks. (#252) * [refactor]Remove unused variables and refactor common configurations * move helper function to call site * fixed lint errors * fix lint errors * fix lint errors * fix lint errors * fix import order * format files * remove unused imports * fix lint errors * address PR comments * sorted imports * add space * modify comment * added doc strings and addressed PR comments. * addressed PR comments * added another comment to clarify. * fixing lint errors * rename variable Co-authored-by:Anjali Sridhar <anj@devfair0443.h2.fair>
-
Benjamin Lefaudeux authored
* timeout on the process join, expose a hanging process * make sure that teardown is always called
-
Benjamin Lefaudeux authored
* removing a dead call since ShardedDDP, small speedup * unrelated, but filling in the changelog * another nit
-
- 29 Dec, 2020 2 commits
-
-
Benjamin Lefaudeux authored
* catching properly a given test failing if not enough gpus
-
Joshua Meier authored
author: Joshua Meier
-
- 28 Dec, 2020 2 commits
-
-
Benjamin Lefaudeux authored
* file based dist init * nicer handling of broken world sizes vs. number of available GPUs, do not break but warn out
-
Benjamin Lefaudeux authored
-
- 24 Dec, 2020 1 commit
-
-
Min Xu authored
* Update changelog missed this item from previous AdaScale commit. * More change log * Addressed review comments
-
- 22 Dec, 2020 2 commits
-
-
Benjamin Lefaudeux authored
* keep two torch 1.7 profiles to save cuda 10.1 testing
-
Benjamin Lefaudeux authored
* fix, one liner * adjust so that frozen trunks get spread still, even if this should have little consequences * removing dead code, hopeful unit test fix * now with some linting.. * adding a proper unit test case
-
- 19 Dec, 2020 1 commit
-
-
Benjamin Lefaudeux authored
[OSS] Getting rid of the "should bucket" hash table, just use a list + non trainable params fix (#259) * Getting rid of the "should bucket" hash table, just use a list Properly handle all params, with or without requires_grad * make sure that this case is unit tested
-
- 17 Dec, 2020 3 commits
-
-
Benjamin Lefaudeux authored
-
Joshua Meier authored
-
Benjamin Lefaudeux authored
* typo, sorry about that * small perf fix
-
- 16 Dec, 2020 6 commits
-
-
Benjamin Lefaudeux authored
* Better handling of the callback queue, try to consume it as we go. * dumping buckets for the reduce part, always the same unused params issue
-
Benjamin Lefaudeux authored
* lintfixes * come on black * Update tutorial_pipe_multiprocess.py make RANK global like the other tutorials Co-authored-by:Vittorio Caggiano <caggiano@gmail.com>
-
VitaliyLi authored
* Update README.md * Update README.md update capitalization Co-authored-by:Vittorio Caggiano <caggiano@gmail.com>
-
jessijzhao authored
* [feat] add CPU support to tutorials in examples * now works on a machine without cuda * fixes some minor typos * [cleanup] factorize tutorials in examples * collects duplicate code across tutorials in helpers.py * [fix] getData in tutorials now returns iterable
-
Stas Bekman authored
-
Min Xu authored
* [doc]: AdaScale example and notes * formatted notes correctly as suggested by Benjamin * added feature and unit test to make sure lr_scheduler works * update the example with lr_scheduler * fixed doc with "make html" * addressed Mike's suggestions
-
- 15 Dec, 2020 1 commit
-
-
Benjamin Lefaudeux authored
-
- 14 Dec, 2020 1 commit
-
-
Min Xu authored
* better ddp adascale tests * make sure the single node test use the same test cases and expected gains * added unit test that covers smoothing factor - tested by re-introducing the bug and see the test fail as expected.
-
- 10 Dec, 2020 2 commits
-
-
Min Xu authored
* [doc] updating the pipe balance doc a bit - Also added a warning to pipeline.py when the partition output is not supported. * addressed Mandeep's comment
-
Benjamin Lefaudeux authored
* unit test checking ddp and sharded_ddp equivalence, reproducing the issue that Sean spotted * fixing the issue, not counting requests in flight properly * adding a multiple optimizers case
-
- 09 Dec, 2020 1 commit
-
-
Benjamin Lefaudeux authored
-
- 07 Dec, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* removing strict typing requirement, broken by ClassyVision
-
- 06 Dec, 2020 1 commit
-
-
Min Xu authored
-
- 05 Dec, 2020 1 commit
-
-
Benjamin Lefaudeux authored
Thanks Jessica for the heads up !
-
- 04 Dec, 2020 2 commits
-
-
Vittorio Caggiano authored
* add logo * Update README.md Co-authored-by:Vittorio Caggiano <caggiano@fb.com>
-
Benjamin Lefaudeux authored
* proper unit testing, but no other solution than disabling bucketing for now, couple of options tested do not work
-
- 03 Dec, 2020 1 commit
-
-
Min Xu authored
* added AdaScale to README * [adascale] added gradient accumulation - added gradient accumulation - tested with cifar full trainings with different value of accumulation and verified the full accuracy is obtained - also removed the patch optimize flag until we need it * [adascale] adding pytest - added basic and ddp tests and grad_accum - closes #195 * added changelog * added ddp grad_accum test * moved ddp and non-ddp tests into separate files * added checkpoint test * more doc * addressed Mike's comments
-
- 02 Dec, 2020 1 commit
-
-
msbaines authored
Fixes #190
-
- 01 Dec, 2020 4 commits
-
-
Benjamin Lefaudeux authored
-
Benjamin Lefaudeux authored
-
msbaines authored
-
Benjamin Lefaudeux authored
* fallback on internal pytorch numbering
-
- 30 Nov, 2020 1 commit
-
-
Benjamin Lefaudeux authored
-
- 27 Nov, 2020 1 commit
-
-
Benjamin Lefaudeux authored
Fixing the relative positions of the html docs.
-
- 26 Nov, 2020 1 commit
-
-
Benjamin Lefaudeux authored
-