- 08 Jan, 2021 3 commits
-
-
Benjamin Lefaudeux authored
* adding a parity unit test * code review, better testing, use torch defaults and check for the loss, log world size
-
Benjamin Lefaudeux authored
-
Joshua Meier authored
* add additional unit test * support model parallelism in oss
-
- 07 Jan, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* trying to fix the missing files in the pip package (not in this diff) * adding a long description, more pypi friendly
-
- 05 Jan, 2021 2 commits
-
-
Benjamin Lefaudeux authored
* adding the pytest timeout plugin to properly root out hanging tests * removing redundant code, slightly more reasonable timeout, works on single cuda * finding the root bug for some of the cpu hangs, rpc init * propagating all the rpc init test changes to the pipe and model parallel tests
-
Benjamin Lefaudeux authored
release pip package to follow suit
-
- 04 Jan, 2021 3 commits
-
-
anj-s authored
* [refactor]Remove unused variables and refactor common configurations * move helper function to call site * fixed lint errors * fix lint errors * fix lint errors * fix lint errors * fix import order * format files * remove unused imports * fix lint errors * fix lint errors * refactor common utilities * address PR comments * sorted imports * add space * modify comment * added doc strings and addressed PR comments. * addressed PR comments * added another comment to clarify. * fixing lint errors * addressed PR comments * addressed PR comments * fixed typos * initialize var * rename seq_pred to lm * fix lint errors Co-authored-by:Anjali Sridhar <anj@devfair0443.h2.fair>
-
Benjamin Lefaudeux authored
-
Min Xu authored
* [feat] sync adascale from internal repo - tbd testing: tbd * Update argument document of __init__ * update documentation around set_num_gradients_to_accumulate * added checking code for proper API calling places * rename internal APIs to make them internal * updated changelog * added support for add_param_group and its unit test * added unit test for set_num_gradients_to_accumulate * added debias_ewma unit test * fixed test_set_num_gradients_to_accumulate (need zero_grad() call) * added missing zero_grad() to test_lr_scheduler * fixed test_add_param_group with respect to optim.zero_grad() * added test_gradient_value * added test_scale_not_equal_default for scale != world_size * grad_accum * added test_unhook() * removed print statements * fixed a typo * addressed Ben's comment
-
- 02 Jan, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* fix typo, backend for CPU test
-
- 30 Dec, 2020 5 commits
-
-
Sean Naren authored
* Add function to add handle for sync BN * Add test to ensure batch norm handles have been added
-
Benjamin Lefaudeux authored
- tighter regression detection, based on the best case vs. worst case - still run all configurations, useful for comparisons but not a target
-
anj-s authored
[refactor] Remove unused variables, add configuration objects and basic cleanup for pipe benchmarks. (#252) * [refactor]Remove unused variables and refactor common configurations * move helper function to call site * fixed lint errors * fix lint errors * fix lint errors * fix lint errors * fix import order * format files * remove unused imports * fix lint errors * address PR comments * sorted imports * add space * modify comment * added doc strings and addressed PR comments. * addressed PR comments * added another comment to clarify. * fixing lint errors * rename variable Co-authored-by:Anjali Sridhar <anj@devfair0443.h2.fair>
-
Benjamin Lefaudeux authored
* timeout on the process join, expose a hanging process * make sure that teardown is always called
-
Benjamin Lefaudeux authored
* removing a dead call since ShardedDDP, small speedup * unrelated, but filling in the changelog * another nit
-
- 29 Dec, 2020 2 commits
-
-
Benjamin Lefaudeux authored
* catching properly a given test failing if not enough gpus
-
Joshua Meier authored
author: Joshua Meier
-
- 28 Dec, 2020 2 commits
-
-
Benjamin Lefaudeux authored
* file based dist init * nicer handling of broken world sizes vs. number of available GPUs, do not break but warn out
-
Benjamin Lefaudeux authored
-
- 24 Dec, 2020 1 commit
-
-
Min Xu authored
* Update changelog missed this item from previous AdaScale commit. * More change log * Addressed review comments
-
- 22 Dec, 2020 2 commits
-
-
Benjamin Lefaudeux authored
* keep two torch 1.7 profiles to save cuda 10.1 testing
-
Benjamin Lefaudeux authored
* fix, one liner * adjust so that frozen trunks get spread still, even if this should have little consequences * removing dead code, hopeful unit test fix * now with some linting.. * adding a proper unit test case
-
- 19 Dec, 2020 1 commit
-
-
Benjamin Lefaudeux authored
[OSS] Getting rid of the "should bucket" hash table, just use a list + non trainable params fix (#259) * Getting rid of the "should bucket" hash table, just use a list Properly handle all params, with or without requires_grad * make sure that this case is unit tested
-
- 17 Dec, 2020 3 commits
-
-
Benjamin Lefaudeux authored
-
Joshua Meier authored
-
Benjamin Lefaudeux authored
* typo, sorry about that * small perf fix
-
- 16 Dec, 2020 6 commits
-
-
Benjamin Lefaudeux authored
* Better handling of the callback queue, try to consume it as we go. * dumping buckets for the reduce part, always the same unused params issue
-
Benjamin Lefaudeux authored
* lintfixes * come on black * Update tutorial_pipe_multiprocess.py make RANK global like the other tutorials Co-authored-by:Vittorio Caggiano <caggiano@gmail.com>
-
VitaliyLi authored
* Update README.md * Update README.md update capitalization Co-authored-by:Vittorio Caggiano <caggiano@gmail.com>
-
jessijzhao authored
* [feat] add CPU support to tutorials in examples * now works on a machine without cuda * fixes some minor typos * [cleanup] factorize tutorials in examples * collects duplicate code across tutorials in helpers.py * [fix] getData in tutorials now returns iterable
-
Stas Bekman authored
-
Min Xu authored
* [doc]: AdaScale example and notes * formatted notes correctly as suggested by Benjamin * added feature and unit test to make sure lr_scheduler works * update the example with lr_scheduler * fixed doc with "make html" * addressed Mike's suggestions
-
- 15 Dec, 2020 1 commit
-
-
Benjamin Lefaudeux authored
-
- 14 Dec, 2020 1 commit
-
-
Min Xu authored
* better ddp adascale tests * make sure the single node test use the same test cases and expected gains * added unit test that covers smoothing factor - tested by re-introducing the bug and see the test fail as expected.
-
- 10 Dec, 2020 2 commits
-
-
Min Xu authored
* [doc] updating the pipe balance doc a bit - Also added a warning to pipeline.py when the partition output is not supported. * addressed Mandeep's comment
-
Benjamin Lefaudeux authored
* unit test checking ddp and sharded_ddp equivalence, reproducing the issue that Sean spotted * fixing the issue, not counting requests in flight properly * adding a multiple optimizers case
-
- 09 Dec, 2020 1 commit
-
-
Benjamin Lefaudeux authored
-
- 07 Dec, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* removing strict typing requirement, broken by ClassyVision
-
- 06 Dec, 2020 1 commit
-
-
Min Xu authored
-
- 05 Dec, 2020 1 commit
-
-
Benjamin Lefaudeux authored
Thanks Jessica for the heads up !
-