- 19 Jan, 2021 1 commit
-
-
anj-s authored
* [refactor]Remove unused variables and refactor common configurations * move helper function to call site * fixed lint errors * fix lint errors * fix lint errors * fix lint errors * fix import order * format files * remove unused imports * fix lint errors * fix lint errors * refactor common utilities * address PR comments * sorted imports * add space * modify comment * added doc strings and addressed PR comments. * addressed PR comments * added another comment to clarify. * fixing lint errors * addressed PR comments * addressed PR comments * fixed typos * initialize var * rename seq_pred to lm * fix lint errors * move datasets and models into separate folders * add the folders created * fix lint errors * create golden config to stats mapping * add common batching for both synthetic and real data * fixed lint errors * enable real pipe benchmakrs with new golden data * reduce seq len to avoid OOM * updated golden data * add logging * add golden data * add golden data * fix lint errors * add doc string * remove commented out line * address comments * rename imports * refactor common logic in dataloaders * add golden configs * lint changes Co-authored-by:Anjali Sridhar <anj@devfair0443.h2.fair>
-
- 16 Jan, 2021 1 commit
-
-
msbaines authored
-
- 15 Jan, 2021 2 commits
-
-
msbaines authored
-
Benjamin Lefaudeux authored
* minor, but ease of life, one less papercut
-
- 12 Jan, 2021 1 commit
-
-
Min Xu authored
- clarify that per-GPU batch size is not increased with AdaScale.
-
- 11 Jan, 2021 2 commits
-
-
Benjamin Lefaudeux authored
* tentatively fixing the cpu version of circleci jobs, now pipe tests are the last ones standing * fixing oss backcompat, trying to fix rpc in old pytorch also * fixing the file based init in torch 1.5
-
Benjamin Lefaudeux authored
* min bucket size with model size * resize the bucket after all the params have been squeezed in, save a tiny bit of memory * minor, ensure that the cache is freed and improve the comments
-
- 08 Jan, 2021 5 commits
-
-
Benjamin Lefaudeux authored
-
Benjamin Lefaudeux authored
* minor, not life changing but removing a dependency on runtime optim
-
Benjamin Lefaudeux authored
* adding a parity unit test * code review, better testing, use torch defaults and check for the loss, log world size
-
Benjamin Lefaudeux authored
-
Joshua Meier authored
* add additional unit test * support model parallelism in oss
-
- 07 Jan, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* trying to fix the missing files in the pip package (not in this diff) * adding a long description, more pypi friendly
-
- 05 Jan, 2021 2 commits
-
-
Benjamin Lefaudeux authored
* adding the pytest timeout plugin to properly root out hanging tests * removing redundant code, slightly more reasonable timeout, works on single cuda * finding the root bug for some of the cpu hangs, rpc init * propagating all the rpc init test changes to the pipe and model parallel tests
-
Benjamin Lefaudeux authored
release pip package to follow suit
-
- 04 Jan, 2021 3 commits
-
-
anj-s authored
* [refactor]Remove unused variables and refactor common configurations * move helper function to call site * fixed lint errors * fix lint errors * fix lint errors * fix lint errors * fix import order * format files * remove unused imports * fix lint errors * fix lint errors * refactor common utilities * address PR comments * sorted imports * add space * modify comment * added doc strings and addressed PR comments. * addressed PR comments * added another comment to clarify. * fixing lint errors * addressed PR comments * addressed PR comments * fixed typos * initialize var * rename seq_pred to lm * fix lint errors Co-authored-by:Anjali Sridhar <anj@devfair0443.h2.fair>
-
Benjamin Lefaudeux authored
-
Min Xu authored
* [feat] sync adascale from internal repo - tbd testing: tbd * Update argument document of __init__ * update documentation around set_num_gradients_to_accumulate * added checking code for proper API calling places * rename internal APIs to make them internal * updated changelog * added support for add_param_group and its unit test * added unit test for set_num_gradients_to_accumulate * added debias_ewma unit test * fixed test_set_num_gradients_to_accumulate (need zero_grad() call) * added missing zero_grad() to test_lr_scheduler * fixed test_add_param_group with respect to optim.zero_grad() * added test_gradient_value * added test_scale_not_equal_default for scale != world_size * grad_accum * added test_unhook() * removed print statements * fixed a typo * addressed Ben's comment
-
- 02 Jan, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* fix typo, backend for CPU test
-
- 30 Dec, 2020 5 commits
-
-
Sean Naren authored
* Add function to add handle for sync BN * Add test to ensure batch norm handles have been added
-
Benjamin Lefaudeux authored
- tighter regression detection, based on the best case vs. worst case - still run all configurations, useful for comparisons but not a target
-
anj-s authored
[refactor] Remove unused variables, add configuration objects and basic cleanup for pipe benchmarks. (#252) * [refactor]Remove unused variables and refactor common configurations * move helper function to call site * fixed lint errors * fix lint errors * fix lint errors * fix lint errors * fix import order * format files * remove unused imports * fix lint errors * address PR comments * sorted imports * add space * modify comment * added doc strings and addressed PR comments. * addressed PR comments * added another comment to clarify. * fixing lint errors * rename variable Co-authored-by:Anjali Sridhar <anj@devfair0443.h2.fair>
-
Benjamin Lefaudeux authored
* timeout on the process join, expose a hanging process * make sure that teardown is always called
-
Benjamin Lefaudeux authored
* removing a dead call since ShardedDDP, small speedup * unrelated, but filling in the changelog * another nit
-
- 29 Dec, 2020 2 commits
-
-
Benjamin Lefaudeux authored
* catching properly a given test failing if not enough gpus
-
Joshua Meier authored
author: Joshua Meier
-
- 28 Dec, 2020 2 commits
-
-
Benjamin Lefaudeux authored
* file based dist init * nicer handling of broken world sizes vs. number of available GPUs, do not break but warn out
-
Benjamin Lefaudeux authored
-
- 24 Dec, 2020 1 commit
-
-
Min Xu authored
* Update changelog missed this item from previous AdaScale commit. * More change log * Addressed review comments
-
- 22 Dec, 2020 2 commits
-
-
Benjamin Lefaudeux authored
* keep two torch 1.7 profiles to save cuda 10.1 testing
-
Benjamin Lefaudeux authored
* fix, one liner * adjust so that frozen trunks get spread still, even if this should have little consequences * removing dead code, hopeful unit test fix * now with some linting.. * adding a proper unit test case
-
- 19 Dec, 2020 1 commit
-
-
Benjamin Lefaudeux authored
[OSS] Getting rid of the "should bucket" hash table, just use a list + non trainable params fix (#259) * Getting rid of the "should bucket" hash table, just use a list Properly handle all params, with or without requires_grad * make sure that this case is unit tested
-
- 17 Dec, 2020 3 commits
-
-
Benjamin Lefaudeux authored
-
Joshua Meier authored
-
Benjamin Lefaudeux authored
* typo, sorry about that * small perf fix
-
- 16 Dec, 2020 5 commits
-
-
Benjamin Lefaudeux authored
* Better handling of the callback queue, try to consume it as we go. * dumping buckets for the reduce part, always the same unused params issue
-
Benjamin Lefaudeux authored
* lintfixes * come on black * Update tutorial_pipe_multiprocess.py make RANK global like the other tutorials Co-authored-by:Vittorio Caggiano <caggiano@gmail.com>
-
VitaliyLi authored
* Update README.md * Update README.md update capitalization Co-authored-by:Vittorio Caggiano <caggiano@gmail.com>
-
jessijzhao authored
* [feat] add CPU support to tutorials in examples * now works on a machine without cuda * fixes some minor typos * [cleanup] factorize tutorials in examples * collects duplicate code across tutorials in helpers.py * [fix] getData in tutorials now returns iterable
-
Stas Bekman authored
-