- 21 Oct, 2021 2 commits
-
-
anj-s authored
* update pytorch version for benchmarks * reduce golden data precision check
-
anj-s authored
* update python version for cpu tess * run CPU tests with updated PyTorch version * update nightly and test PyTorch versions * skip failing multiprocess pipe test * always skip test * always skip test * always skip test * lint error * skip unsupported versions * improve skip message * lint errors
-
- 22 Sep, 2021 1 commit
-
-
tmarkstrum authored
* update master branch to main * added FAQ about updating the branch from master to main * fixed some false positive correction * added what is new section * fixed the quoted code area * added release what is new section * added a step in release.md * fixed a word
-
- 06 Sep, 2021 1 commit
-
-
Min Xu authored
[cleanup] CI test updates; mypy cleanup; partial broadcast_object cleanup; pre-commit documentation (#744) * changelog; mypy; oss cleanup * more broadcast_object cleanup in FSDP * one more mypy fix * retire pytorch 1.6 from circleci, add new lightly, add 1.8 LTS and 1.9 stable release * update torch version for LTS * minor fixes * update cache key * trying newer gpu VMs * bump the cache * update to gpu.medium, which should be 2 GPUs * update nightly version * add pre-commit instruction * fixed CHANGELOG after merging * updated to newer nightly * retained the older broadcast function for older GPUs for oss.py * fixed a bug * added a comment * fixing a test for pytorch 1.10 * testing a fix * Update fairscale/optim/oss.py * Update CONTRIBUTING.md Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 22 Jun, 2021 1 commit
-
-
Pavel Belevich authored
* Update torch to 1.9.0.dev20210614+cu102 * Update config.yml * Update config.yml * Update setup.py * Update config.yml * Update config.yml * Update config.yml * Update config.yml
-
- 01 Jun, 2021 1 commit
-
-
Min Xu authored
* [test] fixing 1.9 nightly install * update cache version so that we don't keep reinstall Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 15 Apr, 2021 1 commit
-
-
anj-s authored
[fix] Revert change that removed the option to run OffloadModel with out activation checkpointing. (#608) * revert change made * add tests and revert sync shard changes * add tests * remove file checked in by error * inine var * fix lint errors * add checkpoint activation * fix mypy * use a bigger model * modify tests for now * resolve conflicts Co-authored-by:Anjali Sridhar <anj@devfair0443.h2.fair>
-
- 05 Apr, 2021 1 commit
-
-
anj-s authored
* add model * add offload regression benchmarks * add golden data * remove mp pipe benchmark * fix lint * remove rank * add check for model type * lint errors
-
- 02 Apr, 2021 1 commit
-
-
msbaines authored
NCCL all_to_all is now supported in PyTorch (since v1.8.0) Fixes: #548
-
- 01 Apr, 2021 1 commit
-
-
msbaines authored
-
- 31 Mar, 2021 1 commit
-
-
msbaines authored
-
- 29 Mar, 2021 2 commits
-
-
anj-s authored
* codedcov testing * codecov testnig * more changes for uploading cov * fix invalid config * fix invalid config * modify name * fix config Co-authored-by:Anjali Sridhar <anj@devfair0443.h2.fair>
-
msbaines authored
-
- 12 Mar, 2021 1 commit
-
-
msbaines authored
-
- 05 Mar, 2021 2 commits
-
-
Benjamin Lefaudeux authored
-
Min Xu authored
* [hotfix]: fix a bug in CI command * debug * debug * bump cache ver * fix * eq * check * bump * addressed comment
-
- 04 Mar, 2021 2 commits
-
-
Min Xu authored
* [chore] move a test script * add a shortcut for installing * more skipping * keep apt-get part
-
Benjamin Lefaudeux authored
-
- 01 Mar, 2021 1 commit
-
-
Min Xu authored
* [chores]: CI py39 on GPU and more efficiency * add test list files * fix * add test list files * split benchmark run into 2 runs * fix 1.8 version and balance benchmarks * fix * fix * fix * fix * recording tests * py39 install fix * test again * move tests * reorg tests * skip tests for torch 1.8 due to an upstream bug * removed __init__.py from tests since it confuses pytest * Revert "removed __init__.py from tests since it confuses pytest" This reverts commit 7e156ba33dfaa5ed052031780613ec0cb57a45b0. * don't include __init__ in file list * notes on __init__.py and added missing ones * fixed mypy in a test file * balance test runtime * better pip install * balance more * pip fix * balance * balance more, all test should finish within 20m now * minor license update * trying cu102 * more doc and addressed Ben's comments * debugging * debugging...
-
- 26 Feb, 2021 1 commit
-
-
Min Xu authored
-
- 04 Feb, 2021 1 commit
-
-
msbaines authored
-
- 03 Feb, 2021 2 commits
-
-
Benjamin Lefaudeux authored
* restoring the regression test, adding a test of the for_each optims * fix the regression test on circleci * removing unused flags
-
anj-s authored
* mp cleanup * round of multiprocess refactoring * test golden run * print cuda stats * fix lint errors * enable multiprocess pipe benchmarks * set world size to be available gpus * more changes * use synthetic loaders for intermediate pipeline stages * merged master * fix for the devices property * dataloader fix * modify rank check * print wps stats * enable verification * fix logging * fix flag name * fix flag name * check for rank * fix indent * pass args * pass args * modify golden data * remove unused print messsage * fix lint errors * add comments * fix benchmarks Co-authored-by:Anjali Sridhar <anj@devfair0443.h2.fair>
-
- 29 Jan, 2021 1 commit
-
-
Min Xu authored
* [test]: test with py39 + torch 1.8 nightly * version fix * more fix * fix version function for nightly version * fix torch_pg build * invalidate cache * separate benchmark requirements * comment * fixed mypy * fixed a test
-
- 27 Jan, 2021 1 commit
-
-
msbaines authored
Also, we can save time by only running unittests once instead of twice (with and without coverage).
-
- 25 Jan, 2021 1 commit
-
-
Min Xu authored
* [test] cover python 3.7 to 3.9 on CPU - covering common python versions on CPU tests - added doc build test * add doc build test * skipping failing tests on py39 * catching doc build warnings * add doc build to py38 and py39 * minor fix * fix doc build for adascale * removed dead code * fix the skipping * skip unit test for py39 * add failing example * no more py39 skipping the tests
-
- 16 Jan, 2021 1 commit
-
-
msbaines authored
-
- 15 Jan, 2021 1 commit
-
-
msbaines authored
-
- 11 Jan, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* tentatively fixing the cpu version of circleci jobs, now pipe tests are the last ones standing * fixing oss backcompat, trying to fix rpc in old pytorch also * fixing the file based init in torch 1.5
-
- 05 Jan, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* adding the pytest timeout plugin to properly root out hanging tests * removing redundant code, slightly more reasonable timeout, works on single cuda * finding the root bug for some of the cpu hangs, rpc init * propagating all the rpc init test changes to the pipe and model parallel tests
-
- 30 Dec, 2020 1 commit
-
-
Benjamin Lefaudeux authored
- tighter regression detection, based on the best case vs. worst case - still run all configurations, useful for comparisons but not a target
-
- 22 Dec, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* keep two torch 1.7 profiles to save cuda 10.1 testing
-
- 30 Nov, 2020 1 commit
-
-
Benjamin Lefaudeux authored
-
- 22 Nov, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* testing median and MAD * synchronize on kernels to make sure that we're measuring the actual completion time * adjusting the circleci threshold, not that the speed has regressed but because we measure proper cuda execution time
-
- 21 Nov, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* rewrite using autograd and Variable execution queue to make the reduce automatic * share buckets with OSS to remove duplication * some speed still likely on the table since the speed vs. bucketing does not match expectations, could be a follow up
-
- 20 Nov, 2020 1 commit
-
-
msbaines authored
-
- 19 Nov, 2020 1 commit
-
-
msbaines authored
-
- 06 Nov, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* oss benchmark: add an --amp option * add a circleCI test
-
- 30 Oct, 2020 1 commit
-
-
msbaines authored
-
- 29 Oct, 2020 1 commit
-
-
msbaines authored
-