- 10 May, 2022 1 commit
-
-
Min Xu authored
Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 09 May, 2022 1 commit
-
-
Min Xu authored
The pyenv version was moving forward and now it is fixed to v2.2.0. Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 25 Apr, 2022 1 commit
-
-
Min Xu authored
* [chore] update nightly version * use yesterday's Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 22 Feb, 2022 1 commit
-
-
anj-s authored
* add benchmarks for fsdp * fix lint errors * clean up * clean up unused flags * add the benchmarks * remove unused args * fix lint errors * fix lint errors * update command line * add support for multiple devices * try full fp16 mode * try full fp16 mode * lint errors * merge main * lint errors * lint errors * lint error * update intersphinx mapping for numpy * update intersphinx mapping for numpy * skip test * added golden configs * use synthetic benchmarks * fix fn name * fix cuda device id * fix verify * lint fix
-
- 14 Feb, 2022 1 commit
-
-
Min Xu authored
* update pytest versions * [test] test related changes - upgrade to newer pytorch versions - added function to make test more deterministic on A100 and TF32 - fixed some tests so that they are correctly skipped on a single GPU system * more fixes * formatting overly long lines * format * better test without trigger a warning * fix an optim state bug with newer pytorch - adam optimizer seems to return "step" as a singleton tensor now in the nightly build - this fixes it assumeing non-tensor value can still be loaded back by the optimizer * improve oss.py - use min_loss for regression checking is a bit more reliable - also increased the num epochs from 10 to 12 * small oss.py fix * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 06 Jan, 2022 1 commit
-
-
four4fish authored
* FullyShardedDataParallel: only return full state dict on rank 0 * Add flag and make rank 0 only optional * Add tests * Add docs * address comments * update comments * update torch nightly version * update torchvision number for torch nightly dependence * add changelog * Update CHANGELOG.md * Update CHANGELOG.md
-
- 17 Nov, 2021 1 commit
-
-
Anupam Bhatnagar authored
* update changelog * [skip ci] removed requirements-test.txt * [skip ci] updating changelog * [skip ci] add PR numbers * replacing requirements-test.txt by requirements-dev.txt * [skip ci] changing requirements-test to requirements-dev in pre-commit and requirements-benchmarks * [skip ci] mark manual static analysis checks as deprecated * empty commit to trigger ci * [skip ci] updating changelog * [skip ci] addressing comments * addressing more comments
-
- 12 Nov, 2021 1 commit
-
-
Anupam Bhatnagar authored
* adding pre-commit files * applying pre-commit to all files * adding no-strict-optional argument to mypy in circle ci config * fix typo * updating python versions * [skip ci] remove extra args * adding python 3.9 * [skip ci] set pre-commit version in requirements-dev.txt * set CACHE_VERSION * move linters from circleci to github actions * update python version * update python version in benchmarks_2 * moving to python 3.9.7
-
- 09 Nov, 2021 1 commit
-
-
Anupam Bhatnagar authored
* CI config changes * changing params for failing tests * [skip ci] minor edit
-
- 02 Nov, 2021 2 commits
-
-
anj-s authored
-
Min Xu authored
Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 21 Oct, 2021 2 commits
-
-
anj-s authored
* update pytorch version for benchmarks * reduce golden data precision check
-
anj-s authored
* update python version for cpu tess * run CPU tests with updated PyTorch version * update nightly and test PyTorch versions * skip failing multiprocess pipe test * always skip test * always skip test * always skip test * lint error * skip unsupported versions * improve skip message * lint errors
-
- 22 Sep, 2021 1 commit
-
-
tmarkstrum authored
* update master branch to main * added FAQ about updating the branch from master to main * fixed some false positive correction * added what is new section * fixed the quoted code area * added release what is new section * added a step in release.md * fixed a word
-
- 06 Sep, 2021 1 commit
-
-
Min Xu authored
[cleanup] CI test updates; mypy cleanup; partial broadcast_object cleanup; pre-commit documentation (#744) * changelog; mypy; oss cleanup * more broadcast_object cleanup in FSDP * one more mypy fix * retire pytorch 1.6 from circleci, add new lightly, add 1.8 LTS and 1.9 stable release * update torch version for LTS * minor fixes * update cache key * trying newer gpu VMs * bump the cache * update to gpu.medium, which should be 2 GPUs * update nightly version * add pre-commit instruction * fixed CHANGELOG after merging * updated to newer nightly * retained the older broadcast function for older GPUs for oss.py * fixed a bug * added a comment * fixing a test for pytorch 1.10 * testing a fix * Update fairscale/optim/oss.py * Update CONTRIBUTING.md Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 22 Jun, 2021 1 commit
-
-
Pavel Belevich authored
* Update torch to 1.9.0.dev20210614+cu102 * Update config.yml * Update config.yml * Update setup.py * Update config.yml * Update config.yml * Update config.yml * Update config.yml
-
- 01 Jun, 2021 1 commit
-
-
Min Xu authored
* [test] fixing 1.9 nightly install * update cache version so that we don't keep reinstall Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 15 Apr, 2021 1 commit
-
-
anj-s authored
[fix] Revert change that removed the option to run OffloadModel with out activation checkpointing. (#608) * revert change made * add tests and revert sync shard changes * add tests * remove file checked in by error * inine var * fix lint errors * add checkpoint activation * fix mypy * use a bigger model * modify tests for now * resolve conflicts Co-authored-by:Anjali Sridhar <anj@devfair0443.h2.fair>
-
- 05 Apr, 2021 1 commit
-
-
anj-s authored
* add model * add offload regression benchmarks * add golden data * remove mp pipe benchmark * fix lint * remove rank * add check for model type * lint errors
-
- 02 Apr, 2021 1 commit
-
-
msbaines authored
NCCL all_to_all is now supported in PyTorch (since v1.8.0) Fixes: #548
-
- 01 Apr, 2021 1 commit
-
-
msbaines authored
-
- 31 Mar, 2021 1 commit
-
-
msbaines authored
-
- 29 Mar, 2021 2 commits
-
-
anj-s authored
* codedcov testing * codecov testnig * more changes for uploading cov * fix invalid config * fix invalid config * modify name * fix config Co-authored-by:Anjali Sridhar <anj@devfair0443.h2.fair>
-
msbaines authored
-
- 12 Mar, 2021 1 commit
-
-
msbaines authored
-
- 05 Mar, 2021 2 commits
-
-
Benjamin Lefaudeux authored
-
Min Xu authored
* [hotfix]: fix a bug in CI command * debug * debug * bump cache ver * fix * eq * check * bump * addressed comment
-
- 04 Mar, 2021 2 commits
-
-
Min Xu authored
* [chore] move a test script * add a shortcut for installing * more skipping * keep apt-get part
-
Benjamin Lefaudeux authored
-
- 01 Mar, 2021 1 commit
-
-
Min Xu authored
* [chores]: CI py39 on GPU and more efficiency * add test list files * fix * add test list files * split benchmark run into 2 runs * fix 1.8 version and balance benchmarks * fix * fix * fix * fix * recording tests * py39 install fix * test again * move tests * reorg tests * skip tests for torch 1.8 due to an upstream bug * removed __init__.py from tests since it confuses pytest * Revert "removed __init__.py from tests since it confuses pytest" This reverts commit 7e156ba33dfaa5ed052031780613ec0cb57a45b0. * don't include __init__ in file list * notes on __init__.py and added missing ones * fixed mypy in a test file * balance test runtime * better pip install * balance more * pip fix * balance * balance more, all test should finish within 20m now * minor license update * trying cu102 * more doc and addressed Ben's comments * debugging * debugging * better capture the errors * debugging * fix pyenv command * add universe repo * update to cuda 11 for 171 * add a test file, improved the checking script
-
- 26 Feb, 2021 1 commit
-
-
Min Xu authored
-
- 04 Feb, 2021 1 commit
-
-
msbaines authored
-
- 03 Feb, 2021 2 commits
-
-
Benjamin Lefaudeux authored
* restoring the regression test, adding a test of the for_each optims * fix the regression test on circleci * removing unused flags
-
anj-s authored
* mp cleanup * round of multiprocess refactoring * test golden run * print cuda stats * fix lint errors * enable multiprocess pipe benchmarks * set world size to be available gpus * more changes * use synthetic loaders for intermediate pipeline stages * merged master * fix for the devices property * dataloader fix * modify rank check * print wps stats * enable verification * fix logging * fix flag name * fix flag name * check for rank * fix indent * pass args * pass args * modify golden data * remove unused print messsage * fix lint errors * add comments * fix benchmarks Co-authored-by:Anjali Sridhar <anj@devfair0443.h2.fair>
-
- 29 Jan, 2021 1 commit
-
-
Min Xu authored
* [test]: test with py39 + torch 1.8 nightly * version fix * more fix * fix version function for nightly version * fix torch_pg build * invalidate cache * separate benchmark requirements * comment * fixed mypy * fixed a test
-
- 27 Jan, 2021 1 commit
-
-
msbaines authored
Also, we can save time by only running unittests once instead of twice (with and without coverage).
-
- 25 Jan, 2021 1 commit
-
-
Min Xu authored
* [test] cover python 3.7 to 3.9 on CPU - covering common python versions on CPU tests - added doc build test * add doc build test * skipping failing tests on py39 * catching doc build warnings * add doc build to py38 and py39 * minor fix * fix doc build for adascale * removed dead code * fix the skipping * skip unit test for py39 * add failing example * no more py39 skipping the tests
-
- 16 Jan, 2021 1 commit
-
-
msbaines authored
-
- 15 Jan, 2021 1 commit
-
-
msbaines authored
-
- 11 Jan, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* tentatively fixing the cpu version of circleci jobs, now pipe tests are the last ones standing * fixing oss backcompat, trying to fix rpc in old pytorch also * fixing the file based init in torch 1.5
-