- 09 Mar, 2022 1 commit
-
-
tmarkstrum authored
* [chore] 0.4.6 release * added the third party libs removed by precommit
-
- 08 Mar, 2022 1 commit
-
-
Min Xu authored
* copyright headers * isort and pyproject.toml * precommit and requirement for isort-seed-config * mypy * dummy change * numpy version for pre-commit * fix mypy issue caused by numpy Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 05 Mar, 2022 1 commit
-
-
Dmitry Vinnik authored
* Adding ELI5 video to Fairscale * docs: add GH button in support of Ukraine ## Summary: Our mission at Meta Open Source is to empower communities through open source, and we believe that it means building a welcoming and safe environment for all. As a part of this work, we are adding this banner in support for Ukraine during this crisis.
-
- 04 Mar, 2022 1 commit
-
-
Vittorio Caggiano authored
-
- 03 Mar, 2022 1 commit
-
-
Min Xu authored
* add an ignore file * [fix] FSDP: handle the lazy_init better - when state_dict and load_state_dict is called, let'em not change the lazy_init state. * changelog * longer timeout * Revert "longer timeout" This reverts commit 00cc145fe86210a0972a1e7ba4f37531b9e091eb. * testing * adding the failed test * fix the global to local id * formatting * more complete fix and test * minor fix for an assert * update changelog * remove an extra line * Update fairscale/nn/data_parallel/fsdp_optim_utils.py Co-authored-by:
anj-s <32556631+anj-s@users.noreply.github.com> * Update fairscale/nn/data_parallel/fsdp_optim_utils.py Co-authored-by:
anj-s <32556631+anj-s@users.noreply.github.com> * Update fairscale/nn/data_parallel/fsdp_optim_utils.py Co-authored-by:
anj-s <32556631+anj-s@users.noreply.github.com> * addressed review comments Co-authored-by:
Min Xu <min.xu.public@gmail.com> Co-authored-by:
anj-s <32556631+anj-s@users.noreply.github.com>
-
- 02 Mar, 2022 2 commits
-
-
Dmitry Vinnik authored
-
foreveronehundred authored
* [FSDP] Add an arg for FSDP __init__ Add an arg, disable_reshard_on_root, for FSDP __init__ to handle the following issue https://github.com/facebookresearch/fairscale/issues/878 For some cases (models wrapped by autowrap), the parameters (of root modules) needs to be sharded, and reshard_after_forward should not be set to False. "disable_reshard_on_root" is for users to choose whether to force reshard_after_forward of root modules to be False or not. * Update fully_sharded_data_parallel.py Modified the description of the feature to explain more clear. * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py Update the comments for disable_reshard_on_root Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com> * Modified the comments Modified the comments of disable_reshard_on_root * Add a new argument for OSS __init__ Add a new argument for OSS __init__ to force the OSS to apply "_broadcast_object" for rebuilding the sharded optimizer. For more details, please see https://github.com/facebookresearch/fairscale/issues/937 * Remove redundant space Remove redundant space Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com>
-
- 23 Feb, 2022 2 commits
- 22 Feb, 2022 1 commit
-
-
anj-s authored
* add benchmarks for fsdp * fix lint errors * clean up * clean up unused flags * add the benchmarks * remove unused args * fix lint errors * fix lint errors * update command line * add support for multiple devices * try full fp16 mode * try full fp16 mode * lint errors * merge main * lint errors * lint errors * lint error * update intersphinx mapping for numpy * update intersphinx mapping for numpy * skip test * added golden configs * use synthetic benchmarks * fix fn name * fix cuda device id * fix verify * lint fix
-
- 15 Feb, 2022 2 commits
-
-
ruanslv authored
* Update CHANGELOG.md Adding https://github.com/facebookresearch/fairscale/pull/930 to changelog * Update CHANGELOG.md Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com>
-
ruanslv authored
* [fix] Add option to wrap root module in auto_wrap * Fix unit-test comment * adding a few more tests to make expected behavior clear * move changes to wrap policy as suggested * set default to false * revert pre-commit change * revert pre-commit change 2 Co-authored-by:Ruan Silva <ruanrms@fb.com>
-
- 14 Feb, 2022 1 commit
-
-
Min Xu authored
* update pytest versions * [test] test related changes - upgrade to newer pytorch versions - added function to make test more deterministic on A100 and TF32 - fixed some tests so that they are correctly skipped on a single GPU system * more fixes * formatting overly long lines * format * better test without trigger a warning * fix an optim state bug with newer pytorch - adam optimizer seems to return "step" as a singleton tensor now in the nightly build - this fixes it assumeing non-tensor value can still be loaded back by the optimizer * improve oss.py - use min_loss for regression checking is a bit more reliable - also increased the num epochs from 10 to 12 * small oss.py fix * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 11 Feb, 2022 1 commit
-
-
Min Xu authored
* skipping one more test * formatting * minor fix and copyright header * comment Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 08 Feb, 2022 2 commits
-
-
foreveronehundred authored
* [FSDP] Add an arg for FSDP __init__ Add an arg, disable_reshard_on_root, for FSDP __init__ to handle the following issue https://github.com/facebookresearch/fairscale/issues/878 For some cases (models wrapped by autowrap), the parameters (of root modules) needs to be sharded, and reshard_after_forward should not be set to False. "disable_reshard_on_root" is for users to choose whether to force reshard_after_forward of root modules to be False or not. * Update fully_sharded_data_parallel.py Modified the description of the feature to explain more clear. * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py Update the comments for disable_reshard_on_root Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com> * Modified the comments Modified the comments of disable_reshard_on_root Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com>
-
anj-s authored
* update intersphinx mapping for numpy * update intersphinx mapping for numpy * update pytorch mapping and disable test
-
- 28 Jan, 2022 1 commit
-
-
Min Xu authored
* [feat] add CosFace paper's LMCL to MEVO - added baseline algorithm to the reference kernel - added MEVO version of LMCL - added unit test to verify it is correct with respect to the reference as well as its memory usage * updated changelog Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 25 Jan, 2022 2 commits
-
-
Min Xu authored
* [minor] better assert in backward * mypy Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
Min Xu authored
* [fix] reduce unit test memory * set seed in CI * fix random seed function * giving up CI, //sigh
-
- 20 Jan, 2022 1 commit
-
-
Yanli Zhao authored
* Add FairScale FSDP adoptions logging * Add FairScale FSDP adoptions logging
-
- 18 Jan, 2022 1 commit
-
-
Sam Shleifer authored
-
- 14 Jan, 2022 3 commits
-
-
Anupam Bhatnagar authored
-
tmarkstrum authored
* release 0.4.5 * added some content for the release * fixed a format issue.
-
Anupam Bhatnagar authored
-
- 13 Jan, 2022 3 commits
-
-
Anupam Bhatnagar authored
-
Anupam Bhatnagar authored
* [skip ci] first commit * [skip ci] gradient scaler example * [skip ci] adding feed forward toy example * [skip ci] adding types * [skip ci] adding backward hook * [skip ci] update * [skip ci] working feed forward example * [skip ci] working feed forward example * [skip ci] use named_modules instead of named_children * [skip ci] adding new file * [skip ci] clean up * [skip ci] implement unscale function * [skip ci] implement unscale function * [skip ci] removing old file * [skip ci] removing some more old files * [skip ci] making unscale function generic * [skip ci] adding test for vision model * [skip ci] adding identity layer * [skip ci] cleanup files * [skip ci] refactoring * [skip ci] more refactoring * [skip ci] added functionality to update scale * [skip ci] data loader clean up * [skip ci] implemented inf checks and update scale functions * [skip ci]code clean up. added...
-
tmarkstrum authored
* fixed padding size of input tensor for reduce scatter, and fixed an error that assigned wrong group * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com> * added changelog * fixed some commit. * added unit test to ensure the reduce_scatter process group size is correct in default cases. And fall back to default process grouop when the reduce_scatter process group has the wrong size. * throw an error instead of rolling back to use default process group for reduce_scatter_process_group * Revert "throw an error instead of rolling back to use default process group for reduce_scatter_process_group" This reverts commit eab5620da3b726ea55d3088ae4ca10d94dcdf4d9. * added check for None to avoid unit test failure * fixed an error to avoid the unit tests failure Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com>
-
- 12 Jan, 2022 1 commit
-
-
tmarkstrum authored
[chore] Update the CHANGELOG to add details about the new feature that enables reduce_scatter overlap in backward propagation (#906) * updated the change log * improve the change log
-
- 07 Jan, 2022 1 commit
-
-
tmarkstrum authored
* enable reduce scatter overlap with other operations * fixed unit tests and added docstrings for the new parameters for fsdp * fixed more unit tests * fixed unit tests * avoided the pickle error on process_group_reduce_scatter * removed an unnecessary parameter in unit tests * remove unnecessary prints * fixed the docstring * skipped the test_offload unit test because this unit test failed in the main branch * removed the enable_reduce_scatter_overlap API parameter * added doc string for the defualt value of process_group_reduce_scatter parameter * fixed a syntax bug * fixed a bug which cause unitest failure * removed the all_gather in the ProcessGroupName enum * added more comment * changed the default value of process_group_reduce_scatter from None to ProcessGroupName.reduce_scatter
-
- 06 Jan, 2022 2 commits
-
-
tmarkstrum authored
-
four4fish authored
* FullyShardedDataParallel: only return full state dict on rank 0 * Add flag and make rank 0 only optional * Add tests * Add docs * address comments * update comments * update torch nightly version * update torchvision number for torch nightly dependence * add changelog * Update CHANGELOG.md * Update CHANGELOG.md
-
- 05 Jan, 2022 1 commit
-
-
Paul Johnson authored
* Enabling ssd_offload training and test via tests/nn/data_parallel/test_fsdp_offload.py. * Removed unused classes: SsdBuffer, SsdTensorHandleView, SsdParameter, SsdTensor * Enhance test coverage of test_ssd_offloading_train_flatten_params_wrapper * Modifications from PR #887 review comments. * Update Changelog
-
- 24 Dec, 2021 1 commit
-
-
Anupam Bhatnagar authored
* [skip ci] update release.md * [skip ci] minor edit
-
- 21 Dec, 2021 5 commits
-
-
Anupam Bhatnagar authored
-
Anupam Bhatnagar authored
-
Anupam Bhatnagar authored
* [skip ci] adding comments to changelog * adding date to changelog * [skip ci] minor edit
-
Anupam Bhatnagar authored
* Finiteness check for all tensors * [skip ci] updating changelog
-
Anupam Bhatnagar authored
* [skip ci] first commit to automate release process * empty commit * fix syntax * fix next_version value * fixing more syntax * remove uses * fix * fixed path in setup.py * trying a basic example * adding branch * change release to name * adding first step * remove push trigger * change order in ON section * modifying manual workflow * adding fairscale release workflow * removing unused workflows * replacing values with secrets * fixing __version__ in __init__.py * cleanup * restoring import statement
-
- 16 Dec, 2021 1 commit
-
-
Freddy Snijder authored
Added warn_on_trainable_params_changed constructor parameter to allow the user to suppress the warning on trainable parameters changed (#886) * Added warn_on_trainable_params_changed constructor parameter to allow the user to suppress the warning on trainable parameters changed; the default is True and thus the default behavior is unchanged * Addded parameter documentation
-
- 13 Dec, 2021 1 commit
-
-
Min Xu authored
- During eval, we will fallback to just output projection without fusing - added unit test to ensure the shape is correct
-