- 08 Feb, 2022 2 commits
-
-
foreveronehundred authored
* [FSDP] Add an arg for FSDP __init__ Add an arg, disable_reshard_on_root, for FSDP __init__ to handle the following issue https://github.com/facebookresearch/fairscale/issues/878 For some cases (models wrapped by autowrap), the parameters (of root modules) needs to be sharded, and reshard_after_forward should not be set to False. "disable_reshard_on_root" is for users to choose whether to force reshard_after_forward of root modules to be False or not. * Update fully_sharded_data_parallel.py Modified the description of the feature to explain more clear. * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py Update the comments for disable_reshard_on_root Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com> * Modified the comments Modified the comments of disable_reshard_on_root Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com>
-
anj-s authored
* update intersphinx mapping for numpy * update intersphinx mapping for numpy * update pytorch mapping and disable test
-
- 28 Jan, 2022 1 commit
-
-
Min Xu authored
* [feat] add CosFace paper's LMCL to MEVO - added baseline algorithm to the reference kernel - added MEVO version of LMCL - added unit test to verify it is correct with respect to the reference as well as its memory usage * updated changelog Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 25 Jan, 2022 2 commits
-
-
Min Xu authored
* [minor] better assert in backward * mypy Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
Min Xu authored
* [fix] reduce unit test memory * set seed in CI * fix random seed function * giving up CI, //sigh
-
- 20 Jan, 2022 1 commit
-
-
Yanli Zhao authored
* Add FairScale FSDP adoptions logging * Add FairScale FSDP adoptions logging
-
- 18 Jan, 2022 1 commit
-
-
Sam Shleifer authored
-
- 14 Jan, 2022 3 commits
-
-
Anupam Bhatnagar authored
-
tmarkstrum authored
* release 0.4.5 * added some content for the release * fixed a format issue.
-
Anupam Bhatnagar authored
-
- 13 Jan, 2022 3 commits
-
-
Anupam Bhatnagar authored
-
Anupam Bhatnagar authored
* [skip ci] first commit * [skip ci] gradient scaler example * [skip ci] adding feed forward toy example * [skip ci] adding types * [skip ci] adding backward hook * [skip ci] update * [skip ci] working feed forward example * [skip ci] working feed forward example * [skip ci] use named_modules instead of named_children * [skip ci] adding new file * [skip ci] clean up * [skip ci] implement unscale function * [skip ci] implement unscale function * [skip ci] removing old file * [skip ci] removing some more old files * [skip ci] making unscale function generic * [skip ci] adding test for vision model * [skip ci] adding identity layer * [skip ci] cleanup files * [skip ci] refactoring * [skip ci] more refactoring * [skip ci] added functionality to update scale * [skip ci] data loader clean up * [skip ci] implemented inf checks and update scale functions * [skip ci]code clean up. added...
-
tmarkstrum authored
* fixed padding size of input tensor for reduce scatter, and fixed an error that assigned wrong group * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com> * added changelog * fixed some commit. * added unit test to ensure the reduce_scatter process group size is correct in default cases. And fall back to default process grouop when the reduce_scatter process group has the wrong size. * throw an error instead of rolling back to use default process group for reduce_scatter_process_group * Revert "throw an error instead of rolling back to use default process group for reduce_scatter_process_group" This reverts commit eab5620da3b726ea55d3088ae4ca10d94dcdf4d9. * added check for None to avoid unit test failure * fixed an error to avoid the unit tests failure Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com>
-
- 12 Jan, 2022 1 commit
-
-
tmarkstrum authored
[chore] Update the CHANGELOG to add details about the new feature that enables reduce_scatter overlap in backward propagation (#906) * updated the change log * improve the change log
-
- 07 Jan, 2022 1 commit
-
-
tmarkstrum authored
* enable reduce scatter overlap with other operations * fixed unit tests and added docstrings for the new parameters for fsdp * fixed more unit tests * fixed unit tests * avoided the pickle error on process_group_reduce_scatter * removed an unnecessary parameter in unit tests * remove unnecessary prints * fixed the docstring * skipped the test_offload unit test because this unit test failed in the main branch * removed the enable_reduce_scatter_overlap API parameter * added doc string for the defualt value of process_group_reduce_scatter parameter * fixed a syntax bug * fixed a bug which cause unitest failure * removed the all_gather in the ProcessGroupName enum * added more comment * changed the default value of process_group_reduce_scatter from None to ProcessGroupName.reduce_scatter
-
- 06 Jan, 2022 2 commits
-
-
tmarkstrum authored
-
four4fish authored
* FullyShardedDataParallel: only return full state dict on rank 0 * Add flag and make rank 0 only optional * Add tests * Add docs * address comments * update comments * update torch nightly version * update torchvision number for torch nightly dependence * add changelog * Update CHANGELOG.md * Update CHANGELOG.md
-
- 05 Jan, 2022 1 commit
-
-
Paul Johnson authored
* Enabling ssd_offload training and test via tests/nn/data_parallel/test_fsdp_offload.py. * Removed unused classes: SsdBuffer, SsdTensorHandleView, SsdParameter, SsdTensor * Enhance test coverage of test_ssd_offloading_train_flatten_params_wrapper * Modifications from PR #887 review comments. * Update Changelog
-
- 24 Dec, 2021 1 commit
-
-
Anupam Bhatnagar authored
* [skip ci] update release.md * [skip ci] minor edit
-
- 21 Dec, 2021 5 commits
-
-
Anupam Bhatnagar authored
-
Anupam Bhatnagar authored
-
Anupam Bhatnagar authored
* [skip ci] adding comments to changelog * adding date to changelog * [skip ci] minor edit
-
Anupam Bhatnagar authored
* Finiteness check for all tensors * [skip ci] updating changelog
-
Anupam Bhatnagar authored
* [skip ci] first commit to automate release process * empty commit * fix syntax * fix next_version value * fixing more syntax * remove uses * fix * fixed path in setup.py * trying a basic example * adding branch * change release to name * adding first step * remove push trigger * change order in ON section * modifying manual workflow * adding fairscale release workflow * removing unused workflows * replacing values with secrets * fixing __version__ in __init__.py * cleanup * restoring import statement
-
- 16 Dec, 2021 1 commit
-
-
Freddy Snijder authored
Added warn_on_trainable_params_changed constructor parameter to allow the user to suppress the warning on trainable parameters changed (#886) * Added warn_on_trainable_params_changed constructor parameter to allow the user to suppress the warning on trainable parameters changed; the default is True and thus the default behavior is unchanged * Addded parameter documentation
-
- 13 Dec, 2021 1 commit
-
-
Min Xu authored
- During eval, we will fallback to just output projection without fusing - added unit test to ensure the shape is correct
-
- 06 Dec, 2021 1 commit
-
-
Freddy Snijder authored
Fix for Key Error that can happen in certain FSDP wrapping scenarios of Huggingface model sub-modules (issue #876) (#881) * Fix for Key Error that can happen in certain FSDP wrapping scenarios of Huggingface model sub-modules (issue #876) * Styling fixes * Updated the test to be independent of the Huggingface transformers package * Added test for issue #876 * Small error message fix * Skip test when CUDA is not available * Fixed naming of model
-
- 02 Dec, 2021 5 commits
-
-
Min Xu authored
* [fix] [FSDP] Do not lose original reshard_after_forward - In a corner case we can lose this value - Saving it and use it in the reset function fixed it - A trivial case probably not worth a dedicated test for now * added changelog
-
Min Xu authored
-
Min Xu authored
-
Min Xu authored
-
Min Xu authored
-
- 29 Nov, 2021 1 commit
-
-
Anupam Bhatnagar authored
-
- 24 Nov, 2021 2 commits
-
-
Ying Zhang authored
* Add MOE to lm benchmarks * linter * Fix source / target * address comments * address comments * address comments * add circleci * fix circleci * precommit
-
anj-s authored
* Update README to specify the exact PyTorch version we are testing with. * update to 1.10.0 in the README
-
- 21 Nov, 2021 1 commit
-
-
anj-s authored
-
- 19 Nov, 2021 1 commit
-
-
h-vetinari authored
* DOC: fix the rst-headers in installation instructions * DOC: add installation through conda-forge to instructions * DOC: fix rst-syntax in installation-instructions * DOC: add comment about building from source with GPU-support
-
- 18 Nov, 2021 3 commits
-
-
Anupam Bhatnagar authored
-
Min Xu authored
* [chore] 0.4.3 release * update setup.py Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
Min Xu authored
* [fix]: fix eval for shared weight FSDP * fixing optim state saving * add changelog * reformat with newer local isort * update test * avoid computing reference state unless we are testing training * added optim_state test * make mypy happy * move tests; maybe we need to CUDA memory related tests in the first of the lists Co-authored-by:Min Xu <min.xu.public@gmail.com>
-