- 14 Jan, 2022 2 commits
-
-
tmarkstrum authored
* release 0.4.5 * added some content for the release * fixed a format issue.
-
Anupam Bhatnagar authored
-
- 13 Jan, 2022 3 commits
-
-
Anupam Bhatnagar authored
-
Anupam Bhatnagar authored
* [skip ci] first commit * [skip ci] gradient scaler example * [skip ci] adding feed forward toy example * [skip ci] adding types * [skip ci] adding backward hook * [skip ci] update * [skip ci] working feed forward example * [skip ci] working feed forward example * [skip ci] use named_modules instead of named_children * [skip ci] adding new file * [skip ci] clean up * [skip ci] implement unscale function * [skip ci] implement unscale function * [skip ci] removing old file * [skip ci] removing some more old files * [skip ci] making unscale function generic * [skip ci] adding test for vision model * [skip ci] adding identity layer * [skip ci] cleanup files * [skip ci] refactoring * [skip ci] more refactoring * [skip ci] added functionality to update scale * [skip ci] data loader clean up * [skip ci] implemented inf checks and update scale functions * [skip ci]code clean up. added...
-
tmarkstrum authored
* fixed padding size of input tensor for reduce scatter, and fixed an error that assigned wrong group * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com> * added changelog * fixed some commit. * added unit test to ensure the reduce_scatter process group size is correct in default cases. And fall back to default process grouop when the reduce_scatter process group has the wrong size. * throw an error instead of rolling back to use default process group for reduce_scatter_process_group * Revert "throw an error instead of rolling back to use default process group for reduce_scatter_process_group" This reverts commit eab5620da3b726ea55d3088ae4ca10d94dcdf4d9. * added check for None to avoid unit test failure * fixed an error to avoid the unit tests failure Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com>
-
- 12 Jan, 2022 1 commit
-
-
tmarkstrum authored
[chore] Update the CHANGELOG to add details about the new feature that enables reduce_scatter overlap in backward propagation (#906) * updated the change log * improve the change log
-
- 07 Jan, 2022 1 commit
-
-
tmarkstrum authored
* enable reduce scatter overlap with other operations * fixed unit tests and added docstrings for the new parameters for fsdp * fixed more unit tests * fixed unit tests * avoided the pickle error on process_group_reduce_scatter * removed an unnecessary parameter in unit tests * remove unnecessary prints * fixed the docstring * skipped the test_offload unit test because this unit test failed in the main branch * removed the enable_reduce_scatter_overlap API parameter * added doc string for the defualt value of process_group_reduce_scatter parameter * fixed a syntax bug * fixed a bug which cause unitest failure * removed the all_gather in the ProcessGroupName enum * added more comment * changed the default value of process_group_reduce_scatter from None to ProcessGroupName.reduce_scatter
-
- 06 Jan, 2022 2 commits
-
-
tmarkstrum authored
-
four4fish authored
* FullyShardedDataParallel: only return full state dict on rank 0 * Add flag and make rank 0 only optional * Add tests * Add docs * address comments * update comments * update torch nightly version * update torchvision number for torch nightly dependence * add changelog * Update CHANGELOG.md * Update CHANGELOG.md
-
- 05 Jan, 2022 1 commit
-
-
Paul Johnson authored
* Enabling ssd_offload training and test via tests/nn/data_parallel/test_fsdp_offload.py. * Removed unused classes: SsdBuffer, SsdTensorHandleView, SsdParameter, SsdTensor * Enhance test coverage of test_ssd_offloading_train_flatten_params_wrapper * Modifications from PR #887 review comments. * Update Changelog
-
- 24 Dec, 2021 1 commit
-
-
Anupam Bhatnagar authored
* [skip ci] update release.md * [skip ci] minor edit
-
- 21 Dec, 2021 5 commits
-
-
Anupam Bhatnagar authored
-
Anupam Bhatnagar authored
-
Anupam Bhatnagar authored
* [skip ci] adding comments to changelog * adding date to changelog * [skip ci] minor edit
-
Anupam Bhatnagar authored
* Finiteness check for all tensors * [skip ci] updating changelog
-
Anupam Bhatnagar authored
* [skip ci] first commit to automate release process * empty commit * fix syntax * fix next_version value * fixing more syntax * remove uses * fix * fixed path in setup.py * trying a basic example * adding branch * change release to name * adding first step * remove push trigger * change order in ON section * modifying manual workflow * adding fairscale release workflow * removing unused workflows * replacing values with secrets * fixing __version__ in __init__.py * cleanup * restoring import statement
-
- 16 Dec, 2021 1 commit
-
-
Freddy Snijder authored
Added warn_on_trainable_params_changed constructor parameter to allow the user to suppress the warning on trainable parameters changed (#886) * Added warn_on_trainable_params_changed constructor parameter to allow the user to suppress the warning on trainable parameters changed; the default is True and thus the default behavior is unchanged * Addded parameter documentation
-
- 13 Dec, 2021 1 commit
-
-
Min Xu authored
- During eval, we will fallback to just output projection without fusing - added unit test to ensure the shape is correct
-
- 06 Dec, 2021 1 commit
-
-
Freddy Snijder authored
Fix for Key Error that can happen in certain FSDP wrapping scenarios of Huggingface model sub-modules (issue #876) (#881) * Fix for Key Error that can happen in certain FSDP wrapping scenarios of Huggingface model sub-modules (issue #876) * Styling fixes * Updated the test to be independent of the Huggingface transformers package * Added test for issue #876 * Small error message fix * Skip test when CUDA is not available * Fixed naming of model
-
- 02 Dec, 2021 5 commits
-
-
Min Xu authored
* [fix] [FSDP] Do not lose original reshard_after_forward - In a corner case we can lose this value - Saving it and use it in the reset function fixed it - A trivial case probably not worth a dedicated test for now * added changelog
-
Min Xu authored
-
Min Xu authored
-
Min Xu authored
-
Min Xu authored
-
- 29 Nov, 2021 1 commit
-
-
Anupam Bhatnagar authored
-
- 24 Nov, 2021 2 commits
-
-
Ying Zhang authored
* Add MOE to lm benchmarks * linter * Fix source / target * address comments * address comments * address comments * add circleci * fix circleci * precommit
-
anj-s authored
* Update README to specify the exact PyTorch version we are testing with. * update to 1.10.0 in the README
-
- 21 Nov, 2021 1 commit
-
-
anj-s authored
-
- 19 Nov, 2021 1 commit
-
-
h-vetinari authored
* DOC: fix the rst-headers in installation instructions * DOC: add installation through conda-forge to instructions * DOC: fix rst-syntax in installation-instructions * DOC: add comment about building from source with GPU-support
-
- 18 Nov, 2021 4 commits
-
-
Anupam Bhatnagar authored
-
Min Xu authored
* [chore] 0.4.3 release * update setup.py Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
Min Xu authored
* [fix]: fix eval for shared weight FSDP * fixing optim state saving * add changelog * reformat with newer local isort * update test * avoid computing reference state unless we are testing training * added optim_state test * make mypy happy * move tests; maybe we need to CUDA memory related tests in the first of the lists Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
Anupam Bhatnagar authored
* adding a manual workflow * add push * fix syntax * adding a new workflow * renaming file * cleanup yaml files * [skip ci] removing pyproject edits
-
- 17 Nov, 2021 2 commits
-
-
anj-s authored
* fixed lint issues * remove unused print statements * add changelog entry * [skip ci] fix lint errors
-
Anupam Bhatnagar authored
* update changelog * [skip ci] removed requirements-test.txt * [skip ci] updating changelog * [skip ci] add PR numbers * replacing requirements-test.txt by requirements-dev.txt * [skip ci] changing requirements-test to requirements-dev in pre-commit and requirements-benchmarks * [skip ci] mark manual static analysis checks as deprecated * empty commit to trigger ci * [skip ci] updating changelog * [skip ci] addressing comments * addressing more comments
-
- 15 Nov, 2021 1 commit
-
-
Anupam Bhatnagar authored
* first commit * sharded scaler hitting nan assertions * adding test for sharded grad scaler without cpu offload * ddp grad scaler and fsdp sharded grad scaler test failing * removing test_output * fix no cpu offload test * changing optimizer from OSS to SGD * all tests passing, code cleanup pending * code cleanup * fix pyproject.toml * removing .isort.cfg * running isort linter * resolving isort issues * resolving black linter issue * resolving mypy issues * fix import statement * fix mypy error * modifying import statement * adding pytorch version requirement * fixing pytest skip test decorator * apply version guard for ShardedGradScaler * removing test_fsdp_grad_scaler * increasing num_epochs for ShardedGradScaler so that updates are not skipped * adding support for torch 1.8 * minor edit * [skip ci] more torch 1.8 changes * parametrizing the tests * cleanup code with linters * [skip ci] update doc string * [skip ci] addressing some more comments
-
- 12 Nov, 2021 1 commit
-
-
Anupam Bhatnagar authored
* adding pre-commit files * applying pre-commit to all files * adding no-strict-optional argument to mypy in circle ci config * fix typo * updating python versions * [skip ci] remove extra args * adding python 3.9 * [skip ci] set pre-commit version in requirements-dev.txt * set CACHE_VERSION * move linters from circleci to github actions * update python version * update python version in benchmarks_2 * moving to python 3.9.7
-
- 09 Nov, 2021 1 commit
-
-
Anupam Bhatnagar authored
* CI config changes * changing params for failing tests * [skip ci] minor edit
-
- 08 Nov, 2021 2 commits
-
-
Anupam Bhatnagar authored
* [chore] 0.4.2 release * updating torch version * [skip ci] updating readme and requirements.txt
-
anj-s authored
* update release notes * initial commit * lint cleanup etc. * helper functions; lint errors * lint errors * lint errors * add back the boolean for named_parameters * address comments and fix lint * remove unused functions and class * remove unused state
-