- 24 Dec, 2021 1 commit
-
-
Anupam Bhatnagar authored
* [skip ci] update release.md * [skip ci] minor edit
-
- 21 Dec, 2021 5 commits
-
-
Anupam Bhatnagar authored
-
Anupam Bhatnagar authored
-
Anupam Bhatnagar authored
* [skip ci] adding comments to changelog * adding date to changelog * [skip ci] minor edit
-
Anupam Bhatnagar authored
* Finiteness check for all tensors * [skip ci] updating changelog
-
Anupam Bhatnagar authored
* [skip ci] first commit to automate release process * empty commit * fix syntax * fix next_version value * fixing more syntax * remove uses * fix * fixed path in setup.py * trying a basic example * adding branch * change release to name * adding first step * remove push trigger * change order in ON section * modifying manual workflow * adding fairscale release workflow * removing unused workflows * replacing values with secrets * fixing __version__ in __init__.py * cleanup * restoring import statement
-
- 16 Dec, 2021 1 commit
-
-
Freddy Snijder authored
Added warn_on_trainable_params_changed constructor parameter to allow the user to suppress the warning on trainable parameters changed (#886) * Added warn_on_trainable_params_changed constructor parameter to allow the user to suppress the warning on trainable parameters changed; the default is True and thus the default behavior is unchanged * Addded parameter documentation
-
- 13 Dec, 2021 1 commit
-
-
Min Xu authored
- During eval, we will fallback to just output projection without fusing - added unit test to ensure the shape is correct
-
- 06 Dec, 2021 1 commit
-
-
Freddy Snijder authored
Fix for Key Error that can happen in certain FSDP wrapping scenarios of Huggingface model sub-modules (issue #876) (#881) * Fix for Key Error that can happen in certain FSDP wrapping scenarios of Huggingface model sub-modules (issue #876) * Styling fixes * Updated the test to be independent of the Huggingface transformers package * Added test for issue #876 * Small error message fix * Skip test when CUDA is not available * Fixed naming of model
-
- 02 Dec, 2021 5 commits
-
-
Min Xu authored
* [fix] [FSDP] Do not lose original reshard_after_forward - In a corner case we can lose this value - Saving it and use it in the reset function fixed it - A trivial case probably not worth a dedicated test for now * added changelog
-
Min Xu authored
-
Min Xu authored
-
Min Xu authored
-
Min Xu authored
-
- 29 Nov, 2021 1 commit
-
-
Anupam Bhatnagar authored
-
- 24 Nov, 2021 2 commits
-
-
Ying Zhang authored
* Add MOE to lm benchmarks * linter * Fix source / target * address comments * address comments * address comments * add circleci * fix circleci * precommit
-
anj-s authored
* Update README to specify the exact PyTorch version we are testing with. * update to 1.10.0 in the README
-
- 21 Nov, 2021 1 commit
-
-
anj-s authored
-
- 19 Nov, 2021 1 commit
-
-
h-vetinari authored
* DOC: fix the rst-headers in installation instructions * DOC: add installation through conda-forge to instructions * DOC: fix rst-syntax in installation-instructions * DOC: add comment about building from source with GPU-support
-
- 18 Nov, 2021 4 commits
-
-
Anupam Bhatnagar authored
-
Min Xu authored
* [chore] 0.4.3 release * update setup.py Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
Min Xu authored
* [fix]: fix eval for shared weight FSDP * fixing optim state saving * add changelog * reformat with newer local isort * update test * avoid computing reference state unless we are testing training * added optim_state test * make mypy happy * move tests; maybe we need to CUDA memory related tests in the first of the lists Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
Anupam Bhatnagar authored
* adding a manual workflow * add push * fix syntax * adding a new workflow * renaming file * cleanup yaml files * [skip ci] removing pyproject edits
-
- 17 Nov, 2021 2 commits
-
-
anj-s authored
* fixed lint issues * remove unused print statements * add changelog entry * [skip ci] fix lint errors
-
Anupam Bhatnagar authored
* update changelog * [skip ci] removed requirements-test.txt * [skip ci] updating changelog * [skip ci] add PR numbers * replacing requirements-test.txt by requirements-dev.txt * [skip ci] changing requirements-test to requirements-dev in pre-commit and requirements-benchmarks * [skip ci] mark manual static analysis checks as deprecated * empty commit to trigger ci * [skip ci] updating changelog * [skip ci] addressing comments * addressing more comments
-
- 15 Nov, 2021 1 commit
-
-
Anupam Bhatnagar authored
* first commit * sharded scaler hitting nan assertions * adding test for sharded grad scaler without cpu offload * ddp grad scaler and fsdp sharded grad scaler test failing * removing test_output * fix no cpu offload test * changing optimizer from OSS to SGD * all tests passing, code cleanup pending * code cleanup * fix pyproject.toml * removing .isort.cfg * running isort linter * resolving isort issues * resolving black linter issue * resolving mypy issues * fix import statement * fix mypy error * modifying import statement * adding pytorch version requirement * fixing pytest skip test decorator * apply version guard for ShardedGradScaler * removing test_fsdp_grad_scaler * increasing num_epochs for ShardedGradScaler so that updates are not skipped * adding support for torch 1.8 * minor edit * [skip ci] more torch 1.8 changes * parametrizing the tests * cleanup code with linters * [skip ci] update doc string * [skip ci] addressing some more comments
-
- 12 Nov, 2021 1 commit
-
-
Anupam Bhatnagar authored
* adding pre-commit files * applying pre-commit to all files * adding no-strict-optional argument to mypy in circle ci config * fix typo * updating python versions * [skip ci] remove extra args * adding python 3.9 * [skip ci] set pre-commit version in requirements-dev.txt * set CACHE_VERSION * move linters from circleci to github actions * update python version * update python version in benchmarks_2 * moving to python 3.9.7
-
- 09 Nov, 2021 1 commit
-
-
Anupam Bhatnagar authored
* CI config changes * changing params for failing tests * [skip ci] minor edit
-
- 08 Nov, 2021 3 commits
-
-
Anupam Bhatnagar authored
* [chore] 0.4.2 release * updating torch version * [skip ci] updating readme and requirements.txt
-
anj-s authored
* update release notes * initial commit * lint cleanup etc. * helper functions; lint errors * lint errors * lint errors * add back the boolean for named_parameters * address comments and fix lint * remove unused functions and class * remove unused state
-
Benjamin Lefaudeux authored
Add SlowMo Distributed Data Parallel for clusters with slow interconnects Co-authored-by:Vinayak Tantia <tantia.vinayak1@gmail.com>
-
- 05 Nov, 2021 1 commit
-
-
Min Xu authored
* [feat] MEVO kernel - initial import from min/softmax and min/testing branches - need to rename and further cleanup * only test with newer pytorch * renamed and added comments and code cleanup * rename and reduce test memory * testing * minor fixing * fixing * more fix * changelog * more 1.7 and 1.8 paper cuts * remove dead code * addressed Benjamin's comments * addressed more comments Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 03 Nov, 2021 1 commit
-
-
Vinayak Tantia authored
-
- 02 Nov, 2021 2 commits
-
-
anj-s authored
-
Min Xu authored
Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 01 Nov, 2021 2 commits
-
-
Min Xu authored
* added a new test, passing without shared weights * tested weight sharing * added the test to test list file * extended to world_size = 2 * fixed test * [feat]: add limited and experimental support for shared parameter * fixed tests * simplify to work with layer with at least 1 non-shared params and add code to pick up linked_param field for sharding the shared param * fixed the case where linked param is not in separate FSDP * changelog and remove old code Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
anj-s authored
* add doc strings * add lower level SSD APIs and tests * add the test to the list to be run * remove unused imports * more doc string changes * fix lint errors
-
- 28 Oct, 2021 1 commit
-
-
Min Xu authored
* [fix] fix test on main * [fix] fix test on main Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 27 Oct, 2021 2 commits
-
-
anj-s authored
* remove offload dependency on fp16 * update python version for cpu tess * run CPU tests with updated PyTorch version * split changes * revert tests config * fix lint errors * update nightly and test PyTorch versions * skip failing multiprocess pipe test * always skip test * always skip test * always skip test * lint error * skip unsupported versions * improve skip message * lint errors * modify docs * add tests * fix test failures * modify comments * fix lint errors * fix lint errors
-
Min Xu authored
* checkpoint + nonflat + mixed_precision * make tests pass with expected errors * addressed comments * add a comment Co-authored-by:Min Xu <min.xu.public@gmail.com>
-