- 23 Feb, 2022 1 commit
-
-
anj-s authored
* checkpoint tests * checkpoint tests * fix tests * lint fixes * remove prints * lint fixes * add comments * add changelog * more cleanup * lint fix
-
- 14 Feb, 2022 1 commit
-
-
Min Xu authored
* update pytest versions * [test] test related changes - upgrade to newer pytorch versions - added function to make test more deterministic on A100 and TF32 - fixed some tests so that they are correctly skipped on a single GPU system * more fixes * formatting overly long lines * format * better test without trigger a warning * fix an optim state bug with newer pytorch - adam optimizer seems to return "step" as a singleton tensor now in the nightly build - this fixes it assumeing non-tensor value can still be loaded back by the optimizer * improve oss.py - use min_loss for regression checking is a bit more reliable - also increased the num epochs from 10 to 12 * small oss.py fix * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 08 Feb, 2022 1 commit
-
-
foreveronehundred authored
* [FSDP] Add an arg for FSDP __init__ Add an arg, disable_reshard_on_root, for FSDP __init__ to handle the following issue https://github.com/facebookresearch/fairscale/issues/878 For some cases (models wrapped by autowrap), the parameters (of root modules) needs to be sharded, and reshard_after_forward should not be set to False. "disable_reshard_on_root" is for users to choose whether to force reshard_after_forward of root modules to be False or not. * Update fully_sharded_data_parallel.py Modified the description of the feature to explain more clear. * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py Update the comments for disable_reshard_on_root Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com> * Modified the comments Modified the comments of disable_reshard_on_root Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com>
-
- 25 Jan, 2022 1 commit
-
-
Min Xu authored
* [minor] better assert in backward * mypy Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 20 Jan, 2022 1 commit
-
-
Yanli Zhao authored
* Add FairScale FSDP adoptions logging * Add FairScale FSDP adoptions logging
-
- 13 Jan, 2022 1 commit
-
-
tmarkstrum authored
* fixed padding size of input tensor for reduce scatter, and fixed an error that assigned wrong group * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com> * added changelog * fixed some commit. * added unit test to ensure the reduce_scatter process group size is correct in default cases. And fall back to default process grouop when the reduce_scatter process group has the wrong size. * throw an error instead of rolling back to use default process group for reduce_scatter_process_group * Revert "throw an error instead of rolling back to use default process group for reduce_scatter_process_group" This reverts commit eab5620da3b726ea55d3088ae4ca10d94dcdf4d9. * added check for None to avoid unit test failure * fixed an error to avoid the unit tests failure Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com>
-
- 07 Jan, 2022 1 commit
-
-
tmarkstrum authored
* enable reduce scatter overlap with other operations * fixed unit tests and added docstrings for the new parameters for fsdp * fixed more unit tests * fixed unit tests * avoided the pickle error on process_group_reduce_scatter * removed an unnecessary parameter in unit tests * remove unnecessary prints * fixed the docstring * skipped the test_offload unit test because this unit test failed in the main branch * removed the enable_reduce_scatter_overlap API parameter * added doc string for the defualt value of process_group_reduce_scatter parameter * fixed a syntax bug * fixed a bug which cause unitest failure * removed the all_gather in the ProcessGroupName enum * added more comment * changed the default value of process_group_reduce_scatter from None to ProcessGroupName.reduce_scatter
-
- 06 Jan, 2022 2 commits
-
-
tmarkstrum authored
-
four4fish authored
* FullyShardedDataParallel: only return full state dict on rank 0 * Add flag and make rank 0 only optional * Add tests * Add docs * address comments * update comments * update torch nightly version * update torchvision number for torch nightly dependence * add changelog * Update CHANGELOG.md * Update CHANGELOG.md
-
- 05 Jan, 2022 1 commit
-
-
Paul Johnson authored
* Enabling ssd_offload training and test via tests/nn/data_parallel/test_fsdp_offload.py. * Removed unused classes: SsdBuffer, SsdTensorHandleView, SsdParameter, SsdTensor * Enhance test coverage of test_ssd_offloading_train_flatten_params_wrapper * Modifications from PR #887 review comments. * Update Changelog
-
- 02 Dec, 2021 1 commit
-
-
Min Xu authored
* [fix] [FSDP] Do not lose original reshard_after_forward - In a corner case we can lose this value - Saving it and use it in the reset function fixed it - A trivial case probably not worth a dedicated test for now * added changelog
-
- 18 Nov, 2021 1 commit
-
-
Min Xu authored
* [fix]: fix eval for shared weight FSDP * fixing optim state saving * add changelog * reformat with newer local isort * update test * avoid computing reference state unless we are testing training * added optim_state test * make mypy happy * move tests; maybe we need to CUDA memory related tests in the first of the lists Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 17 Nov, 2021 1 commit
-
-
anj-s authored
* fixed lint issues * remove unused print statements * add changelog entry * [skip ci] fix lint errors
-
- 12 Nov, 2021 1 commit
-
-
Anupam Bhatnagar authored
* adding pre-commit files * applying pre-commit to all files * adding no-strict-optional argument to mypy in circle ci config * fix typo * updating python versions * [skip ci] remove extra args * adding python 3.9 * [skip ci] set pre-commit version in requirements-dev.txt * set CACHE_VERSION * move linters from circleci to github actions * update python version * update python version in benchmarks_2 * moving to python 3.9.7
-
- 08 Nov, 2021 1 commit
-
-
anj-s authored
* update release notes * initial commit * lint cleanup etc. * helper functions; lint errors * lint errors * lint errors * add back the boolean for named_parameters * address comments and fix lint * remove unused functions and class * remove unused state
-
- 05 Nov, 2021 1 commit
-
-
Min Xu authored
* [feat] MEVO kernel - initial import from min/softmax and min/testing branches - need to rename and further cleanup * only test with newer pytorch * renamed and added comments and code cleanup * rename and reduce test memory * testing * minor fixing * fixing * more fix * changelog * more 1.7 and 1.8 paper cuts * remove dead code * addressed Benjamin's comments * addressed more comments Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 01 Nov, 2021 1 commit
-
-
Min Xu authored
* added a new test, passing without shared weights * tested weight sharing * added the test to test list file * extended to world_size = 2 * fixed test * [feat]: add limited and experimental support for shared parameter * fixed tests * simplify to work with layer with at least 1 non-shared params and add code to pick up linked_param field for sharding the shared param * fixed the case where linked param is not in separate FSDP * changelog and remove old code Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 27 Oct, 2021 4 commits
-
-
anj-s authored
* remove offload dependency on fp16 * update python version for cpu tess * run CPU tests with updated PyTorch version * split changes * revert tests config * fix lint errors * update nightly and test PyTorch versions * skip failing multiprocess pipe test * always skip test * always skip test * always skip test * lint error * skip unsupported versions * improve skip message * lint errors * modify docs * add tests * fix test failures * modify comments * fix lint errors * fix lint errors
-
Min Xu authored
* checkpoint + nonflat + mixed_precision * make tests pass with expected errors * addressed comments * add a comment Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
anj-s authored
* skip creating cpu grads and pinning memory * added additional comment * pin docutils to fix circleCI
-
Min Xu authored
* added the failing test * fixed the bug * fine-tune the condition * typo * typo * changelog and added test to test files Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 19 Oct, 2021 1 commit
-
-
Rohan Varma authored
* fix * remove dup file
-
- 28 Sep, 2021 1 commit
-
-
Anjali Sridhar authored
-
- 24 Sep, 2021 1 commit
-
-
Anjali Sridhar authored
-
- 17 Sep, 2021 1 commit
-
-
tmarkstrum authored
* add toggler to disable the using the nccl base collectives * added todo to remove the toggle when the issue is resolved.
-
- 12 Sep, 2021 1 commit
-
-
Darryl Barnhart authored
* [fix] FSDP intra-backwards gradient accumulation. Ensure gradient reduction accumulates into the unsharded gradient tensor within a backwards pass. This matters when an FSDP module is called multiple times within a forward pass, and reduction is _not_ deferred using activation checkpoint forward counters, bucketing or some other mechanism. Closes #780 * [refactor] Remove forward counters. Comments. Removed forward counters from the activation checkpointing utility, now that FSDP does not require them for correct operation. Add more detailed comment about memory usage behaviour with gradient reduction. * [refactor] Delete deprecated forward counter usage. * [refactor] Add state assertion as end of pre-backward hook.
-
- 10 Sep, 2021 1 commit
-
-
Min Xu authored
Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 06 Sep, 2021 1 commit
-
-
Min Xu authored
[cleanup] CI test updates; mypy cleanup; partial broadcast_object cleanup; pre-commit documentation (#744) * changelog; mypy; oss cleanup * more broadcast_object cleanup in FSDP * one more mypy fix * retire pytorch 1.6 from circleci, add new lightly, add 1.8 LTS and 1.9 stable release * update torch version for LTS * minor fixes * update cache key * trying newer gpu VMs * bump the cache * update to gpu.medium, which should be 2 GPUs * update nightly version * add pre-commit instruction * fixed CHANGELOG after merging * updated to newer nightly * retained the older broadcast function for older GPUs for oss.py * fixed a bug * added a comment * fixing a test for pytorch 1.10 * testing a fix * Update fairscale/optim/oss.py * Update CONTRIBUTING.md Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 05 Sep, 2021 1 commit
-
-
Min Xu authored
* [bug] [FSDP] making sure we use full params for multiple backwards within an iteration * changelog Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 12 Aug, 2021 2 commits
-
-
anj-s authored
* add additional assert for checking if the requires_grad field is set. * fix lint errors * add unit tests and address comments
-
anj-s authored
[FSDP][feature] Support returning the original parameter names after a model has been wrapped with FSDP (#755) * checkpoint work * fix lint issues * remove debug statement * remove print * fix lint errors * fix lint errors * fix lint errors * add comments and fix lint errors * modified comments and tests
-
- 02 Aug, 2021 1 commit
-
-
mrshenli authored
`wrap` from `auto_wrap` is used in the docstring example which is missing from the imports.
-
- 31 Jul, 2021 1 commit
-
-
Myle Ott authored
* Add test (broken) for gradient accumulation without no_sync context manager * changelog * no_sync to grad_acc renaming for tests * clean up tmp files * support grad acc without no_sync * minor * update changelog * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py Better assertion from Sam. Co-authored-by:
Sam Shleifer <sshleifer@gmail.com> * lint Co-authored-by:
Min Xu <min.xu.public@gmail.com> Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com> Co-authored-by:
Sam Shleifer <sshleifer@gmail.com>
-
- 30 Jul, 2021 1 commit
-
-
Yanli Zhao authored
Move final backward callback to pre-backward hook of root FSDP instance Summary: Move final backward callback to pre-backward hook of root FSDP instance, so that it is always attached to the outer most backward call and fired after all backward calls are completed. Also added flags to check final backward callback is fired when final backward callback is required. If root FSDP is checkpointed and called multiple times in forward, check pointer counter is used to make sure final backward callback is queued inside last inner backward call as well. Test Plan: unit tests Reviewers: Subscribers: Tasks: Tags: * reformat Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * nits and unit tests Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * address some comments Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * replace m with self Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * reformat Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * nits Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * remove the fired flag Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * assert state on root only Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * comments Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * comments Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
-
- 26 Jul, 2021 1 commit
-
-
Min Xu authored
* [feat] FSDP: supporting multiple flatten parameter groups - step 3: make FSDP use FlattenParamModule unconditionally * fixing the auto_wrap tests * minor * rewrite local_metadata_dict - updated FPW so that custom flat param name is also supported * bug fix * mypy * rewrote consolidate_shard_weights - test_consolidate passes * comments * fixing pickling * Fix shared params and MoE logic (#749) * add strict kwarg to support fairseq:gshard MoE saving logic * Test fairseq style shard * style * formatting and address comments * added changelog * fixing a test after padding renaming Co-authored-by:
Min Xu <min.xu.public@gmail.com> Co-authored-by:
Sam Shleifer <sshleifer@gmail.com>
-
- 19 Jul, 2021 1 commit
-
-
liangluofb authored
* Update fully_sharded_data_parallel.py update fully_sharded_data_parallel to use _allgather_base * Update reduce_scatter_bucketer.py Use reduce_scatter_base * Update fully_sharded_data_parallel.py nonblocking gradient cpu copy, and nonblocking param rebulds * Update reduce_scatter_bucketer.py lints * Update fully_sharded_data_parallel.py * Update reduce_scatter_bucketer.py * Update reduce_scatter_bucketer.py * lints * linter, test fix * linter * LINTERgit add fairscale/utils/reduce_scatter_bucketer.pygit add fairscale/utils/reduce_scatter_bucketer.py * LINTERgit add tests/nn/data_parallel/test_fsdp_overlap.pygit add tests/nn/data_parallel/test_fsdp_overlap.py * Update test_fsdp_overlap.py * Update fairscale/utils/reduce_scatter_bucketer.py Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com> * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com> * Update reduce_scatter_bucketer.py * isort Co-authored-by:
Ubuntu <ubuntu@ip-172-31-9-185.ec2.internal> Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-77-164.ec2.internal>
-
- 21 Jun, 2021 1 commit
-
-
Min Xu authored
* [feat] FSDP: supporting multiple flatten parameter groups - step 2: extending FPW to support multiple flat params groups - FSDP still only use one group - unit test does this the new code paths - updated the changelog * first cut, mypy passed * test_flatten_params_wrapper.py::TestFlattenParams tests pass * added two more test cases and fixed a case in the code * fixed one bug with param_path_infos * fixed two more tests with hardcoded flat_param names * Update CHANGELOG.md Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 08 Jun, 2021 1 commit
-
-
Min Xu authored
* refactoring FlattenParamWrapper - use a FlatParameter class to encapsulate the logic of flattening and expanding into views. - this will make it easier to have multiple groups of flatten parameters * fixed testing context issues for both temp files and temp dirs * fixing test_fsdp_metadata * fix pickling of FlatParameter * fixed test_fsdp_optimizer_utils.py * minor * fix assert * lint * remove nesting from the test * step 1.5: remove the code related unnecessary nesting support in FPW * Update fairscale/nn/misc/flatten_params_wrapper.py Co-authored-by:
Sam Shleifer <sshleifer@gmail.com> * address comment Co-authored-by:
Min Xu <min.xu.public@gmail.com> Co-authored-by:
Sam Shleifer <sshleifer@gmail.com>
-
- 01 Jun, 2021 1 commit
-
-
Pete authored
* add failing test for buffer dtype * fix buffer dtype issue * update CHANGELOG * fix
-
- 17 May, 2021 1 commit
-
-
Min Xu authored
* [fix] auto_wrap: support wrapping based on wrapper_config - user can use this to avoid assert if auto_wrap is used multiple times on a module - user can traverse the modules multiple times and assign a wrapper_config to the module and then use auto_wrap once to wrap them fix #649 fix #585 * added changelog * fix tests * fix a test * added an optional assert for collision based on discussions with Quentin * added config_auto_wrap_policy * lint Co-authored-by:Min Xu <min.xu.public@gmail.com>
-