- 21 Oct, 2021 1 commit
-
-
anj-s authored
* update python version for cpu tess * run CPU tests with updated PyTorch version * update nightly and test PyTorch versions * skip failing multiprocess pipe test * always skip test * always skip test * always skip test * lint error * skip unsupported versions * improve skip message * lint errors
-
- 20 Oct, 2021 3 commits
-
-
anj-s authored
* add log for new memory tracker features * add log for new memory tracker features
-
Quentin Duval authored
* [feat] layer memory tracking * [feat] layer memory tracking (add tests in CI) * [feat] layer memory tracking: doc typos * [feat] layer memory tracking: mypy fixes * [feat] layer memory tracking: fixes for FSDP all gather tracking on pytorch 1.9 and above * [feat] layer memory tracking: lint * [feat] layer memory tracking: mypy Co-authored-by:QuentinDuval <QuentinDuval@users.noreply.github.com>
-
anj-s authored
-
- 19 Oct, 2021 1 commit
-
-
Rohan Varma authored
* fix * remove dup file
-
- 28 Sep, 2021 1 commit
-
-
Anjali Sridhar authored
-
- 24 Sep, 2021 1 commit
-
-
Anjali Sridhar authored
-
- 22 Sep, 2021 1 commit
-
-
tmarkstrum authored
* update master branch to main * added FAQ about updating the branch from master to main * fixed some false positive correction * added what is new section * fixed the quoted code area * added release what is new section * added a step in release.md * fixed a word
-
- 21 Sep, 2021 1 commit
-
-
anj-s authored
-
- 20 Sep, 2021 1 commit
-
-
tmarkstrum authored
* [chore]0.4.1 release * put more details in one change log
-
- 17 Sep, 2021 1 commit
-
-
tmarkstrum authored
* add toggler to disable the using the nccl base collectives * added todo to remove the toggle when the issue is resolved.
-
- 13 Sep, 2021 1 commit
-
-
Benjamin Lefaudeux authored
-
- 12 Sep, 2021 2 commits
-
-
Min Xu authored
* add changelog for previous commit * add changelog for previous commit * add changelog for previous commit * fix a merge induced error Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
Darryl Barnhart authored
* [fix] FSDP intra-backwards gradient accumulation. Ensure gradient reduction accumulates into the unsharded gradient tensor within a backwards pass. This matters when an FSDP module is called multiple times within a forward pass, and reduction is _not_ deferred using activation checkpoint forward counters, bucketing or some other mechanism. Closes #780 * [refactor] Remove forward counters. Comments. Removed forward counters from the activation checkpointing utility, now that FSDP does not require them for correct operation. Add more detailed comment about memory usage behaviour with gradient reduction. * [refactor] Delete deprecated forward counter usage. * [refactor] Add state assertion as end of pre-backward hook.
-
- 11 Sep, 2021 1 commit
-
-
Alex Xiao authored
Before this commit, output tensors of checkpointed modules always require grad, even if they shouldn't. This commit makes it so that the outputs of checkpointed modules only require grad if either the input requires grad or if the parameters require grad. To achieve this, this commit also adds a new _unflattened_param_views attribute to modules being flattened. This allows the checkpointing to still access the parameters and check if gradients need to be computed. Co-authored-by:Alex Xiao <axiao@fb.com>
-
- 10 Sep, 2021 2 commits
-
-
Min Xu authored
Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
Benjamin Lefaudeux authored
-
- 07 Sep, 2021 1 commit
-
-
Achal Dixit authored
* [test] Added disable_checkpointing unit test * [test] Added disable_checkpointing unit test (Clean-up) * [test] Added disable_checkpointing unit test (Clean-up)
-
- 06 Sep, 2021 2 commits
-
-
-
Min Xu authored
[cleanup] CI test updates; mypy cleanup; partial broadcast_object cleanup; pre-commit documentation (#744) * changelog; mypy; oss cleanup * more broadcast_object cleanup in FSDP * one more mypy fix * retire pytorch 1.6 from circleci, add new lightly, add 1.8 LTS and 1.9 stable release * update torch version for LTS * minor fixes * update cache key * trying newer gpu VMs * bump the cache * update to gpu.medium, which should be 2 GPUs * update nightly version * add pre-commit instruction * fixed CHANGELOG after merging * updated to newer nightly * retained the older broadcast function for older GPUs for oss.py * fixed a bug * added a comment * fixing a test for pytorch 1.10 * testing a fix * Update fairscale/optim/oss.py * Update CONTRIBUTING.md Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 05 Sep, 2021 1 commit
-
-
Min Xu authored
* [bug] [FSDP] making sure we use full params for multiple backwards within an iteration * changelog Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 18 Aug, 2021 1 commit
-
-
Vittorio Caggiano authored
-
- 12 Aug, 2021 4 commits
-
-
anj-s authored
-
Min Xu authored
* minor: changelog and pre-commit * addressed comment * update the release doc Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
anj-s authored
* add additional assert for checking if the requires_grad field is set. * fix lint errors * add unit tests and address comments
-
anj-s authored
[FSDP][feature] Support returning the original parameter names after a model has been wrapped with FSDP (#755) * checkpoint work * fix lint issues * remove debug statement * remove print * fix lint errors * fix lint errors * fix lint errors * add comments and fix lint errors * modified comments and tests
-
- 10 Aug, 2021 1 commit
-
-
Rahul Iyer authored
Pre-commit hook fails when run on all files for three reasons: (see trace below) 1. Trailing whitespace on multiple files 2. mypy fails to load numpy and then subsequently fails to load LazyModule from pipe.py 3. isort sees issues with known_third_party packages ``` > pre-commit run --all-files Trim Trailing Whitespace.................................................Failed - hook id: trailing-whitespace - exit code: 1 - files were modified by this hook Fixing docs/source/conf.py Fixing fairscale/experimental/nn/auto_shard.py Fixing docs/source/deep_dive/activation_checkpointing.rst Fixing docs/source/tutorials/pipe.rst Fixing docs/source/installation_instructions.rst Fixing docs/source/deep_dive/pipeline_parallelism.rst Fixing docs/source/tutorials/activation_checkpointing.rst Fixing docs/source/tutorials/offload_model.rst Fixing docs/source/deep_dive/oss_sdp_fsdp.rst Fixing docs/source/what_is_fairscale.rst Fixing CHANGELOG.md Fixing fairscale/experimental/nn/offload.py Fixing docs/source/index.rst Fixing docs/source/deep_dive/adascale.rst Fixing README.md Fixing docs/source/tutorials/oss.rst Fixing docs/source/deep_dive/offload.rst Check python ast.........................................................Passed Check for merge conflicts................................................Passed Don't commit to branch...................................................Passed Check for added large files..............................................Passed Fix End of Files.........................................................Failed - hook id: end-of-file-fixer - exit code: 1 - files were modified by this hook Fixing requirements.txt Fixing docs/source/getting_started.rst Fixing docs/source/installation_instructions.rst Fixing codecov.yml Fixing docs/source/deep_dive/adascale.rst Fixing docs/source/tutorials/oss.rst Fixing docs/source/deep_dive/offload.rst black....................................................................Passed flake8...................................................................Passed seed isort known_third_party.............................................Failed - hook id: seed-isort-config - exit code: 1 - files were modified by this hook isort....................................................................Passed mypy.....................................................................Failed - hook id: mypy - exit code: 2 setup.cfg:45: error: Error importing plugin 'numpy.typing.mypy_plugin': No module named 'numpy' Found 1 error in 1 file (checked 197 source files) ```
-
- 02 Aug, 2021 2 commits
-
-
mrshenli authored
`wrap` from `auto_wrap` is used in the docstring example which is missing from the imports.
-
Howard Huang authored
-
- 01 Aug, 2021 1 commit
-
-
Min Xu authored
Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 31 Jul, 2021 1 commit
-
-
Myle Ott authored
* Add test (broken) for gradient accumulation without no_sync context manager * changelog * no_sync to grad_acc renaming for tests * clean up tmp files * support grad acc without no_sync * minor * update changelog * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py Better assertion from Sam. Co-authored-by:
Sam Shleifer <sshleifer@gmail.com> * lint Co-authored-by:
Min Xu <min.xu.public@gmail.com> Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com> Co-authored-by:
Sam Shleifer <sshleifer@gmail.com>
-
- 30 Jul, 2021 1 commit
-
-
Yanli Zhao authored
Move final backward callback to pre-backward hook of root FSDP instance Summary: Move final backward callback to pre-backward hook of root FSDP instance, so that it is always attached to the outer most backward call and fired after all backward calls are completed. Also added flags to check final backward callback is fired when final backward callback is required. If root FSDP is checkpointed and called multiple times in forward, check pointer counter is used to make sure final backward callback is queued inside last inner backward call as well. Test Plan: unit tests Reviewers: Subscribers: Tasks: Tags: * reformat Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * nits and unit tests Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * address some comments Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * replace m with self Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * reformat Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * nits Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * remove the fired flag Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * assert state on root only Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * comments Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * comments Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
-
- 27 Jul, 2021 2 commits
-
-
Min Xu authored
* [chore] 0.3.9 release * update changelog * address comments Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
Benjamin Lefaudeux authored
-
- 26 Jul, 2021 1 commit
-
-
Min Xu authored
* [feat] FSDP: supporting multiple flatten parameter groups - step 3: make FSDP use FlattenParamModule unconditionally * fixing the auto_wrap tests * minor * rewrite local_metadata_dict - updated FPW so that custom flat param name is also supported * bug fix * mypy * rewrote consolidate_shard_weights - test_consolidate passes * comments * fixing pickling * Fix shared params and MoE logic (#749) * add strict kwarg to support fairseq:gshard MoE saving logic * Test fairseq style shard * style * formatting and address comments * added changelog * fixing a test after padding renaming Co-authored-by:
Min Xu <min.xu.public@gmail.com> Co-authored-by:
Sam Shleifer <sshleifer@gmail.com>
-
- 19 Jul, 2021 1 commit
-
-
liangluofb authored
* Update fully_sharded_data_parallel.py update fully_sharded_data_parallel to use _allgather_base * Update reduce_scatter_bucketer.py Use reduce_scatter_base * Update fully_sharded_data_parallel.py nonblocking gradient cpu copy, and nonblocking param rebulds * Update reduce_scatter_bucketer.py lints * Update fully_sharded_data_parallel.py * Update reduce_scatter_bucketer.py * Update reduce_scatter_bucketer.py * lints * linter, test fix * linter * LINTERgit add fairscale/utils/reduce_scatter_bucketer.pygit add fairscale/utils/reduce_scatter_bucketer.py * LINTERgit add tests/nn/data_parallel/test_fsdp_overlap.pygit add tests/nn/data_parallel/test_fsdp_overlap.py * Update test_fsdp_overlap.py * Update fairscale/utils/reduce_scatter_bucketer.py Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com> * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com> * Update reduce_scatter_bucketer.py * isort Co-authored-by:
Ubuntu <ubuntu@ip-172-31-9-185.ec2.internal> Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-77-164.ec2.internal>
-
- 12 Jul, 2021 2 commits
-
-
anj-s authored
-
Vittorio Caggiano authored
misspelled name
-
- 07 Jul, 2021 1 commit
-
-
Edward Z. Yang authored
See https://github.com/pytorch/pytorch/pull/59671/ Signed-off-by:
Edward Z. Yang <ezyang@fb.com>
-
- 28 Jun, 2021 1 commit
-
-
anj-s authored
-