- 08 Nov, 2021 2 commits
-
-
anj-s authored
* update release notes * initial commit * lint cleanup etc. * helper functions; lint errors * lint errors * lint errors * add back the boolean for named_parameters * address comments and fix lint * remove unused functions and class * remove unused state
-
Benjamin Lefaudeux authored
Add SlowMo Distributed Data Parallel for clusters with slow interconnects Co-authored-by:Vinayak Tantia <tantia.vinayak1@gmail.com>
-
- 05 Nov, 2021 1 commit
-
-
Min Xu authored
* [feat] MEVO kernel - initial import from min/softmax and min/testing branches - need to rename and further cleanup * only test with newer pytorch * renamed and added comments and code cleanup * rename and reduce test memory * testing * minor fixing * fixing * more fix * changelog * more 1.7 and 1.8 paper cuts * remove dead code * addressed Benjamin's comments * addressed more comments Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 03 Nov, 2021 1 commit
-
-
Vinayak Tantia authored
-
- 02 Nov, 2021 2 commits
-
-
anj-s authored
-
Min Xu authored
Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 01 Nov, 2021 2 commits
-
-
Min Xu authored
* added a new test, passing without shared weights * tested weight sharing * added the test to test list file * extended to world_size = 2 * fixed test * [feat]: add limited and experimental support for shared parameter * fixed tests * simplify to work with layer with at least 1 non-shared params and add code to pick up linked_param field for sharding the shared param * fixed the case where linked param is not in separate FSDP * changelog and remove old code Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
anj-s authored
* add doc strings * add lower level SSD APIs and tests * add the test to the list to be run * remove unused imports * more doc string changes * fix lint errors
-
- 28 Oct, 2021 1 commit
-
-
Min Xu authored
* [fix] fix test on main * [fix] fix test on main Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 27 Oct, 2021 6 commits
-
-
anj-s authored
* remove offload dependency on fp16 * update python version for cpu tess * run CPU tests with updated PyTorch version * split changes * revert tests config * fix lint errors * update nightly and test PyTorch versions * skip failing multiprocess pipe test * always skip test * always skip test * always skip test * lint error * skip unsupported versions * improve skip message * lint errors * modify docs * add tests * fix test failures * modify comments * fix lint errors * fix lint errors
-
Min Xu authored
* checkpoint + nonflat + mixed_precision * make tests pass with expected errors * addressed comments * add a comment Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
anj-s authored
* skip creating cpu grads and pinning memory * added additional comment * pin docutils to fix circleCI
-
Min Xu authored
* added the failing test * fixed the bug * fine-tune the condition * typo * typo * changelog and added test to test files Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
anj-s authored
-
Eugen Hotaj authored
Fixes #827. Co-authored-by:Eugen Hotaj <ehotaj@fb.com>
-
- 24 Oct, 2021 1 commit
-
-
anj-s authored
* relax speed constraints * relax the regressions constraints
-
- 22 Oct, 2021 2 commits
-
-
anj-s authored
-
Eugen Hotaj authored
auto_shard.py currently uses torch.fx to create a symbolic DAG of operations and linearizes that DAG into an nn.Sequential so it can later be used for model offloading. This works in most cases but runs into issues for certain eager mode features, such as dynamic conditionals, shape-dependent computation, etc. This PR extends auto_shard.py to first run a preprocessing step which wraps any nn.Module which cannot be traced through. It adds a test for dynamic conditionals and updates existing failing test code. There are some immediate extensions to this approach which are marked as TODO in the code.
-
- 21 Oct, 2021 2 commits
-
-
anj-s authored
* update pytorch version for benchmarks * reduce golden data precision check
-
anj-s authored
* update python version for cpu tess * run CPU tests with updated PyTorch version * update nightly and test PyTorch versions * skip failing multiprocess pipe test * always skip test * always skip test * always skip test * lint error * skip unsupported versions * improve skip message * lint errors
-
- 20 Oct, 2021 3 commits
-
-
anj-s authored
* add log for new memory tracker features * add log for new memory tracker features
-
Quentin Duval authored
* [feat] layer memory tracking * [feat] layer memory tracking (add tests in CI) * [feat] layer memory tracking: doc typos * [feat] layer memory tracking: mypy fixes * [feat] layer memory tracking: fixes for FSDP all gather tracking on pytorch 1.9 and above * [feat] layer memory tracking: lint * [feat] layer memory tracking: mypy Co-authored-by:QuentinDuval <QuentinDuval@users.noreply.github.com>
-
anj-s authored
-
- 19 Oct, 2021 1 commit
-
-
Rohan Varma authored
* fix * remove dup file
-
- 28 Sep, 2021 1 commit
-
-
Anjali Sridhar authored
-
- 24 Sep, 2021 1 commit
-
-
Anjali Sridhar authored
-
- 22 Sep, 2021 1 commit
-
-
tmarkstrum authored
* update master branch to main * added FAQ about updating the branch from master to main * fixed some false positive correction * added what is new section * fixed the quoted code area * added release what is new section * added a step in release.md * fixed a word
-
- 21 Sep, 2021 1 commit
-
-
anj-s authored
-
- 20 Sep, 2021 1 commit
-
-
tmarkstrum authored
* [chore]0.4.1 release * put more details in one change log
-
- 17 Sep, 2021 1 commit
-
-
tmarkstrum authored
* add toggler to disable the using the nccl base collectives * added todo to remove the toggle when the issue is resolved.
-
- 13 Sep, 2021 1 commit
-
-
Benjamin Lefaudeux authored
-
- 12 Sep, 2021 2 commits
-
-
Min Xu authored
* add changelog for previous commit * add changelog for previous commit * add changelog for previous commit * fix a merge induced error Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
Darryl Barnhart authored
* [fix] FSDP intra-backwards gradient accumulation. Ensure gradient reduction accumulates into the unsharded gradient tensor within a backwards pass. This matters when an FSDP module is called multiple times within a forward pass, and reduction is _not_ deferred using activation checkpoint forward counters, bucketing or some other mechanism. Closes #780 * [refactor] Remove forward counters. Comments. Removed forward counters from the activation checkpointing utility, now that FSDP does not require them for correct operation. Add more detailed comment about memory usage behaviour with gradient reduction. * [refactor] Delete deprecated forward counter usage. * [refactor] Add state assertion as end of pre-backward hook.
-
- 11 Sep, 2021 1 commit
-
-
Alex Xiao authored
Before this commit, output tensors of checkpointed modules always require grad, even if they shouldn't. This commit makes it so that the outputs of checkpointed modules only require grad if either the input requires grad or if the parameters require grad. To achieve this, this commit also adds a new _unflattened_param_views attribute to modules being flattened. This allows the checkpointing to still access the parameters and check if gradients need to be computed. Co-authored-by:Alex Xiao <axiao@fb.com>
-
- 10 Sep, 2021 2 commits
-
-
Min Xu authored
Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
Benjamin Lefaudeux authored
-
- 07 Sep, 2021 1 commit
-
-
Achal Dixit authored
* [test] Added disable_checkpointing unit test * [test] Added disable_checkpointing unit test (Clean-up) * [test] Added disable_checkpointing unit test (Clean-up)
-
- 06 Sep, 2021 2 commits
-
-
-
Min Xu authored
[cleanup] CI test updates; mypy cleanup; partial broadcast_object cleanup; pre-commit documentation (#744) * changelog; mypy; oss cleanup * more broadcast_object cleanup in FSDP * one more mypy fix * retire pytorch 1.6 from circleci, add new lightly, add 1.8 LTS and 1.9 stable release * update torch version for LTS * minor fixes * update cache key * trying newer gpu VMs * bump the cache * update to gpu.medium, which should be 2 GPUs * update nightly version * add pre-commit instruction * fixed CHANGELOG after merging * updated to newer nightly * retained the older broadcast function for older GPUs for oss.py * fixed a bug * added a comment * fixing a test for pytorch 1.10 * testing a fix * Update fairscale/optim/oss.py * Update CONTRIBUTING.md Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 05 Sep, 2021 1 commit
-
-
Min Xu authored
* [bug] [FSDP] making sure we use full params for multiple backwards within an iteration * changelog Co-authored-by:Min Xu <min.xu.public@gmail.com>
-