- 19 Jul, 2021 1 commit
-
-
liangluofb authored
* Update fully_sharded_data_parallel.py update fully_sharded_data_parallel to use _allgather_base * Update reduce_scatter_bucketer.py Use reduce_scatter_base * Update fully_sharded_data_parallel.py nonblocking gradient cpu copy, and nonblocking param rebulds * Update reduce_scatter_bucketer.py lints * Update fully_sharded_data_parallel.py * Update reduce_scatter_bucketer.py * Update reduce_scatter_bucketer.py * lints * linter, test fix * linter * LINTERgit add fairscale/utils/reduce_scatter_bucketer.pygit add fairscale/utils/reduce_scatter_bucketer.py * LINTERgit add tests/nn/data_parallel/test_fsdp_overlap.pygit add tests/nn/data_parallel/test_fsdp_overlap.py * Update test_fsdp_overlap.py * Update fairscale/utils/reduce_scatter_bucketer.py Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com> * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com> * Update reduce_scatter_bucketer.py * isort Co-authored-by:
Ubuntu <ubuntu@ip-172-31-9-185.ec2.internal> Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-77-164.ec2.internal>
-
- 07 Jul, 2021 1 commit
-
-
Edward Z. Yang authored
See https://github.com/pytorch/pytorch/pull/59671/ Signed-off-by:
Edward Z. Yang <ezyang@fb.com>
-
- 28 Jun, 2021 2 commits
-
-
Yanli Zhao authored
Make sure requires_grad of FlatParameter to be consistent with requires_grad of original parameters (#721) * Make sure requires_grad of FlatParameter to be consistent with requires_grad of original parameters * Make sure requires_grad of FlatParameter to be consistent with requires_grad of original parameters
-
Mehdi Mirzazadeh authored
* fixing bug in setting dependancies in parition handler * modifying unit test to need the fix * black
-
- 26 Jun, 2021 1 commit
-
-
Pavel Belevich authored
-
- 25 Jun, 2021 2 commits
-
-
Mehdi Mirzazadeh authored
-
Mehdi Mirzazadeh authored
* Preparing pipeline for newer versions of pytorch * updated error message
-
- 22 Jun, 2021 1 commit
-
-
Pavel Belevich authored
* Update torch to 1.9.0.dev20210614+cu102 * Update config.yml * Update config.yml * Update setup.py * Update config.yml * Update config.yml * Update config.yml * Update config.yml
-
- 21 Jun, 2021 1 commit
-
-
Min Xu authored
* [feat] FSDP: supporting multiple flatten parameter groups - step 2: extending FPW to support multiple flat params groups - FSDP still only use one group - unit test does this the new code paths - updated the changelog * first cut, mypy passed * test_flatten_params_wrapper.py::TestFlattenParams tests pass * added two more test cases and fixed a case in the code * fixed one bug with param_path_infos * fixed two more tests with hardcoded flat_param names * Update CHANGELOG.md Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 11 Jun, 2021 2 commits
-
-
anj-s authored
[Offload][feature] Add auto shard functionality to remove requirement of nn.Sequential models. (#695) * auto wrap functionality * lint and doc strings * fix lint errors * lint errors and version skips * remove mypy checking and add conditional import * another math.prod instance * another import fix * address comments * lint errors * address comments * fix lint errors * add placeholder nodes to tracker list
-
Pete authored
* add failing test * add fix * use 'torch.is_grad_enabled()' instead of 'module.training' * Revert "add failing test" This reverts commit 1c34242208f9b2c5fa6c8f181434c2be6d7cdbc0. * add simple test * improve test * add check for fwd_counter * revert typing/format changes * move to new test file * CHANGELOG * remove old test * fix import order * fix test to be compat with torch 1.6.0 * clean up * comments * isort
🤦
-
- 08 Jun, 2021 1 commit
-
-
Min Xu authored
* refactoring FlattenParamWrapper - use a FlatParameter class to encapsulate the logic of flattening and expanding into views. - this will make it easier to have multiple groups of flatten parameters * fixed testing context issues for both temp files and temp dirs * fixing test_fsdp_metadata * fix pickling of FlatParameter * fixed test_fsdp_optimizer_utils.py * minor * fix assert * lint * remove nesting from the test * step 1.5: remove the code related unnecessary nesting support in FPW * Update fairscale/nn/misc/flatten_params_wrapper.py Co-authored-by:
Sam Shleifer <sshleifer@gmail.com> * address comment Co-authored-by:
Min Xu <min.xu.public@gmail.com> Co-authored-by:
Sam Shleifer <sshleifer@gmail.com>
-
- 01 Jun, 2021 1 commit
-
-
Pete authored
* add failing test for buffer dtype * fix buffer dtype issue * update CHANGELOG * fix
-
- 28 May, 2021 1 commit
-
-
Min Xu authored
* [do not merge] testing a corner case * workaround * using dummy tensor to fix * lint * changelog * update a comment Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 27 May, 2021 1 commit
-
-
msbaines authored
This change also ensure that we calculate running_{mean,var} correctly when wrapped.
-
- 17 May, 2021 2 commits
-
-
Min Xu authored
* [fix] auto_wrap: support wrapping based on wrapper_config - user can use this to avoid assert if auto_wrap is used multiple times on a module - user can traverse the modules multiple times and assign a wrapper_config to the module and then use auto_wrap once to wrap them fix #649 fix #585 * added changelog * fix tests * fix a test * added an optional assert for collision based on discussions with Quentin * added config_auto_wrap_policy * lint Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
Quentin Duval authored
* Save FSDP metadata for offline unflattening * Complete the meta-data saving method with all the information needed to reconstruct a checkpoint offline, and implement the method that reconstruct a consolidated checkpoint from a sharded checkpoint * Complete the meta-data saving method with all the information needed to reconstruct a checkpoint offline, and implement the method that reconstruct a consolidated checkpoint from a sharded checkpoint * Add a unit test to show how to use the function * Code review + improvement of the unit tests * Code review: extract clean_path * Make meta data and consolidation of checkpoint work for flatten_parameter=False * Add new unit test file in CI * Complete changelog and fix mypy issues * Add support for module buffers in the consolidation of sharded checkpoints * Better support for module buffers: save them in the meta data * Refactoring: use a data-format for the meta data that is simpler to understand (move from object of array to array of object format) * Renaming to make code clearer * Code review: in_temporary_directory rework and typo correction * Renaming Co-authored-by:
Sam Shleifer <sshleifer@gmail.com> Co-authored-by:
QuentinDuval <QuentinDuval@users.noreply.github.com>
-
- 14 May, 2021 2 commits
-
-
msbaines authored
-
Shruti Bhosale authored
* fix saving and loading checkpoints with use_sharded_state=True * mypy fix * better fix of the infinite recursion - we need to specifically call FSDP.state_dict from its local state_dict - added unit test that fails without the fix and works with the fix - fixed mypy for the overloaded functions * make cpu-only fsdp work for state_dict at least Co-authored-by:
Min Xu <min.xu@acm.org> Co-authored-by:
Min Xu <min.xu.public@gmail.com> Co-authored-by:
Min Xu <m1n@fb.com>
-
- 13 May, 2021 1 commit
-
-
Min Xu authored
* [fix] add and use get_process_group_cached - This commit makes FSDP avoid making too many process groups by default - Extra process group is bad for GPU memory and init time * add changelog * lint * note on speed * add better assert output test seems to be flaky: https://app.circleci.com/pipelines/github/facebookresearch/fairscale/2957/workflows/383c9f9f-f1a5-461c-8c41-e2e28ece037b/jobs/26783/steps * update test reference memory values - With cached process groups, the memory is reduced as reported by pytorch as well (due to bucket buffer memory for the reduction buffer) - The effect on memory is actually more on the SMI memory, which is not reported by pytorch and checked by this test. * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py * Update CHANGELOG.md * Update fairscale/utils/parallel.py * Update fairscale/utils/parallel.py * Update fairscale/utils/parallel.py * Update fairscale/utils/parallel.py * improved changelog * better handling of underscores in the md file Co-authored-by:
Min Xu <min.xu@acm.org>
-
- 12 May, 2021 1 commit
-
-
anj-s authored
* rename files * add newly renamed file * rename and move checkpoint activations related files * add test files to ci list * fix lint errors * modify docs * add changelog * retain old path for now * fix lint errors * add another import test case * fix merge conflict * add missing test file
-
- 11 May, 2021 1 commit
-
-
Min Xu authored
* [fix] FSDP forward pass overlap between compute and all-gather - much thanks for @cyanguwa for report and @QuentinDuval for debugging it - a new unit test is added to check for this and ensure we detect issue with overlapping and cpu/gpu blocking wait calls * fix * fix * fix * better assertion outputs * fix format and tune all_gather mb for CI * more tuning with non_flatten * undo an accidental change * tuning all gather mb and del model * Update + fix overlapping test to use patched all_gather w/ delay (#672) * fixing get_cycles_per_ms * add get_smi_memory * update the docstring Co-authored-by:
Min Xu <min.xu@acm.org> Co-authored-by:
Myle Ott <myleott@fb.com>
-
- 08 May, 2021 2 commits
-
-
Sam Shleifer authored
-
anj-s authored
* rename and move optim/utils.py * attach the new file
-
- 07 May, 2021 2 commits
-
-
Min Xu authored
* [test]: add a more general test case - also rebalance the tests a bit * added missing arg * balance * better checking * balance * make test smaller and faster * make ddp results cached and enable sync_bn * clean up * fix tests * changelog * blance * fix * addressing comments Co-authored-by:Min Xu <min.xu@acm.org>
-
msbaines authored
* [feat] experimental.nn.SyncBatchNorm: initial commit Fast/simple re-implementation of SyncBatchNorm. When profiling SSL Vision, I was seeing a majority of cycles spent in SyncBatchNorm. With this change, I see a 10% to 20% speedup on the model I was profiling. When running benchmarks/experimental/sync_batchnorm.py on 8 x V100, I get a 6x speedup: <class 'torch.nn.modules.batchnorm.BatchNorm2d'> Elapsed time is 0.08709120750427246 Elapsed time is 0.12632274627685547 Elapsed time is 0.14095258712768555 Elapsed time is 0.16529417037963867 Elapsed time is 0.1419970989227295 Elapsed time is 0.15166854858398438 Elapsed time is 0.12000870704650879 Elapsed time is 0.17534875869750977 <class 'torch.nn.modules.batchnorm.SyncBatchNorm'> Elapsed time is 2.5087168216705322 Elapsed time is 2.497001886367798 Elapsed time is 2.5204885005950928 Elapsed time is 2.526789903640747 Elapsed time is 2.5080230236053467 Elapsed time is 2.524489641189575 Elapsed time is 2.513214588165283 Elapsed time is 2.5359973907470703 <class 'fairscale.experimental.nn.sync_batchnorm.SyncBatchNorm'> Elapsed time is 0.4126114845275879 Elapsed time is 0.39051294326782227 Elapsed time is 0.40685415267944336 Elapsed time is 0.4159870147705078 Elapsed time is 0.42383885383605957 Elapsed time is 0.4080159664154053 Elapsed time is 0.41202712059020996 Elapsed time is 0.42400121688842773
-
- 05 May, 2021 3 commits
-
-
Min Xu authored
* [fix] better assert and better test for frozen weights - the precise condition should have been check m.parameters(), not m.params. - fixes #643 * add changelog * use enum is so much better Co-authored-by:Min Xu <min.xu@acm.org>
-
Benjamin Lefaudeux authored
* increasing the code coverage, good practice and raising bugs. hopefully getting to 100% * small bugfix
-
Min Xu authored
* [fix] add clear_autocast_cache flag - when training in AMP model with weight dtype32, FSDP may need to optionally clear the autocast cache to avoid GPU OOM - this flag is default false, automatically doing it is a future TODO - also added a verbose flag to make print(fsdp_model) a bit shorter - updated the memory test to cover those new code - added a couple of useful functions in parallel.py and testing.py * minor * address comments * format * improve the test Co-authored-by:Min Xu <min.xu@acm.org>
-
- 04 May, 2021 1 commit
-
-
tmarkstrum authored
* dynamic loss scaler * isort * black * flake8 * comments * added the test to ci file, added a line to catch the overflow error, fixed some formatting errors * adding type annotation * added todo for adding more test cases for handling Nan gradients * fix some doc string and comments, add more tods * fix two doc strings
-
- 03 May, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* fix + unit test * changelog update
-
- 30 Apr, 2021 1 commit
-
-
msbaines authored
-
- 29 Apr, 2021 2 commits
-
-
Benjamin Lefaudeux authored
-
Benjamin Lefaudeux authored
* Improving test coverage on SDP * using pytest exception catcher
-
- 28 Apr, 2021 3 commits
-
-
Min Xu authored
* [test] improve BN test coverage - Added sync_bn on/off cases - Added conv and linear bias on/off cases - clarified when sync_bn is off, when is BN wrapping needed with the test * adding a comment Co-authored-by:Min Xu <min.xu@acm.org>
-
Mehdi Mirzazadeh authored
* adding auto graph generation for distributed pipeline * ignore trace.py for my for now, since it needs pytorch 1.8 * fixing tests * simplifying graph api * remove unused debug utilities * use inspect to find argument lists * use sharded linear layer * flkae8 * comment * polishing * polishing
-
Min Xu authored
* [feat] save memory by using bucket buffer only in backward - this fixes bug #627 - added documentation to clarify the buffer's cost and speed/memory tradeoff - added setup/teardown calls so that the buffer is only allocated during the backward pass, saving more memory for forward and stepping so that they can be used for things like activations. - added a unit test that assert the memory is in range. Comparing with DDP: 1. buffer size scales with # of FSDP not model size 2. buffer is only allocated during backward 3. buffer is used for small tensors only to reduce overhead 4. overlapping of compute-reduction is very different * add PR number to changelog * filled in with memory number on 1.9 * addressed comments * update comments * fix for 1.6 * add a todo Co-authored-by:Min Xu <min.xu@acm.org>
-
- 26 Apr, 2021 1 commit
-
-
Min Xu authored
* [fix]: let FSDP handle model with multiple forward pass and checkpoint * try CI again * save * save * fixed case with bn * minor * add the new file * minor * added test of a single case, runtime is about 50s * enable all 8 test cases * cleanup * cleanup * skip flatten case with 1.6 and 1.7 * minor Co-authored-by:Min Xu <min.xu@acm.org>
-
- 23 Apr, 2021 1 commit
-
-
shuyingsunshine21 authored
* relax checking root condition * formatting * add unittest * add unittest to ci test list * isort for import of unittest * format black . * move test to list 1 * add skip no cuda * black and isort
-
- 22 Apr, 2021 1 commit
-
-
Min Xu authored
* [fix] mypy and flaky test - CI didn't seem to catch this or maybe I merged incorrectly yesterday - this should fix the mypy error on master - also updated a test that seems to be flaky due to tcp port conflict * another flaky test, hopefully more determinism helps * CR * skip 1.6 * fix * minor Co-authored-by:Min Xu <min.xu@acm.org>
-