- 07 May, 2021 1 commit
-
-
msbaines authored
* [feat] experimental.nn.SyncBatchNorm: initial commit Fast/simple re-implementation of SyncBatchNorm. When profiling SSL Vision, I was seeing a majority of cycles spent in SyncBatchNorm. With this change, I see a 10% to 20% speedup on the model I was profiling. When running benchmarks/experimental/sync_batchnorm.py on 8 x V100, I get a 6x speedup: <class 'torch.nn.modules.batchnorm.BatchNorm2d'> Elapsed time is 0.08709120750427246 Elapsed time is 0.12632274627685547 Elapsed time is 0.14095258712768555 Elapsed time is 0.16529417037963867 Elapsed time is 0.1419970989227295 Elapsed time is 0.15166854858398438 Elapsed time is 0.12000870704650879 Elapsed time is 0.17534875869750977 <class 'torch.nn.modules.batchnorm.SyncBatchNorm'> Elapsed time is 2.5087168216705322 Elapsed time is 2.497001886367798 Elapsed time is 2.5204885005950928 Elapsed time is 2.526789903640747 Elapsed time is 2.5080230236053467 Elapsed time is 2.524489641189575 Elapsed time is 2.513214588165283 Elapsed time is 2.5359973907470703 <class 'fairscale.experimental.nn.sync_batchnorm.SyncBatchNorm'> Elapsed time is 0.4126114845275879 Elapsed time is 0.39051294326782227 Elapsed time is 0.40685415267944336 Elapsed time is 0.4159870147705078 Elapsed time is 0.42383885383605957 Elapsed time is 0.4080159664154053 Elapsed time is 0.41202712059020996 Elapsed time is 0.42400121688842773
-
- 05 May, 2021 3 commits
-
-
Min Xu authored
* [fix] better assert and better test for frozen weights - the precise condition should have been check m.parameters(), not m.params. - fixes #643 * add changelog * use enum is so much better Co-authored-by:Min Xu <min.xu@acm.org>
-
Benjamin Lefaudeux authored
* increasing the code coverage, good practice and raising bugs. hopefully getting to 100% * small bugfix
-
Min Xu authored
* [fix] add clear_autocast_cache flag - when training in AMP model with weight dtype32, FSDP may need to optionally clear the autocast cache to avoid GPU OOM - this flag is default false, automatically doing it is a future TODO - also added a verbose flag to make print(fsdp_model) a bit shorter - updated the memory test to cover those new code - added a couple of useful functions in parallel.py and testing.py * minor * address comments * format * improve the test Co-authored-by:Min Xu <min.xu@acm.org>
-
- 04 May, 2021 1 commit
-
-
tmarkstrum authored
* dynamic loss scaler * isort * black * flake8 * comments * added the test to ci file, added a line to catch the overflow error, fixed some formatting errors * adding type annotation * added todo for adding more test cases for handling Nan gradients * fix some doc string and comments, add more tods * fix two doc strings
-
- 03 May, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* fix + unit test * changelog update
-
- 30 Apr, 2021 1 commit
-
-
msbaines authored
-
- 29 Apr, 2021 2 commits
-
-
Benjamin Lefaudeux authored
-
Benjamin Lefaudeux authored
* Improving test coverage on SDP * using pytest exception catcher
-
- 28 Apr, 2021 3 commits
-
-
Min Xu authored
* [test] improve BN test coverage - Added sync_bn on/off cases - Added conv and linear bias on/off cases - clarified when sync_bn is off, when is BN wrapping needed with the test * adding a comment Co-authored-by:Min Xu <min.xu@acm.org>
-
Mehdi Mirzazadeh authored
* adding auto graph generation for distributed pipeline * ignore trace.py for my for now, since it needs pytorch 1.8 * fixing tests * simplifying graph api * remove unused debug utilities * use inspect to find argument lists * use sharded linear layer * flkae8 * comment * polishing * polishing
-
Min Xu authored
* [feat] save memory by using bucket buffer only in backward - this fixes bug #627 - added documentation to clarify the buffer's cost and speed/memory tradeoff - added setup/teardown calls so that the buffer is only allocated during the backward pass, saving more memory for forward and stepping so that they can be used for things like activations. - added a unit test that assert the memory is in range. Comparing with DDP: 1. buffer size scales with # of FSDP not model size 2. buffer is only allocated during backward 3. buffer is used for small tensors only to reduce overhead 4. overlapping of compute-reduction is very different * add PR number to changelog * filled in with memory number on 1.9 * addressed comments * update comments * fix for 1.6 * add a todo Co-authored-by:Min Xu <min.xu@acm.org>
-
- 26 Apr, 2021 1 commit
-
-
Min Xu authored
* [fix]: let FSDP handle model with multiple forward pass and checkpoint * try CI again * save * save * fixed case with bn * minor * add the new file * minor * added test of a single case, runtime is about 50s * enable all 8 test cases * cleanup * cleanup * skip flatten case with 1.6 and 1.7 * minor Co-authored-by:Min Xu <min.xu@acm.org>
-
- 23 Apr, 2021 1 commit
-
-
shuyingsunshine21 authored
* relax checking root condition * formatting * add unittest * add unittest to ci test list * isort for import of unittest * format black . * move test to list 1 * add skip no cuda * black and isort
-
- 22 Apr, 2021 2 commits
-
-
Min Xu authored
* [fix] mypy and flaky test - CI didn't seem to catch this or maybe I merged incorrectly yesterday - this should fix the mypy error on master - also updated a test that seems to be flaky due to tcp port conflict * another flaky test, hopefully more determinism helps * CR * skip 1.6 * fix * minor Co-authored-by:Min Xu <min.xu@acm.org>
-
Benjamin Lefaudeux authored
-
- 19 Apr, 2021 1 commit
-
-
Min Xu authored
* FSDP: fixing training with freezing weights - an assert is changed to catch this case correctly - unit test added (based on Quentin's test code) for this case and compare DDP and FSDP fixes: #610 * added test file to list 1 * Use better and simpler code as suggested by Myle * testing both methods of freezing as well Co-authored-by:Min Xu <min.xu@acm.org>
-
- 15 Apr, 2021 1 commit
-
-
anj-s authored
[fix] Revert change that removed the option to run OffloadModel with out activation checkpointing. (#608) * revert change made * add tests and revert sync shard changes * add tests * remove file checked in by error * inine var * fix lint errors * add checkpoint activation * fix mypy * use a bigger model * modify tests for now * resolve conflicts Co-authored-by:Anjali Sridhar <anj@devfair0443.h2.fair>
-
- 13 Apr, 2021 3 commits
-
-
Sam Shleifer authored
-
Mehdi Mirzazadeh authored
replacing multip-process pipe implementation with more flexible one Initial implementation of proposal pytorch/pytorch#55256
-
Benjamin Lefaudeux authored
* Adding a unit test which checks for multiple FW passes on the same block * Adding an embedding table, but still no problem to show for it
-
- 08 Apr, 2021 1 commit
-
-
Sam Shleifer authored
-
- 07 Apr, 2021 2 commits
-
-
Benjamin Lefaudeux authored
* Properly handle .train() and .eval() modes * showing that the unit test works, now fixed * code review
-
Myle Ott authored
-
- 06 Apr, 2021 1 commit
-
-
Benjamin Lefaudeux authored
-
- 05 Apr, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* making APIs more private * linting
-
- 04 Apr, 2021 3 commits
-
-
Sam Shleifer authored
-
msbaines authored
This test is flaky for torch >= 1.8.0.
-
Benjamin Lefaudeux authored
-
- 02 Apr, 2021 1 commit
-
-
msbaines authored
NCCL all_to_all is now supported in PyTorch (since v1.8.0) Fixes: #548
-
- 01 Apr, 2021 1 commit
-
-
msbaines authored
-
- 31 Mar, 2021 4 commits
-
-
msbaines authored
-
anj-s authored
* renaming/adding error messages * address comments * address comments * add more comments * add more comments
-
Min Xu authored
[fix] FSDP: disable single rank process group for auto_wrap_bn and fixed mixed precision regnet test (#556) * [fix] disable single rank process group for auto_wrap_bn - beefed up unit test with regnet-like model - found that single-rank process group is causing problem - disabled it to enable convergence tests on the vissl side - use `raise e from None` to get a better assertion output in testing.py. * [test] fix regnet test for ddp+mixed_precision - need AMP context in FSDP - workaround different between ddp & fsdp when bias=True - fixed a bug in input data generation that caused different ranks have the same data with wrong iteration count. - added TODO for need a better loss and grad_scaler and reduced iters so there is no nan. - added a (disabled) debugging code * lint * lint * add scaler * lint * scaler * add a real loss * seeding in the ranks * blance tests * run AMP DDP==FSDP test only on cuda version 11 and up * add relu inplace and comment * make wrap_bn covers more cases in full precision mode
-
msbaines authored
-
- 30 Mar, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* survive the model being moved to device post-construction * make sure that a unit test would catch a regression
-
- 29 Mar, 2021 1 commit
-
-
msbaines authored
-
- 28 Mar, 2021 1 commit
-
-
msbaines authored
-
- 26 Mar, 2021 1 commit
-
-
Min Xu authored
- added DDP equivalency test - added rmf, state_dict_norm functions to testing utils - added more debugging output to objects_are_equal
-
- 25 Mar, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* re-activating unit test * removing changed that slipped in
-