Commits · 25cebf85667bc42f17a0899bba41efe8777fdd11 · OpenDAS / fairscale

01 Jun, 2021 2 commits

Fix buffer dtype in ` FSDP.state_dict()` when using mixed precision (#705) · 25cebf85
Pete authored Jun 01, 2021
```
* add failing test for buffer dtype

* fix buffer dtype issue

* update CHANGELOG

* fix
```
25cebf85

[test] fixing 1.9 nightly install (#706) · 3443a635

Min Xu authored Jun 01, 2021



* [test] fixing 1.9 nightly install

* update cache version so that we don't keep reinstall
Co-authored-by: Min Xu <min.xu.public@gmail.com>

3443a635

28 May, 2021 2 commits

[fix] using dummy tensor to ensure checkpoint backward pass is called in corner cases (#701) · df7db85c

Min Xu authored May 28, 2021



* [do not merge] testing a corner case

* workaround

* using dummy tensor to fix

* lint

* changelog

* update a comment
Co-authored-by: Min Xu <min.xu.public@gmail.com>

df7db85c

[docs] Update README (#702) · 1bcab8dd
anj-s authored May 27, 2021
```
* update installation instructions

* modify README

* fix heading
```
1bcab8dd

27 May, 2021 3 commits
- update workflow diagram (#699) · b84b9146
  anj-s authored May 26, 2021
  
  b84b9146
- [docs] Revamp FairScale documentation (#698) · dcfb7a99
  anj-s authored May 26, 2021
```
* add tutorials

* add new context, modify and delete existing docs

* remove duplicate labels

* modify layout and more nits

* address comments

* fix merge conflicts
```
  dcfb7a99
- [perf] SyncBatchNorm: avoid 2nd set of all_reduce when wrapped by checkpoint_wrapper (#694) · 29aae007
  msbaines authored May 26, 2021
```
This change also ensure that we calculate running_{mean,var} correctly
when wrapped.
```
  29aae007
26 May, 2021 2 commits
- [docs] add MOE to docs (#693) · 3dcc9eff
  msbaines authored May 26, 2021
  
  3dcc9eff
- Update CONTRIBUTING.md · 2c663f5a
  anj-s authored May 26, 2021
  
  2c663f5a
21 May, 2021 1 commit

[refactor] ShardedGradScaler init and super call (#691) · 945b9666

Nicholas Cilfone authored May 21, 2021

Make ShardedGradScaler __init__ mirror GradScaler so super can forward parameters. Without this one cannot configure a ShardedGradScaler object like one can with the PyTorch native GradScaler object.
Updated with black linter.
Added stub for GradScaler __init__ which solves mypy issues and removed
ignore comment.

945b9666

18 May, 2021 2 commits
- [potential fix] Rename codecov yaml file according to docs (#687) · 8a05ff76
  anj-s authored May 18, 2021
```
* rename codecov yaml file

* remove status checks
```
  8a05ff76
- [chore] 0.3.7 release (#686) · a462df2e
  Min Xu authored May 17, 2021
```
* [chore] 0.3.7 release

* fixed changelog
Co-authored-by: Min Xu <min.xu.public@gmail.com>
```
  a462df2e
17 May, 2021 2 commits

[fix] auto_wrap: support wrapping based on wrapper_config (#685) · 9d2bbcf2

Min Xu authored May 17, 2021



* [fix] auto_wrap: support wrapping based on wrapper_config

- user can use this to avoid assert if auto_wrap is used multiple times on a module
- user can traverse the modules multiple times and assign a wrapper_config
  to the module and then use auto_wrap once to wrap them

fix #649
fix #585

* added changelog

* fix tests

* fix a test

* added an optional assert for collision based on discussions with Quentin

* added config_auto_wrap_policy

* lint
Co-authored-by: Min Xu <min.xu.public@gmail.com>

9d2bbcf2

[feat] Save FSDP metadata for offline unflattening + Consolidate checkpoints (#683) · 81c20f72

Quentin Duval authored May 17, 2021



* Save FSDP metadata for offline unflattening

* Complete the meta-data saving method with all the information needed to reconstruct a checkpoint offline, and implement the method that reconstruct a consolidated checkpoint from a sharded checkpoint

* Complete the meta-data saving method with all the information needed to reconstruct a checkpoint offline, and implement the method that reconstruct a consolidated checkpoint from a sharded checkpoint

* Add a unit test to show how to use the function

* Code review + improvement of the unit tests

* Code review: extract clean_path

* Make meta data and consolidation of checkpoint work for flatten_parameter=False

* Add new unit test file in CI

* Complete changelog and fix mypy issues

* Add support for module buffers in the consolidation of sharded checkpoints

* Better support for module buffers: save them in the meta data

* Refactoring: use a data-format for the meta data that is simpler to understand (move from object of array to array of object format)

* Renaming to make code clearer

* Code review: in_temporary_directory rework and typo correction

* Renaming
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: QuentinDuval <QuentinDuval@users.noreply.github.com>

81c20f72

14 May, 2021 4 commits

[perf] nn.SyncBatchNorm: use autograd function to save memory (#680) · d240b748
msbaines authored May 14, 2021

d240b748
[refactor] [fsdp] Modify FSDP API param name to better reflect functionality (#676) · 5be4817d
anj-s authored May 14, 2021
```
* api changes

* fix list

* modify changelog

* modify changelog

* modify changelog

* move function
```
5be4817d

[minor] use dist.group.WORLD for default process group (#681) · bbac5564

Min Xu authored May 14, 2021



* [minor] use dist.group.WORLD for default process group

- this is slightly more efficient than the previous commit
  for get_process_group_cached.

* fix

* better fix

* fixed for pytorch 1.6 and 1.7

* Update fairscale/utils/parallel.py
Co-authored-by: Min Xu <min.xu@acm.org>
Co-authored-by: Min Xu <min.xu.public@gmail.com>

bbac5564

FSDP: Fix saving and loading checkpoints with use_sharded_state=True (#574) · 468874c8

Shruti Bhosale authored May 14, 2021



* fix saving and loading checkpoints with use_sharded_state=True

* mypy fix

* better fix of the infinite recursion

- we need to specifically call FSDP.state_dict from its local state_dict
- added unit test that fails without the fix and works with the fix
- fixed mypy for the overloaded functions

* make cpu-only fsdp work for state_dict at least
Co-authored-by: Min Xu <min.xu@acm.org>
Co-authored-by: Min Xu <min.xu.public@gmail.com>
Co-authored-by: Min Xu <m1n@fb.com>

468874c8

13 May, 2021 1 commit

[fix] add and use get_process_group_cached (#678) · bde4bac5

Min Xu authored May 12, 2021

* [fix] add and use get_process_group_cached

- This commit makes FSDP avoid making too many process groups by default
- Extra process group is bad for GPU memory and init time

* add changelog

* lint

* note on speed

* add better assert output

test seems to be flaky:
https://app.circleci.com/pipelines/github/facebookresearch/fairscale/2957/workflows/383c9f9f-f1a5-461c-8c41-e2e28ece037b/jobs/26783/steps



* update test reference memory values

- With cached process groups, the memory is reduced as reported by
pytorch as well (due to bucket buffer memory for the reduction buffer)
- The effect on memory is actually more on the SMI memory, which is not
reported by pytorch and checked by this test.

* Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py

* Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py

* Update CHANGELOG.md

* Update fairscale/utils/parallel.py

* Update fairscale/utils/parallel.py

* Update fairscale/utils/parallel.py

* Update fairscale/utils/parallel.py

* improved changelog

* better handling of underscores in the md file
Co-authored-by: Min Xu <min.xu@acm.org>

bde4bac5

12 May, 2021 2 commits

[chore] Rename and move checkpoint_activations from misc folder. (#654) · 72c6bab2

anj-s authored May 12, 2021

* rename files

* add newly renamed file

* rename and move checkpoint activations related files

* add test files to ci list

* fix lint errors

* modify docs

* add changelog

* retain old path for now

* fix lint errors

* add another import test case

* fix merge conflict

* add missing test file

72c6bab2

[offload] Add pragma directives to ensure we ignore the backward pass functions. (#675) · c141f8db
anj-s authored May 12, 2021
```
* add pragma

* add mypy ignore comments

* fix comment

* add more no cover comments

* add comments
```
c141f8db

11 May, 2021 1 commit

[fix] FSDP forward pass overlap between compute and all-gather (#671) · 8a42a8e3

Min Xu authored May 10, 2021



* [fix] FSDP forward pass overlap between compute and all-gather

- much thanks for @cyanguwa for report and @QuentinDuval for debugging it
- a new unit test is added to check for this and ensure we detect
  issue with overlapping and cpu/gpu blocking wait calls

* fix

* fix

* fix

* better assertion outputs

* fix format and tune all_gather mb for CI

* more tuning with non_flatten

* undo an accidental change

* tuning all gather mb and del model

* Update + fix overlapping test to use patched all_gather w/ delay (#672)

* fixing get_cycles_per_ms

* add get_smi_memory

* update the docstring
Co-authored-by: Min Xu <min.xu@acm.org>
Co-authored-by: Myle Ott <myleott@fb.com>

8a42a8e3

10 May, 2021 2 commits

[chore] Updating PR template (#674) · c8d32c30

Min Xu authored May 10, 2021

* [chore] Updating PR template

Add N/A (Not Applicable) options to some of the questions in the PR template

* Update PULL_REQUEST_TEMPLATE.md

* Update PULL_REQUEST_TEMPLATE.md

c8d32c30

[minor] clarify a comment (#673) · 6c61887d

Min Xu authored May 10, 2021

- we do have a use case of empty params inside a FSDP -- for the
overlapping fsdp unit test, we use it to measure timing of compute
when no params is needed for all_gather
- therefore, I updated the comment to be more correct there.
- fixes #661

6c61887d

08 May, 2021 5 commits
- [fix] nn.moe: softmax should be done in FP32 (#668) · 002aae63
  msbaines authored May 08, 2021
```
Co-authored-by: @myleott
```
  002aae63
- [perf] nn.moe: replace einsum with faster equivalent code (#667) · 29d81c43
  msbaines authored May 08, 2021
```
Co-authored-by: @myleott
```
  29d81c43
- [chore][benchmarks] Add license file headers for all files in fairscale/benchmarks (#670) · a9156260
  anj-s authored May 08, 2021
```
* add license file headers for all files

* fix lint
```
  a9156260
- [test] Force overflow in top2gating test (#664) · 29c01fb1
  Sam Shleifer authored May 08, 2021
  
  29c01fb1
- [chore] Rename and move utils.py from optim/ to utils/ (#669) · 5739930f
  anj-s authored May 07, 2021
```
* rename and move optim/utils.py

* attach the new file
```
  5739930f
07 May, 2021 3 commits

[perf] nn.moe: workaround inefficiency in PyTorch's one_hot (#666) · 99b30a04
msbaines authored May 07, 2021
```
Workaround for https://github.com/pytorch/pytorch/issues/55579

Co-authored-by: @shruti-bh, @myleott
```
99b30a04

[fix]: support pytorch SyncBatchNorm under AMP & checkpointing with FSDP (#659) · 6db68518

Min Xu authored May 07, 2021



* [test]: add a more general test case

- also rebalance the tests a bit

* added missing arg

* balance

* better checking

* balance

* make test smaller and faster

* make ddp results cached and enable sync_bn

* clean up

* fix tests

* changelog

* blance

* fix

* addressing comments
Co-authored-by: Min Xu <min.xu@acm.org>

6db68518

[feat] experimental.nn.SyncBatchNorm: initial commit (#662) · f0a40046

msbaines authored May 07, 2021

* [feat] experimental.nn.SyncBatchNorm: initial commit

Fast/simple re-implementation of SyncBatchNorm.

When profiling SSL Vision, I was seeing a majority of cycles spent in
SyncBatchNorm. With this change, I see a 10% to 20% speedup on the
model I was profiling.

When running benchmarks/experimental/sync_batchnorm.py on 8 x V100,
I get a 6x speedup:

<class 'torch.nn.modules.batchnorm.BatchNorm2d'>
Elapsed time is  0.08709120750427246
Elapsed time is  0.12632274627685547
Elapsed time is  0.14095258712768555
Elapsed time is  0.16529417037963867
Elapsed time is  0.1419970989227295
Elapsed time is  0.15166854858398438
Elapsed time is  0.12000870704650879
Elapsed time is  0.17534875869750977
<class 'torch.nn.modules.batchnorm.SyncBatchNorm'>
Elapsed time is  2.5087168216705322
Elapsed time is  2.497001886367798
Elapsed time is  2.5204885005950928
Elapsed time is  2.526789903640747
Elapsed time is  2.5080230236053467
Elapsed time is  2.524489641189575
Elapsed time is  2.513214588165283
Elapsed time is  2.5359973907470703
<class 'fairscale.experimental.nn.sync_batchnorm.SyncBatchNorm'>
Elapsed time is  0.4126114845275879
Elapsed time is  0.39051294326782227
Elapsed time is  0.40685415267944336
Elapsed time is  0.4159870147705078
Elapsed time is  0.42383885383605957
Elapsed time is  0.4080159664154053
Elapsed time is  0.41202712059020996
Elapsed time is  0.42400121688842773

f0a40046

05 May, 2021 6 commits

[fix] better assert and better test for frozen weights (#657) · b54eed1b

Min Xu authored May 05, 2021



* [fix] better assert and better test for frozen weights

- the precise condition should have been check m.parameters(), not
  m.params.
- fixes #643

* add changelog

* use enum is so much better
Co-authored-by: Min Xu <min.xu@acm.org>

b54eed1b

[fix][adascale] Fix infinite loop in docstring (#656) · 1ae77784
anj-s authored May 05, 2021
```
* fix infinite loop in docstring

* fix docstring
```
1ae77784
[draft][chore] SDP : increase code coverage (#653) · 69cbdf5d
Benjamin Lefaudeux authored May 05, 2021
```
* increasing the code coverage, good practice and raising bugs.  hopefully getting to 100%
* small bugfix
```
69cbdf5d
[chore] Rename misc.py to better reflect functionality. (#652) · c65a48f3
anj-s authored May 04, 2021
```
* rename files

* add newly renamed file
```
c65a48f3
add info about PEP8 style guide (#651) · 0ce85af2
anj-s authored May 04, 2021
```
Co-authored-by: Anjali Sridhar <anj@devfair0443.h2.fair>
```
0ce85af2

[fix] add clear_autocast_cache flag (#650) · 861b5ce2

Min Xu authored May 04, 2021



* [fix] add clear_autocast_cache flag

- when training in AMP model with weight dtype32, FSDP may need to
  optionally clear the autocast cache to avoid GPU OOM
- this flag is default false, automatically doing it is a future TODO
- also added a verbose flag to make print(fsdp_model) a bit shorter
- updated the memory test to cover those new code
- added a couple of useful functions in parallel.py and testing.py

* minor

* address comments

* format

* improve the test
Co-authored-by: Min Xu <min.xu@acm.org>

861b5ce2

04 May, 2021 1 commit

[feat]Adding DynamicLossScaler class for supporting optimizer updates on the CPU (#635) · 14d1f78c

tmarkstrum authored May 03, 2021

* dynamic loss scaler

* isort

* black

* flake8

* comments

* added the test to ci file, added a line to catch the overflow error, fixed some formatting errors

* adding type annotation

* added todo for adding more test cases for handling Nan gradients

* fix some doc string and comments, add more tods

* fix two doc strings

14d1f78c

03 May, 2021 1 commit
- [fix] SDP: expose module property fix + unit test (#647) · 4e438ba1
  Benjamin Lefaudeux authored May 03, 2021
```
* fix + unit test
* changelog update
```
  4e438ba1