Commits · 3e36cd07680cfd02d9916ea968d960340c78d90d · OpenDAS / fairscale

09 Mar, 2022 1 commit
- [chore] 0.4.6 release (#953) · 3e36cd07
  tmarkstrum authored Mar 09, 2022
```
* [chore] 0.4.6 release

* added the third party libs removed by precommit
```
  3e36cd07
08 Mar, 2022 1 commit

[chore] Fix copyright headers & fixed issue with mypy & NumPy versions in pre-commit (#951) · 8fa26ae4

Min Xu authored Mar 08, 2022



* copyright headers

* isort and pyproject.toml

* precommit and requirement for isort-seed-config

* mypy

* dummy change

* numpy version for pre-commit

* fix mypy issue caused by numpy
Co-authored-by: Min Xu <min.xu.public@gmail.com>

8fa26ae4

05 Mar, 2022 1 commit

docs: add GH button in support of Ukraine (#949) · 2877474c

Dmitry Vinnik authored Mar 04, 2022

* Adding ELI5 video to Fairscale

* docs: add GH button in support of Ukraine

## Summary:
Our mission at Meta Open Source is to empower communities through open source, and we believe that it means building a welcoming and safe environment for all. As a part of this work, we are adding this banner in support for Ukraine during this crisis.

2877474c

04 Mar, 2022 1 commit
- Update README.md (#946) · a444eeec
  Vittorio Caggiano authored Mar 04, 2022
  
  a444eeec
03 Mar, 2022 1 commit

[fix] FSDP: EMA related fixes (#922) · 9f347f37

Min Xu authored Mar 03, 2022



* add an ignore file

* [fix] FSDP: handle the lazy_init better

- when state_dict and load_state_dict is called, let'em not change
  the lazy_init state.

* changelog

* longer timeout

* Revert "longer timeout"

This reverts commit 00cc145fe86210a0972a1e7ba4f37531b9e091eb.

* testing

* adding the failed test

* fix the global to local id

* formatting

* more complete fix and test

* minor fix for an assert

* update changelog

* remove an extra line

* Update fairscale/nn/data_parallel/fsdp_optim_utils.py
Co-authored-by: anj-s <32556631+anj-s@users.noreply.github.com>

* Update fairscale/nn/data_parallel/fsdp_optim_utils.py
Co-authored-by: anj-s <32556631+anj-s@users.noreply.github.com>

* Update fairscale/nn/data_parallel/fsdp_optim_utils.py
Co-authored-by: anj-s <32556631+anj-s@users.noreply.github.com>

* addressed review comments
Co-authored-by: Min Xu <min.xu.public@gmail.com>
Co-authored-by: anj-s <32556631+anj-s@users.noreply.github.com>

9f347f37

02 Mar, 2022 2 commits

Adding ELI5 video to Fairscale (#939) · 2ca4f0ee
Dmitry Vinnik authored Mar 02, 2022

2ca4f0ee

Add a new arg, "force_broadcast_object", to OSS __init__ (#942) · 105f6507

foreveronehundred authored Mar 02, 2022

* [FSDP] Add an arg for FSDP __init__

Add an arg, disable_reshard_on_root, for FSDP __init__ to handle the following issue
https://github.com/facebookresearch/fairscale/issues/878


For some cases (models wrapped by autowrap), the parameters (of root modules) needs to be sharded, and reshard_after_forward should not be set to False.
"disable_reshard_on_root" is for users to choose whether to force reshard_after_forward of root modules to be False or not.

* Update fully_sharded_data_parallel.py

Modified the description of the feature to explain more clear.

* Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py

Update the comments for disable_reshard_on_root
Co-authored-by: Min Xu <24926999+min-xu-ai@users.noreply.github.com>

* Modified the comments

Modified the comments of disable_reshard_on_root

* Add a new argument for OSS __init__

Add a new argument for OSS __init__ to force the OSS to apply "_broadcast_object" for rebuilding the sharded optimizer. For more details, please see https://github.com/facebookresearch/fairscale/issues/937



* Remove redundant space

Remove redundant space
Co-authored-by: Min Xu <24926999+min-xu-ai@users.noreply.github.com>

105f6507

23 Feb, 2022 2 commits
- [fix][FSDP] Add support for saving optimizer state with expert replication (#936) · 40e7450f
  anj-s authored Feb 23, 2022
```
* checkpoint tests

* checkpoint tests

* fix tests

* lint fixes

* remove prints

* lint fixes

* add comments

* add changelog

* more cleanup

* lint fix
```
  40e7450f
- fix typo (#938) · cb72ae54
  anj-s authored Feb 22, 2022
  
  cb72ae54
22 Feb, 2022 1 commit

[benchmarks] Add benchmarks for FSDP (#765) · f9a125db

anj-s authored Feb 22, 2022

* add benchmarks for fsdp

* fix lint errors

* clean up

* clean up unused flags

* add the benchmarks

* remove unused args

* fix lint errors

* fix lint errors

* update command line

* add support for multiple devices

* try full fp16 mode

* try full fp16 mode

* lint errors

* merge main

* lint errors

* lint errors

* lint error

* update intersphinx mapping for numpy

* update intersphinx mapping for numpy

* skip test

* added golden configs

* use synthetic benchmarks

* fix fn name

* fix cuda device id

* fix verify

* lint fix

f9a125db

15 Feb, 2022 2 commits

Update CHANGELOG.md (#935) · 9090bfdc

ruanslv authored Feb 15, 2022

* Update CHANGELOG.md

Adding https://github.com/facebookresearch/fairscale/pull/930

 to changelog

* Update CHANGELOG.md
Co-authored-by: Min Xu <24926999+min-xu-ai@users.noreply.github.com>

9090bfdc

[fix] Add option to wrap root module in auto_wrap (#930) · 3b8f445f

ruanslv authored Feb 15, 2022



* [fix] Add option to wrap root module in auto_wrap

* Fix unit-test comment

* adding a few more tests to make expected behavior clear

* move changes to wrap policy as suggested

* set default to false

* revert pre-commit change

* revert pre-commit change 2
Co-authored-by: Ruan Silva <ruanrms@fb.com>

3b8f445f

14 Feb, 2022 1 commit

[chore] [cleanup]: pytest, pytorch new versions, fix tests (#933) · fae29959

Min Xu authored Feb 14, 2022



* update pytest versions

* [test] test related changes

- upgrade to newer pytorch versions
- added function to make test more deterministic on A100 and TF32
- fixed some tests so that they are correctly skipped on a single GPU system

* more fixes

* formatting overly long lines

* format

* better test without trigger a warning

* fix an optim state bug with newer pytorch

- adam optimizer seems to return "step" as a singleton tensor now in the
nightly build
- this fixes it assumeing non-tensor value can still be loaded back by
the optimizer

* improve oss.py

- use min_loss for regression checking is a bit more reliable
- also increased the num epochs from 10 to 12

* small oss.py fix

* Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py
Co-authored-by: Min Xu <min.xu.public@gmail.com>

fae29959

11 Feb, 2022 1 commit

[minor] skipping one more flaky test (#932) · 8527c587

Min Xu authored Feb 11, 2022



* skipping one more test

* formatting

* minor fix and copyright header

* comment
Co-authored-by: Min Xu <min.xu.public@gmail.com>

8527c587

08 Feb, 2022 2 commits

[FSDP] Add an arg for FSDP __init__ (#926) · 67bf5bf8

foreveronehundred authored Feb 09, 2022

* [FSDP] Add an arg for FSDP __init__

Add an arg, disable_reshard_on_root, for FSDP __init__ to handle the following issue
https://github.com/facebookresearch/fairscale/issues/878


For some cases (models wrapped by autowrap), the parameters (of root modules) needs to be sharded, and reshard_after_forward should not be set to False.
"disable_reshard_on_root" is for users to choose whether to force reshard_after_forward of root modules to be False or not.

* Update fully_sharded_data_parallel.py

Modified the description of the feature to explain more clear.

* Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py

Update the comments for disable_reshard_on_root
Co-authored-by: Min Xu <24926999+min-xu-ai@users.noreply.github.com>

* Modified the comments

Modified the comments of disable_reshard_on_root
Co-authored-by: Min Xu <24926999+min-xu-ai@users.noreply.github.com>

67bf5bf8

[chore] Fix docs build by updating the numpy intersphinx mapping (#929) · 7202115e

anj-s authored Feb 08, 2022

* update intersphinx mapping for numpy

* update intersphinx mapping for numpy

* update pytorch mapping and disable test

7202115e

28 Jan, 2022 1 commit

[feat] add CosFace paper's LMCL to MEVO (#916) · 89e1ae5f

Min Xu authored Jan 27, 2022



* [feat] add CosFace paper's LMCL to MEVO

- added baseline algorithm to the reference kernel
- added MEVO version of LMCL
- added unit test to verify it is correct with respect to the reference as well as its memory usage

* updated changelog
Co-authored-by: Min Xu <min.xu.public@gmail.com>

89e1ae5f

25 Jan, 2022 2 commits
- [minor] make backward assert a bit better (#919) · 8ba649e1
  Min Xu authored Jan 25, 2022
```
* [minor] better assert in backward

* mypy
Co-authored-by: Min Xu <min.xu.public@gmail.com>
```
  8ba649e1
- [fix] reduce unit test memory and workaround the flakiness of the test (#917) · 5d8a505c
  Min Xu authored Jan 25, 2022
```
* [fix] reduce unit test memory

* set seed in CI

* fix random seed function

* giving up CI, //sigh
```
  5d8a505c
20 Jan, 2022 1 commit
- [FSDP] Add FairScale FSDP adoptions logging (#913) · 6f18e779
  Yanli Zhao authored Jan 20, 2022
```
* Add FairScale FSDP adoptions logging

* Add FairScale FSDP adoptions logging
```
  6f18e779
18 Jan, 2022 1 commit
- FSDP: better traceback for dtype assertion (#912) · fef44233
  Sam Shleifer authored Jan 17, 2022
  
  fef44233
14 Jan, 2022 3 commits
- 0.4.5 release · 6b2f992c
  Anupam Bhatnagar authored Jan 14, 2022
  
  6b2f992c
- [Chore]release 0.4.5 (#911) · 4a3bd93a
  tmarkstrum authored Jan 14, 2022
```
* release 0.4.5

* added some content for the release

* fixed a format issue.
```
  4a3bd93a
- small fixes to layerwise gradient scaler (#910) · 10d21b38
  Anupam Bhatnagar authored Jan 14, 2022
  
  10d21b38
13 Jan, 2022 3 commits

[skip ci] fixing typos · 39e7821a
Anupam Bhatnagar authored Jan 13, 2022

39e7821a

[feature] [experimental] Layerwise Gradient Scaler (#879) · 52d066a2

Anupam Bhatnagar authored Jan 12, 2022

* [skip ci] first commit

* [skip ci] gradient scaler example

* [skip ci] adding feed forward toy example

* [skip ci] adding types

* [skip ci] adding backward hook

* [skip ci] update

* [skip ci] working feed forward example

* [skip ci] working feed forward example

* [skip ci] use named_modules instead of named_children

* [skip ci] adding new file

* [skip ci] clean up

* [skip ci] implement unscale function

* [skip ci] implement unscale function

* [skip ci] removing old file

* [skip ci] removing some more old files

* [skip ci] making unscale function generic

* [skip ci] adding test for vision model

* [skip ci] adding identity layer

* [skip ci] cleanup files

* [skip ci] refactoring

* [skip ci] more refactoring

* [skip ci] added functionality to update scale

* [skip ci] data loader clean up

* [skip ci] implemented inf checks and update scale functions

* [skip ci]code clean up. added...

52d066a2

[Fix][FSDP]fixed padding size of input tensor for reduce scatter (#907) · fb4eca19

tmarkstrum authored Jan 12, 2022



* fixed padding size of input tensor for reduce scatter, and fixed an error that assigned wrong group

* Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py
Co-authored-by: Min Xu <24926999+min-xu-ai@users.noreply.github.com>

* added changelog

* fixed some commit.

* added unit test to ensure the reduce_scatter process group size is correct in default cases. And fall back to default process grouop when the reduce_scatter process group has the wrong size.

* throw an error instead of rolling back to use default process group for reduce_scatter_process_group

* Revert "throw an error instead of rolling back to use default process group for reduce_scatter_process_group"

This reverts commit eab5620da3b726ea55d3088ae4ca10d94dcdf4d9.

* added check for None to avoid unit test failure

* fixed an error to avoid the unit tests failure
Co-authored-by: Min Xu <24926999+min-xu-ai@users.noreply.github.com>

fb4eca19

12 Jan, 2022 1 commit

[chore] Update the CHANGELOG to add details about the new feature that enables... · 0044372c

tmarkstrum authored Jan 11, 2022

[chore] Update the CHANGELOG to add details about the new feature that enables reduce_scatter overlap in backward propagation (#906)

* updated the change log

* improve the change log

0044372c

07 Jan, 2022 1 commit

[FSDP] Enable FSDP reduce scatter overlap (#897) · 0a526bcb

tmarkstrum authored Jan 07, 2022

* enable reduce scatter overlap with other operations

* fixed unit tests and added docstrings for the new parameters for fsdp

* fixed more unit tests

* fixed unit tests

* avoided the pickle error on process_group_reduce_scatter

* removed an unnecessary parameter in unit tests

* remove unnecessary prints

* fixed the docstring

* skipped the test_offload unit test because this unit test failed in the main branch

* removed the enable_reduce_scatter_overlap API parameter

* added doc string for the defualt value of process_group_reduce_scatter parameter

* fixed a syntax bug

* fixed a bug which cause unitest failure

* removed the all_gather in the ProcessGroupName enum

* added more comment

* changed the default value of process_group_reduce_scatter from None to ProcessGroupName.reduce_scatter

0a526bcb

06 Jan, 2022 2 commits

fix trailing space issue (#903) · 02a8913c
tmarkstrum authored Jan 06, 2022

02a8913c

FullyShardedDataParallel: only return full state dict on rank 0 (#885) · d3417ceb

four4fish authored Jan 06, 2022

* FullyShardedDataParallel: only return full state dict on rank 0

* Add flag and make rank 0 only optional

* Add tests

* Add docs

* address comments

* update comments

* update torch nightly version

* update torchvision number for torch nightly dependence

* add changelog

* Update CHANGELOG.md

* Update CHANGELOG.md

d3417ceb

05 Jan, 2022 1 commit

Enabling ssd_offload training basic tests. (#887) · c5e471bc

Paul Johnson authored Jan 05, 2022

* Enabling ssd_offload training and test via tests/nn/data_parallel/test_fsdp_offload.py.
* Removed unused classes: SsdBuffer, SsdTensorHandleView, SsdParameter, SsdTensor
* Enhance test coverage of test_ssd_offloading_train_flatten_params_wrapper
* Modifications from PR #887 review comments.
* Update Changelog

c5e471bc

24 Dec, 2021 1 commit
- [skip ci] update release.md (#896) · 541bb8c9
  Anupam Bhatnagar authored Dec 23, 2021
```
* [skip ci] update release.md

* [skip ci] minor edit
```
  541bb8c9
21 Dec, 2021 5 commits

0.4.4 release · 38af6d32
Anupam Bhatnagar authored Dec 21, 2021

38af6d32
[skip ci] updating date in changelog (#892) · 8397f766
Anupam Bhatnagar authored Dec 21, 2021

8397f766

Changelog update (#891) · 8e770bac

Anupam Bhatnagar authored Dec 21, 2021

* [skip ci] adding comments to changelog

* adding date to changelog

* [skip ci] minor edit

8e770bac

[Fix] - Finiteness check for all tensors (#890) · c3fc3894
Anupam Bhatnagar authored Dec 21, 2021
```
* Finiteness check for all tensors

* [skip ci] updating changelog
```
c3fc3894

Release automation (#888) · 49eacf12

Anupam Bhatnagar authored Dec 21, 2021

* [skip ci] first commit to automate release process

* empty commit

* fix syntax

* fix next_version value

* fixing more syntax

* remove uses

* fix

* fixed path in setup.py

* trying a basic example

* adding branch

* change release to name

* adding first step

* remove push trigger

* change order in ON section

* modifying manual workflow

* adding fairscale release workflow

* removing unused workflows

* replacing values with secrets

* fixing __version__ in __init__.py

* cleanup

* restoring import statement

49eacf12

16 Dec, 2021 1 commit

Added warn_on_trainable_params_changed constructor parameter to allow the user... · 99163d4f

Freddy Snijder authored Dec 16, 2021

Added warn_on_trainable_params_changed constructor parameter to allow the user to suppress the warning on trainable parameters changed (#886)

* Added warn_on_trainable_params_changed constructor parameter to allow the user to suppress the warning on trainable parameters changed; the default is True and thus the default behavior is unchanged

* Addded parameter documentation

99163d4f

13 Dec, 2021 1 commit

[feat] support eval in mevo (#884) · 56add6d5

Min Xu authored Dec 13, 2021

- During eval, we will fallback to just output projection without fusing
- added unit test to ensure the shape is correct

56add6d5