Commits · ce1f2cea8be223658c9f94e16a9f226ebd376d30 · OpenDAS / fairscale

07 Apr, 2021 2 commits
- [fix][ShardedDDP] Properly handle .eval() mode (#587) · ce1f2cea
  Benjamin Lefaudeux authored Apr 07, 2021
```
* Properly handle .train() and .eval() modes
* showing that the unit test works, now fixed
* code review
```
  ce1f2cea
- [FSDP] [feat] Add state_dict_device option (#579) · 14abed6e
  Myle Ott authored Apr 07, 2021
  
  14abed6e
06 Apr, 2021 1 commit
- [fix][OSS] two small hotfixes.. repro not obvious for grad_fn (#583) · 121b9db0
  Benjamin Lefaudeux authored Apr 06, 2021
  
  121b9db0
04 Apr, 2021 1 commit
- [FSDP] add no_broadcast_optim_state option (#560) · 1fcbd624
  Sam Shleifer authored Apr 04, 2021
  
  1fcbd624
31 Mar, 2021 2 commits

[fix] FSDP: disable single rank process group for auto_wrap_bn and fixed mixed... · a0458b98

Min Xu authored Mar 31, 2021

[fix] FSDP: disable single rank process group for auto_wrap_bn and fixed mixed precision regnet test (#556)

* [fix] disable single rank process group for auto_wrap_bn

- beefed up unit test with regnet-like model
- found that single-rank process group is causing problem
- disabled it to enable convergence tests on the vissl side
- use `raise e from None` to get a better assertion output
  in testing.py.

* [test] fix regnet test for ddp+mixed_precision

- need AMP context in FSDP
- workaround different between ddp & fsdp when bias=True
- fixed a bug in input data generation that caused different ranks have
  the same data with wrong iteration count.
- added TODO for need a better loss and grad_scaler and reduced
  iters so there is no nan.
- added a (disabled) debugging code

* lint

* lint

* add scaler

* lint

* scaler

* add a real loss

* seeding in the ranks

* blance tests

* run AMP DDP==FSDP test only on cuda version 11 and up

* add relu inplace and comment

* make wrap_bn covers more cases in full precision mode

a0458b98

[chore] add testing of torch 1.9.0 nightly build (#559) · acb9ef00
msbaines authored Mar 31, 2021

acb9ef00

30 Mar, 2021 1 commit

[feat][fix] ShardedDDP deferred init (#558) · daa1bad5

Benjamin Lefaudeux authored Mar 30, 2021

* survive the model being moved to device post-construction
* make sure that a unit test would catch a regression

daa1bad5

26 Mar, 2021 1 commit

[test] FSDP: check with ddp parity with conv + bn (#549) · 0233efca

Min Xu authored Mar 26, 2021

- added DDP equivalency test
- added rmf, state_dict_norm functions to testing utils
- added more debugging output to objects_are_equal

0233efca

25 Mar, 2021 2 commits
- [chore][fix] SDP: yet another unit test improvement + bugfixes (#546) · ece0cbf9
  Benjamin Lefaudeux authored Mar 25, 2021
```
* re-activating unit test
* removing changed that slipped in
```
  ece0cbf9
- [FSDP][feature] optimizer state dict save and load (#537) · 9474d75d
  Sam Shleifer authored Mar 25, 2021
```
Co-authored-by: Min Xu <24926999+min-xu-ai@users.noreply.github.com>
```
  9474d75d
22 Mar, 2021 1 commit
- [ci][SDP] extending the test matrix which checks for equivalence with DDP (#542) · df493a29
  Benjamin Lefaudeux authored Mar 22, 2021
  
  df493a29
20 Mar, 2021 1 commit

[fix][FSDP] fix weight init when using apply() (fixes #490 and #444) (#543) · fa1b85fb

Myle Ott authored Mar 20, 2021

* Add new test for weight init (fails)
* Set FSDP.compute_device so summon_full_params works before module moves to CUDA
* Override FSDP.apply to enable custom weight init

fa1b85fb

18 Mar, 2021 3 commits

[feat] FSDP: add auto_wrap_bn (#531) · 8b59267b

Min Xu authored Mar 18, 2021

* [feat] FSDP: add auto_wrap_bn

- add an utility function to handle wrapping of BN

* changelog

8b59267b

[feature] FSDP: enable pytorch SyncBN (#527) · 2fc1f6d8

Min Xu authored Mar 17, 2021

* [feature] FSDP: enable pytorch SyncBN

- not fully validated yet but at least not asserting
- this enables VISSL to move forward with its next PR

* add the test file

* changelog and lint

* addressed comment

2fc1f6d8

[refactor] removing duplicated tests (#529) · 98223763
Benjamin Lefaudeux authored Mar 17, 2021

98223763

17 Mar, 2021 1 commit
- [fix][SDP] Lightning-compat: deactivating buckets for a single rank, not useful (#514) · d3bfcbf5
  Benjamin Lefaudeux authored Mar 17, 2021
```
* Deactivating buckets for a single rank, not crashing but not useful
```
  d3bfcbf5
12 Mar, 2021 1 commit

[fix] FSDP: multi-pass autograd graph and mixed precision (#513) · 82986ca0

Min Xu authored Mar 12, 2021



* FSDP: multi-pass autograd graph and mixed precision

- added BACKWARD_PRE/POST checking
- better assert_state
- fixed issue of backward hook misfiring

* fix

* cleanup

* Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py
Co-authored-by: Myle Ott <myleott@fb.com>
Co-authored-by: Myle Ott <myleott@fb.com>

82986ca0

11 Mar, 2021 1 commit

[fix][OSS] Adding a hard sync stream barrier before broadcast (#512) · c9fdf506

Benjamin Lefaudeux authored Mar 11, 2021

* Adding a hard sync barrier before the broadcast, mostly useful for Gloo actually, NCCL is synced behind the scene
* adding a proper unit test
* adding a unit test for https://github.com/facebookresearch/fairscale/pull/510

c9fdf506

09 Mar, 2021 2 commits
- [perf] Further improve performance for FSDP.no_sync (#502) · 0cbf3bab
  Myle Ott authored Mar 09, 2021
  
  0cbf3bab
- [fix] FSDP: fix MoE corner case (fixes #467) (#501) · 05ce7971
  Myle Ott authored Mar 08, 2021
  
  05ce7971
08 Mar, 2021 1 commit

[fix]: handle inputs with containers in mixed precision (#486) · 2e9a14e7

Min Xu authored Mar 08, 2021

* [fix]: handle inputs with containers

- this is an issue surfaces by vissl as well
- fix seems to be super simple
- also cleaned up two tests with respect to multiple such tests
  running back to back (they don't do that presently)

* cleanup

* fix

* lint

2e9a14e7

06 Mar, 2021 1 commit
- [perf] FSDP: speed up no_sync and test communication volume (#470) · 1204c7cf
  Myle Ott authored Mar 06, 2021
  
  1204c7cf
05 Mar, 2021 1 commit
- [perf][minor] cache the rank lookups, small shardedddp perf fix (#474) · 131a5356
  Benjamin Lefaudeux authored Mar 05, 2021
```
* [perf][minor] cache the rank lookups, small shardedddp perf fix
* tiny improvement, code quality
```
  131a5356
04 Mar, 2021 1 commit
- [feat] add buffer_dtype kwarg for more control of batchnorm (#458) · b36e01d5
  Sam Shleifer authored Mar 04, 2021
  
  b36e01d5
03 Mar, 2021 1 commit
- [docs] minor doc update (#459) · 428110b8
  Min Xu authored Mar 02, 2021
  
  428110b8
02 Mar, 2021 1 commit
- [fix] Make state_dict all-gather FP32 params (#451) · d2924670
  Myle Ott authored Mar 02, 2021
  
  d2924670
01 Mar, 2021 2 commits

[chores]: make CI more efficient and update py39 env a bit (#447) · 5eb6b8c7

Min Xu authored Mar 01, 2021

* [chores]: CI py39 on GPU and more efficiency

* add test list files

* fix

* add test list files

* split benchmark run into 2 runs

* fix 1.8 version and balance benchmarks

* fix

* fix

* fix

* fix

* recording tests

* py39 install fix

* test again

* move tests

* reorg tests

* skip tests for torch 1.8 due to an upstream bug

* removed __init__.py from tests since it confuses pytest

* Revert "removed __init__.py from tests since it confuses pytest"

This reverts commit 7e156ba33dfaa5ed052031780613ec0cb57a45b0.

* don't include __init__ in file list

* notes on __init__.py and added missing ones

* fixed mypy in a test file

* balance test runtime

* better pip install

* balance more

* pip fix

* balance

* balance more, all test should finish within 20m now

* minor license update

* trying cu102

* more doc and addressed Ben's comments

* debugging

* debugging...

5eb6b8c7

[test] FSDP: add the failing test for #421 (#453) · 5ecac15a

Min Xu authored Mar 01, 2021



* [test] FSDP: add the failing test for #421

* skip on 1.5

* better skipping

* Update tests/nn/data_parallel/test_fsdp_grad_scaler.py
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

5ecac15a

27 Feb, 2021 1 commit

[fix] FSDP: fix the corner case of all params are in the children (#441) · b75a5e26

Min Xu authored Feb 26, 2021

* [fix] FSDP corner case of all params at in the children

* lint

* fix

* tradeoff

* fix doc build

* review comments

b75a5e26

26 Feb, 2021 3 commits
- [fix] fix FSDP state_dict/load_state_dict for nested wrapped instances (#440) · b6dc98cf
  Myle Ott authored Feb 26, 2021
  
  b6dc98cf
- [fix] Fix nested FlattenParamsWrapper state_dict/load_state_dict (#434) · 506d6209
  Myle Ott authored Feb 26, 2021
  
  506d6209
- [feat]: add summon_full_params context mgr (#433) · 77f92b38
  Min Xu authored Feb 25, 2021
```
* [feat]: add summon_full_params context mgr

* fix

* fix

* addressed comments

* fixed the state_dict copy

* lint
```
  77f92b38
25 Feb, 2021 1 commit
- [ShardedDDP][Minor] Backport a bucket flush fix from FSDP, may help a few existing users (#435) · 7ee228bf
  Benjamin Lefaudeux authored Feb 25, 2021
```
* bring back a fix from FSDP, may help a few existing users
```
  7ee228bf
24 Feb, 2021 1 commit
- [fix]: Fix non-float buffers in FSDP (#427) · 9e0df348
  Myle Ott authored Feb 23, 2021
  
  9e0df348
23 Feb, 2021 2 commits

[perf][ShardedDDP] fp16 gradient reduce (#411) · d52d2186

Benjamin Lefaudeux authored Feb 22, 2021

* POC, testing against the DDP comm hook when available
* docs, adding a reference to DDP's compress hook
* updating changelog, prep for v0.1.8 release

d52d2186

Add FullyShardedDataParallel (FSDP) (#413) · 15512d9e

Myle Ott authored Feb 22, 2021

Recent work by [Microsoft](https://arxiv.org/abs/1910.02054) and [Google](https://arxiv.org/abs/2004.13336

) has shown that data parallel training can be made significantly more efficient by sharding the model parameters and optimizer state across data parallel workers. These ideas are encapsulated in the new **`FullyShardedDataParallel` (FSDP)** wrapper, which is a drop-in replacement for PyTorch's `DistributedDataParallel` (DDP) wrapper.

Compared to PyTorch DDP:
* FSDP shards parameters (FP16 + FP32) and optimizer state across data parallel GPUs
* FSDP with `reshard_after_forward=False` has the same communication cost as PyTorch DDP and is similar to ZeRO-2
* FSDP with `reshard_after_forward=True` increases total communication by 50% and is similar to ZeRO-3:
    * all-gather parameters at start of forward pass and start of backward pass
    * reduce-scatter grads at end of backward pass
Co-authored-by: Min Xu <24926999+min-xu-ai@users.noreply.github.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

15512d9e

19 Feb, 2021 1 commit

[feature] Unit test with and without buckets for all ShardedDDP unit tests (#400) · 175fdeb0

Benjamin Lefaudeux authored Feb 19, 2021

* test with and without buckets for all the shardedDDP unit tests
* parametrize all the things
* refactoring, adding even more  combinations at times
* handle hosts not having cuda

175fdeb0

18 Feb, 2021 2 commits
- [feat][ShardedDDP] Support multiple groups (#394) · 205af8c2
  Benjamin Lefaudeux authored Feb 18, 2021
```
* Adding multiple groups support to ShardedDDP + unit test
* adding gloo to the backends tested for multiple groups
```
  205af8c2
- [fix][minor] ShardedDDP train/eval modes (#393) · ef7146d5
  Benjamin Lefaudeux authored Feb 18, 2021
```
* [fix] ShardedDDP train/eval modes
* Update CHANGELOG.md
```
  ef7146d5
17 Feb, 2021 1 commit
- [feat][ShardedDDP] manual reduce option (#389) · 47042917
  Benjamin Lefaudeux authored Feb 16, 2021
```
* initial implementation, with unit test and assert
* added changelog and better debug string
```
  47042917