Commits · 5e64d6a71d19dfce689a5d97a652e2cfaf29266d · OpenDAS / fairscale

04 Mar, 2021 5 commits

[feat]: checkpoint and normalization (#457) · 5e64d6a7

Min Xu authored Mar 04, 2021

* [feat]: checkpoint and normalization

- added special handling of BN for track_running_stats and checkpointing
- we test BN/LN and checkpointing
- we test them with mixed precision

5e64d6a7

[feat] add buffer_dtype kwarg for more control of batchnorm (#458) · b36e01d5
Sam Shleifer authored Mar 04, 2021

b36e01d5
Fix ampnet unit tests (#466) · 103d33c1
Siddharth Goyal authored Mar 04, 2021
```
* Fix ampnet unit test by adding delegate object

* Remove comments
```
103d33c1

[test] AdaScale & SDP/FSDP (#468) · efed9cee

Min Xu authored Mar 04, 2021

- cover them in terms of code path only
- numerically, AdaScale is different on SDP/FSDP than DDP, mainly
  due to partial view of the gradients.
- this doesn't mean it is definitely not useful but it is yet to
  be validated.
- not going to spend too much time until we have a real use case.

efed9cee

[chore] move a test script and a CI test improvement (#464) · eeabc6f1
Min Xu authored Mar 03, 2021
```
* [chore] move a test script

* add a shortcut for installing

* more skipping

* keep apt-get part
```
eeabc6f1

03 Mar, 2021 2 commits
- [refactor] Use logging in place of print statements, remove unused functions... · 7a3199b1
  anj-s authored Mar 02, 2021
```
[refactor] Use logging in place of print statements, remove unused functions and other minor refactoring changes. (#461)

* fix pipe logging and other cleanups

* more log/debug changes
```
  7a3199b1
- [docs] minor doc update (#459) · 428110b8
  Min Xu authored Mar 02, 2021
  
  428110b8
02 Mar, 2021 2 commits

[fix] Make state_dict all-gather FP32 params (#451) · d2924670
Myle Ott authored Mar 02, 2021

d2924670

[feat] Add context manager to FSDP for easier child module wrapping (#446) · f3359550

Sean Naren authored Mar 02, 2021

This adds a context manager that assists in making child modules with similar defaults.
Usage:
```
from fairscale.nn.misc import enable_wrap, wrap

with enable_wrap(**handleful_of_important_params):
    layer_1 = wrap(torch.nn.Linear(5, 5))
    layer_2 = wrap(torch.nn.Linear(5, 5), flatten_parameters=True) # Override parameters if you'd like

# without the context manager, creates Linear layer
layer_1 = wrap(torch.nn.Linear(5, 5))
```
If not within the FSDP context, this would be a no-op. This makes it easier to annotate layers without having to copy any changes in parameters.

f3359550

01 Mar, 2021 2 commits

[chores]: make CI more efficient and update py39 env a bit (#447) · 5eb6b8c7

Min Xu authored Mar 01, 2021

* [chores]: CI py39 on GPU and more efficiency

* add test list files

* fix

* add test list files

* split benchmark run into 2 runs

* fix 1.8 version and balance benchmarks

* fix

* fix

* fix

* fix

* recording tests

* py39 install fix

* test again

* move tests

* reorg tests

* skip tests for torch 1.8 due to an upstream bug

* removed __init__.py from tests since it confuses pytest

* Revert "removed __init__.py from tests since it confuses pytest"

This reverts commit 7e156ba33dfaa5ed052031780613ec0cb57a45b0.

* don't include __init__ in file list

* notes on __init__.py and added missing ones

* fixed mypy in a test file

* balance test runtime

* better pip install

* balance more

* pip fix

* balance

* balance more, all test should finish within 20m now

* minor license update

* trying cu102

* more doc and addressed Ben's comments

* debugging

* debugging...

5eb6b8c7

[test] FSDP: add the failing test for #421 (#453) · 5ecac15a

Min Xu authored Mar 01, 2021



* [test] FSDP: add the failing test for #421

* skip on 1.5

* better skipping

* Update tests/nn/data_parallel/test_fsdp_grad_scaler.py
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

5ecac15a

27 Feb, 2021 1 commit

[fix] FSDP: fix the corner case of all params are in the children (#441) · b75a5e26

Min Xu authored Feb 26, 2021

* [fix] FSDP corner case of all params at in the children

* lint

* fix

* tradeoff

* fix doc build

* review comments

b75a5e26

26 Feb, 2021 4 commits

[fix] fix FSDP state_dict/load_state_dict for nested wrapped instances (#440) · b6dc98cf
Myle Ott authored Feb 26, 2021

b6dc98cf
[fix] Fix nested FlattenParamsWrapper state_dict/load_state_dict (#434) · 506d6209
Myle Ott authored Feb 26, 2021

506d6209

[feat]: add summon_full_params context mgr (#433) · 77f92b38

Min Xu authored Feb 25, 2021

* [feat]: add summon_full_params context mgr

* fix

* fix

* addressed comments

* fixed the state_dict copy

* lint

77f92b38

[feature] Add support for OffloadModel to enable training large models on 1 GPU. (#432) · f7813d6d

anj-s authored Feb 25, 2021



* clean start

* removing per layer split strategy, probably not that useful indeed

* initial transformer benchmark

* hack, enable testing ViT + offload, python3 benchmarks/oss.py  --epochs 2 --optim_type oss_offload_ddp --batch_size=32 --model vit_large_patch16_224

* proper cuda streams and device, something off in terms of mems consumption

* minor, stashing

* unit test fix

* removing all the distributed parts

* simpler test, needs debugging

* working OOP, running a model which does not fit on the gpu memory

* spring cleaning

* removing the ill-advised optimizer bits, better keep that orthogonal

* [offload] Add support for activation offloading + other changes (#367)

* initial fwd/bwd commit

* checkpoint work

* modify shard loop

* activation offloading and test to start with

* fix lint errors

* update comments

* fix lint

* remove unused var

* remove commented out lines

* modify name

* remove break

* remove profiler comments

* avoid saving inputs

* fix lint errors
Co-authored-by: Anjali Sridhar <anj@devfair0443.h2.fair>

* [offload] Add support for fp16 training (#374)

* initial fwd/bwd commit

* checkpoint work

* modify shard loop

* activation offloading and test to start with

* fix lint errors

* update comments

* fix lint

* remove unused var

* remove commented out lines

* modify name

* remove break

* remove profiler comments

* add support for fp16

* add unit tests

* fix lint errors

* fix test failure
Co-authored-by: Anjali Sridhar <anj@devfair0443.h2.fair>

* [offload] Add support for activation checkpointing for all layers. (#381)

* initial fwd/bwd commit

* checkpoint work

* modify shard loop

* activation offloading and test to start with

* fix lint errors

* update comments

* fix lint

* remove unused var

* remove commented out lines

* modify name

* remove break

* remove profiler comments

* add support for fp16

* add unit tests

* fix lint errors

* fix test failure

* cp work, incorrect output dimensions still need to be fixed

* fixed activation outputs

* intermediate cp of work

* add tests

* fix lint errors
Co-authored-by: Anjali Sridhar <anj@devfair0443.h2.fair>

* add support for microbatches

* revert benchmark config changes

* add parametrization

* fix lint errors and tests

* skip test for 1.5

* fix lint errors

* skip test if there are no GPUs

* fix lint errors

* fix lint errors

* move experimental to the fairscale repo

* lint error fixes

* modify test imports

* lint error fixes

* move offload files to the experimental directory

* move tests and benchmarks to their forlder

* fix mypy errors

* cp intermediate working benchmarks

* more changes

* split benchmark configs

* remove print statements

* fix lint errors

* remove unused print

* stress testing

* remove unused file

* change param nae

* lint fixes

* move file to the right folder

* offload_experimental

* add doc string

* add error message
Co-authored-by: Benjamin Lefaudeux <benjamin.lefaudeux@gmail.com>
Co-authored-by: Benjamin Lefaudeux <benjamin.lefaudeux@protonmail.com>
Co-authored-by: Anjali Sridhar <anj@devfair0443.h2.fair>

f7813d6d

25 Feb, 2021 2 commits
- [ShardedDDP][Minor] Backport a bucket flush fix from FSDP, may help a few existing users (#435) · 7ee228bf
  Benjamin Lefaudeux authored Feb 25, 2021
```
* bring back a fix from FSDP, may help a few existing users
```
  7ee228bf
- [test] checkpoint: multiple input and output model test (#425) · 2478a9ad
  Min Xu authored Feb 25, 2021
  
  2478a9ad
24 Feb, 2021 2 commits

[refactor] Modify folder locations for tests/ to mirror source code tree. (#419) · 3b0717eb

anj-s authored Feb 24, 2021



* refactor experimental file locations

* refactor fix

* disable test temporarily

* lint error fix

* make the change in the right file

* fix lint errors

* skip failing tests
Co-authored-by: Anjali Sridhar <anj@devfair0443.h2.fair>

3b0717eb

[fix]: Fix non-float buffers in FSDP (#427) · 9e0df348
Myle Ott authored Feb 23, 2021

9e0df348

23 Feb, 2021 4 commits

[test]: add peak mem in checkpoint test (#415) · 4b5b4d3d

Min Xu authored Feb 23, 2021

* [test]: add peak mem in checkpoint test

* more debugging

* new test

* more fix

* better collection of debug in case of future failures

* update the comment

* typo

* comment

* clarify

* better wording

4b5b4d3d

[perf][ShardedDDP] fp16 gradient reduce (#411) · d52d2186

Benjamin Lefaudeux authored Feb 22, 2021

* POC, testing against the DDP comm hook when available
* docs, adding a reference to DDP's compress hook
* updating changelog, prep for v0.1.8 release

d52d2186

[bug]: not all CUDA memory is freed when model is deleted (#412) · e3035933

Min Xu authored Feb 22, 2021

* [bug]: not all CUDA memory is freed when model is deleted

* fixed memory leak

- without this, peak memory will be high when more than one model
  is trained (i.e. first model leave staff around pushing up the
  peak memory when the second model runs)

* addressed comments

* fix

* changelog

e3035933

Add FullyShardedDataParallel (FSDP) (#413) · 15512d9e

Myle Ott authored Feb 22, 2021

Recent work by [Microsoft](https://arxiv.org/abs/1910.02054) and [Google](https://arxiv.org/abs/2004.13336

) has shown that data parallel training can be made significantly more efficient by sharding the model parameters and optimizer state across data parallel workers. These ideas are encapsulated in the new **`FullyShardedDataParallel` (FSDP)** wrapper, which is a drop-in replacement for PyTorch's `DistributedDataParallel` (DDP) wrapper.

Compared to PyTorch DDP:
* FSDP shards parameters (FP16 + FP32) and optimizer state across data parallel GPUs
* FSDP with `reshard_after_forward=False` has the same communication cost as PyTorch DDP and is similar to ZeRO-2
* FSDP with `reshard_after_forward=True` increases total communication by 50% and is similar to ZeRO-3:
    * all-gather parameters at start of forward pass and start of backward pass
    * reduce-scatter grads at end of backward pass
Co-authored-by: Min Xu <24926999+min-xu-ai@users.noreply.github.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

15512d9e

22 Feb, 2021 1 commit
- [fix][OSS] adding an assert for empty shards + corresponding unit test (#406) · 279b8024
  Benjamin Lefaudeux authored Feb 22, 2021
```
* adding an assert + corresponding unit test
* updated changelog
* adjusting the adascale tests
```
  279b8024
19 Feb, 2021 2 commits
- [feature] Unit test with and without buckets for all ShardedDDP unit tests (#400) · 175fdeb0
  Benjamin Lefaudeux authored Feb 19, 2021
```
* test with and without buckets for all the shardedDDP unit tests
* parametrize all the things
* refactoring, adding even more  combinations at times
* handle hosts not having cuda
```
  175fdeb0
- [bug]: fix a bug on custom smoothing factor (#401) · 4396ef4a
  Min Xu authored Feb 18, 2021
  
  4396ef4a
18 Feb, 2021 2 commits
- [feat][ShardedDDP] Support multiple groups (#394) · 205af8c2
  Benjamin Lefaudeux authored Feb 18, 2021
```
* Adding multiple groups support to ShardedDDP + unit test
* adding gloo to the backends tested for multiple groups
```
  205af8c2
- [fix][minor] ShardedDDP train/eval modes (#393) · ef7146d5
  Benjamin Lefaudeux authored Feb 18, 2021
```
* [fix] ShardedDDP train/eval modes
* Update CHANGELOG.md
```
  ef7146d5
17 Feb, 2021 1 commit
- [feat][ShardedDDP] manual reduce option (#389) · 47042917
  Benjamin Lefaudeux authored Feb 16, 2021
```
* initial implementation, with unit test and assert
* added changelog and better debug string
```
  47042917
14 Feb, 2021 1 commit

[fix] OSS dict load/save fix - better fix than 383 and unit test (#386) · 54bd62d3

Benjamin Lefaudeux authored Feb 13, 2021

* WIP, needs to be fixed !

* should be a fix, many thanks Weiyi Zheng

* slightly better unit test, sorting the states on the way out

* reproducing the issue from Weiyi in a unit test, and finally properly fixing

* fixing unit test on pytorch1.5 - original loss diff 26.404895782470703 - 26.404342651367188

54bd62d3

12 Feb, 2021 1 commit

[feature-fix-refactor][ShardedDDP] Make it possible to change trainability graph on the fly (#369) · 13445c55

Benjamin Lefaudeux authored Feb 11, 2021

* Better unit testing
* Make it possible to refresh the DDP assumptions when the model has changed. Make it optional so that you save some time
* Enabling accumulation tests

13445c55

10 Feb, 2021 1 commit

Add fairscale.nn.misc.checkpoint_activations (#376) · c963a72a

Myle Ott authored Feb 10, 2021



* Add fairscale.utils.containers
Co-authored-by: Min Xu <24926999+min-xu-ai@users.noreply.github.com>

* Add fairscale.nn.misc.checkpoint_activations
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Min Xu <24926999+min-xu-ai@users.noreply.github.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

c963a72a

09 Feb, 2021 1 commit
- [refactor] remove multiprocess dependency on async (#373) · b1b9e0f8
  msbaines authored Feb 08, 2021
  
  b1b9e0f8
05 Feb, 2021 1 commit
- [fix] repro+fix (#365) · 8778fa66
  Benjamin Lefaudeux authored Feb 05, 2021
```
fix a broken earlier commit, only worked for the first step
```
  8778fa66
04 Feb, 2021 4 commits
- [refactor] multiprocess_pipe: remove pipelined_backward (#362) · 42e44149
  msbaines authored Feb 04, 2021
  
  42e44149
- [feat] ShardedDDP : Adding a proper DDP parity / AMP unit test, overdue (#361) · 5c3ff9bd
  Benjamin Lefaudeux authored Feb 04, 2021
```
* Adding a proper ddp parity / AMP unit test, overdue
* catch non-AMP pytorch
```
  5c3ff9bd
- [refactor] multiprocess_pipe: focus on LazyModule usage (#360) · e3a20fef
  msbaines authored Feb 03, 2021
  
  e3a20fef
- [refactor] multiprocess_pipe: cleanup __init__ (#357) · 39675773
  msbaines authored Feb 03, 2021
  
  39675773
03 Feb, 2021 1 commit
- [chore] disheartening switch off of a OSS cpu test (#356) · 011c0c41
  Benjamin Lefaudeux authored Feb 03, 2021
```
* precise skip, only if agent has only cpu
```
  011c0c41