Commits · ad92220c9c26bca88b9daacd6841855c2dcf2def · OpenDAS / fairscale

20 Oct, 2021 1 commit

[feat] layer memory tracking (#808) · ad92220c

Quentin Duval authored Oct 20, 2021



* [feat] layer memory tracking

* [feat] layer memory tracking (add tests in CI)

* [feat] layer memory tracking: doc typos

* [feat] layer memory tracking: mypy fixes

* [feat] layer memory tracking: fixes for FSDP all gather tracking on pytorch 1.9 and above

* [feat] layer memory tracking: lint

* [feat] layer memory tracking: mypy
Co-authored-by: QuentinDuval <QuentinDuval@users.noreply.github.com>

ad92220c

13 Sep, 2021 1 commit
- [OSS] Fixing the fp16 broadcast and catching this case in the unit test (#795) · 180ab8c8
  Benjamin Lefaudeux authored Sep 13, 2021
  
  180ab8c8
12 Sep, 2021 1 commit

[fix] FSDP intra-backwards gradient accumulation. (#784) · 4fa2ab9b

Darryl Barnhart authored Sep 12, 2021

* [fix] FSDP intra-backwards gradient accumulation.

Ensure gradient reduction accumulates into the unsharded gradient tensor
within a backwards pass. This matters when an FSDP module is called
multiple times within a forward pass, and reduction is _not_ deferred
using activation checkpoint forward counters, bucketing or some other
mechanism.

Closes #780

* [refactor] Remove forward counters. Comments.

Removed forward counters from the activation checkpointing utility, now
that FSDP does not require them for correct operation. Add more detailed
comment about memory usage behaviour with gradient reduction.

* [refactor] Delete deprecated forward counter usage.

* [refactor] Add state assertion as end of pre-backward hook.

4fa2ab9b

11 Sep, 2021 1 commit

[feat] set requires_grad of output tensors of checkpointed modules properly (#787) · 482944d9

Alex Xiao authored Sep 10, 2021



Before this commit, output tensors of checkpointed modules always
require grad, even if they shouldn't. This commit makes it so that
the outputs of checkpointed modules only require grad if either
the input requires grad or if the parameters require grad.

To achieve this, this commit also adds a new _unflattened_param_views
attribute to modules being flattened. This allows the checkpointing
to still access the parameters and check if gradients need to be
computed.
Co-authored-by: Alex Xiao <axiao@fb.com>

482944d9

10 Sep, 2021 1 commit
- capture default device when refreshing the params (#786) · e1f36346
  Benjamin Lefaudeux authored Sep 09, 2021
  
  e1f36346
07 Sep, 2021 1 commit

[test] Added disable_checkpointing unit test (#779) · e00dfd95

Achal Dixit authored Sep 08, 2021

* [test] Added disable_checkpointing unit test

* [test] Added disable_checkpointing unit test (Clean-up)

* [test] Added disable_checkpointing unit test (Clean-up)

e00dfd95

06 Sep, 2021 1 commit

[cleanup] CI test updates; mypy cleanup; partial broadcast_object cleanup;... · 3ecf76f4

Min Xu authored Sep 05, 2021


[cleanup] CI test updates; mypy cleanup; partial broadcast_object cleanup; pre-commit documentation (#744)

* changelog; mypy; oss cleanup

* more broadcast_object cleanup in FSDP

* one more mypy fix

* retire pytorch 1.6 from circleci, add new lightly, add 1.8 LTS and 1.9 stable release

* update torch version for LTS

* minor fixes

* update cache key

* trying newer gpu VMs

* bump the cache

* update to gpu.medium, which should be 2 GPUs

* update nightly version

* add pre-commit instruction

* fixed CHANGELOG after merging

* updated to newer nightly

* retained the older broadcast function for older GPUs for oss.py

* fixed a bug

* added a comment

* fixing a test for pytorch 1.10

* testing a fix

* Update fairscale/optim/oss.py

* Update CONTRIBUTING.md
Co-authored-by: Min Xu <min.xu.public@gmail.com>

3ecf76f4

12 Aug, 2021 2 commits

[fix] Add an additional assert for checking if the params of a module requires_grad=True (#761) · 73f73120
anj-s authored Aug 11, 2021
```
* add additional assert for checking if the requires_grad field is set.

* fix lint errors

* add unit tests and address comments
```
73f73120

[FSDP][feature] Support returning the original parameter names after a model... · a825348d

anj-s authored Aug 11, 2021

[FSDP][feature] Support returning the original parameter names after a model has been wrapped with FSDP (#755)

* checkpoint work

* fix lint issues

* remove debug statement

* remove print

* fix lint errors

* fix lint errors

* fix lint errors

* add comments and fix lint errors

* modified comments and tests

a825348d

31 Jul, 2021 1 commit

FSDP: supporting gradient accumulation without no_sync context manager to save GPU memory (#752) · cd0f0b88

Myle Ott authored Jul 31, 2021



* Add test (broken) for gradient accumulation without no_sync context manager

* changelog

* no_sync to grad_acc renaming for tests

* clean up tmp files

* support grad acc without no_sync

* minor

* update changelog

* Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py

Better assertion from Sam.
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* lint
Co-authored-by: Min Xu <min.xu.public@gmail.com>
Co-authored-by: Min Xu <24926999+min-xu-ai@users.noreply.github.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

cd0f0b88

30 Jul, 2021 1 commit

[FSDP] Move final backward callback queueing to pre-backward hook of root instance (#753) · ba7df621

Yanli Zhao authored Jul 30, 2021

Move final backward callback to pre-backward hook of root FSDP instance

Summary:

Move final backward callback to pre-backward hook of root FSDP instance,
so that it is always attached to the outer most backward call and fired
after all backward calls are completed.

Also added flags to check final backward callback is fired when final
backward callback is required.

If root FSDP is checkpointed and called multiple times in forward,
check pointer counter is used to make sure final backward callback is queued inside last inner backward
call as well.

Test Plan: unit tests

Reviewers:

Subscribers:

Tasks:

Tags:

* reformat

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* nits and unit tests

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* address some comments

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* replace m with self
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* reformat

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* nits

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* remove the fired flag

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* assert state on root only

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* comments

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* comments

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

ba7df621

27 Jul, 2021 1 commit
- [fix] OSS fp16 broadcast typo (#751) · b46dcfaf
  Benjamin Lefaudeux authored Jul 27, 2021
  
  b46dcfaf
26 Jul, 2021 1 commit

[feat]: prepare FSDP to handle multiple flatten params and fixed metadata saving for MoE (#746) · 83b0b49e

Min Xu authored Jul 26, 2021



* [feat] FSDP: supporting multiple flatten parameter groups

- step 3: make FSDP use FlattenParamModule unconditionally

* fixing the auto_wrap tests

* minor

* rewrite local_metadata_dict

- updated FPW so that custom flat param name is also supported

* bug fix

* mypy

* rewrote consolidate_shard_weights

- test_consolidate passes

* comments

* fixing pickling

* Fix shared params and MoE logic (#749)

* add strict kwarg to support fairseq:gshard MoE saving logic

* Test fairseq style shard

* style

* formatting and address comments

* added changelog

* fixing a test after padding renaming
Co-authored-by: Min Xu <min.xu.public@gmail.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

83b0b49e

19 Jul, 2021 1 commit

FSDP use _allgather_base and _reduce_scatter_base (#729) · 8bca4f87

liangluofb authored Jul 19, 2021



* Update fully_sharded_data_parallel.py

update fully_sharded_data_parallel to use _allgather_base

* Update reduce_scatter_bucketer.py

Use reduce_scatter_base

* Update fully_sharded_data_parallel.py

nonblocking gradient cpu copy, and nonblocking param rebulds

* Update reduce_scatter_bucketer.py

lints

* Update fully_sharded_data_parallel.py

* Update reduce_scatter_bucketer.py

* Update reduce_scatter_bucketer.py

* lints

* linter, test fix

* linter

* LINTERgit add fairscale/utils/reduce_scatter_bucketer.pygit add fairscale/utils/reduce_scatter_bucketer.py

* LINTERgit add tests/nn/data_parallel/test_fsdp_overlap.pygit add tests/nn/data_parallel/test_fsdp_overlap.py

* Update test_fsdp_overlap.py

* Update fairscale/utils/reduce_scatter_bucketer.py
Co-authored-by: Min Xu <24926999+min-xu-ai@users.noreply.github.com>

* Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py
Co-authored-by: Min Xu <24926999+min-xu-ai@users.noreply.github.com>

* Update reduce_scatter_bucketer.py

* isort
Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-185.ec2.internal>
Co-authored-by: Min Xu <24926999+min-xu-ai@users.noreply.github.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-77-164.ec2.internal>

8bca4f87

07 Jul, 2021 1 commit

Future proof storage size test (#735) · 8d82db43

Edward Z. Yang authored Jul 06, 2021

See https://github.com/pytorch/pytorch/pull/59671/

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

8d82db43

28 Jun, 2021 2 commits

Make sure requires_grad of FlatParameter to be consistent with requires_grad... · 91c7dd05

Yanli Zhao authored Jun 28, 2021

Make sure requires_grad of FlatParameter to be consistent with requires_grad of original parameters (#721)

* Make sure requires_grad of FlatParameter to be consistent with requires_grad of original parameters

* Make sure requires_grad of FlatParameter to be consistent with requires_grad of original parameters

91c7dd05

fixing bug in setting dependencies in partition handler (#723) · 681606f0
Mehdi Mirzazadeh authored Jun 28, 2021
```
* fixing bug in setting dependancies in parition handler

* modifying unit test to need the fix

* black
```
681606f0

26 Jun, 2021 1 commit
- Fix pytorch version check (#716) · bc1e60e0
  Pavel Belevich authored Jun 25, 2021
  
  bc1e60e0
25 Jun, 2021 2 commits
- checking number parameters in distributed pipeline test (#728) · 4a63034e
  Mehdi Mirzazadeh authored Jun 25, 2021
  
  4a63034e
- Preparing pipeline for newer versions of pytorch (#726) · bcd4748d
  Mehdi Mirzazadeh authored Jun 25, 2021
```
* Preparing pipeline for newer versions of pytorch

* updated error message
```
  bcd4748d
22 Jun, 2021 1 commit

Update torch to 1.9.0 release (#717) · 1cc4c837

Pavel Belevich authored Jun 21, 2021

* Update torch to 1.9.0.dev20210614+cu102

* Update config.yml

* Update config.yml

* Update setup.py

* Update config.yml

* Update config.yml

* Update config.yml

* Update config.yml

1cc4c837

21 Jun, 2021 1 commit

[feat] FSDP: supporting multiple flatten parameter groups (#711) · ab71efb3

Min Xu authored Jun 21, 2021



* [feat] FSDP: supporting multiple flatten parameter groups

- step 2: extending FPW to support multiple flat params groups
- FSDP still only use one group
- unit test does this the new code paths
- updated the changelog

* first cut, mypy passed

* test_flatten_params_wrapper.py::TestFlattenParams tests pass

* added two more test cases and fixed a case in the code

* fixed one bug with param_path_infos

* fixed two more tests with hardcoded flat_param names

* Update CHANGELOG.md
Co-authored-by: Min Xu <min.xu.public@gmail.com>

ab71efb3

11 Jun, 2021 2 commits

[Offload][feature] Add auto shard functionality to remove requirement of... · cbeda830

anj-s authored Jun 10, 2021

[Offload][feature] Add auto shard functionality to remove requirement of nn.Sequential models. (#695)

* auto wrap functionality

* lint and doc strings

* fix lint errors

* lint errors and version skips

* remove mypy checking and add conditional import

* another math.prod instance

* another import fix

* address comments

* lint errors

* address comments

* fix lint errors

* add placeholder nodes to tracker list

cbeda830

Use original forward pass directly when in eval mode from within checkpoint wrapper (#709) · 370b8483

Pete authored Jun 10, 2021

* add failing test

* add fix

* use 'torch.is_grad_enabled()' instead of 'module.training'

* Revert "add failing test"

This reverts commit 1c34242208f9b2c5fa6c8f181434c2be6d7cdbc0.

* add simple test

* improve test

* add check for fwd_counter

* revert typing/format changes

* move to new test file

* CHANGELOG

* remove old test

* fix import order

* fix test to be compat with torch 1.6.0

* clean up

* comments

* isort 🤦

370b8483

08 Jun, 2021 1 commit

[feat] supporting multiple flatten parameter groups (step 1 and step 1.5) (#708) · d60fc284

Min Xu authored Jun 08, 2021



* refactoring FlattenParamWrapper

- use a FlatParameter class to encapsulate the logic of
  flattening and expanding into views.
- this will make it easier to have multiple groups of flatten
  parameters

* fixed testing context issues for both temp files and temp dirs

* fixing test_fsdp_metadata

* fix pickling of FlatParameter

* fixed test_fsdp_optimizer_utils.py

* minor

* fix assert

* lint

* remove nesting from the test

* step 1.5: remove the code related unnecessary nesting support in FPW

* Update fairscale/nn/misc/flatten_params_wrapper.py
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* address comment
Co-authored-by: Min Xu <min.xu.public@gmail.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

d60fc284

01 Jun, 2021 1 commit
- Fix buffer dtype in ` FSDP.state_dict()` when using mixed precision (#705) · 25cebf85
  Pete authored Jun 01, 2021
```
* add failing test for buffer dtype

* fix buffer dtype issue

* update CHANGELOG

* fix
```
  25cebf85
28 May, 2021 1 commit

[fix] using dummy tensor to ensure checkpoint backward pass is called in corner cases (#701) · df7db85c

Min Xu authored May 28, 2021



* [do not merge] testing a corner case

* workaround

* using dummy tensor to fix

* lint

* changelog

* update a comment
Co-authored-by: Min Xu <min.xu.public@gmail.com>

df7db85c

27 May, 2021 1 commit
- [perf] SyncBatchNorm: avoid 2nd set of all_reduce when wrapped by checkpoint_wrapper (#694) · 29aae007
  msbaines authored May 26, 2021
```
This change also ensure that we calculate running_{mean,var} correctly
when wrapped.
```
  29aae007
17 May, 2021 2 commits

[fix] auto_wrap: support wrapping based on wrapper_config (#685) · 9d2bbcf2

Min Xu authored May 17, 2021



* [fix] auto_wrap: support wrapping based on wrapper_config

- user can use this to avoid assert if auto_wrap is used multiple times on a module
- user can traverse the modules multiple times and assign a wrapper_config
  to the module and then use auto_wrap once to wrap them

fix #649
fix #585

* added changelog

* fix tests

* fix a test

* added an optional assert for collision based on discussions with Quentin

* added config_auto_wrap_policy

* lint
Co-authored-by: Min Xu <min.xu.public@gmail.com>

9d2bbcf2

[feat] Save FSDP metadata for offline unflattening + Consolidate checkpoints (#683) · 81c20f72

Quentin Duval authored May 17, 2021



* Save FSDP metadata for offline unflattening

* Complete the meta-data saving method with all the information needed to reconstruct a checkpoint offline, and implement the method that reconstruct a consolidated checkpoint from a sharded checkpoint

* Complete the meta-data saving method with all the information needed to reconstruct a checkpoint offline, and implement the method that reconstruct a consolidated checkpoint from a sharded checkpoint

* Add a unit test to show how to use the function

* Code review + improvement of the unit tests

* Code review: extract clean_path

* Make meta data and consolidation of checkpoint work for flatten_parameter=False

* Add new unit test file in CI

* Complete changelog and fix mypy issues

* Add support for module buffers in the consolidation of sharded checkpoints

* Better support for module buffers: save them in the meta data

* Refactoring: use a data-format for the meta data that is simpler to understand (move from object of array to array of object format)

* Renaming to make code clearer

* Code review: in_temporary_directory rework and typo correction

* Renaming
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: QuentinDuval <QuentinDuval@users.noreply.github.com>

81c20f72

14 May, 2021 2 commits

[perf] nn.SyncBatchNorm: use autograd function to save memory (#680) · d240b748
msbaines authored May 14, 2021

d240b748

FSDP: Fix saving and loading checkpoints with use_sharded_state=True (#574) · 468874c8

Shruti Bhosale authored May 14, 2021



* fix saving and loading checkpoints with use_sharded_state=True

* mypy fix

* better fix of the infinite recursion

- we need to specifically call FSDP.state_dict from its local state_dict
- added unit test that fails without the fix and works with the fix
- fixed mypy for the overloaded functions

* make cpu-only fsdp work for state_dict at least
Co-authored-by: Min Xu <min.xu@acm.org>
Co-authored-by: Min Xu <min.xu.public@gmail.com>
Co-authored-by: Min Xu <m1n@fb.com>

468874c8

13 May, 2021 1 commit

[fix] add and use get_process_group_cached (#678) · bde4bac5

Min Xu authored May 12, 2021

* [fix] add and use get_process_group_cached

- This commit makes FSDP avoid making too many process groups by default
- Extra process group is bad for GPU memory and init time

* add changelog

* lint

* note on speed

* add better assert output

test seems to be flaky:
https://app.circleci.com/pipelines/github/facebookresearch/fairscale/2957/workflows/383c9f9f-f1a5-461c-8c41-e2e28ece037b/jobs/26783/steps



* update test reference memory values

- With cached process groups, the memory is reduced as reported by
pytorch as well (due to bucket buffer memory for the reduction buffer)
- The effect on memory is actually more on the SMI memory, which is not
reported by pytorch and checked by this test.

* Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py

* Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py

* Update CHANGELOG.md

* Update fairscale/utils/parallel.py

* Update fairscale/utils/parallel.py

* Update fairscale/utils/parallel.py

* Update fairscale/utils/parallel.py

* improved changelog

* better handling of underscores in the md file
Co-authored-by: Min Xu <min.xu@acm.org>

bde4bac5

12 May, 2021 1 commit

[chore] Rename and move checkpoint_activations from misc folder. (#654) · 72c6bab2

anj-s authored May 12, 2021

* rename files

* add newly renamed file

* rename and move checkpoint activations related files

* add test files to ci list

* fix lint errors

* modify docs

* add changelog

* retain old path for now

* fix lint errors

* add another import test case

* fix merge conflict

* add missing test file

72c6bab2

11 May, 2021 1 commit

[fix] FSDP forward pass overlap between compute and all-gather (#671) · 8a42a8e3

Min Xu authored May 10, 2021



* [fix] FSDP forward pass overlap between compute and all-gather

- much thanks for @cyanguwa for report and @QuentinDuval for debugging it
- a new unit test is added to check for this and ensure we detect
  issue with overlapping and cpu/gpu blocking wait calls

* fix

* fix

* fix

* better assertion outputs

* fix format and tune all_gather mb for CI

* more tuning with non_flatten

* undo an accidental change

* tuning all gather mb and del model

* Update + fix overlapping test to use patched all_gather w/ delay (#672)

* fixing get_cycles_per_ms

* add get_smi_memory

* update the docstring
Co-authored-by: Min Xu <min.xu@acm.org>
Co-authored-by: Myle Ott <myleott@fb.com>

8a42a8e3

08 May, 2021 2 commits
- [test] Force overflow in top2gating test (#664) · 29c01fb1
  Sam Shleifer authored May 08, 2021
  
  29c01fb1
- [chore] Rename and move utils.py from optim/ to utils/ (#669) · 5739930f
  anj-s authored May 07, 2021
```
* rename and move optim/utils.py

* attach the new file
```
  5739930f
07 May, 2021 2 commits

[fix]: support pytorch SyncBatchNorm under AMP & checkpointing with FSDP (#659) · 6db68518

Min Xu authored May 07, 2021



* [test]: add a more general test case

- also rebalance the tests a bit

* added missing arg

* balance

* better checking

* balance

* make test smaller and faster

* make ddp results cached and enable sync_bn

* clean up

* fix tests

* changelog

* blance

* fix

* addressing comments
Co-authored-by: Min Xu <min.xu@acm.org>

6db68518

[feat] experimental.nn.SyncBatchNorm: initial commit (#662) · f0a40046

msbaines authored May 07, 2021

* [feat] experimental.nn.SyncBatchNorm: initial commit

Fast/simple re-implementation of SyncBatchNorm.

When profiling SSL Vision, I was seeing a majority of cycles spent in
SyncBatchNorm. With this change, I see a 10% to 20% speedup on the
model I was profiling.

When running benchmarks/experimental/sync_batchnorm.py on 8 x V100,
I get a 6x speedup:

<class 'torch.nn.modules.batchnorm.BatchNorm2d'>
Elapsed time is  0.08709120750427246
Elapsed time is  0.12632274627685547
Elapsed time is  0.14095258712768555
Elapsed time is  0.16529417037963867
Elapsed time is  0.1419970989227295
Elapsed time is  0.15166854858398438
Elapsed time is  0.12000870704650879
Elapsed time is  0.17534875869750977
<class 'torch.nn.modules.batchnorm.SyncBatchNorm'>
Elapsed time is  2.5087168216705322
Elapsed time is  2.497001886367798
Elapsed time is  2.5204885005950928
Elapsed time is  2.526789903640747
Elapsed time is  2.5080230236053467
Elapsed time is  2.524489641189575
Elapsed time is  2.513214588165283
Elapsed time is  2.5359973907470703
<class 'fairscale.experimental.nn.sync_batchnorm.SyncBatchNorm'>
Elapsed time is  0.4126114845275879
Elapsed time is  0.39051294326782227
Elapsed time is  0.40685415267944336
Elapsed time is  0.4159870147705078
Elapsed time is  0.42383885383605957
Elapsed time is  0.4080159664154053
Elapsed time is  0.41202712059020996
Elapsed time is  0.42400121688842773

f0a40046

05 May, 2021 1 commit

[fix] better assert and better test for frozen weights (#657) · b54eed1b

Min Xu authored May 05, 2021



* [fix] better assert and better test for frozen weights

- the precise condition should have been check m.parameters(), not
  m.params.
- fixes #643

* add changelog

* use enum is so much better
Co-authored-by: Min Xu <min.xu@acm.org>

b54eed1b