Commits · c06efdf6607a613e6bfea83fd524cce47b424f17 · OpenDAS / fairscale

08 Mar, 2021 3 commits

[fix] FSDP: fix CPU offload corner case (#496) · c06efdf6
Myle Ott authored Mar 08, 2021

c06efdf6
[docs] add fsdp_tips.rst (#455) · ad611a34
Sam Shleifer authored Mar 08, 2021
```
* Document FSDP tips and tricks in a separate file
```
ad611a34

[fix]: handle inputs with containers in mixed precision (#486) · 2e9a14e7

Min Xu authored Mar 08, 2021

* [fix]: handle inputs with containers

- this is an issue surfaces by vissl as well
- fix seems to be super simple
- also cleaned up two tests with respect to multiple such tests
  running back to back (they don't do that presently)

* cleanup

* fix

* lint

2e9a14e7

06 Mar, 2021 1 commit
- [perf] FSDP: speed up no_sync and test communication volume (#470) · 1204c7cf
  Myle Ott authored Mar 06, 2021
  
  1204c7cf
04 Mar, 2021 1 commit
- [feat] add buffer_dtype kwarg for more control of batchnorm (#458) · b36e01d5
  Sam Shleifer authored Mar 04, 2021
  
  b36e01d5
02 Mar, 2021 2 commits

[fix] Make state_dict all-gather FP32 params (#451) · d2924670
Myle Ott authored Mar 02, 2021

d2924670

[feat] Add context manager to FSDP for easier child module wrapping (#446) · f3359550

Sean Naren authored Mar 02, 2021

This adds a context manager that assists in making child modules with similar defaults.
Usage:
```
from fairscale.nn.misc import enable_wrap, wrap

with enable_wrap(**handleful_of_important_params):
    layer_1 = wrap(torch.nn.Linear(5, 5))
    layer_2 = wrap(torch.nn.Linear(5, 5), flatten_parameters=True) # Override parameters if you'd like

# without the context manager, creates Linear layer
layer_1 = wrap(torch.nn.Linear(5, 5))
```
If not within the FSDP context, this would be a no-op. This makes it easier to annotate layers without having to copy any changes in parameters.

f3359550

01 Mar, 2021 1 commit
- Add is root check to only cast to FP16 on main FSDP wrapper (#452) · 5c5866b3
  Sean Naren authored Mar 01, 2021
  
  5c5866b3
27 Feb, 2021 1 commit

[fix] FSDP: fix the corner case of all params are in the children (#441) · b75a5e26

Min Xu authored Feb 26, 2021

* [fix] FSDP corner case of all params at in the children

* lint

* fix

* tradeoff

* fix doc build

* review comments

b75a5e26

26 Feb, 2021 2 commits
- [fix] fix FSDP state_dict/load_state_dict for nested wrapped instances (#440) · b6dc98cf
  Myle Ott authored Feb 26, 2021
  
  b6dc98cf
- [feat]: add summon_full_params context mgr (#433) · 77f92b38
  Min Xu authored Feb 25, 2021
```
* [feat]: add summon_full_params context mgr

* fix

* fix

* addressed comments

* fixed the state_dict copy

* lint
```
  77f92b38
25 Feb, 2021 1 commit
- [cleanup] FSDP docstrings (#428) · 6b2897ca
  Myle Ott authored Feb 25, 2021
  
  6b2897ca
24 Feb, 2021 1 commit
- [fix]: Fix non-float buffers in FSDP (#427) · 9e0df348
  Myle Ott authored Feb 23, 2021
  
  9e0df348
23 Feb, 2021 2 commits

[docs] fsdp changelog and doc (#414) · 2b15720b
Min Xu authored Feb 22, 2021

2b15720b

Add FullyShardedDataParallel (FSDP) (#413) · 15512d9e

Myle Ott authored Feb 22, 2021

Recent work by [Microsoft](https://arxiv.org/abs/1910.02054) and [Google](https://arxiv.org/abs/2004.13336

) has shown that data parallel training can be made significantly more efficient by sharding the model parameters and optimizer state across data parallel workers. These ideas are encapsulated in the new **`FullyShardedDataParallel` (FSDP)** wrapper, which is a drop-in replacement for PyTorch's `DistributedDataParallel` (DDP) wrapper.

Compared to PyTorch DDP:
* FSDP shards parameters (FP16 + FP32) and optimizer state across data parallel GPUs
* FSDP with `reshard_after_forward=False` has the same communication cost as PyTorch DDP and is similar to ZeRO-2
* FSDP with `reshard_after_forward=True` increases total communication by 50% and is similar to ZeRO-3:
    * all-gather parameters at start of forward pass and start of backward pass
    * reduce-scatter grads at end of backward pass
Co-authored-by: Min Xu <24926999+min-xu-ai@users.noreply.github.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

15512d9e