Commits · ccda8bd0b4dd691d05d3fe0e39b00c2885b8db31 · OpenDAS / fairscale

09 Mar, 2021 9 commits
- [docs] OSS/SDP minor, clarifying a confusing line in the docs #477 (#505) · ccda8bd0
  Benjamin Lefaudeux authored Mar 09, 2021
  
  ccda8bd0
- [fix] flaky SDP tests with Gloo, checking all handles (#499) · 9c4e6d1a
  Benjamin Lefaudeux authored Mar 09, 2021
```
* seemingly fix flakyness for gloo by checking all coms handles
```
  9c4e6d1a
- [refactor] Fix for using synthetic data + remove unused flags (#485) · 8eaa3622
  anj-s authored Mar 09, 2021
```
* smal fix, remove unused flags

* remove usused flag

* add back max_batch flag

* adding back lazy_construction

* adding back lazy_construction

* add missing device arg
Co-authored-by: Anjali Sridhar <anj@devfair0443.h2.fair>
```
  8eaa3622
- [chore] 0.3.1 release (#504) · 84cec202
  Min Xu authored Mar 09, 2021
```
* [chore] 0.3.1 release

- mainly because vissl needs the new version
- added a doc on release steps

* Update CHANGELOG.md
Co-authored-by: anj-s <32556631+anj-s@users.noreply.github.com>

* review comments
Co-authored-by: anj-s <32556631+anj-s@users.noreply.github.com>
```
  84cec202
- [perf] Further improve performance for FSDP.no_sync (#502) · 0cbf3bab
  Myle Ott authored Mar 09, 2021
  
  0cbf3bab
- readme: lossing --> losing (#503) · aa9129a3
  brett koonce authored Mar 09, 2021
  
  aa9129a3
- [fix] FSDP: fix MoE corner case (fixes #467) (#501) · 05ce7971
  Myle Ott authored Mar 08, 2021
  
  05ce7971
- [fix] oss and interleaved param groups (#483) · 02405740
  Benjamin Lefaudeux authored Mar 08, 2021
  
  02405740
- [doc] fix enable_wrap syntax in FSDP docs (#497) · 64bbb6e1
  Sam Shleifer authored Mar 08, 2021
  
  64bbb6e1
08 Mar, 2021 5 commits
- [fix] FSDP: fix CPU offload corner case (#496) · c06efdf6
  Myle Ott authored Mar 08, 2021
  
  c06efdf6
- [docs] add fsdp_tips.rst (#455) · ad611a34
  Sam Shleifer authored Mar 08, 2021
```
* Document FSDP tips and tricks in a separate file
```
  ad611a34
- [chore] OSS perf test, super minor (#495) · 886aa327
  Benjamin Lefaudeux authored Mar 08, 2021
  
  886aa327
- Fixed RNN support for containers (#494) · 8c405c51
  Sean Naren authored Mar 08, 2021
```
* Fix packed sequence apply

* Update fairscale/utils/containers.py
Co-authored-by: Min Xu <24926999+min-xu-ai@users.noreply.github.com>
```
  8c405c51
- [fix]: handle inputs with containers in mixed precision (#486) · 2e9a14e7
  Min Xu authored Mar 08, 2021
```
* [fix]: handle inputs with containers

- this is an issue surfaces by vissl as well
- fix seems to be super simple
- also cleaned up two tests with respect to multiple such tests
  running back to back (they don't do that presently)

* cleanup

* fix

* lint
```
  2e9a14e7
06 Mar, 2021 1 commit
- [perf] FSDP: speed up no_sync and test communication volume (#470) · 1204c7cf
  Myle Ott authored Mar 06, 2021
  
  1204c7cf
05 Mar, 2021 7 commits

vfdev authored Mar 06, 2021

Update example as style arg was removed in https://github.com/facebookresearch/fairscale/pull/345

0b8d0753

[fix] OSS speed ref adjusted down.. (#480) · 80cc7559
Benjamin Lefaudeux authored Mar 05, 2021
```
:(
```
80cc7559
[chore] Robustify CI, apt-get retries (#482) · d168481c
Benjamin Lefaudeux authored Mar 05, 2021

d168481c

[refactor] enhance wrap and auto_wrap (#467) · a05a79bc

Min Xu authored Mar 05, 2021



* [refactor] enhance wrap and auto_wrap

- Two things were done in this PR
  1. We don't need to import FSDP in wrap.py since the wrapper class
     type is stored in the context now.
  2. We can use a `auto_wrap_policy` function to customize wrapping policy
     for auto_wrap, including size of module, blacklist, exclude list
- The auto_wrap function got simplified a bit as a minor side effect.

* Update fairscale/nn/wrap/auto_wrap.py
Co-authored-by: Sean Naren <sean@grid.ai>

* addressed comments

* addressed more comments
Co-authored-by: Sean Naren <sean@grid.ai>

a05a79bc

[perf][minor] cache the rank lookups, small shardedddp perf fix (#474) · 131a5356
Benjamin Lefaudeux authored Mar 05, 2021
```
* [perf][minor] cache the rank lookups, small shardedddp perf fix
* tiny improvement, code quality
```
131a5356
[fix][minor] Change empty shard handling for OSS, do not rely on asserts (#460) · d1fab39e
Benjamin Lefaudeux authored Mar 04, 2021
```
* change empty shard handling for OSS, do not rely on asserts
* code review
```
d1fab39e

[fix]: CI and check_version (#475) · f565d443

Min Xu authored Mar 04, 2021

* [hotfix]: fix a bug in CI command

* debug

* debug

* bump cache ver

* fix

* eq

* check

* bump

* addressed comment

f565d443

04 Mar, 2021 6 commits

[feat]: checkpoint and normalization (#457) · 5e64d6a7

Min Xu authored Mar 04, 2021

* [feat]: checkpoint and normalization

- added special handling of BN for track_running_stats and checkpointing
- we test BN/LN and checkpointing
- we test them with mixed precision

5e64d6a7

[feat] add buffer_dtype kwarg for more control of batchnorm (#458) · b36e01d5
Sam Shleifer authored Mar 04, 2021

b36e01d5
Fix ampnet unit tests (#466) · 103d33c1
Siddharth Goyal authored Mar 04, 2021
```
* Fix ampnet unit test by adding delegate object

* Remove comments
```
103d33c1

[test] AdaScale & SDP/FSDP (#468) · efed9cee

Min Xu authored Mar 04, 2021

- cover them in terms of code path only
- numerically, AdaScale is different on SDP/FSDP than DDP, mainly
  due to partial view of the gradients.
- this doesn't mean it is definitely not useful but it is yet to
  be validated.
- not going to spend too much time until we have a real use case.

efed9cee

[chore] move a test script and a CI test improvement (#464) · eeabc6f1
Min Xu authored Mar 03, 2021
```
* [chore] move a test script

* add a shortcut for installing

* more skipping

* keep apt-get part
```
eeabc6f1
[fix] Cache MNIST fetchs, use alternative URLs (#465) · 0491715f
Benjamin Lefaudeux authored Mar 03, 2021

0491715f

03 Mar, 2021 3 commits
- [refactor] Use logging in place of print statements, remove unused functions... · 7a3199b1
  anj-s authored Mar 02, 2021
```
[refactor] Use logging in place of print statements, remove unused functions and other minor refactoring changes. (#461)

* fix pipe logging and other cleanups

* more log/debug changes
```
  7a3199b1
- [docs] minor doc update (#459) · 428110b8
  Min Xu authored Mar 02, 2021
  
  428110b8
- [refactor] multiprocess_pipe: avoid unnecessary use of create_task and other cleanup (#456) · 8f77255b
  msbaines authored Mar 02, 2021
  
  8f77255b
02 Mar, 2021 2 commits

[fix] Make state_dict all-gather FP32 params (#451) · d2924670
Myle Ott authored Mar 02, 2021

d2924670

[feat] Add context manager to FSDP for easier child module wrapping (#446) · f3359550

Sean Naren authored Mar 02, 2021

This adds a context manager that assists in making child modules with similar defaults.
Usage:
```
from fairscale.nn.misc import enable_wrap, wrap

with enable_wrap(**handleful_of_important_params):
    layer_1 = wrap(torch.nn.Linear(5, 5))
    layer_2 = wrap(torch.nn.Linear(5, 5), flatten_parameters=True) # Override parameters if you'd like

# without the context manager, creates Linear layer
layer_1 = wrap(torch.nn.Linear(5, 5))
```
If not within the FSDP context, this would be a no-op. This makes it easier to annotate layers without having to copy any changes in parameters.

f3359550

01 Mar, 2021 3 commits

[chores]: make CI more efficient and update py39 env a bit (#447) · 5eb6b8c7

Min Xu authored Mar 01, 2021

* [chores]: CI py39 on GPU and more efficiency

* add test list files

* fix

* add test list files

* split benchmark run into 2 runs

* fix 1.8 version and balance benchmarks

* fix

* fix

* fix

* fix

* recording tests

* py39 install fix

* test again

* move tests

* reorg tests

* skip tests for torch 1.8 due to an upstream bug

* removed __init__.py from tests since it confuses pytest

* Revert "removed __init__.py from tests since it confuses pytest"

This reverts commit 7e156ba33dfaa5ed052031780613ec0cb57a45b0.

* don't include __init__ in file list

* notes on __init__.py and added missing ones

* fixed mypy in a test file

* balance test runtime

* better pip install

* balance more

* pip fix

* balance

* balance more, all test should finish within 20m now

* minor license update

* trying cu102

* more doc and addressed Ben's comments

* debugging

* debugging...

5eb6b8c7

[test] FSDP: add the failing test for #421 (#453) · 5ecac15a

Min Xu authored Mar 01, 2021



* [test] FSDP: add the failing test for #421

* skip on 1.5

* better skipping

* Update tests/nn/data_parallel/test_fsdp_grad_scaler.py
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

5ecac15a

Add is root check to only cast to FP16 on main FSDP wrapper (#452) · 5c5866b3
Sean Naren authored Mar 01, 2021

5c5866b3

27 Feb, 2021 3 commits
- [fix] fixed typo (#448) · c114a219
  vfdev authored Feb 28, 2021
  
  c114a219
- [fix] FSDP: fix the corner case of all params are in the children (#441) · b75a5e26
  Min Xu authored Feb 26, 2021
```
* [fix] FSDP corner case of all params at in the children

* lint

* fix

* tradeoff

* fix doc build

* review comments
```
  b75a5e26
- Update README.md (#443) · bd04f21f
  Vittorio Caggiano authored Feb 26, 2021
  
  bd04f21f
26 Feb, 2021 1 commit
- [fix] fix FSDP state_dict/load_state_dict for nested wrapped instances (#440) · b6dc98cf
  Myle Ott authored Feb 26, 2021
  
  b6dc98cf