Commits · 1c88e3b771567792e99624473de29d09900fcc4b · OpenDAS / fairscale

02 Apr, 2021 3 commits
- move back · 1c88e3b7
  Anjali Sridhar authored Apr 02, 2021
  
  1c88e3b7
- move grad scaler to the tutorials folder · 79a9373a
  Anjali Sridhar authored Apr 02, 2021
  
  79a9373a
- [offload] Add support for record_function when using OffloadModel (#564) · c19cc897
  anj-s authored Apr 01, 2021
```
* add record_function support

* add more record_function cutpoints

* add more record_function cutpoints

* lint errors

* make string ids more specific
```
  c19cc897
01 Apr, 2021 1 commit
- [feat] remove old MultiProcessPipe (#563) · 2d3d5a7b
  msbaines authored Apr 01, 2021
  
  2d3d5a7b
31 Mar, 2021 5 commits

[feat] experimental: Add xpipe support (#553) · e141a93e
Siddharth Goyal authored Mar 31, 2021

e141a93e
[refactor] multiprocess_pipe: only support torch >= 1.9.0 (#561) · 204392e5
msbaines authored Mar 31, 2021

204392e5
[offload] Audit OffloadModel API, add error messages and remove redundant code path. (#557) · 34384e1b
anj-s authored Mar 31, 2021
```
* renaming/adding error messages

* address comments

* address comments

* add more comments

* add more comments
```
34384e1b

[fix] FSDP: disable single rank process group for auto_wrap_bn and fixed mixed... · a0458b98

Min Xu authored Mar 31, 2021

[fix] FSDP: disable single rank process group for auto_wrap_bn and fixed mixed precision regnet test (#556)

* [fix] disable single rank process group for auto_wrap_bn

- beefed up unit test with regnet-like model
- found that single-rank process group is causing problem
- disabled it to enable convergence tests on the vissl side
- use `raise e from None` to get a better assertion output
  in testing.py.

* [test] fix regnet test for ddp+mixed_precision

- need AMP context in FSDP
- workaround different between ddp & fsdp when bias=True
- fixed a bug in input data generation that caused different ranks have
  the same data with wrong iteration count.
- added TODO for need a better loss and grad_scaler and reduced
  iters so there is no nan.
- added a (disabled) debugging code

* lint

* lint

* add scaler

* lint

* scaler

* add a real loss

* seeding in the ranks

* blance tests

* run AMP DDP==FSDP test only on cuda version 11 and up

* add relu inplace and comment

* make wrap_bn covers more cases in full precision mode

a0458b98

[chore] add testing of torch 1.9.0 nightly build (#559) · acb9ef00
msbaines authored Mar 31, 2021

acb9ef00

30 Mar, 2021 1 commit

[feat][fix] ShardedDDP deferred init (#558) · daa1bad5

Benjamin Lefaudeux authored Mar 30, 2021

* survive the model being moved to device post-construction
* make sure that a unit test would catch a regression

daa1bad5

29 Mar, 2021 3 commits
- [feat] multiproces_pipe: add checkpoint support (#555) · 5e6a7a57
  msbaines authored Mar 29, 2021
  
  5e6a7a57
- [chore] Enable codecov for fairscale (#551) · 9a950651
  anj-s authored Mar 29, 2021
```
* codedcov testing

* codecov testnig

* more changes for uploading cov

* fix invalid config

* fix invalid config

* modify name

* fix config
Co-authored-by: Anjali Sridhar <anj@devfair0443.h2.fair>
```
  9a950651
- [chore] update to torch v1.8.1 (#554) · c9db4775
  msbaines authored Mar 28, 2021
  
  c9db4775
28 Mar, 2021 1 commit
- [feat] multiprocess_pipe: add support for testing gpu-gpu rpc (#552) · 62635f0f
  msbaines authored Mar 28, 2021
  
  62635f0f
26 Mar, 2021 2 commits

[cleanup] consistent __init__.py for import * (#550) · 9a6ca9bd
Min Xu authored Mar 26, 2021
```
- fixes #471
- one less thing to worry about during development.
```
9a6ca9bd

[test] FSDP: check with ddp parity with conv + bn (#549) · 0233efca

Min Xu authored Mar 26, 2021

- added DDP equivalency test
- added rmf, state_dict_norm functions to testing utils
- added more debugging output to objects_are_equal

0233efca

25 Mar, 2021 3 commits
- [doc] Adding some more ShardedDDP documentation (#547) · a2b11de4
  Benjamin Lefaudeux authored Mar 25, 2021
  
  a2b11de4
- [chore][fix] SDP: yet another unit test improvement + bugfixes (#546) · ece0cbf9
  Benjamin Lefaudeux authored Mar 25, 2021
```
* re-activating unit test
* removing changed that slipped in
```
  ece0cbf9
- [FSDP][feature] optimizer state dict save and load (#537) · 9474d75d
  Sam Shleifer authored Mar 25, 2021
```
Co-authored-by: Min Xu <24926999+min-xu-ai@users.noreply.github.com>
```
  9474d75d
22 Mar, 2021 1 commit
- [ci][SDP] extending the test matrix which checks for equivalence with DDP (#542) · df493a29
  Benjamin Lefaudeux authored Mar 22, 2021
  
  df493a29
20 Mar, 2021 1 commit

[fix][FSDP] fix weight init when using apply() (fixes #490 and #444) (#543) · fa1b85fb

Myle Ott authored Mar 20, 2021

* Add new test for weight init (fails)
* Set FSDP.compute_device so summon_full_params works before module moves to CUDA
* Override FSDP.apply to enable custom weight init

fa1b85fb

19 Mar, 2021 3 commits
- [feat][refactor][OSS] Param buckets + fp16 broadcasts (#540) · e3865549
  Benjamin Lefaudeux authored Mar 19, 2021
```
* param buckets
* unifying the buckets
```
  e3865549
- [test] use workaround to enable rpc tests when cuda not available (#541) · 195d62f1
  msbaines authored Mar 19, 2021
  
  195d62f1
- [feat] experimental.nn.multiprocess_pipe: re-implemented using rpc (#519) · 84e0de84
  msbaines authored Mar 18, 2021
  
  84e0de84
18 Mar, 2021 9 commits
- [fix] super minor, but make sure that the mem leak does not come back (#536) · f7e6680b
  Benjamin Lefaudeux authored Mar 18, 2021
  
  f7e6680b
- [chore] 0.3.2 release (#535) · 9a37498c
  Min Xu authored Mar 18, 2021
  
  9a37498c
- [refactor][fix][SDP] Extract the grad buckets in a dedicated class, fix the resize_ bug (#532) · a1bdc7d3
  Benjamin Lefaudeux authored Mar 18, 2021
```
* extracting the buckets in a dedicated class, fixing the resize_ bug
* adding a unit test
* copyright
```
  a1bdc7d3
- [perf] [FSDP] micro-optimization for memory usage (#533) · fcbf1ea3
  Myle Ott authored Mar 18, 2021
  
  fcbf1ea3
- [fix][OSS] enabling disabled tests for 1.8 (#534) · 7b127ccb
  Benjamin Lefaudeux authored Mar 18, 2021
```
* enabling disabled tests
```
  7b127ccb
- [feat] FSDP: add auto_wrap_bn (#531) · 8b59267b
  Min Xu authored Mar 18, 2021
```
* [feat] FSDP: add auto_wrap_bn

- add an utility function to handle wrapping of BN

* changelog
```
  8b59267b
- [feature] FSDP: enable pytorch SyncBN (#527) · 2fc1f6d8
  Min Xu authored Mar 17, 2021
```
* [feature] FSDP: enable pytorch SyncBN

- not fully validated yet but at least not asserting
- this enables VISSL to move forward with its next PR

* add the test file

* changelog and lint

* addressed comment
```
  2fc1f6d8
- [refactor] removing dead or faulty code (#530) · 142cfdcc
  Benjamin Lefaudeux authored Mar 17, 2021
  
  142cfdcc
- [refactor] removing duplicated tests (#529) · 98223763
  Benjamin Lefaudeux authored Mar 17, 2021
  
  98223763
17 Mar, 2021 4 commits
- [offload] Add support for multiple streams and fix issue with integer inputs. (#515) · 39a12a8b
  anj-s authored Mar 17, 2021
```
* debugging statements

* fix index inputs and streams

* fix lint errors

* remove print

* lint errors

* address comments

* lint error
Co-authored-by: Anjali Sridhar <anj@devfair0443.h2.fair>
```
  39a12a8b
- better backward debugging (#526) · 66dfe606
  Min Xu authored Mar 17, 2021
  
  66dfe606
- [fix][SDP] Lightning-compat: deactivating buckets for a single rank, not useful (#514) · d3bfcbf5
  Benjamin Lefaudeux authored Mar 17, 2021
```
* Deactivating buckets for a single rank, not crashing but not useful
```
  d3bfcbf5
- [feat][OSS] handle the device being changed after construction (#523) · d217278c
  Benjamin Lefaudeux authored Mar 16, 2021
  
  d217278c
15 Mar, 2021 2 commits
- [feat] Make OSS state available on all ranks (#500) · 2d2412e2
  Benjamin Lefaudeux authored Mar 15, 2021
```
* extending the current state_dict interface, make it possible to do everything in a single call, and to checkpoint on all ranks
```
  2d2412e2
- [fix] compute the grad norm in fp32 (#520) · 8dc2030b
  Benjamin Lefaudeux authored Mar 15, 2021
  
  8dc2030b
12 Mar, 2021 1 commit

[fix] FSDP: multi-pass autograd graph and mixed precision (#513) · 82986ca0

Min Xu authored Mar 12, 2021



* FSDP: multi-pass autograd graph and mixed precision

- added BACKWARD_PRE/POST checking
- better assert_state
- fixed issue of backward hook misfiring

* fix

* cleanup

* Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py
Co-authored-by: Myle Ott <myleott@fb.com>
Co-authored-by: Myle Ott <myleott@fb.com>

82986ca0