Commits · 180ab8c8464d1c2a22556df9923696bbc2c92076 · OpenDAS / fairscale

13 Sep, 2021 1 commit
- [OSS] Fixing the fp16 broadcast and catching this case in the unit test (#795) · 180ab8c8
  Benjamin Lefaudeux authored Sep 13, 2021
  
  180ab8c8
07 Jul, 2021 1 commit

Future proof storage size test (#735) · 8d82db43

Edward Z. Yang authored Jul 06, 2021

See https://github.com/pytorch/pytorch/pull/59671/

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

8d82db43

21 Jun, 2021 1 commit

[feat] FSDP: supporting multiple flatten parameter groups (#711) · ab71efb3

Min Xu authored Jun 21, 2021



* [feat] FSDP: supporting multiple flatten parameter groups

- step 2: extending FPW to support multiple flat params groups
- FSDP still only use one group
- unit test does this the new code paths
- updated the changelog

* first cut, mypy passed

* test_flatten_params_wrapper.py::TestFlattenParams tests pass

* added two more test cases and fixed a case in the code

* fixed one bug with param_path_infos

* fixed two more tests with hardcoded flat_param names

* Update CHANGELOG.md
Co-authored-by: Min Xu <min.xu.public@gmail.com>

ab71efb3

08 Jun, 2021 1 commit

[feat] supporting multiple flatten parameter groups (step 1 and step 1.5) (#708) · d60fc284

Min Xu authored Jun 08, 2021



* refactoring FlattenParamWrapper

- use a FlatParameter class to encapsulate the logic of
  flattening and expanding into views.
- this will make it easier to have multiple groups of flatten
  parameters

* fixed testing context issues for both temp files and temp dirs

* fixing test_fsdp_metadata

* fix pickling of FlatParameter

* fixed test_fsdp_optimizer_utils.py

* minor

* fix assert

* lint

* remove nesting from the test

* step 1.5: remove the code related unnecessary nesting support in FPW

* Update fairscale/nn/misc/flatten_params_wrapper.py
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* address comment
Co-authored-by: Min Xu <min.xu.public@gmail.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

d60fc284

12 May, 2021 1 commit

[chore] Rename and move checkpoint_activations from misc folder. (#654) · 72c6bab2

anj-s authored May 12, 2021

* rename files

* add newly renamed file

* rename and move checkpoint activations related files

* add test files to ci list

* fix lint errors

* modify docs

* add changelog

* retain old path for now

* fix lint errors

* add another import test case

* fix merge conflict

* add missing test file

72c6bab2

19 Mar, 2021 1 commit
- [feat][refactor][OSS] Param buckets + fp16 broadcasts (#540) · e3865549
  Benjamin Lefaudeux authored Mar 19, 2021
```
* param buckets
* unifying the buckets
```
  e3865549
18 Mar, 2021 1 commit
- [refactor][fix][SDP] Extract the grad buckets in a dedicated class, fix the resize_ bug (#532) · a1bdc7d3
  Benjamin Lefaudeux authored Mar 18, 2021
```
* extracting the buckets in a dedicated class, fixing the resize_ bug
* adding a unit test
* copyright
```
  a1bdc7d3
04 Mar, 2021 1 commit

[feat]: checkpoint and normalization (#457) · 5e64d6a7

Min Xu authored Mar 04, 2021

* [feat]: checkpoint and normalization

- added special handling of BN for track_running_stats and checkpointing
- we test BN/LN and checkpointing
- we test them with mixed precision

5e64d6a7

02 Mar, 2021 1 commit
- [fix] Make state_dict all-gather FP32 params (#451) · d2924670
  Myle Ott authored Mar 02, 2021
  
  d2924670
26 Feb, 2021 1 commit
- [fix] Fix nested FlattenParamsWrapper state_dict/load_state_dict (#434) · 506d6209
  Myle Ott authored Feb 26, 2021
  
  506d6209
25 Feb, 2021 1 commit
- [test] checkpoint: multiple input and output model test (#425) · 2478a9ad
  Min Xu authored Feb 25, 2021
  
  2478a9ad
23 Feb, 2021 2 commits

[test]: add peak mem in checkpoint test (#415) · 4b5b4d3d

Min Xu authored Feb 23, 2021

* [test]: add peak mem in checkpoint test

* more debugging

* new test

* more fix

* better collection of debug in case of future failures

* update the comment

* typo

* comment

* clarify

* better wording

4b5b4d3d

[bug]: not all CUDA memory is freed when model is deleted (#412) · e3035933

Min Xu authored Feb 22, 2021

* [bug]: not all CUDA memory is freed when model is deleted

* fixed memory leak

- without this, peak memory will be high when more than one model
  is trained (i.e. first model leave staff around pushing up the
  peak memory when the second model runs)

* addressed comments

* fix

* changelog

e3035933

10 Feb, 2021 1 commit

Add fairscale.nn.misc.checkpoint_activations (#376) · c963a72a

Myle Ott authored Feb 10, 2021



* Add fairscale.utils.containers
Co-authored-by: Min Xu <24926999+min-xu-ai@users.noreply.github.com>

* Add fairscale.nn.misc.checkpoint_activations
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Min Xu <24926999+min-xu-ai@users.noreply.github.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

c963a72a

21 Jan, 2021 3 commits
- [fix] Lint flattenparams (#320) · bd5d0496
  Benjamin Lefaudeux authored Jan 21, 2021
```
* working around broken mypy
```
  bd5d0496
- [fix] lint/typing in FlattenParamsWrapper (#318) · a6ed6da8
  Myle Ott authored Jan 21, 2021
  
  a6ed6da8
- Add FlattenParamsWrapper (#317) · 35fdf537
  Myle Ott authored Jan 21, 2021
  
  35fdf537