Commits · f4fcee7e8a74660bb34114cff710c769167c936b · OpenDAS / fairscale

24 Sep, 2022 2 commits

[Fix][FSDP] Don't remove post backward hooks for multiple backward fix (#1079) · f4fcee7e

Min Xu authored Sep 24, 2022



* tmp

* test again

* test again

* add new test

* clean up

* add test file to the testlist

* more comments

* add changelog
Co-authored-by: Min Xu <min.xu.public@gmail.com>

f4fcee7e

[chore] move fair_dev into fairscale (#1078) · 8f8f8ef9
Min Xu authored Sep 23, 2022
```
Co-authored-by: Min Xu <min.xu.public@gmail.com>
```
8f8f8ef9

23 Sep, 2022 6 commits

[fix] SDP syncing buffers during gradient accumulation (#1075) · bfd57ff3

Min Xu authored Sep 23, 2022



- Fixes from Benjamin.

Original commit msg:
  - Fixes #1041. I just had a minute or two, hoping that it's enough :)
Co-authored-by: Min Xu <min.xu.public@gmail.com>

bfd57ff3

disable code cov (#1077) · abfa7193
Min Xu authored Sep 23, 2022
```
Co-authored-by: Min Xu <min.xu.public@gmail.com>
```
abfa7193
disable codecov (#1076) · 72fcabec
Min Xu authored Sep 23, 2022
```
Co-authored-by: Min Xu <min.xu.public@gmail.com>
```
72fcabec
0.4.10 release · 6f03e415
Anupam Bhatnagar authored Sep 23, 2022

6f03e415

[fix] better handling non-flatten in FSDP (#1072) · 429f3d31

Min Xu authored Sep 23, 2022



* [fix] better handling non-flatten in FSDP

- see the detailed comment about that backward firing case
- also minor debugging help in FSDP
- also minor fix in FPW's state dict

* [feat] disallow reset_parameters by default

* [feat] adding fsdp_instances API - useful in check wrapping by user code

* [fix] one line fix but more than a day of debugging

* fixed the case of loading combined check with empty fsdp instances

* fixed another bug around state loading the root/nonroot module full param caching due to not resharding after forward

* [feat] support .half and .float better

* fixed a bug in gather optim state losses extra keys from the original state_dict

* fixed a test failure in mixed precision

* fixed another bug affecting no_sync grad acc

* fixed a bug and a test in fsdp optim state

* fixed another corner case

* added a comment

* skip ssd offload tests

* skip fsdp one for ssd overload
Co-authored-by: Min Xu <min.xu.public@gmail.com>

429f3d31

[fix] don't import ProcessGroup eagerly (#1074) · 47ce21ac

Min Xu authored Sep 22, 2022



* [fix] don't import ProcessGroup eagerly

- move the import into typing since it is only used for type checking
- fixes #1057

* more fixes

* one more

* tested at least
Co-authored-by: Min Xu <min.xu.public@gmail.com>

47ce21ac

13 Sep, 2022 3 commits
- [bug] fix optim state gather when there is empty FSDP instances (#1071) · d8fc94d9
  Min Xu authored Sep 13, 2022
```
* [bug] fix optim state gather when there is empty FSDP instances

* fixes an anssert and a test bug
```
  d8fc94d9
- [minor] add a warning in the doc (#1070) · 203dd668
  Min Xu authored Sep 12, 2022
  
  203dd668
- [feat] support namedtuple in container.py (#1069) · eeb6684e
  Min Xu authored Sep 12, 2022
  
  eeb6684e
10 Sep, 2022 1 commit

[minor] help pure fp16 FSDP init a bit (#1068) · 73bf5964

Min Xu authored Sep 10, 2022

* [minor] [FSDP] add a better for pure fp16

* [minor] [wrap] add a flag to help fsdp pure fp16 wrapping

73bf5964

07 Sep, 2022 4 commits
- [minor] fix doc and assert and test around percent (#1067) · 454537d1
  Min Xu authored Sep 07, 2022
  
  454537d1
- [feat] add random_sparse_mask api (#1066) · 1a8d234d
  Min Xu authored Sep 07, 2022
```
* [feat] add random_sparse_mask api

* correct test skip
Co-authored-by: Min Xu <min.xu.public@gmail.com>
```
  1a8d234d
- 0.4.9 release · 19033c32
  Anupam Bhatnagar authored Sep 07, 2022
  
  19033c32
- [feat] support a context for loading state_dict for FSDP (#1065) · 4b126c7b
  Min Xu authored Sep 06, 2022
```
* [fix]: add a context for supporting state_dict from a non-FSDP parent module

* formatting
Co-authored-by: Min Xu <min.xu.public@gmail.com>
```
  4b126c7b
26 Aug, 2022 1 commit

[feat] support optional SST and DST (#1063) · 3cc7fa8d

Min Xu authored Aug 25, 2022



* [feat] support sst disabled and dst disabled cases

* added tests
Co-authored-by: Min Xu <min.xu.public@gmail.com>

3cc7fa8d

25 Aug, 2022 1 commit

[chore] update nightly version (#1064) · 15d4cf15

Min Xu authored Aug 25, 2022



* update nightly version

* update wgit to use numpy for load/store

- this is introduced with new nightly torch version, which made torch.save() not
  producing deterministic bytes
- this make tensor<->numpy conversion and then do the save/load to avoid that issues.

* fixed tests
Co-authored-by: Min Xu <min.xu.public@gmail.com>

15d4cf15

11 Aug, 2022 1 commit

[feat] signal sparsity profiling class (#1060) · e982b433

Min Xu authored Aug 11, 2022



* added a profiling class

* no more type ignore after merging main

* fixed a int/round bug

* add unit tests

* skip if no cuda for a test
Co-authored-by: Min Xu <min.xu.public@gmail.com>

e982b433

08 Aug, 2022 2 commits

[fix] bugs in signal sparsity class and improving tests (#1058) · 4c830de1

Min Xu authored Aug 08, 2022



* update examples and comment

* fixed issue with fft/ifft only doing the last dim

* fixed a int/round bug; fixed tests

* add cuda tests

* add atol and rtol

* skip cuda test correctly
Co-authored-by: Min Xu <min.xu.public@gmail.com>

4c830de1

Disable broken tests (#1055) · f81a60be
Crutcher Dunnavant authored Aug 08, 2022

f81a60be

03 Aug, 2022 1 commit

implementation of lossy_compression method (#1051) · 5c60f33c

Riyasat Ohib authored Aug 03, 2022

* [Feat] implements lossy_compress with tests

1. Implements a method lossy_compress that takes in a dense tensor and returns a reconstruction with sst and dst, and optionally with sparsity.

5c60f33c

31 Jul, 2022 1 commit

Implmentation of dense_sst_to_dst and sst_dst_to_dense (#1048) · c1dada48

Riyasat Ohib authored Jul 31, 2022

[Feat] Implements dense_sst_to_dst and sst_dst_to_dense methods and adds tests

1. Implements the dense_sst_to_dst and sst_dst_to_dense method.
2. Adds tests for perfect reconstruction with all top-k across different dims.
3. Adds tests for the two new methods.

c1dada48

29 Jul, 2022 1 commit

[test] setup.py dependency testing (#1045) · d3bda798

Min Xu authored Jul 28, 2022



* [fix]: experimental import fix

* [test]: catch issue #1042 in the future

* revert trigger for failure

* add numpy dep for users
Co-authored-by: Min Xu <min.xu.public@gmail.com>

d3bda798

28 Jul, 2022 1 commit
- [chore] update pytorch versions (#1046) · 0a5737ca
  Min Xu authored Jul 28, 2022
```
Co-authored-by: Min Xu <min.xu.public@gmail.com>
```
  0a5737ca
27 Jul, 2022 1 commit

[Feat] dense to sst implementation (#1034) · 608492af

Riyasat Ohib authored Jul 27, 2022

* [Feat] dense to sst implementation
1. Implementation of dense_to_sst function.
2. calculating the threshold for both the cases of top-k-element and top-k-percentage (fraction)
3. assertions to verify that the top_k_elements is smaller than the numel along the same dim
4. top_k_percent to top-k conversion
5. When calculating SST, now the real part of the complex dense_freq is used instead of the magnitudes.

* [Feat, Tests] transform method addition, handling of top_k_element None case
1. Addition of a transform method
2. Adds code to handle the dim=None case for top_k_element

* [Feat, Refactor] Reorganizations, new assertions and fixes.
1. XOR for validation that both of topk percent and element are not set, or both simultaneously unset. One and only one is set.
3. Distills topk and percent both to topk using unified helper function .
5. Adds a scatter topk values function to scatter values for SST and in future DST.
6. Validation for percentage range, and ensures k is never 0.
7. Uses config validation, adds config validation for top_k_element > 0 if not None.

608492af

26 Jul, 2022 6 commits

0.4.8 release · c3f88a6d
Anupam Bhatnagar authored Jul 26, 2022

c3f88a6d
[fix] handle EMA in the state_dict (#1044) · 5ed53a3e
Min Xu authored Jul 26, 2022
```
* [fix] handle EMA in the state_dict

* better fix
```
5ed53a3e

[fix]: experimental import fix (#1043) · 4cb293e8

Min Xu authored Jul 26, 2022



* [fix]: experimental import fix

* Update fairscale/experimental/__init__.py

* Update fairscale/experimental/__init__.py
Co-authored-by: Min Xu <min.xu.public@gmail.com>

4cb293e8

0.4.7 release · 3626a366
Anupam Bhatnagar authored Jul 26, 2022

3626a366

[fix] unclose FD and not load/store metadata many times (#1038) · fd7b962f

Min Xu authored Jul 25, 2022



* [fix] unclose FD and not load/store metadata many times

* one more stat

* Update fairscale/experimental/wgit/sha1_store.py

* add name to the objects when added

* dict key can be int from a state_dict

* removed top_level_objects key; it should be added into repo, not sha1_store
Co-authored-by: Min Xu <min.xu.public@gmail.com>

fd7b962f

[minor] add a checking around local_state_dict (#1040) · b0c3fe1e
Min Xu authored Jul 25, 2022
```
Co-authored-by: Min Xu <min.xu.public@gmail.com>
```
b0c3fe1e

22 Jul, 2022 1 commit

[fix] original size computation (#1037) · 16fba4c0

Min Xu authored Jul 21, 2022



* flip per_tensor's default

* fixed original size computation
Co-authored-by: Min Xu <min.xu.public@gmail.com>

16fba4c0

21 Jul, 2022 1 commit

[feat]: add size and names metadata to sha1 store (#1036) · 2e544bd7

Min Xu authored Jul 21, 2022



* additional metadata, step 1

* add gzip option to repo::add

* add repo:add's return value and some refactoring and todo

* added size metadata to sha1_store

* added names metadata to sha1_store
Co-authored-by: Min Xu <min.xu.public@gmail.com>

2e544bd7

19 Jul, 2022 1 commit

[feat]: add per-tensor add to repo (#1033) · 4d58a294

Min Xu authored Jul 18, 2022



* formatting change, no logical change

* formatting and name change, no logical change

* [refactor] sha1_store's path arg

- make sha1_store's path arg directly the path, not its parent
- this is because sha1_store is not like a .git or a .wgit dir, which is
  nested inside another "working" dir. It is simply a store, which
  is using a given dir.
- updated repo and tests as well.

* remove a test warning due to deprecated API from torch

* [refactor] change how dot_wgit_dir_path is used

- it should only be assigned in __init__.
- we use it in error checking in the rest APIs.

* simplify the init a bit

* refactor the sanity check

* moved some functions, no code change

* [feat] added per-tensor add to the repo

* enabled gzip compression on add

* fix a unit test

* add a note

* make sha1 store work on general dict

* handle general state_dict from a model, not just a module's one-level OrderedDict

* formatting
Co-authored-by: Min Xu <min.xu.public@gmail.com>

4d58a294

18 Jul, 2022 1 commit
- [feat] add compression and tests to sha1 store (#1032) · d0ad08c0
  Min Xu authored Jul 18, 2022
```
Co-authored-by: Min Xu <min.xu.public@gmail.com>
```
  d0ad08c0
15 Jul, 2022 2 commits
- [feat] draft structure of SignalSparsity class (#1031) · c8327e1c
  Min Xu authored Jul 15, 2022
```
Co-authored-by: Min Xu <min.xu.public@gmail.com>
```
  c8327e1c
- [minor] copyright header (#1030) · 937b8b9b
  Min Xu authored Jul 15, 2022
```
Co-authored-by: Min Xu <min.xu.public@gmail.com>
```
  937b8b9b
14 Jul, 2022 2 commits
- [feat] add sha1_store delete function (#1028) · c75d1896
  Min Xu authored Jul 14, 2022
```
Co-authored-by: Min Xu <min.xu.public@gmail.com>
```
  c75d1896
- [feat] add sha1_store get function (#1027) · 073618d8
  Min Xu authored Jul 14, 2022
```
Co-authored-by: Min Xu <min.xu.public@gmail.com>
```
  073618d8