Commits · 99edeff12714f8c609b65d7957081870dda42106 · OpenDAS / fairscale

05 Dec, 2022 1 commit

Implement _compute_intra_grad_corr_mean for gradient computation (#1095) · 99edeff1

Changyu Gao authored Dec 05, 2022

* Fix gradient accumulation

Add ``is_scaled_loss`` flag to support both scaled / unscaled loss
Fix ``test_grad_accum`` and``test_set_num_gradients_to_accumulate``

* Add a method to scale grad for grad_accum using unscaled loss

- Revert the changes in `step` method
- Add a method `scale_grad_by_num_grads_to_accum`to handle gradient accumulation using unscaled loss more explicitly
- Add gradient tests

* Implement _compute_corr_mean_between_grads

* Improve tests and comments

* Use ubuntu-20.04 instead of latest

Use ubuntu-20.04 to fix the `arch x64 not found` issue
[Version 3.10 with arch x64 not found actions/setup-python#401](https://github.com/actions/setup-python/issues/401)

* Switch flake8 from gitlab to github

Flake8 was moved to Github
See discussions https://www.reddit.com/r/Python/comments/yvfww8/flake8_took_down_the_gitlab_repository_in_favor/

* Fix scikit-learn package

* Update PyTorch versions

* Resolve comments from Min

* Minor fix

* Disable broken tests for new versions of PyTorch

99edeff1

05 Oct, 2022 1 commit

Fix gradient accumulation (#1086) · f5e727cc

Changyu Gao authored Oct 05, 2022

* Fix gradient accumulation

- Add ``is_scaled_loss`` flag to support both scaled / unscaled loss
- Add a method `scale_grad_by_num_grads_to_accum`to handle gradient accumulation using unscaled loss more explicitly
- Fix ``test_grad_accum`` and``test_set_num_gradients_to_accumulate``
- Add tests for gradient

f5e727cc

24 Sep, 2022 1 commit
- [chore] move fair_dev into fairscale (#1078) · 8f8f8ef9
  Min Xu authored Sep 23, 2022
```
Co-authored-by: Min Xu <min.xu.public@gmail.com>
```
  8f8f8ef9
12 Jun, 2022 1 commit
- Move f/utils => f/internal; move testing libs to fair_dev/testing (#1004) · 2350968e
  Crutcher Dunnavant authored Jun 12, 2022
  
  2350968e
04 Mar, 2021 1 commit

[test] AdaScale & SDP/FSDP (#468) · efed9cee

Min Xu authored Mar 04, 2021

- cover them in terms of code path only
- numerically, AdaScale is different on SDP/FSDP than DDP, mainly
  due to partial view of the gradients.
- this doesn't mean it is definitely not useful but it is yet to
  be validated.
- not going to spend too much time until we have a real use case.

efed9cee

22 Feb, 2021 1 commit
- [fix][OSS] adding an assert for empty shards + corresponding unit test (#406) · 279b8024
  Benjamin Lefaudeux authored Feb 22, 2021
```
* adding an assert + corresponding unit test
* updated changelog
* adjusting the adascale tests
```
  279b8024
28 Jan, 2021 1 commit

[test]: test adascale with oss (#328) · fa11d338

Min Xu authored Jan 28, 2021

* [test]: test adascale with oss

* minor fix

* add a small comment

* refactor: moved find_tensor_by_shape

* refactor: move test golden data into its own module

* refactor: simplied the train function

* refactor: added comments as suggested

fa11d338

05 Jan, 2021 1 commit

[fix] Flaky tests (#283) · 79365ee6

Benjamin Lefaudeux authored Jan 04, 2021

* adding the pytest timeout plugin to properly root out hanging tests
* removing redundant code, slightly more reasonable timeout, works on single cuda
* finding the root bug for some of the cpu hangs, rpc init
* propagating all the rpc init test changes to the pipe and model parallel tests

79365ee6

14 Dec, 2020 1 commit

[fix] more adascale gradient accumulation tests and smoothing factor fix (#235) · f74afebb

Min Xu authored Dec 14, 2020

* better ddp adascale tests

* make sure the single node test use the same test cases and expected gains

* added unit test that covers smoothing factor

- tested by re-introducing the bug and see the test fail as expected.

f74afebb

03 Dec, 2020 1 commit

[feat] AdaScale: Gradient Accumulation and Add PyTest unit tests (#202) · ce5860ea

Min Xu authored Dec 03, 2020

* added AdaScale to README

* [adascale] added gradient accumulation

- added gradient accumulation
- tested with cifar full trainings with different value of accumulation
and verified the full accuracy is obtained
- also removed the patch optimize flag until we need it

* [adascale] adding pytest

- added basic and ddp tests and grad_accum
- closes #195

* added changelog

* added ddp grad_accum test

* moved ddp and non-ddp tests into separate files

* added checkpoint test

* more doc

* addressed Mike's comments

ce5860ea