- 05 Dec, 2022 1 commit
-
-
Changyu Gao authored
* Fix gradient accumulation Add ``is_scaled_loss`` flag to support both scaled / unscaled loss Fix ``test_grad_accum`` and``test_set_num_gradients_to_accumulate`` * Add a method to scale grad for grad_accum using unscaled loss - Revert the changes in `step` method - Add a method `scale_grad_by_num_grads_to_accum`to handle gradient accumulation using unscaled loss more explicitly - Add gradient tests * Implement _compute_corr_mean_between_grads * Improve tests and comments * Use ubuntu-20.04 instead of latest Use ubuntu-20.04 to fix the `arch x64 not found` issue [Version 3.10 with arch x64 not found actions/setup-python#401](https://github.com/actions/setup-python/issues/401) * Switch flake8 from gitlab to github Flake8 was moved to Github See discussions https://www.reddit.com/r/Python/comments/yvfww8/flake8_took_down_the_gitlab_repository_in_favor/ * Fix scikit-learn package * Update PyTorch versions * Resolve comments from Min * Minor fix * Disable broken tests for new versions of PyTorch
-
- 05 Oct, 2022 1 commit
-
-
Changyu Gao authored
* Fix gradient accumulation - Add ``is_scaled_loss`` flag to support both scaled / unscaled loss - Add a method `scale_grad_by_num_grads_to_accum`to handle gradient accumulation using unscaled loss more explicitly - Fix ``test_grad_accum`` and``test_set_num_gradients_to_accumulate`` - Add tests for gradient
-
- 24 Sep, 2022 1 commit
-
-
Min Xu authored
Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 12 Jun, 2022 1 commit
-
-
Crutcher Dunnavant authored
-
- 04 Mar, 2021 1 commit
-
-
Min Xu authored
- cover them in terms of code path only - numerically, AdaScale is different on SDP/FSDP than DDP, mainly due to partial view of the gradients. - this doesn't mean it is definitely not useful but it is yet to be validated. - not going to spend too much time until we have a real use case.
-
- 22 Feb, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* adding an assert + corresponding unit test * updated changelog * adjusting the adascale tests
-
- 28 Jan, 2021 1 commit
-
-
Min Xu authored
* [test]: test adascale with oss * minor fix * add a small comment * refactor: moved find_tensor_by_shape * refactor: move test golden data into its own module * refactor: simplied the train function * refactor: added comments as suggested
-
- 05 Jan, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* adding the pytest timeout plugin to properly root out hanging tests * removing redundant code, slightly more reasonable timeout, works on single cuda * finding the root bug for some of the cpu hangs, rpc init * propagating all the rpc init test changes to the pipe and model parallel tests
-
- 14 Dec, 2020 1 commit
-
-
Min Xu authored
* better ddp adascale tests * make sure the single node test use the same test cases and expected gains * added unit test that covers smoothing factor - tested by re-introducing the bug and see the test fail as expected.
-
- 03 Dec, 2020 1 commit
-
-
Min Xu authored
* added AdaScale to README * [adascale] added gradient accumulation - added gradient accumulation - tested with cifar full trainings with different value of accumulation and verified the full accuracy is obtained - also removed the patch optimize flag until we need it * [adascale] adding pytest - added basic and ddp tests and grad_accum - closes #195 * added changelog * added ddp grad_accum test * moved ddp and non-ddp tests into separate files * added checkpoint test * more doc * addressed Mike's comments
-