[fix] more adascale gradient accumulation tests and smoothing factor fix (#235)
* better ddp adascale tests * make sure the single node test use the same test cases and expected gains * added unit test that covers smoothing factor - tested by re-introducing the bug and see the test fail as expected.
Showing
Please register or sign in to comment