Moving gradient division back to after the allreduce. Empirically, it appears...
Moving gradient division back to after the allreduce. Empirically, it appears underflow is more of a danger than overflow.
Showing
Please register or sign in to comment