Commit fd9b02c0 authored by Michael Carilli's avatar Michael Carilli
Browse files

Moving gradient division back to after the allreduce. Empirically, it appears...

Moving gradient division back to after the allreduce.  Empirically, it appears underflow is more of a danger than overflow.
parent 9eab1ac3
......@@ -11,13 +11,13 @@ import copy
def apply_flat_dist_call(bucket, call, extra_args=None):
coalesced = _flatten_dense_tensors(bucket)
if call is dist.all_reduce:
coalesced /= dist.get_world_size()
if extra_args is not None:
call(coalesced, *extra_args)
else:
call(coalesced)
if call is dist.all_reduce:
coalesced /= dist.get_world_size()
for buf, synced in zip(bucket, _unflatten_dense_tensors(coalesced, bucket)):
buf.copy_(synced)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment