Unverified Commit e4af2d90 authored by mcarilli's avatar mcarilli Committed by GitHub
Browse files

Move gradient division to before the allreduce

This is consistent with upstream, and safer against overflow.
parent 2f204bca
......@@ -11,14 +11,14 @@ import copy
def apply_flat_dist_call(bucket, call, extra_args=None):
coalesced = _flatten_dense_tensors(bucket)
if call is dist.all_reduce:
coalesced /= dist.get_world_size()
if extra_args is not None:
call(coalesced, *extra_args)
else:
call(coalesced)
if call is dist.all_reduce:
coalesced /= dist.get_world_size()
for buf, synced in zip(bucket, _unflatten_dense_tensors(coalesced, bucket)):
buf.copy_(synced)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment