Unverified Commit 00c3a254 authored by Samyam Rajbhandari's avatar Samyam Rajbhandari Committed by GitHub
Browse files

Bug fix for norm calculation in absence of model parallel group (#551)

In the absence of a model parallel group, model_parallel_allreduce should not do any reduction. This commit fixes the bug which was doing a model parallel allreduce across world group when model parallel group is None
parent bcd56f97
...@@ -1198,7 +1198,7 @@ class FP16_DeepSpeedZeroOptimizer(object): ...@@ -1198,7 +1198,7 @@ class FP16_DeepSpeedZeroOptimizer(object):
""" Perform all reduce within model parallel group, if any. """ Perform all reduce within model parallel group, if any.
""" """
if self.model_parallel_group is None: if self.model_parallel_group is None:
torch.distributed.all_reduce(tensor=tensor, op=op) pass
else: else:
torch.distributed.all_reduce(tensor=tensor, torch.distributed.all_reduce(tensor=tensor,
op=op, op=op,
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment