Bug fix for norm calculation in absence of model parallel group (#551)

In the absence of a model parallel group, model_parallel_allreduce should not do any reduction. This commit fixes the bug which was doing a model parallel allreduce across world group when model parallel group is None

Bug fix for norm calculation in absence of model parallel group (#551)
In the absence of a model parallel group, model_parallel_allreduce should not do any reduction. This commit fixes the bug which was doing a model parallel allreduce across world group when model parallel group is None
00c3a254 · Samyam Rajbhandari · GitHub · bcd56f97 · 00c3a254
Unverified Commit 00c3a254 authored Nov 23, 2020 by Samyam Rajbhandari Committed by GitHub Nov 23, 2020
Hide whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

deepspeed/runtime/zero/stage2.py deepspeed/runtime/zero/stage2.py +1 -1

No files found.
--- a/deepspeed/runtime/zero/stage2.py
+++ b/deepspeed/runtime/zero/stage2.py
@@ -1198,7 +1198,7 @@ class FP16_DeepSpeedZeroOptimizer(object):
        """ Perform all reduce within model parallel group, if any.
        """
        if self.model_parallel_group is None:
-            torch.distributed.all_reduce(tensor=tensor, op=op)
+            pass
        else:
            torch.distributed.all_reduce(tensor=tensor,
                                         op=op,