-
Zongwei Zhou authored
In BERT SQuAD, disable explicit allreduce for now to keep the original clip_by_global_norm math. With explicit allreduce, the gradients before allreduce are scaled so even if we move clip_by_global_norm before allreduce (as in TF1 and pre-TF 2.2) it will operate on scaled gradients, the math will be changed. So with explicit allreduce, it is better to move clip_by_global_norm to after allreduce. PiperOrigin-RevId: 299278082
11ccb99e