• Zongwei Zhou's avatar
    Temporarily disable explicit allreduce in BERT SQuAD · 11ccb99e
    Zongwei Zhou authored
    In BERT SQuAD, disable explicit allreduce for now to keep the original clip_by_global_norm math. With explicit allreduce, the gradients before allreduce are scaled so even if we move clip_by_global_norm before allreduce (as in TF1 and pre-TF 2.2) it will operate on scaled gradients, the math will be changed. So with explicit allreduce, it is better to move clip_by_global_norm to after allreduce.
    
    PiperOrigin-RevId: 299278082
    11ccb99e
model_training_utils.py 18.6 KB