"git@developer.sourcefind.cn:OpenDAS/torch-scatter.git" did not exist on "4179782d4571ea3366706fff037fa952d8a95fe5"
Temporarily disable explicit allreduce in BERT SQuAD
In BERT SQuAD, disable explicit allreduce for now to keep the original clip_by_global_norm math. With explicit allreduce, the gradients before allreduce are scaled so even if we move clip_by_global_norm before allreduce (as in TF1 and pre-TF 2.2) it will operate on scaled gradients, the math will be changed. So with explicit allreduce, it is better to move clip_by_global_norm to after allreduce. PiperOrigin-RevId: 299278082
Showing
Please register or sign in to comment