Unverified Commit 7cd531c4 authored by Serge Panev's avatar Serge Panev Committed by GitHub
Browse files

[Dist][Optim] Change op order in SparseAdagrad to be numerically closer to PyTorch (#4253)


Signed-off-by: default avatarSerge Panev <spanev@nvidia.com>
Co-authored-by: default avatarMufei Li <mufeili1996@gmail.com>
parent 8292bf32
...@@ -255,7 +255,7 @@ class SparseAdagrad(DistSparseGradOptimizer): ...@@ -255,7 +255,7 @@ class SparseAdagrad(DistSparseGradOptimizer):
update_event.record() update_event.record()
# update emb # update emb
std_values = grad_state.add_(eps).sqrt_() std_values = grad_state.sqrt_().add_(eps)
tmp = clr * grad_values / std_values tmp = clr * grad_values / std_values
tmp_dst = tmp.to(state_dev, non_blocking=True) tmp_dst = tmp.to(state_dev, non_blocking=True)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment