Directly decay weight instead of L2 penalty (#157)

See https://arxiv.org/pdf/1711.05101.pdf

Directly decay weight instead of L2 penalty (#157)
See https://arxiv.org/pdf/1711.05101.pdf
9430544a · Yann N. Dauphin · Sergey Edunov · 94dae690 · 9430544a
Commit 9430544a authored Jan 05, 2018 by Yann N. Dauphin Committed by Sergey Edunov Jan 05, 2018
Hide whitespace changes
Inline Side-by-side

Showing with 2 additions and 3 deletions

fairseq/nag.py fairseq/nag.py +2 -3

No files found.
--- a/fairseq/nag.py
+++ b/fairseq/nag.py
@@ -35,15 +35,14 @@ class NAG(Optimizer):
                    continue
                d_p = p.grad.data
-                if weight_decay != 0:
-                    d_p.add_(weight_decay, p.data)
                param_state = self.state[p]
                if 'momentum_buffer' not in param_state:
                    param_state['momentum_buffer'] = d_p.clone().zero_()
                buf = param_state['momentum_buffer']
+                if weight_decay != 0:
+                    p.data.mul_(1 - weight_decay)
                p.data.add_(momentum * momentum, buf)
                p.data.add_(-(1 + momentum) * lr, d_p)