"torchvision/git@developer.sourcefind.cn:OpenDAS/vision.git" did not exist on "c4c28dff8a68fae20eb7f2a82051ef7a388353bd"
Commit 8dbee4ab authored by Akhilesh Gotmare's avatar Akhilesh Gotmare Committed by Facebook Github Bot
Browse files

Minor fix to make adafactor work for >2d conv kernels (#1122)

Summary:
missing .unsqueeze(-1) in line 124,
without this change we'll encounter runtime error for >2d convolutional kernels, with this fix, we're applying adafactor's 2d logic to the two final dimensions.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1122

Differential Revision: D17431662

Pulled By: myleott

fbshipit-source-id: e7435e77270a9252f75f01b2457ef0048f5bcf36
parent 718677eb
...@@ -121,7 +121,7 @@ class Adafactor(torch.optim.Optimizer): ...@@ -121,7 +121,7 @@ class Adafactor(torch.optim.Optimizer):
return tensor.norm(2) / (tensor.numel() ** 0.5) return tensor.norm(2) / (tensor.numel() ** 0.5)
def _approx_sq_grad(self, exp_avg_sq_row, exp_avg_sq_col, output): def _approx_sq_grad(self, exp_avg_sq_row, exp_avg_sq_col, output):
r_factor = (exp_avg_sq_row / exp_avg_sq_row.mean(dim=-1)).rsqrt_().unsqueeze(-1) r_factor = (exp_avg_sq_row / exp_avg_sq_row.mean(dim=-1).unsqueeze(-1)).rsqrt_().unsqueeze(-1)
c_factor = exp_avg_sq_col.unsqueeze(-2).rsqrt() c_factor = exp_avg_sq_col.unsqueeze(-2).rsqrt()
torch.mul(r_factor, c_factor, out=output) torch.mul(r_factor, c_factor, out=output)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment