Minor fix to make adafactor work for >2d conv kernels (#1122)
Summary: missing .unsqueeze(-1) in line 124, without this change we'll encounter runtime error for >2d convolutional kernels, with this fix, we're applying adafactor's 2d logic to the two final dimensions. Pull Request resolved: https://github.com/pytorch/fairseq/pull/1122 Differential Revision: D17431662 Pulled By: myleott fbshipit-source-id: e7435e77270a9252f75f01b2457ef0048f5bcf36
Showing
Please register or sign in to comment