Simplify fairseq multihead attention (#888)
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/888 We want to simplify multihead attention and get rid of the dynamic in_proj_weight logic. Sending the diff early for feedback, will have further changes as I try to fix breaking tests Reviewed By: edunov Differential Revision: D17912661 fbshipit-source-id: 0e6319fc694d8ec5187d1c2fefe5839d9d522186
Showing