added missing dense layers in masked lm model (#581)
Summary: 1) Added pooled_output for sentence classification as `Tanh(Linear())`. 2) Added lm_head_transform as `LayerNorm(GeLU(Linear(x)))` 3) `act_dropout = 0.0` 4) added `lm_output_learned_bias` Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/581 Reviewed By: borguz Differential Revision: D15353575 Pulled By: borguz fbshipit-source-id: 4ff64c6ceed23f3e99348f73d189546f1d84452e
Showing
Please register or sign in to comment