Various fixes for Masked LM (#573)
Summary: Various fixes for Masked LM - use --activation-fn instead of --gelu - use --dataset-impl instead of --lazy-load - add embed_scale option to TransformerSentenceEncoder - fix encoder_normalize_before to include a final layer norm - delete BertLayerNorm Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/573 Reviewed By: borguz Differential Revision: D15317933 Pulled By: myleott fbshipit-source-id: 8ecb46556ad43e76e92d41ed8f5a62e8516fd375
Showing
Please register or sign in to comment