Use bert init for xlm_base
Summary: Use bert init for xlm_base. This seems to be much closer to what is done in the [XLM](https://github.com/facebookresearch/XLM/blob/master/src/model/transformer.py#L44) repo. At update 10 with BERT init (f121471600), loss starts at 14.234 At update 10 without BERT init (f121471612), loss starts at 154.423 Reviewed By: liezl200, pipibjc Differential Revision: D15874836 fbshipit-source-id: f81bf83a078992d7476ba7fdf263b731a9f5b66d
Showing
Please register or sign in to comment