fixed bugs of masked_lm for fine-tuning (#744)

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/744 Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/587 After we added additional prediciton layers for language model predictions. The fine-tuning is broken because of 2 reasons. 1. checkpoint cannot be loaded since we didn't update state_dict names 2. lm_output_learned_bias is not initialize if load_softmax is false Reviewed By: myleott Differential Revision: D15377380 fbshipit-source-id: d58544b1d2c549586abef42fec19ec8bf27a994a

fixed bugs of masked_lm for fine-tuning (#744)
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/744 Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/587 After we added additional prediciton layers for language model predictions. The fine-tuning is broken because of 2 reasons. 1. checkpoint cannot be loaded since we didn't update state_dict names 2. lm_output_learned_bias is not initialize if load_softmax is false Reviewed By: myleott Differential Revision: D15377380 fbshipit-source-id: d58544b1d2c549586abef42fec19ec8bf27a994a
fca32e05 · Jingfei Du · Facebook Github Bot · e2a0b87d · fca32e05
Commit fca32e05 authored May 16, 2019 by Jingfei Du Committed by Facebook Github Bot May 16, 2019
Hide whitespace changes
Inline Side-by-side

Showing with 6 additions and 1 deletion

fairseq/models/masked_lm.py fairseq/models/masked_lm.py +6 -1

No files found.
--- a/fairseq/models/masked_lm.py
+++ b/fairseq/models/masked_lm.py
@@ -159,6 +159,7 @@ class MaskedLMEncoder(FairseqEncoder):
        self.embed_out = None
        self.sentence_projection_layer = None
        self.sentence_out_dim = args.sentence_class_num
+        self.lm_output_learned_bias = None
        # Remove head is set to true during fine-tuning
        self.load_softmax = not getattr(args, 'remove_head', False)
@@ -252,7 +253,11 @@ class MaskedLMEncoder(FairseqEncoder):
            ] = torch.FloatTensor(1)
        if not self.load_softmax:
            for k in list(state_dict.keys()):
-                if "embed_out.weight" in k or "sentence_projection_layer.weight" in k:
+                if (
+                    "embed_out.weight" in k or
+                    "sentence_projection_layer.weight" in k or
+                    "lm_output_learned_bias" in k
+                ):
                    del state_dict[k]
        return state_dict