Mask out embeddings associated with padding (#710)

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/710 Previously there was a bug in how we dealt with padding when computing the input representation from the segment and position embedding. D15144912 fixed this by adding an offset based on the padding id. However this makes assumptions about the padding id which may not hold true for vocabularies built outside of pyText and fairseq. Based on a discussion with barlaso, this diff 0's out all the embeddings associated with the padding. Reviewed By: borguz Differential Revision: D15209395 fbshipit-source-id: 5573020e610f5466e673fe3845c3ed34ebb5c44d

Mask out embeddings associated with padding (#710)
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/710 Previously there was a bug in how we dealt with padding when computing the input representation from the segment and position embedding. D15144912 fixed this by adding an offset based on the padding id. However this makes assumptions about the padding id which may not hold true for vocabularies built outside of pyText and fairseq. Based on a discussion with barlaso, this diff 0's out all the embeddings associated with the padding. Reviewed By: borguz Differential Revision: D15209395 fbshipit-source-id: 5573020e610f5466e673fe3845c3ed34ebb5c44d
8d9063fe · Kartikay Khandelwal · Facebook Github Bot · 0add50c2 · 8d9063fe
Commit 8d9063fe authored May 06, 2019 by Kartikay Khandelwal Committed by Facebook Github Bot May 06, 2019
Show whitespace changes
Inline Side-by-side

Showing with 7 additions and 2 deletions

fairseq/modules/transformer_sentence_encoder.py fairseq/modules/transformer_sentence_encoder.py +7 -2

No files found.
--- a/fairseq/modules/transformer_sentence_encoder.py
+++ b/fairseq/modules/transformer_sentence_encoder.py
@@ -100,7 +100,7 @@ class TransformerSentenceEncoder(nn.Module):
        )
        self.segment_embeddings = (
-            nn.Embedding(self.num_segments, self.embedding_dim, self.padding_idx)
+            nn.Embedding(self.num_segments, self.embedding_dim, padding_idx=None)
            if self.num_segments > 0
            else None
        )
@@ -110,7 +110,7 @@ class TransformerSentenceEncoder(nn.Module):
                self.max_seq_len,
                self.embedding_dim,
                self.padding_idx,
-                self.learned_pos_embedding,
+                learned=self.learned_pos_embedding,
            )
            if self.use_position_embeddings
            else None
@@ -162,12 +162,17 @@ class TransformerSentenceEncoder(nn.Module):
        )
        x = self.embed_tokens(tokens)
        if positions is not None:
            x += positions
        if segments is not None:
            x += segments
        x = F.dropout(x, p=self.dropout, training=self.training)
+        # account for padding while computing the representation
+        if padding_mask is not None:
+            x *= (1 - padding_mask.unsqueeze(-1).float())
        # B x T x C -> T x B x C
        x = x.transpose(0, 1)
        inner_states = [x]