Merge pull request #718 from Rocketknight1/master

Incorrect docstring for BertForMaskedLM

Merge pull request #718 from Rocketknight1/master
Incorrect docstring for BertForMaskedLM
c68b4ece · Thomas Wolf · GitHub · 98dc30b2 · 8d6a118a · c68b4ece
Unverified Commit c68b4ece authored Jun 28, 2019 by Thomas Wolf Committed by GitHub Jun 28, 2019
Hide whitespace changes
Inline Side-by-side

Showing with 0 additions and 4 deletions

pytorch_pretrained_bert/modeling.py pytorch_pretrained_bert/modeling.py +0 -4

No files found.
--- a/pytorch_pretrained_bert/modeling.py
+++ b/pytorch_pretrained_bert/modeling.py
@@ -997,10 +997,6 @@ class BertForMaskedLM(BertPreTrainedModel):
        `masked_lm_labels`: masked language modeling labels: torch.LongTensor of shape [batch_size, sequence_length]
            with indices selected in [-1, 0, ..., vocab_size]. All labels set to -1 are ignored (masked), the loss
            is only computed for the labels set in [0, ..., vocab_size]
-        `head_mask`: an optional torch.LongTensor of shape [num_heads] with indices
-            selected in [0, 1]. It's a mask to be used if the input sequence length is smaller than the max
-            input sequence length in the current batch. It's the mask that we typically use for attention when
-            a batch has varying length sentences.
        `head_mask`: an optional torch.Tensor of shape [num_heads] or [num_layers, num_heads] with indices between 0 and 1.
            It's a mask to be used to nullify some heads of the transformer. 1.0 => head is fully masked, 0.0 => head is not masked.