Performs a model forward pass. Can be called by calling the class directly, once it has been instantiated.
Performs a model forward pass. **Can be called by calling the class directly, once it has been instantiated.**
Parameters:
Parameters:
`input_ids`: a torch.LongTensor of shape [batch_size, sequence_length]
`input_ids`: a ``torch.LongTensor`` of shape [batch_size, sequence_length]
with the word token indices in the vocabulary. Items in the batch should begin with the special "CLS" token. (see the tokens preprocessing logic in the scripts
with the word token indices in the vocabulary. Items in the batch should begin with the special "CLS" token. (see the tokens preprocessing logic in the scripts
`run_bert_extract_features.py`, `run_bert_classifier.py` and `run_bert_squad.py`)
`run_bert_extract_features.py`, `run_bert_classifier.py` and `run_bert_squad.py`)
`token_type_ids`: an optional torch.LongTensor of shape [batch_size, sequence_length] with the token
`token_type_ids`: an optional ``torch.LongTensor`` of shape [batch_size, sequence_length] with the token
types indices selected in [0, 1]. Type 0 corresponds to a `sentence A` and type 1 corresponds to
types indices selected in [0, 1]. Type 0 corresponds to a `sentence A` and type 1 corresponds to
a `sentence B` token (see BERT paper for more details).
a `sentence B` token (see BERT paper for more details).
`attention_mask`: an optional torch.LongTensor of shape [batch_size, sequence_length] with indices
`attention_mask`: an optional ``torch.LongTensor`` of shape [batch_size, sequence_length] with indices
selected in [0, 1]. It's a mask to be used if the input sequence length is smaller than the max
selected in [0, 1]. It's a mask to be used if the input sequence length is smaller than the max
input sequence length in the current batch. It's the mask that we typically use for attention when
input sequence length in the current batch. It's the mask that we typically use for attention when
a batch has varying length sentences.
a batch has varying length sentences.
`labels`: labels for the classification output: torch.LongTensor of shape [batch_size]
`labels`: labels for the classification output: ``torch.LongTensor`` of shape [batch_size]
with indices selected in [0, ..., num_labels].
with indices selected in [0, ..., num_labels].
`head_mask`: an optional torch.Tensor of shape [num_heads] or [num_layers, num_heads] with indices between 0 and 1.
`head_mask`: an optional ``torch.Tensor`` of shape [num_heads] or [num_layers, num_heads] with indices between 0 and 1.
It's a mask to be used to nullify some heads of the transformer. 1.0 => head is fully masked, 0.0 => head is not masked.
It's a mask to be used to nullify some heads of the transformer. 1.0 => head is fully masked, 0.0 => head is not masked.
Returns:
Returns:
if `labels` is not `None`, outputs the CrossEntropy classification loss of the output with the labels.
If ``labels`` is not ``None``, outputs the CrossEntropy classification loss of the output with the labels.
if `labels` is `None`, outputs the classification logits of shape `[batch_size, num_labels]`.
If ``labels`` is ``None``, outputs the classification logits of shape [batch_size, num_labels].
Example::
Example::
...
@@ -1070,27 +1096,27 @@ class BertForMultipleChoice(BertPreTrainedModel):
...
@@ -1070,27 +1096,27 @@ class BertForMultipleChoice(BertPreTrainedModel):