**start_positions**: (`optional`) ``torch.LongTensor`` of shape ``(batch_size,)``:
start_positions (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`):
Labels for position (index) of the start of the labelled span for computing the token classification loss.
Labels for position (index) of the start of the labelled span for computing the token classification loss.
Positions are clamped to the length of the sequence (`sequence_length`).
Positions are clamped to the length of the sequence (`sequence_length`).
Position outside of the sequence are not taken into account for computing the loss.
Position outside of the sequence are not taken into account for computing the loss.
**end_positions**: (`optional`) ``torch.LongTensor`` of shape ``(batch_size,)``:
end_positions (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`):
Labels for position (index) of the end of the labelled span for computing the token classification loss.
Labels for position (index) of the end of the labelled span for computing the token classification loss.
Positions are clamped to the length of the sequence (`sequence_length`).
Positions are clamped to the length of the sequence (`sequence_length`).
Position outside of the sequence are not taken into account for computing the loss.
Position outside of the sequence are not taken into account for computing the loss.
**is_impossible**: (`optional`) ``torch.LongTensor`` of shape ``(batch_size,)``:
is_impossible (``torch.LongTensor`` of shape ``(batch_size,)``, `optional`, defaults to :obj:`None`):
Labels whether a question has an answer or no answer (SQuAD 2.0)
Labels whether a question has an answer or no answer (SQuAD 2.0)
**cls_index**: (`optional`) ``torch.LongTensor`` of shape ``(batch_size,)``:
cls_index (``torch.LongTensor`` of shape ``(batch_size,)``, `optional`, defaults to :obj:`None`):
Labels for position (index) of the classification token to use as input for computing plausibility of the answer.
Labels for position (index) of the classification token to use as input for computing plausibility of the answer.
**p_mask**: (`optional`) ``torch.FloatTensor`` of shape ``(batch_size, sequence_length)``:
p_mask (``torch.FloatTensor`` of shape ``(batch_size, sequence_length)``, `optional`, defaults to :obj:`None`):
Optional mask of tokens which can't be in answers (e.g. [CLS], [PAD], ...).
Optional mask of tokens which can't be in answers (e.g. [CLS], [PAD], ...).
1.0 means token should be masked. 0.0 mean token is not masked.
1.0 means token should be masked. 0.0 mean token is not masked.
Outputs: `Tuple` comprising various elements depending on the configuration (config) and inputs:
Returns:
**loss**: (`optional`, returned if both ``start_positions`` and ``end_positions`` are provided) ``torch.FloatTensor`` of shape ``(1,)``:
:obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (config) and inputs:
loss (:obj:`torch.FloatTensor` of shape :obj:`(1,)`, `optional`, returned if both :obj:`start_positions` and :obj:`end_positions` are provided):
Classification loss as the sum of start token, end token (and is_impossible if provided) classification losses.
Classification loss as the sum of start token, end token (and is_impossible if provided) classification losses.
**start_top_log_probs**: (`optional`, returned if ``start_positions`` or ``end_positions`` is not provided)
start_top_log_probs (``torch.FloatTensor`` of shape ``(batch_size, config.start_n_top)``, `optional`, returned if ``start_positions`` or ``end_positions`` is not provided):
``torch.FloatTensor`` of shape ``(batch_size, config.start_n_top)``
Log probabilities for the top config.start_n_top start token possibilities (beam-search).
Log probabilities for the top config.start_n_top start token possibilities (beam-search).
**start_top_index**: (`optional`, returned if ``start_positions`` or ``end_positions`` is not provided)
start_top_index (``torch.LongTensor`` of shape ``(batch_size, config.start_n_top)``, `optional`, returned if ``start_positions`` or ``end_positions`` is not provided):
``torch.LongTensor`` of shape ``(batch_size, config.start_n_top)``
Indices for the top config.start_n_top start token possibilities (beam-search).
Indices for the top config.start_n_top start token possibilities (beam-search).
**end_top_log_probs**: (`optional`, returned if ``start_positions`` or ``end_positions`` is not provided)
end_top_log_probs (``torch.FloatTensor`` of shape ``(batch_size, config.start_n_top * config.end_n_top)``, `optional`, returned if ``start_positions`` or ``end_positions`` is not provided):
``torch.FloatTensor`` of shape ``(batch_size, config.start_n_top * config.end_n_top)``
Log probabilities for the top ``config.start_n_top * config.end_n_top`` end token possibilities (beam-search).
Log probabilities for the top ``config.start_n_top * config.end_n_top`` end token possibilities (beam-search).
**end_top_index**: (`optional`, returned if ``start_positions`` or ``end_positions`` is not provided)
end_top_index (``torch.LongTensor`` of shape ``(batch_size, config.start_n_top * config.end_n_top)``, `optional`, returned if ``start_positions`` or ``end_positions`` is not provided):
``torch.LongTensor`` of shape ``(batch_size, config.start_n_top * config.end_n_top)``
Indices for the top ``config.start_n_top * config.end_n_top`` end token possibilities (beam-search).
Indices for the top ``config.start_n_top * config.end_n_top`` end token possibilities (beam-search).
**cls_logits**: (`optional`, returned if ``start_positions`` or ``end_positions`` is not provided)
cls_logits (``torch.FloatTensor`` of shape ``(batch_size,)``, `optional`, returned if ``start_positions`` or ``end_positions`` is not provided):
``torch.FloatTensor`` of shape ``(batch_size,)``
Log probabilities for the ``is_impossible`` label of the answers.
Log probabilities for the ``is_impossible`` label of the answers.
**mems**: (`optional`, returned when ``config.mem_len > 0``)
mems (:obj:`List[torch.FloatTensor]` of length :obj:`config.n_layers`):
list of ``torch.FloatTensor`` (one for each layer):
Contains pre-computed hidden-states (key and values in the attention blocks).
that contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model
Can be used (see `past` input) to speed up sequential decoding. The token ids which have their past given to this model
if config.mem_len > 0 else tuple of None. Can be used to speed up sequential decoding and attend to longer context.
should not be passed as input ids as they have already been computed.
See details in the docstring of the `mems` input above.
hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``config.output_hidden_states=True``):
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer)
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
of shape :obj:`(batch_size, sequence_length, hidden_size)`.
of shape ``(batch_size, sequence_length, hidden_size)``:
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``config.output_attentions=True``):
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.