Unverified Commit d1b14c9b authored by Jan Jitse Venselaar's avatar Jan Jitse Venselaar Committed by GitHub
Browse files

Tensorflow doc changes on loss output size (#9922)

* Change documentation to correctly specify loss tensor size

* Change documentation to correct input format for labels

* Corrected output size of loss tensor for sequence classifier, multiple choice model and question answering
parent 343057e1
...@@ -175,7 +175,7 @@ class TFCausalLMOutput(ModelOutput): ...@@ -175,7 +175,7 @@ class TFCausalLMOutput(ModelOutput):
Base class for causal language model (or autoregressive) outputs. Base class for causal language model (or autoregressive) outputs.
Args: Args:
loss (:obj:`tf.Tensor` of shape :obj:`(1,)`, `optional`, returned when :obj:`labels` is provided): loss (:obj:`tf.Tensor` of shape :obj:`(n,)`, `optional`, where n is the number of non-masked labels, returned when :obj:`labels` is provided):
Language modeling loss (for next-token prediction). Language modeling loss (for next-token prediction).
logits (:obj:`tf.Tensor` of shape :obj:`(batch_size, sequence_length, config.vocab_size)`): logits (:obj:`tf.Tensor` of shape :obj:`(batch_size, sequence_length, config.vocab_size)`):
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
...@@ -204,7 +204,7 @@ class TFCausalLMOutputWithPast(ModelOutput): ...@@ -204,7 +204,7 @@ class TFCausalLMOutputWithPast(ModelOutput):
Base class for causal language model (or autoregressive) outputs. Base class for causal language model (or autoregressive) outputs.
Args: Args:
loss (:obj:`tf.Tensor` of shape :obj:`(1,)`, `optional`, returned when :obj:`labels` is provided): loss (:obj:`tf.Tensor` of shape :obj:`(n,)`, `optional`, where n is the number of non-masked labels, returned when :obj:`labels` is provided):
Language modeling loss (for next-token prediction). Language modeling loss (for next-token prediction).
logits (:obj:`tf.Tensor` of shape :obj:`(batch_size, sequence_length, config.vocab_size)`): logits (:obj:`tf.Tensor` of shape :obj:`(batch_size, sequence_length, config.vocab_size)`):
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
...@@ -240,7 +240,7 @@ class TFMaskedLMOutput(ModelOutput): ...@@ -240,7 +240,7 @@ class TFMaskedLMOutput(ModelOutput):
Base class for masked language models outputs. Base class for masked language models outputs.
Args: Args:
loss (:obj:`tf.Tensor` of shape :obj:`(1,)`, `optional`, returned when :obj:`labels` is provided): loss (:obj:`tf.Tensor` of shape :obj:`(n,)`, `optional`, where n is the number of non-masked labels, returned when :obj:`labels` is provided):
Masked language modeling (MLM) loss. Masked language modeling (MLM) loss.
logits (:obj:`tf.Tensor` of shape :obj:`(batch_size, sequence_length, config.vocab_size)`): logits (:obj:`tf.Tensor` of shape :obj:`(batch_size, sequence_length, config.vocab_size)`):
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
...@@ -269,7 +269,7 @@ class TFSeq2SeqLMOutput(ModelOutput): ...@@ -269,7 +269,7 @@ class TFSeq2SeqLMOutput(ModelOutput):
Base class for sequence-to-sequence language models outputs. Base class for sequence-to-sequence language models outputs.
Args: Args:
loss (:obj:`tf.Tensor` of shape :obj:`(1,)`, `optional`, returned when :obj:`labels` is provided): loss (:obj:`tf.Tensor` of shape :obj:`(n,)`, `optional`, where n is the number of non-masked labels, returned when :obj:`labels` is provided):
Language modeling loss. Language modeling loss.
logits (:obj:`tf.Tensor` of shape :obj:`(batch_size, sequence_length, config.vocab_size)`): logits (:obj:`tf.Tensor` of shape :obj:`(batch_size, sequence_length, config.vocab_size)`):
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
...@@ -321,7 +321,7 @@ class TFNextSentencePredictorOutput(ModelOutput): ...@@ -321,7 +321,7 @@ class TFNextSentencePredictorOutput(ModelOutput):
Base class for outputs of models predicting if two sentences are consecutive or not. Base class for outputs of models predicting if two sentences are consecutive or not.
Args: Args:
loss (:obj:`tf.Tensor` of shape :obj:`(1,)`, `optional`, returned when :obj:`next_sentence_label` is provided): loss (:obj:`tf.Tensor` of shape :obj:`(n,)`, `optional`, where n is the number of non-masked labels, returned when :obj:`next_sentence_label` is provided):
Next sentence prediction loss. Next sentence prediction loss.
logits (:obj:`tf.Tensor` of shape :obj:`(batch_size, 2)`): logits (:obj:`tf.Tensor` of shape :obj:`(batch_size, 2)`):
Prediction scores of the next sequence prediction (classification) head (scores of True/False continuation Prediction scores of the next sequence prediction (classification) head (scores of True/False continuation
...@@ -351,7 +351,7 @@ class TFSequenceClassifierOutput(ModelOutput): ...@@ -351,7 +351,7 @@ class TFSequenceClassifierOutput(ModelOutput):
Base class for outputs of sentence classification models. Base class for outputs of sentence classification models.
Args: Args:
loss (:obj:`tf.Tensor` of shape :obj:`(1,)`, `optional`, returned when :obj:`labels` is provided): loss (:obj:`tf.Tensor` of shape :obj:`(batch_size, )`, `optional`, returned when :obj:`labels` is provided):
Classification (or regression if config.num_labels==1) loss. Classification (or regression if config.num_labels==1) loss.
logits (:obj:`tf.Tensor` of shape :obj:`(batch_size, config.num_labels)`): logits (:obj:`tf.Tensor` of shape :obj:`(batch_size, config.num_labels)`):
Classification (or regression if config.num_labels==1) scores (before SoftMax). Classification (or regression if config.num_labels==1) scores (before SoftMax).
...@@ -432,7 +432,7 @@ class TFMultipleChoiceModelOutput(ModelOutput): ...@@ -432,7 +432,7 @@ class TFMultipleChoiceModelOutput(ModelOutput):
Base class for outputs of multiple choice models. Base class for outputs of multiple choice models.
Args: Args:
loss (:obj:`tf.Tensor` of shape `(1,)`, `optional`, returned when :obj:`labels` is provided): loss (:obj:`tf.Tensor` of shape `(batch_size, )`, `optional`, returned when :obj:`labels` is provided):
Classification loss. Classification loss.
logits (:obj:`tf.Tensor` of shape :obj:`(batch_size, num_choices)`): logits (:obj:`tf.Tensor` of shape :obj:`(batch_size, num_choices)`):
`num_choices` is the second dimension of the input tensors. (see `input_ids` above). `num_choices` is the second dimension of the input tensors. (see `input_ids` above).
...@@ -463,7 +463,7 @@ class TFTokenClassifierOutput(ModelOutput): ...@@ -463,7 +463,7 @@ class TFTokenClassifierOutput(ModelOutput):
Base class for outputs of token classification models. Base class for outputs of token classification models.
Args: Args:
loss (:obj:`tf.Tensor` of shape :obj:`(1,)`, `optional`, returned when ``labels`` is provided) : loss (:obj:`tf.Tensor` of shape :obj:`(n,)`, `optional`, where n is the number of unmasked labels, returned when ``labels`` is provided) :
Classification loss. Classification loss.
logits (:obj:`tf.Tensor` of shape :obj:`(batch_size, sequence_length, config.num_labels)`): logits (:obj:`tf.Tensor` of shape :obj:`(batch_size, sequence_length, config.num_labels)`):
Classification scores (before SoftMax). Classification scores (before SoftMax).
...@@ -492,7 +492,7 @@ class TFQuestionAnsweringModelOutput(ModelOutput): ...@@ -492,7 +492,7 @@ class TFQuestionAnsweringModelOutput(ModelOutput):
Base class for outputs of question answering models. Base class for outputs of question answering models.
Args: Args:
loss (:obj:`tf.Tensor` of shape :obj:`(1,)`, `optional`, returned when :obj:`labels` is provided): loss (:obj:`tf.Tensor` of shape :obj:`(batch_size, )`, `optional`, returned when :obj:`start_positions` and :obj:`end_positions` are provided):
Total span extraction loss is the sum of a Cross-Entropy for the start and end positions. Total span extraction loss is the sum of a Cross-Entropy for the start and end positions.
start_logits (:obj:`tf.Tensor` of shape :obj:`(batch_size, sequence_length)`): start_logits (:obj:`tf.Tensor` of shape :obj:`(batch_size, sequence_length)`):
Span-start scores (before SoftMax). Span-start scores (before SoftMax).
...@@ -579,7 +579,7 @@ class TFSequenceClassifierOutputWithPast(ModelOutput): ...@@ -579,7 +579,7 @@ class TFSequenceClassifierOutputWithPast(ModelOutput):
Base class for outputs of sentence classification models. Base class for outputs of sentence classification models.
Args: Args:
loss (:obj:`tf.Tensor` of shape :obj:`(1,)`, `optional`, returned when :obj:`labels` is provided): loss (:obj:`tf.Tensor` of shape :obj:`(batch_size, )`, `optional`, returned when :obj:`labels` is provided):
Classification (or regression if config.num_labels==1) loss. Classification (or regression if config.num_labels==1) loss.
logits (:obj:`tf.Tensor` of shape :obj:`(batch_size, config.num_labels)`): logits (:obj:`tf.Tensor` of shape :obj:`(batch_size, config.num_labels)`):
Classification (or regression if config.num_labels==1) scores (before SoftMax). Classification (or regression if config.num_labels==1) scores (before SoftMax).
......
...@@ -1322,7 +1322,7 @@ class TFBlenderbotForConditionalGeneration(TFBlenderbotPreTrainedModel, TFCausal ...@@ -1322,7 +1322,7 @@ class TFBlenderbotForConditionalGeneration(TFBlenderbotPreTrainedModel, TFCausal
**kwargs, **kwargs,
): ):
r""" r"""
labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`): labels (:obj:`tf.tensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):
Labels for computing the masked language modeling loss. Indices should either be in ``[0, ..., Labels for computing the masked language modeling loss. Indices should either be in ``[0, ...,
config.vocab_size]`` or -100 (see ``input_ids`` docstring). Tokens with indices set to ``-100`` are ignored config.vocab_size]`` or -100 (see ``input_ids`` docstring). Tokens with indices set to ``-100`` are ignored
(masked), the loss is only computed for the tokens with labels in ``[0, ..., config.vocab_size]``. (masked), the loss is only computed for the tokens with labels in ``[0, ..., config.vocab_size]``.
......
...@@ -1297,7 +1297,7 @@ class TFBlenderbotSmallForConditionalGeneration(TFBlenderbotSmallPreTrainedModel ...@@ -1297,7 +1297,7 @@ class TFBlenderbotSmallForConditionalGeneration(TFBlenderbotSmallPreTrainedModel
**kwargs, **kwargs,
): ):
r""" r"""
labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`): labels (:obj:`tf.tensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):
Labels for computing the masked language modeling loss. Indices should either be in ``[0, ..., Labels for computing the masked language modeling loss. Indices should either be in ``[0, ...,
config.vocab_size]`` or -100 (see ``input_ids`` docstring). Tokens with indices set to ``-100`` are ignored config.vocab_size]`` or -100 (see ``input_ids`` docstring). Tokens with indices set to ``-100`` are ignored
(masked), the loss is only computed for the tokens with labels in ``[0, ..., config.vocab_size]``. (masked), the loss is only computed for the tokens with labels in ``[0, ..., config.vocab_size]``.
......
...@@ -1314,7 +1314,7 @@ class TFMarianMTModel(TFMarianPreTrainedModel, TFCausalLanguageModelingLoss): ...@@ -1314,7 +1314,7 @@ class TFMarianMTModel(TFMarianPreTrainedModel, TFCausalLanguageModelingLoss):
**kwargs, **kwargs,
): ):
r""" r"""
labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`): labels (:obj:`tf.tensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):
Labels for computing the masked language modeling loss. Indices should either be in ``[0, ..., Labels for computing the masked language modeling loss. Indices should either be in ``[0, ...,
config.vocab_size]`` or -100 (see ``input_ids`` docstring). Tokens with indices set to ``-100`` are ignored config.vocab_size]`` or -100 (see ``input_ids`` docstring). Tokens with indices set to ``-100`` are ignored
(masked), the loss is only computed for the tokens with labels in ``[0, ..., config.vocab_size]``. (masked), the loss is only computed for the tokens with labels in ``[0, ..., config.vocab_size]``.
......
...@@ -1328,7 +1328,7 @@ class TFPegasusForConditionalGeneration(TFPegasusPreTrainedModel, TFCausalLangua ...@@ -1328,7 +1328,7 @@ class TFPegasusForConditionalGeneration(TFPegasusPreTrainedModel, TFCausalLangua
**kwargs, **kwargs,
): ):
""" """
labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`): labels (:obj:`tf.tensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):
Labels for computing the masked language modeling loss. Indices should either be in ``[0, ..., Labels for computing the masked language modeling loss. Indices should either be in ``[0, ...,
config.vocab_size]`` or -100 (see ``input_ids`` docstring). Tokens with indices set to ``-100`` are ignored config.vocab_size]`` or -100 (see ``input_ids`` docstring). Tokens with indices set to ``-100`` are ignored
(masked), the loss is only computed for the tokens with labels in ``[0, ..., config.vocab_size]``. (masked), the loss is only computed for the tokens with labels in ``[0, ..., config.vocab_size]``.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment