"git@developer.sourcefind.cn:chenpangpang/transformers.git" did not exist on "6e161955105f7e012dba5d51842923fc25fc5cdf"
Unverified Commit 91cb9546 authored by Sylvain Gugger's avatar Sylvain Gugger Committed by GitHub
Browse files

Switch from return_tuple to return_dict (#6138)



* Switch from return_tuple to return_dict

* Fix test

* [WIP] Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleC… (#5614)

* Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleChoice} models and tests

* AutoModels


Tiny tweaks

* Style

* Final changes before merge

* Re-order for simpler review

* Final fixes

* Addressing @sgugger's comments

* Test MultipleChoice

* Rework TF trainer (#6038)

* Fully rework training/prediction loops

* fix method name

* Fix variable name

* Fix property name

* Fix scope

* Fix method name

* Fix tuple index

* Fix tuple index

* Fix indentation

* Fix variable name

* fix eval before log

* Add drop remainder for test dataset

* Fix step number + fix logging datetime

* fix eval loss value

* use global step instead of step + fix logging at step 0

* Fix logging datetime

* Fix global_step usage

* Fix breaking loop + logging datetime

* Fix step in prediction loop

* Fix step breaking

* Fix train/test loops

* Force TF at least 2.2 for the trainer

* Use assert_cardinality to facilitate the dataset size computation

* Log steps per epoch

* Make tfds compliant with TPU

* Make tfds compliant with TPU

* Use TF dataset enumerate instead of the Python one

* revert previous commit

* Fix data_dir

* Apply style

* rebase on master

* Address Sylvain's comments

* Address Sylvain's and Lysandre comments

* Trigger CI

* Remove unused import

* Switch from return_tuple to return_dict

* Fix test

* Add recent model
Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
Co-authored-by: default avatarJulien Plu <plu.julien@gmail.com>
parent 562b6369
...@@ -230,19 +230,16 @@ final activations of the model. ...@@ -230,19 +230,16 @@ final activations of the model.
>>> ## PYTORCH CODE >>> ## PYTORCH CODE
>>> print(pt_outputs) >>> print(pt_outputs)
SequenceClassifierOutput(loss=None, logits=tensor([[-4.0833, 4.3364], (tensor([[-4.0833, 4.3364],
[ 0.0818, -0.0418]], grad_fn=<AddmmBackward>), hidden_states=None, attentions=None) [ 0.0818, -0.0418]], grad_fn=<AddmmBackward>),)
>>> ## TENSORFLOW CODE >>> ## TENSORFLOW CODE
>>> print(tf_outputs) >>> print(tf_outputs)
(<tf.Tensor: shape=(2, 2), dtype=float32, numpy= (<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[-4.0832963 , 4.336414 ], array([[-4.0832963 , 4.336414 ],
[ 0.08181786, -0.04179301]], dtype=float32)>,) [ 0.08181786, -0.04179301]], dtype=float32)>,)
The model can return more than just the final activations, which is why the PyTorch output is a special class and the The model can return more than just the final activations, which is why the output is a tuple. Here we only asked for
TensorFlow output is a tuple. Here we only asked for the final activations, so we get a tuple with one element on the the final activations, so we get a tuple with one element.
TensorFlow side and a :class:`~transformers.modeling_outputs.SequenceClassifierOutput` with just the ``logits`` field
filled on the PyTorch side.
.. note:: .. note::
All 🤗 Transformers models (PyTorch or TensorFlow) return the activations of the model *before* the final All 🤗 Transformers models (PyTorch or TensorFlow) return the activations of the model *before* the final
...@@ -254,7 +251,7 @@ Let's apply the SoftMax activation to get predictions. ...@@ -254,7 +251,7 @@ Let's apply the SoftMax activation to get predictions.
>>> ## PYTORCH CODE >>> ## PYTORCH CODE
>>> import torch.nn.functional as F >>> import torch.nn.functional as F
>>> pt_predictions = F.softmax(pt_outputs.logits, dim=-1) >>> pt_predictions = F.softmax(pt_outputs[0], dim=-1)
>>> ## TENSORFLOW CODE >>> ## TENSORFLOW CODE
>>> import tensorflow as tf >>> import tensorflow as tf
>>> tf_predictions = tf.nn.softmax(tf_outputs[0], axis=-1) >>> tf_predictions = tf.nn.softmax(tf_outputs[0], axis=-1)
...@@ -341,8 +338,8 @@ code is easy to access and tweak if you need to. ...@@ -341,8 +338,8 @@ code is easy to access and tweak if you need to.
In our previous example, the model was called "distilbert-base-uncased-finetuned-sst-2-english", which means it's In our previous example, the model was called "distilbert-base-uncased-finetuned-sst-2-english", which means it's
using the :doc:`DistilBERT </model_doc/distilbert>` architecture. As using the :doc:`DistilBERT </model_doc/distilbert>` architecture. As
:class:`~transformers.AutoModelForSequenceClassification` (or :class:`~transformers.TFAutoModelForSequenceClassification` :class:`~transformers.AutoModelForSequenceClassification` (or :class:`~transformers.TFAutoModelForSequenceClassification`
if you are using TensorFlow)` was used, the model automatically created is then a if you are using TensorFlow) was used, the model automatically created is then a
:class:`~transformers.DistilBertForSequenceClassification`. You can look at its documentation for all details relevant :class:`~transformers.DistilBertForSequenceClassification`. You can look at its documentation for all details relevant
to that specific model, or browse the source code. This is how you would directly instantiate model and tokenizer to that specific model, or browse the source code. This is how you would directly instantiate model and tokenizer
without the auto magic: without the auto magic:
......
...@@ -49,7 +49,7 @@ put it in train mode. ...@@ -49,7 +49,7 @@ put it in train mode.
.. code-block:: python .. code-block:: python
from transformers import BertForSequenceClassification from transformers import BertForSequenceClassification
model = BertForSequenceClassification.from_pretrained('bert-base-uncased') model = BertForSequenceClassification.from_pretrained('bert-base-uncased', return_dict=True)
model.train() model.train()
This is useful because it allows us to make use of the pre-trained BERT This is useful because it allows us to make use of the pre-trained BERT
......
...@@ -199,9 +199,6 @@ def train(args, train_dataset, model, tokenizer): ...@@ -199,9 +199,6 @@ def train(args, train_dataset, model, tokenizer):
{"langs": (torch.ones(batch[0].shape, dtype=torch.int64) * args.lang_id).to(args.device)} {"langs": (torch.ones(batch[0].shape, dtype=torch.int64) * args.lang_id).to(args.device)}
) )
if isinstance(model, torch.nn.DataParallel):
inputs["return_tuple"] = True
outputs = model(**inputs) outputs = model(**inputs)
# model outputs are always tuple in transformers (see doc) # model outputs are always tuple in transformers (see doc)
loss = outputs[0] loss = outputs[0]
...@@ -316,8 +313,6 @@ def evaluate(args, model, tokenizer, prefix=""): ...@@ -316,8 +313,6 @@ def evaluate(args, model, tokenizer, prefix=""):
inputs.update( inputs.update(
{"langs": (torch.ones(batch[0].shape, dtype=torch.int64) * args.lang_id).to(args.device)} {"langs": (torch.ones(batch[0].shape, dtype=torch.int64) * args.lang_id).to(args.device)}
) )
if isinstance(model, torch.nn.DataParallel):
inputs["return_tuple"] = True
outputs = model(**inputs) outputs = model(**inputs)
for i, feature_index in enumerate(feature_indices): for i, feature_index in enumerate(feature_indices):
......
...@@ -144,7 +144,7 @@ class TestSummarizationDistiller(unittest.TestCase): ...@@ -144,7 +144,7 @@ class TestSummarizationDistiller(unittest.TestCase):
evaluate_checkpoint(ckpts[0], dest_dir=Path(tempfile.mkdtemp())) evaluate_checkpoint(ckpts[0], dest_dir=Path(tempfile.mkdtemp()))
def test_loss_fn(self): def test_loss_fn(self):
model = AutoModelForSeq2SeqLM.from_pretrained(BART_TINY) model = AutoModelForSeq2SeqLM.from_pretrained(BART_TINY, return_dict=True)
input_ids, mask = model.dummy_inputs["input_ids"], model.dummy_inputs["attention_mask"] input_ids, mask = model.dummy_inputs["input_ids"], model.dummy_inputs["attention_mask"]
target_ids = torch.tensor([[0, 4, 8, 2], [0, 8, 2, 1]], dtype=torch.long, device=model.device) target_ids = torch.tensor([[0, 4, 8, 2], [0, 8, 2, 1]], dtype=torch.long, device=model.device)
decoder_input_ids = target_ids[:, :-1].contiguous() # Why this line? decoder_input_ids = target_ids[:, :-1].contiguous() # Why this line?
......
...@@ -49,8 +49,9 @@ class PretrainedConfig(object): ...@@ -49,8 +49,9 @@ class PretrainedConfig(object):
Whether or not the model should returns all attentions. Whether or not the model should returns all attentions.
use_cache (:obj:`bool`, `optional`, defaults to :obj:`True`): use_cache (:obj:`bool`, `optional`, defaults to :obj:`True`):
Whether or not the model should return the last key/values attentions (not used by all models). Whether or not the model should return the last key/values attentions (not used by all models).
return_tuple (:obj:`bool`, `optional`, defaults to :obj:`False`): return_dict (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not the model should return tuples instead of :obj:`ModelOutput` objects. Whether or not the model should return a :class:`~transformers.file_utils.ModelOutput` instead of a
plain tuple.
is_encoder_decoder (:obj:`bool`, `optional`, defaults to :obj:`False`): is_encoder_decoder (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether the model is used as an encoder/decoder or not. Whether the model is used as an encoder/decoder or not.
is_decoder (:obj:`bool`, `optional`, defaults to :obj:`False`): is_decoder (:obj:`bool`, `optional`, defaults to :obj:`False`):
...@@ -133,7 +134,7 @@ class PretrainedConfig(object): ...@@ -133,7 +134,7 @@ class PretrainedConfig(object):
def __init__(self, **kwargs): def __init__(self, **kwargs):
# Attributes with defaults # Attributes with defaults
self.return_tuple = kwargs.pop("return_tuple", False) self.return_dict = kwargs.pop("return_dict", False)
self.output_hidden_states = kwargs.pop("output_hidden_states", False) self.output_hidden_states = kwargs.pop("output_hidden_states", False)
self.output_attentions = kwargs.pop("output_attentions", False) self.output_attentions = kwargs.pop("output_attentions", False)
self.use_cache = kwargs.pop("use_cache", True) # Not used by all models self.use_cache = kwargs.pop("use_cache", True) # Not used by all models
...@@ -194,12 +195,12 @@ class PretrainedConfig(object): ...@@ -194,12 +195,12 @@ class PretrainedConfig(object):
raise err raise err
@property @property
def use_return_tuple(self) -> bool: def use_return_dict(self) -> bool:
""" """
:obj:`bool`: Whether or not the model should return a tuple. :obj:`bool`: Whether or not return :class:`~transformers.file_utils.ModelOutput` instead of tuples.
""" """
# If torchscript is set, force return_tuple to avoid jit errors # If torchscript is set, force `return_dict=False` to avoid jit errors
return self.return_tuple or self.torchscript return self.return_dict and not self.torchscript
@property @property
def num_labels(self) -> int: def num_labels(self) -> int:
......
...@@ -13,14 +13,17 @@ import shutil ...@@ -13,14 +13,17 @@ import shutil
import sys import sys
import tarfile import tarfile
import tempfile import tempfile
from collections import OrderedDict
from contextlib import contextmanager from contextlib import contextmanager
from dataclasses import fields
from functools import partial, wraps from functools import partial, wraps
from hashlib import sha256 from hashlib import sha256
from pathlib import Path from pathlib import Path
from typing import Dict, Optional, Union from typing import Any, Dict, Optional, Tuple, Union
from urllib.parse import urlparse from urllib.parse import urlparse
from zipfile import ZipFile, is_zipfile from zipfile import ZipFile, is_zipfile
import numpy as np
import requests import requests
from filelock import FileLock from filelock import FileLock
from tqdm.auto import tqdm from tqdm.auto import tqdm
...@@ -190,8 +193,8 @@ def add_end_docstrings(*docstr): ...@@ -190,8 +193,8 @@ def add_end_docstrings(*docstr):
RETURN_INTRODUCTION = r""" RETURN_INTRODUCTION = r"""
Returns: Returns:
:class:`~{full_output_type}` or :obj:`tuple(torch.FloatTensor)`: :class:`~{full_output_type}` or :obj:`tuple(torch.FloatTensor)`:
A :class:`~{full_output_type}` or a tuple of :obj:`torch.FloatTensor` (if ``return_tuple=True`` is passed or A :class:`~{full_output_type}` (if ``return_dict=True`` is passed or when ``config.return_dict=True``) or a
when ``config.return_tuple=True``) comprising various elements depending on the configuration tuple of :obj:`torch.FloatTensor` comprising various elements depending on the configuration
(:class:`~transformers.{config_class}`) and inputs. (:class:`~transformers.{config_class}`) and inputs.
""" """
...@@ -257,7 +260,7 @@ PT_TOKEN_CLASSIFICATION_SAMPLE = r""" ...@@ -257,7 +260,7 @@ PT_TOKEN_CLASSIFICATION_SAMPLE = r"""
>>> import torch >>> import torch
>>> tokenizer = {tokenizer_class}.from_pretrained('{checkpoint}') >>> tokenizer = {tokenizer_class}.from_pretrained('{checkpoint}')
>>> model = {model_class}.from_pretrained('{checkpoint}') >>> model = {model_class}.from_pretrained('{checkpoint}', return_dict=True)
>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> labels = torch.tensor([1] * inputs["input_ids"].size(1)).unsqueeze(0) # Batch size 1 >>> labels = torch.tensor([1] * inputs["input_ids"].size(1)).unsqueeze(0) # Batch size 1
...@@ -274,7 +277,7 @@ PT_QUESTION_ANSWERING_SAMPLE = r""" ...@@ -274,7 +277,7 @@ PT_QUESTION_ANSWERING_SAMPLE = r"""
>>> import torch >>> import torch
>>> tokenizer = {tokenizer_class}.from_pretrained('{checkpoint}') >>> tokenizer = {tokenizer_class}.from_pretrained('{checkpoint}')
>>> model = {model_class}.from_pretrained('{checkpoint}') >>> model = {model_class}.from_pretrained('{checkpoint}', return_dict=True)
>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> start_positions = torch.tensor([1]) >>> start_positions = torch.tensor([1])
...@@ -293,7 +296,7 @@ PT_SEQUENCE_CLASSIFICATION_SAMPLE = r""" ...@@ -293,7 +296,7 @@ PT_SEQUENCE_CLASSIFICATION_SAMPLE = r"""
>>> import torch >>> import torch
>>> tokenizer = {tokenizer_class}.from_pretrained('{checkpoint}') >>> tokenizer = {tokenizer_class}.from_pretrained('{checkpoint}')
>>> model = {model_class}.from_pretrained('{checkpoint}') >>> model = {model_class}.from_pretrained('{checkpoint}', return_dict=True)
>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> labels = torch.tensor([1]).unsqueeze(0) # Batch size 1 >>> labels = torch.tensor([1]).unsqueeze(0) # Batch size 1
...@@ -309,7 +312,7 @@ PT_MASKED_LM_SAMPLE = r""" ...@@ -309,7 +312,7 @@ PT_MASKED_LM_SAMPLE = r"""
>>> import torch >>> import torch
>>> tokenizer = {tokenizer_class}.from_pretrained('{checkpoint}') >>> tokenizer = {tokenizer_class}.from_pretrained('{checkpoint}')
>>> model = {model_class}.from_pretrained('{checkpoint}') >>> model = {model_class}.from_pretrained('{checkpoint}', return_dict=True)
>>> input_ids = tokenizer("Hello, my dog is cute", return_tensors="pt")["input_ids"] >>> input_ids = tokenizer("Hello, my dog is cute", return_tensors="pt")["input_ids"]
...@@ -325,7 +328,7 @@ PT_BASE_MODEL_SAMPLE = r""" ...@@ -325,7 +328,7 @@ PT_BASE_MODEL_SAMPLE = r"""
>>> import torch >>> import torch
>>> tokenizer = {tokenizer_class}.from_pretrained('{checkpoint}') >>> tokenizer = {tokenizer_class}.from_pretrained('{checkpoint}')
>>> model = {model_class}.from_pretrained('{checkpoint}') >>> model = {model_class}.from_pretrained('{checkpoint}', return_dict=True)
>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> outputs = model(**inputs) >>> outputs = model(**inputs)
...@@ -340,7 +343,7 @@ PT_MULTIPLE_CHOICE_SAMPLE = r""" ...@@ -340,7 +343,7 @@ PT_MULTIPLE_CHOICE_SAMPLE = r"""
>>> import torch >>> import torch
>>> tokenizer = {tokenizer_class}.from_pretrained('{checkpoint}') >>> tokenizer = {tokenizer_class}.from_pretrained('{checkpoint}')
>>> model = {model_class}.from_pretrained('{checkpoint}') >>> model = {model_class}.from_pretrained('{checkpoint}', return_dict=True)
>>> prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced." >>> prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced."
>>> choice0 = "It is eaten with a fork and a knife." >>> choice0 = "It is eaten with a fork and a knife."
...@@ -362,7 +365,7 @@ PT_CAUSAL_LM_SAMPLE = r""" ...@@ -362,7 +365,7 @@ PT_CAUSAL_LM_SAMPLE = r"""
>>> from transformers import {tokenizer_class}, {model_class} >>> from transformers import {tokenizer_class}, {model_class}
>>> tokenizer = {tokenizer_class}.from_pretrained('{checkpoint}') >>> tokenizer = {tokenizer_class}.from_pretrained('{checkpoint}')
>>> model = {model_class}.from_pretrained('{checkpoint}') >>> model = {model_class}.from_pretrained('{checkpoint}', return_dict=True)
>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> outputs = model(**inputs, labels=inputs["input_ids"]) >>> outputs = model(**inputs, labels=inputs["input_ids"])
...@@ -900,30 +903,91 @@ def tf_required(func): ...@@ -900,30 +903,91 @@ def tf_required(func):
return wrapper return wrapper
class ModelOutput: def is_tensor(x):
""" Tests if ``x`` is a :obj:`torch.Tensor`, :obj:`tf.Tensor` or :obj:`np.ndarray`. """
if is_torch_available():
import torch
if isinstance(x, torch.Tensor):
return True
if is_tf_available():
import tensorflow as tf
if isinstance(x, tf.Tensor):
return True
return isinstance(x, np.ndarray)
class ModelOutput(OrderedDict):
""" """
Base class for all model outputs as dataclass. Has a ``__getitem__`` that allows indexing by integer or slice (like Base class for all model outputs as dataclass. Has a ``__getitem__`` that allows indexing by integer or slice (like
a tuple) or strings (like a dictionnary) that will ignore the ``None`` attributes. a tuple) or strings (like a dictionnary) that will ignore the ``None`` attributes. Otherwise behaves like a
regular python dictionary.
.. warning::
You can't unpack a :obj:`ModelOutput` directly. Use the :meth:`~transformers.file_utils.ModelOutput.to_tuple`
method to convert it to a tuple before.
""" """
def to_tuple(self): def __post_init__(self):
""" class_fields = fields(self)
Converts :obj:`self` to a tuple.
# Safety and consistency checks
assert len(class_fields), f"{self.__class__.__name__} has no fields."
assert all(
field.default is None for field in class_fields[1:]
), f"{self.__class__.__name__} should not have more than one required field."
first_field = getattr(self, class_fields[0].name)
other_fields_are_none = all(getattr(self, field.name) is None for field in class_fields[1:])
if other_fields_are_none and not is_tensor(first_field):
try:
iterator = iter(first_field)
first_field_iterator = True
except TypeError:
first_field_iterator = False
# if we provided an iterator as first field and the iterator is a (key, value) iterator
# set the associated fields
if first_field_iterator:
for element in iterator:
if (
not isinstance(element, (list, tuple))
or not len(element) == 2
or not isinstance(element[0], str)
):
break
setattr(self, element[0], element[1])
if element[1] is not None:
self[element[0]] = element[1]
else:
for field in class_fields:
v = getattr(self, field.name)
if v is not None:
self[field.name] = v
Return: A tuple containing all non-:obj:`None` attributes of the :obj:`self`. def __delitem__(self, *args, **kwargs):
""" raise Exception(f"You cannot use ``__delitem__`` on a {self.__class__.__name__} instance.")
return tuple(getattr(self, f) for f in self.__dataclass_fields__.keys() if getattr(self, f, None) is not None)
def to_dict(self): def setdefault(self, *args, **kwargs):
""" raise Exception(f"You cannot use ``setdefault`` on a {self.__class__.__name__} instance.")
Converts :obj:`self` to a Python dictionary.
Return: A dictionary containing all non-:obj:`None` attributes of the :obj:`self`. def pop(self, *args, **kwargs):
""" raise Exception(f"You cannot use ``pop`` on a {self.__class__.__name__} instance.")
return {f: getattr(self, f) for f in self.__dataclass_fields__.keys() if getattr(self, f, None) is not None}
def update(self, *args, **kwargs):
raise Exception(f"You cannot use ``update`` on a {self.__class__.__name__} instance.")
def __getitem__(self, i): def __getitem__(self, k):
return self.to_dict()[i] if isinstance(i, str) else self.to_tuple()[i] if isinstance(k, str):
inner_dict = {k: v for (k, v) in self.items()}
return inner_dict[k]
else:
return self.to_tuple()[k]
def __len__(self): def to_tuple(self) -> Tuple[Any]:
return len(self.to_tuple()) """
Convert self to a tuple containing all the attributes/keys that are not ``None``.
"""
return tuple(self[k] for k in self.keys())
...@@ -346,7 +346,7 @@ class AlbertTransformer(nn.Module): ...@@ -346,7 +346,7 @@ class AlbertTransformer(nn.Module):
head_mask=None, head_mask=None,
output_attentions=False, output_attentions=False,
output_hidden_states=False, output_hidden_states=False,
return_tuple=False, return_dict=False,
): ):
hidden_states = self.embedding_hidden_mapping_in(hidden_states) hidden_states = self.embedding_hidden_mapping_in(hidden_states)
...@@ -375,7 +375,7 @@ class AlbertTransformer(nn.Module): ...@@ -375,7 +375,7 @@ class AlbertTransformer(nn.Module):
if output_hidden_states: if output_hidden_states:
all_hidden_states = all_hidden_states + (hidden_states,) all_hidden_states = all_hidden_states + (hidden_states,)
if return_tuple: if not return_dict:
return tuple(v for v in [hidden_states, all_hidden_states, all_attentions] if v is not None) return tuple(v for v in [hidden_states, all_hidden_states, all_attentions] if v is not None)
return BaseModelOutput( return BaseModelOutput(
last_hidden_state=hidden_states, hidden_states=all_hidden_states, attentions=all_attentions last_hidden_state=hidden_states, hidden_states=all_hidden_states, attentions=all_attentions
...@@ -430,9 +430,9 @@ class AlbertForPretrainingOutput(ModelOutput): ...@@ -430,9 +430,9 @@ class AlbertForPretrainingOutput(ModelOutput):
heads. heads.
""" """
loss: Optional[torch.FloatTensor] loss: Optional[torch.FloatTensor] = None
prediction_logits: torch.FloatTensor prediction_logits: torch.FloatTensor = None
sop_logits: torch.FloatTensor sop_logits: torch.FloatTensor = None
hidden_states: Optional[Tuple[torch.FloatTensor]] = None hidden_states: Optional[Tuple[torch.FloatTensor]] = None
attentions: Optional[Tuple[torch.FloatTensor]] = None attentions: Optional[Tuple[torch.FloatTensor]] = None
...@@ -488,8 +488,9 @@ ALBERT_INPUTS_DOCSTRING = r""" ...@@ -488,8 +488,9 @@ ALBERT_INPUTS_DOCSTRING = r"""
If set to ``True``, the attentions tensors of all attention layers are returned. See ``attentions`` under returned tensors for more detail. If set to ``True``, the attentions tensors of all attention layers are returned. See ``attentions`` under returned tensors for more detail.
output_hidden_states (:obj:`bool`, `optional`, defaults to :obj:`None`): output_hidden_states (:obj:`bool`, `optional`, defaults to :obj:`None`):
If set to ``True``, the hidden states of all layers are returned. See ``hidden_states`` under returned tensors for more detail. If set to ``True``, the hidden states of all layers are returned. See ``hidden_states`` under returned tensors for more detail.
return_tuple (:obj:`bool`, `optional`, defaults to :obj:`None`): return_dict (:obj:`bool`, `optional`, defaults to :obj:`None`):
If set to ``True``, the output of the model will be a plain tuple instead of a ``dataclass``. If set to ``True``, the model will return a :class:`~transformers.file_utils.ModelOutput` instead of a
plain tuple.
""" """
...@@ -561,13 +562,13 @@ class AlbertModel(AlbertPreTrainedModel): ...@@ -561,13 +562,13 @@ class AlbertModel(AlbertPreTrainedModel):
inputs_embeds=None, inputs_embeds=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
): ):
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
output_hidden_states = ( output_hidden_states = (
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
) )
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
if input_ids is not None and inputs_embeds is not None: if input_ids is not None and inputs_embeds is not None:
raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time") raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
...@@ -599,14 +600,14 @@ class AlbertModel(AlbertPreTrainedModel): ...@@ -599,14 +600,14 @@ class AlbertModel(AlbertPreTrainedModel):
head_mask=head_mask, head_mask=head_mask,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
sequence_output = encoder_outputs[0] sequence_output = encoder_outputs[0]
pooled_output = self.pooler_activation(self.pooler(sequence_output[:, 0])) pooled_output = self.pooler_activation(self.pooler(sequence_output[:, 0]))
if return_tuple: if not return_dict:
return (sequence_output, pooled_output) + encoder_outputs[1:] return (sequence_output, pooled_output) + encoder_outputs[1:]
return BaseModelOutputWithPooling( return BaseModelOutputWithPooling(
...@@ -653,7 +654,7 @@ class AlbertForPreTraining(AlbertPreTrainedModel): ...@@ -653,7 +654,7 @@ class AlbertForPreTraining(AlbertPreTrainedModel):
sentence_order_label=None, sentence_order_label=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
**kwargs, **kwargs,
): ):
r""" r"""
...@@ -678,7 +679,7 @@ class AlbertForPreTraining(AlbertPreTrainedModel): ...@@ -678,7 +679,7 @@ class AlbertForPreTraining(AlbertPreTrainedModel):
>>> import torch >>> import torch
>>> tokenizer = AlbertTokenizer.from_pretrained('albert-base-v2') >>> tokenizer = AlbertTokenizer.from_pretrained('albert-base-v2')
>>> model = AlbertForPreTraining.from_pretrained('albert-base-v2') >>> model = AlbertForPreTraining.from_pretrained('albert-base-v2', return_dict=True)
>>> input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0) # Batch size 1 >>> input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0) # Batch size 1
>>> outputs = model(input_ids) >>> outputs = model(input_ids)
...@@ -695,7 +696,7 @@ class AlbertForPreTraining(AlbertPreTrainedModel): ...@@ -695,7 +696,7 @@ class AlbertForPreTraining(AlbertPreTrainedModel):
) )
labels = kwargs.pop("masked_lm_labels") labels = kwargs.pop("masked_lm_labels")
assert kwargs == {}, f"Unexpected keyword arguments: {list(kwargs.keys())}." assert kwargs == {}, f"Unexpected keyword arguments: {list(kwargs.keys())}."
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
outputs = self.albert( outputs = self.albert(
input_ids, input_ids,
...@@ -706,7 +707,7 @@ class AlbertForPreTraining(AlbertPreTrainedModel): ...@@ -706,7 +707,7 @@ class AlbertForPreTraining(AlbertPreTrainedModel):
inputs_embeds=inputs_embeds, inputs_embeds=inputs_embeds,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
sequence_output, pooled_output = outputs[:2] sequence_output, pooled_output = outputs[:2]
...@@ -721,7 +722,7 @@ class AlbertForPreTraining(AlbertPreTrainedModel): ...@@ -721,7 +722,7 @@ class AlbertForPreTraining(AlbertPreTrainedModel):
sentence_order_loss = loss_fct(sop_scores.view(-1, 2), sentence_order_label.view(-1)) sentence_order_loss = loss_fct(sop_scores.view(-1, 2), sentence_order_label.view(-1))
total_loss = masked_lm_loss + sentence_order_loss total_loss = masked_lm_loss + sentence_order_loss
if return_tuple: if not return_dict:
output = (prediction_scores, sop_scores) + outputs[2:] output = (prediction_scores, sop_scores) + outputs[2:]
return ((total_loss,) + output) if total_loss is not None else output return ((total_loss,) + output) if total_loss is not None else output
...@@ -808,7 +809,7 @@ class AlbertForMaskedLM(AlbertPreTrainedModel): ...@@ -808,7 +809,7 @@ class AlbertForMaskedLM(AlbertPreTrainedModel):
labels=None, labels=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
**kwargs **kwargs
): ):
r""" r"""
...@@ -827,7 +828,7 @@ class AlbertForMaskedLM(AlbertPreTrainedModel): ...@@ -827,7 +828,7 @@ class AlbertForMaskedLM(AlbertPreTrainedModel):
) )
labels = kwargs.pop("masked_lm_labels") labels = kwargs.pop("masked_lm_labels")
assert kwargs == {}, f"Unexpected keyword arguments: {list(kwargs.keys())}." assert kwargs == {}, f"Unexpected keyword arguments: {list(kwargs.keys())}."
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
outputs = self.albert( outputs = self.albert(
input_ids=input_ids, input_ids=input_ids,
...@@ -838,7 +839,7 @@ class AlbertForMaskedLM(AlbertPreTrainedModel): ...@@ -838,7 +839,7 @@ class AlbertForMaskedLM(AlbertPreTrainedModel):
inputs_embeds=inputs_embeds, inputs_embeds=inputs_embeds,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
sequence_outputs = outputs[0] sequence_outputs = outputs[0]
...@@ -849,7 +850,7 @@ class AlbertForMaskedLM(AlbertPreTrainedModel): ...@@ -849,7 +850,7 @@ class AlbertForMaskedLM(AlbertPreTrainedModel):
loss_fct = CrossEntropyLoss() loss_fct = CrossEntropyLoss()
masked_lm_loss = loss_fct(prediction_scores.view(-1, self.config.vocab_size), labels.view(-1)) masked_lm_loss = loss_fct(prediction_scores.view(-1, self.config.vocab_size), labels.view(-1))
if return_tuple: if not return_dict:
output = (prediction_scores,) + outputs[2:] output = (prediction_scores,) + outputs[2:]
return ((masked_lm_loss,) + output) if masked_lm_loss is not None else output return ((masked_lm_loss,) + output) if masked_lm_loss is not None else output
...@@ -895,7 +896,7 @@ class AlbertForSequenceClassification(AlbertPreTrainedModel): ...@@ -895,7 +896,7 @@ class AlbertForSequenceClassification(AlbertPreTrainedModel):
labels=None, labels=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
): ):
r""" r"""
labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`): labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`):
...@@ -904,7 +905,7 @@ class AlbertForSequenceClassification(AlbertPreTrainedModel): ...@@ -904,7 +905,7 @@ class AlbertForSequenceClassification(AlbertPreTrainedModel):
If ``config.num_labels == 1`` a regression loss is computed (Mean-Square loss), If ``config.num_labels == 1`` a regression loss is computed (Mean-Square loss),
If ``config.num_labels > 1`` a classification loss is computed (Cross-Entropy). If ``config.num_labels > 1`` a classification loss is computed (Cross-Entropy).
""" """
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
outputs = self.albert( outputs = self.albert(
input_ids=input_ids, input_ids=input_ids,
...@@ -915,7 +916,7 @@ class AlbertForSequenceClassification(AlbertPreTrainedModel): ...@@ -915,7 +916,7 @@ class AlbertForSequenceClassification(AlbertPreTrainedModel):
inputs_embeds=inputs_embeds, inputs_embeds=inputs_embeds,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
pooled_output = outputs[1] pooled_output = outputs[1]
...@@ -933,7 +934,7 @@ class AlbertForSequenceClassification(AlbertPreTrainedModel): ...@@ -933,7 +934,7 @@ class AlbertForSequenceClassification(AlbertPreTrainedModel):
loss_fct = CrossEntropyLoss() loss_fct = CrossEntropyLoss()
loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1)) loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
if return_tuple: if not return_dict:
output = (logits,) + outputs[2:] output = (logits,) + outputs[2:]
return ((loss,) + output) if loss is not None else output return ((loss,) + output) if loss is not None else output
...@@ -976,14 +977,14 @@ class AlbertForTokenClassification(AlbertPreTrainedModel): ...@@ -976,14 +977,14 @@ class AlbertForTokenClassification(AlbertPreTrainedModel):
labels=None, labels=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
): ):
r""" r"""
labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`): labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`):
Labels for computing the token classification loss. Labels for computing the token classification loss.
Indices should be in ``[0, ..., config.num_labels - 1]``. Indices should be in ``[0, ..., config.num_labels - 1]``.
""" """
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
outputs = self.albert( outputs = self.albert(
input_ids, input_ids,
...@@ -994,7 +995,7 @@ class AlbertForTokenClassification(AlbertPreTrainedModel): ...@@ -994,7 +995,7 @@ class AlbertForTokenClassification(AlbertPreTrainedModel):
inputs_embeds=inputs_embeds, inputs_embeds=inputs_embeds,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
sequence_output = outputs[0] sequence_output = outputs[0]
...@@ -1014,7 +1015,7 @@ class AlbertForTokenClassification(AlbertPreTrainedModel): ...@@ -1014,7 +1015,7 @@ class AlbertForTokenClassification(AlbertPreTrainedModel):
else: else:
loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1)) loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
if return_tuple: if not return_dict:
output = (logits,) + outputs[2:] output = (logits,) + outputs[2:]
return ((loss,) + output) if loss is not None else output return ((loss,) + output) if loss is not None else output
...@@ -1057,7 +1058,7 @@ class AlbertForQuestionAnswering(AlbertPreTrainedModel): ...@@ -1057,7 +1058,7 @@ class AlbertForQuestionAnswering(AlbertPreTrainedModel):
end_positions=None, end_positions=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
): ):
r""" r"""
start_positions (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`): start_positions (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`):
...@@ -1069,7 +1070,7 @@ class AlbertForQuestionAnswering(AlbertPreTrainedModel): ...@@ -1069,7 +1070,7 @@ class AlbertForQuestionAnswering(AlbertPreTrainedModel):
Positions are clamped to the length of the sequence (`sequence_length`). Positions are clamped to the length of the sequence (`sequence_length`).
Position outside of the sequence are not taken into account for computing the loss. Position outside of the sequence are not taken into account for computing the loss.
""" """
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
outputs = self.albert( outputs = self.albert(
input_ids=input_ids, input_ids=input_ids,
...@@ -1080,7 +1081,7 @@ class AlbertForQuestionAnswering(AlbertPreTrainedModel): ...@@ -1080,7 +1081,7 @@ class AlbertForQuestionAnswering(AlbertPreTrainedModel):
inputs_embeds=inputs_embeds, inputs_embeds=inputs_embeds,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
sequence_output = outputs[0] sequence_output = outputs[0]
...@@ -1107,7 +1108,7 @@ class AlbertForQuestionAnswering(AlbertPreTrainedModel): ...@@ -1107,7 +1108,7 @@ class AlbertForQuestionAnswering(AlbertPreTrainedModel):
end_loss = loss_fct(end_logits, end_positions) end_loss = loss_fct(end_logits, end_positions)
total_loss = (start_loss + end_loss) / 2 total_loss = (start_loss + end_loss) / 2
if return_tuple: if not return_dict:
output = (start_logits, end_logits) + outputs[2:] output = (start_logits, end_logits) + outputs[2:]
return ((total_loss,) + output) if total_loss is not None else output return ((total_loss,) + output) if total_loss is not None else output
...@@ -1153,7 +1154,7 @@ class AlbertForMultipleChoice(AlbertPreTrainedModel): ...@@ -1153,7 +1154,7 @@ class AlbertForMultipleChoice(AlbertPreTrainedModel):
labels=None, labels=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
): ):
r""" r"""
labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`): labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`):
...@@ -1161,7 +1162,7 @@ class AlbertForMultipleChoice(AlbertPreTrainedModel): ...@@ -1161,7 +1162,7 @@ class AlbertForMultipleChoice(AlbertPreTrainedModel):
Indices should be in ``[0, ..., num_choices-1]`` where `num_choices` is the size of the second dimension Indices should be in ``[0, ..., num_choices-1]`` where `num_choices` is the size of the second dimension
of the input tensors. (see `input_ids` above) of the input tensors. (see `input_ids` above)
""" """
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1] num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1]
input_ids = input_ids.view(-1, input_ids.size(-1)) if input_ids is not None else None input_ids = input_ids.view(-1, input_ids.size(-1)) if input_ids is not None else None
...@@ -1182,7 +1183,7 @@ class AlbertForMultipleChoice(AlbertPreTrainedModel): ...@@ -1182,7 +1183,7 @@ class AlbertForMultipleChoice(AlbertPreTrainedModel):
inputs_embeds=inputs_embeds, inputs_embeds=inputs_embeds,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
pooled_output = outputs[1] pooled_output = outputs[1]
...@@ -1196,7 +1197,7 @@ class AlbertForMultipleChoice(AlbertPreTrainedModel): ...@@ -1196,7 +1197,7 @@ class AlbertForMultipleChoice(AlbertPreTrainedModel):
loss_fct = CrossEntropyLoss() loss_fct = CrossEntropyLoss()
loss = loss_fct(reshaped_logits, labels) loss = loss_fct(reshaped_logits, labels)
if return_tuple: if not return_dict:
output = (reshaped_logits,) + outputs[2:] output = (reshaped_logits,) + outputs[2:]
return ((loss,) + output) if loss is not None else output return ((loss,) + output) if loss is not None else output
......
...@@ -124,8 +124,9 @@ BART_INPUTS_DOCSTRING = r""" ...@@ -124,8 +124,9 @@ BART_INPUTS_DOCSTRING = r"""
If set to ``True``, the attentions tensors of all attention layers are returned. See ``attentions`` under returned tensors for more detail. If set to ``True``, the attentions tensors of all attention layers are returned. See ``attentions`` under returned tensors for more detail.
output_hidden_states (:obj:`bool`, `optional`, defaults to :obj:`None`): output_hidden_states (:obj:`bool`, `optional`, defaults to :obj:`None`):
If set to ``True``, the hidden states of all layers are returned. See ``hidden_states`` under returned tensors for more detail. If set to ``True``, the hidden states of all layers are returned. See ``hidden_states`` under returned tensors for more detail.
return_tuple (:obj:`bool`, `optional`, defaults to :obj:`None`): return_dict (:obj:`bool`, `optional`, defaults to :obj:`None`):
If set to ``True``, the output of the model will be a plain tuple instead of a ``dataclass``. If set to ``True``, the model will return a :class:`~transformers.file_utils.ModelOutput` instead of a
plain tuple.
""" """
...@@ -304,7 +305,7 @@ class BartEncoder(nn.Module): ...@@ -304,7 +305,7 @@ class BartEncoder(nn.Module):
self.layer_norm = LayerNorm(config.d_model) if config.normalize_before else None self.layer_norm = LayerNorm(config.d_model) if config.normalize_before else None
def forward( def forward(
self, input_ids, attention_mask=None, output_attentions=False, output_hidden_states=False, return_tuple=False self, input_ids, attention_mask=None, output_attentions=False, output_hidden_states=False, return_dict=False
): ):
""" """
Args: Args:
...@@ -359,7 +360,7 @@ class BartEncoder(nn.Module): ...@@ -359,7 +360,7 @@ class BartEncoder(nn.Module):
# T x B x C -> B x T x C # T x B x C -> B x T x C
x = x.transpose(0, 1) x = x.transpose(0, 1)
if return_tuple: if not return_dict:
return tuple(v for v in [x, encoder_states, all_attentions] if v is not None) return tuple(v for v in [x, encoder_states, all_attentions] if v is not None)
return BaseModelOutput(last_hidden_state=x, hidden_states=encoder_states, attentions=all_attentions) return BaseModelOutput(last_hidden_state=x, hidden_states=encoder_states, attentions=all_attentions)
...@@ -495,7 +496,7 @@ class BartDecoder(nn.Module): ...@@ -495,7 +496,7 @@ class BartDecoder(nn.Module):
use_cache=False, use_cache=False,
output_attentions=False, output_attentions=False,
output_hidden_states=False, output_hidden_states=False,
return_tuple=False, return_dict=False,
**unused, **unused,
): ):
""" """
...@@ -588,7 +589,7 @@ class BartDecoder(nn.Module): ...@@ -588,7 +589,7 @@ class BartDecoder(nn.Module):
else: else:
next_cache = None next_cache = None
if return_tuple: if not return_dict:
return tuple(v for v in [x, next_cache, all_hidden_states, all_self_attns] if v is not None) return tuple(v for v in [x, next_cache, all_hidden_states, all_self_attns] if v is not None)
return BaseModelOutputWithPast( return BaseModelOutputWithPast(
last_hidden_state=x, past_key_values=next_cache, hidden_states=all_hidden_states, attentions=all_self_attns last_hidden_state=x, past_key_values=next_cache, hidden_states=all_hidden_states, attentions=all_self_attns
...@@ -850,7 +851,7 @@ class BartModel(PretrainedBartModel): ...@@ -850,7 +851,7 @@ class BartModel(PretrainedBartModel):
use_cache=None, use_cache=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
**kwargs, **kwargs,
): ):
...@@ -862,7 +863,7 @@ class BartModel(PretrainedBartModel): ...@@ -862,7 +863,7 @@ class BartModel(PretrainedBartModel):
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
) )
use_cache = use_cache if use_cache is not None else self.config.use_cache use_cache = use_cache if use_cache is not None else self.config.use_cache
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
# make masks if user doesn't supply # make masks if user doesn't supply
if not use_cache: if not use_cache:
...@@ -884,10 +885,10 @@ class BartModel(PretrainedBartModel): ...@@ -884,10 +885,10 @@ class BartModel(PretrainedBartModel):
attention_mask=attention_mask, attention_mask=attention_mask,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
# If the user passed a tuple for encoder_outputs, we wrap it in a BaseModelOuput when return_tuple=False # If the user passed a tuple for encoder_outputs, we wrap it in a BaseModelOuput when return_dict=False
elif not return_tuple and not isinstance(encoder_outputs, BaseModelOutput): elif return_dict and not isinstance(encoder_outputs, BaseModelOutput):
encoder_outputs = BaseModelOutput( encoder_outputs = BaseModelOutput(
last_hidden_state=encoder_outputs[0], last_hidden_state=encoder_outputs[0],
hidden_states=encoder_outputs[1] if len(encoder_outputs) > 1 else None, hidden_states=encoder_outputs[1] if len(encoder_outputs) > 1 else None,
...@@ -905,10 +906,10 @@ class BartModel(PretrainedBartModel): ...@@ -905,10 +906,10 @@ class BartModel(PretrainedBartModel):
use_cache=use_cache, use_cache=use_cache,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
if return_tuple: if not return_dict:
return decoder_outputs + encoder_outputs return decoder_outputs + encoder_outputs
return Seq2SeqModelOutput( return Seq2SeqModelOutput(
...@@ -976,7 +977,7 @@ class BartForConditionalGeneration(PretrainedBartModel): ...@@ -976,7 +977,7 @@ class BartForConditionalGeneration(PretrainedBartModel):
use_cache=None, use_cache=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
**unused, **unused,
): ):
r""" r"""
...@@ -1018,7 +1019,7 @@ class BartForConditionalGeneration(PretrainedBartModel): ...@@ -1018,7 +1019,7 @@ class BartForConditionalGeneration(PretrainedBartModel):
FutureWarning, FutureWarning,
) )
decoder_past_key_values = unused.pop("decoder_cached_states") decoder_past_key_values = unused.pop("decoder_cached_states")
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
if labels is not None: if labels is not None:
use_cache = False use_cache = False
...@@ -1033,7 +1034,7 @@ class BartForConditionalGeneration(PretrainedBartModel): ...@@ -1033,7 +1034,7 @@ class BartForConditionalGeneration(PretrainedBartModel):
use_cache=use_cache, use_cache=use_cache,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
lm_logits = F.linear(outputs[0], self.model.shared.weight, bias=self.final_logits_bias) lm_logits = F.linear(outputs[0], self.model.shared.weight, bias=self.final_logits_bias)
...@@ -1043,7 +1044,7 @@ class BartForConditionalGeneration(PretrainedBartModel): ...@@ -1043,7 +1044,7 @@ class BartForConditionalGeneration(PretrainedBartModel):
# TODO(SS): do we need to ignore pad tokens in labels? # TODO(SS): do we need to ignore pad tokens in labels?
masked_lm_loss = loss_fct(lm_logits.view(-1, self.config.vocab_size), labels.view(-1)) masked_lm_loss = loss_fct(lm_logits.view(-1, self.config.vocab_size), labels.view(-1))
if return_tuple: if not return_dict:
output = (lm_logits,) + outputs[1:] output = (lm_logits,) + outputs[1:]
return ((masked_lm_loss,) + output) if masked_lm_loss is not None else output return ((masked_lm_loss,) + output) if masked_lm_loss is not None else output
...@@ -1146,7 +1147,7 @@ class BartForSequenceClassification(PretrainedBartModel): ...@@ -1146,7 +1147,7 @@ class BartForSequenceClassification(PretrainedBartModel):
use_cache=None, use_cache=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
): ):
r""" r"""
labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`): labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`):
...@@ -1154,7 +1155,7 @@ class BartForSequenceClassification(PretrainedBartModel): ...@@ -1154,7 +1155,7 @@ class BartForSequenceClassification(PretrainedBartModel):
Indices should be in :obj:`[0, ..., config.num_labels - 1]`. Indices should be in :obj:`[0, ..., config.num_labels - 1]`.
If :obj:`config.num_labels > 1` a classification loss is computed (Cross-Entropy). If :obj:`config.num_labels > 1` a classification loss is computed (Cross-Entropy).
""" """
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
if labels is not None: if labels is not None:
use_cache = False use_cache = False
...@@ -1167,7 +1168,7 @@ class BartForSequenceClassification(PretrainedBartModel): ...@@ -1167,7 +1168,7 @@ class BartForSequenceClassification(PretrainedBartModel):
use_cache=use_cache, use_cache=use_cache,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
x = outputs[0] # last hidden state x = outputs[0] # last hidden state
eos_mask = input_ids.eq(self.config.eos_token_id) eos_mask = input_ids.eq(self.config.eos_token_id)
...@@ -1180,7 +1181,7 @@ class BartForSequenceClassification(PretrainedBartModel): ...@@ -1180,7 +1181,7 @@ class BartForSequenceClassification(PretrainedBartModel):
if labels is not None: if labels is not None:
loss = F.cross_entropy(logits.view(-1, self.config.num_labels), labels.view(-1)) loss = F.cross_entropy(logits.view(-1, self.config.num_labels), labels.view(-1))
if return_tuple: if not return_dict:
output = (logits,) + outputs[1:] output = (logits,) + outputs[1:]
return ((loss,) + output) if loss is not None else output return ((loss,) + output) if loss is not None else output
...@@ -1232,7 +1233,7 @@ class BartForQuestionAnswering(PretrainedBartModel): ...@@ -1232,7 +1233,7 @@ class BartForQuestionAnswering(PretrainedBartModel):
use_cache=None, use_cache=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
): ):
r""" r"""
start_positions (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`): start_positions (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`):
...@@ -1244,7 +1245,7 @@ class BartForQuestionAnswering(PretrainedBartModel): ...@@ -1244,7 +1245,7 @@ class BartForQuestionAnswering(PretrainedBartModel):
Positions are clamped to the length of the sequence (`sequence_length`). Positions are clamped to the length of the sequence (`sequence_length`).
Position outside of the sequence are not taken into account for computing the loss. Position outside of the sequence are not taken into account for computing the loss.
""" """
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
if start_positions is not None and end_positions is not None: if start_positions is not None and end_positions is not None:
use_cache = False use_cache = False
...@@ -1257,7 +1258,7 @@ class BartForQuestionAnswering(PretrainedBartModel): ...@@ -1257,7 +1258,7 @@ class BartForQuestionAnswering(PretrainedBartModel):
use_cache=use_cache, use_cache=use_cache,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
sequence_output = outputs[0] sequence_output = outputs[0]
...@@ -1284,7 +1285,7 @@ class BartForQuestionAnswering(PretrainedBartModel): ...@@ -1284,7 +1285,7 @@ class BartForQuestionAnswering(PretrainedBartModel):
end_loss = loss_fct(end_logits, end_positions) end_loss = loss_fct(end_logits, end_positions)
total_loss = (start_loss + end_loss) / 2 total_loss = (start_loss + end_loss) / 2
if return_tuple: if not return_dict:
output = (start_logits, end_logits,) + outputs[1:] output = (start_logits, end_logits,) + outputs[1:]
return ((total_loss,) + output) if total_loss is not None else output return ((total_loss,) + output) if total_loss is not None else output
......
...@@ -429,7 +429,7 @@ class BertEncoder(nn.Module): ...@@ -429,7 +429,7 @@ class BertEncoder(nn.Module):
encoder_attention_mask=None, encoder_attention_mask=None,
output_attentions=False, output_attentions=False,
output_hidden_states=False, output_hidden_states=False,
return_tuple=False, return_dict=False,
): ):
all_hidden_states = () if output_hidden_states else None all_hidden_states = () if output_hidden_states else None
all_attentions = () if output_attentions else None all_attentions = () if output_attentions else None
...@@ -469,7 +469,7 @@ class BertEncoder(nn.Module): ...@@ -469,7 +469,7 @@ class BertEncoder(nn.Module):
if output_hidden_states: if output_hidden_states:
all_hidden_states = all_hidden_states + (hidden_states,) all_hidden_states = all_hidden_states + (hidden_states,)
if return_tuple: if not return_dict:
return tuple(v for v in [hidden_states, all_hidden_states, all_attentions] if v is not None) return tuple(v for v in [hidden_states, all_hidden_states, all_attentions] if v is not None)
return BaseModelOutput( return BaseModelOutput(
last_hidden_state=hidden_states, hidden_states=all_hidden_states, attentions=all_attentions last_hidden_state=hidden_states, hidden_states=all_hidden_states, attentions=all_attentions
...@@ -609,9 +609,9 @@ class BertForPretrainingOutput(ModelOutput): ...@@ -609,9 +609,9 @@ class BertForPretrainingOutput(ModelOutput):
heads. heads.
""" """
loss: Optional[torch.FloatTensor] loss: Optional[torch.FloatTensor] = None
prediction_logits: torch.FloatTensor prediction_logits: torch.FloatTensor = None
seq_relationship_logits: torch.FloatTensor seq_relationship_logits: torch.FloatTensor = None
hidden_states: Optional[Tuple[torch.FloatTensor]] = None hidden_states: Optional[Tuple[torch.FloatTensor]] = None
attentions: Optional[Tuple[torch.FloatTensor]] = None attentions: Optional[Tuple[torch.FloatTensor]] = None
...@@ -674,8 +674,9 @@ BERT_INPUTS_DOCSTRING = r""" ...@@ -674,8 +674,9 @@ BERT_INPUTS_DOCSTRING = r"""
If set to ``True``, the attentions tensors of all attention layers are returned. See ``attentions`` under returned tensors for more detail. If set to ``True``, the attentions tensors of all attention layers are returned. See ``attentions`` under returned tensors for more detail.
output_hidden_states (:obj:`bool`, `optional`, defaults to :obj:`None`): output_hidden_states (:obj:`bool`, `optional`, defaults to :obj:`None`):
If set to ``True``, the hidden states of all layers are returned. See ``hidden_states`` under returned tensors for more detail. If set to ``True``, the hidden states of all layers are returned. See ``hidden_states`` under returned tensors for more detail.
return_tuple (:obj:`bool`, `optional`, defaults to :obj:`None`): return_dict (:obj:`bool`, `optional`, defaults to :obj:`None`):
If set to ``True``, the output of the model will be a plain tuple instead of a ``dataclass``. If set to ``True``, the model will return a :class:`~transformers.file_utils.ModelOutput` instead of a
plain tuple.
""" """
...@@ -743,13 +744,13 @@ class BertModel(BertPreTrainedModel): ...@@ -743,13 +744,13 @@ class BertModel(BertPreTrainedModel):
encoder_attention_mask=None, encoder_attention_mask=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
): ):
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
output_hidden_states = ( output_hidden_states = (
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
) )
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
if input_ids is not None and inputs_embeds is not None: if input_ids is not None and inputs_embeds is not None:
raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time") raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
...@@ -800,12 +801,12 @@ class BertModel(BertPreTrainedModel): ...@@ -800,12 +801,12 @@ class BertModel(BertPreTrainedModel):
encoder_attention_mask=encoder_extended_attention_mask, encoder_attention_mask=encoder_extended_attention_mask,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
sequence_output = encoder_outputs[0] sequence_output = encoder_outputs[0]
pooled_output = self.pooler(sequence_output) pooled_output = self.pooler(sequence_output)
if return_tuple: if not return_dict:
return (sequence_output, pooled_output) + encoder_outputs[1:] return (sequence_output, pooled_output) + encoder_outputs[1:]
return BaseModelOutputWithPooling( return BaseModelOutputWithPooling(
...@@ -847,7 +848,7 @@ class BertForPreTraining(BertPreTrainedModel): ...@@ -847,7 +848,7 @@ class BertForPreTraining(BertPreTrainedModel):
next_sentence_label=None, next_sentence_label=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
**kwargs **kwargs
): ):
r""" r"""
...@@ -872,7 +873,7 @@ class BertForPreTraining(BertPreTrainedModel): ...@@ -872,7 +873,7 @@ class BertForPreTraining(BertPreTrainedModel):
>>> import torch >>> import torch
>>> tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') >>> tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
>>> model = BertForPreTraining.from_pretrained('bert-base-uncased') >>> model = BertForPreTraining.from_pretrained('bert-base-uncased', return_dict=True)
>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> outputs = model(**inputs) >>> outputs = model(**inputs)
...@@ -887,7 +888,7 @@ class BertForPreTraining(BertPreTrainedModel): ...@@ -887,7 +888,7 @@ class BertForPreTraining(BertPreTrainedModel):
) )
labels = kwargs.pop("masked_lm_labels") labels = kwargs.pop("masked_lm_labels")
assert kwargs == {}, f"Unexpected keyword arguments: {list(kwargs.keys())}." assert kwargs == {}, f"Unexpected keyword arguments: {list(kwargs.keys())}."
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
outputs = self.bert( outputs = self.bert(
input_ids, input_ids,
...@@ -898,7 +899,7 @@ class BertForPreTraining(BertPreTrainedModel): ...@@ -898,7 +899,7 @@ class BertForPreTraining(BertPreTrainedModel):
inputs_embeds=inputs_embeds, inputs_embeds=inputs_embeds,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
sequence_output, pooled_output = outputs[:2] sequence_output, pooled_output = outputs[:2]
...@@ -911,7 +912,7 @@ class BertForPreTraining(BertPreTrainedModel): ...@@ -911,7 +912,7 @@ class BertForPreTraining(BertPreTrainedModel):
next_sentence_loss = loss_fct(seq_relationship_score.view(-1, 2), next_sentence_label.view(-1)) next_sentence_loss = loss_fct(seq_relationship_score.view(-1, 2), next_sentence_label.view(-1))
total_loss = masked_lm_loss + next_sentence_loss total_loss = masked_lm_loss + next_sentence_loss
if return_tuple: if not return_dict:
output = (prediction_scores, seq_relationship_score) + outputs[2:] output = (prediction_scores, seq_relationship_score) + outputs[2:]
return ((total_loss,) + output) if total_loss is not None else output return ((total_loss,) + output) if total_loss is not None else output
...@@ -955,7 +956,7 @@ class BertLMHeadModel(BertPreTrainedModel): ...@@ -955,7 +956,7 @@ class BertLMHeadModel(BertPreTrainedModel):
encoder_attention_mask=None, encoder_attention_mask=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
**kwargs **kwargs
): ):
r""" r"""
...@@ -977,14 +978,14 @@ class BertLMHeadModel(BertPreTrainedModel): ...@@ -977,14 +978,14 @@ class BertLMHeadModel(BertPreTrainedModel):
>>> tokenizer = BertTokenizer.from_pretrained('bert-base-cased') >>> tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
>>> config = BertConfig.from_pretrained("bert-base-cased") >>> config = BertConfig.from_pretrained("bert-base-cased")
>>> config.is_decoder = True >>> config.is_decoder = True
>>> model = BertLMHeadModel.from_pretrained('bert-base-cased', config=config) >>> model = BertLMHeadModel.from_pretrained('bert-base-cased', config=config, return_dict=True)
>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> outputs = model(**inputs) >>> outputs = model(**inputs)
>>> prediction_logits = outputs.logits >>> prediction_logits = outputs.logits
""" """
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
outputs = self.bert( outputs = self.bert(
input_ids, input_ids,
...@@ -997,7 +998,7 @@ class BertLMHeadModel(BertPreTrainedModel): ...@@ -997,7 +998,7 @@ class BertLMHeadModel(BertPreTrainedModel):
encoder_attention_mask=encoder_attention_mask, encoder_attention_mask=encoder_attention_mask,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
sequence_output = outputs[0] sequence_output = outputs[0]
...@@ -1011,7 +1012,7 @@ class BertLMHeadModel(BertPreTrainedModel): ...@@ -1011,7 +1012,7 @@ class BertLMHeadModel(BertPreTrainedModel):
loss_fct = CrossEntropyLoss() loss_fct = CrossEntropyLoss()
lm_loss = loss_fct(shifted_prediction_scores.view(-1, self.config.vocab_size), labels.view(-1)) lm_loss = loss_fct(shifted_prediction_scores.view(-1, self.config.vocab_size), labels.view(-1))
if return_tuple: if not return_dict:
output = (prediction_scores,) + outputs[2:] output = (prediction_scores,) + outputs[2:]
return ((lm_loss,) + output) if lm_loss is not None else output return ((lm_loss,) + output) if lm_loss is not None else output
...@@ -1065,7 +1066,7 @@ class BertForMaskedLM(BertPreTrainedModel): ...@@ -1065,7 +1066,7 @@ class BertForMaskedLM(BertPreTrainedModel):
encoder_attention_mask=None, encoder_attention_mask=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
**kwargs **kwargs
): ):
r""" r"""
...@@ -1086,7 +1087,7 @@ class BertForMaskedLM(BertPreTrainedModel): ...@@ -1086,7 +1087,7 @@ class BertForMaskedLM(BertPreTrainedModel):
assert "lm_labels" not in kwargs, "Use `BertWithLMHead` for autoregressive language modeling task." assert "lm_labels" not in kwargs, "Use `BertWithLMHead` for autoregressive language modeling task."
assert kwargs == {}, f"Unexpected keyword arguments: {list(kwargs.keys())}." assert kwargs == {}, f"Unexpected keyword arguments: {list(kwargs.keys())}."
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
outputs = self.bert( outputs = self.bert(
input_ids, input_ids,
...@@ -1099,7 +1100,7 @@ class BertForMaskedLM(BertPreTrainedModel): ...@@ -1099,7 +1100,7 @@ class BertForMaskedLM(BertPreTrainedModel):
encoder_attention_mask=encoder_attention_mask, encoder_attention_mask=encoder_attention_mask,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
sequence_output = outputs[0] sequence_output = outputs[0]
...@@ -1110,7 +1111,7 @@ class BertForMaskedLM(BertPreTrainedModel): ...@@ -1110,7 +1111,7 @@ class BertForMaskedLM(BertPreTrainedModel):
loss_fct = CrossEntropyLoss() # -100 index = padding token loss_fct = CrossEntropyLoss() # -100 index = padding token
masked_lm_loss = loss_fct(prediction_scores.view(-1, self.config.vocab_size), labels.view(-1)) masked_lm_loss = loss_fct(prediction_scores.view(-1, self.config.vocab_size), labels.view(-1))
if return_tuple: if not return_dict:
output = (prediction_scores,) + outputs[2:] output = (prediction_scores,) + outputs[2:]
return ((masked_lm_loss,) + output) if masked_lm_loss is not None else output return ((masked_lm_loss,) + output) if masked_lm_loss is not None else output
...@@ -1161,7 +1162,7 @@ class BertForNextSentencePrediction(BertPreTrainedModel): ...@@ -1161,7 +1162,7 @@ class BertForNextSentencePrediction(BertPreTrainedModel):
next_sentence_label=None, next_sentence_label=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
): ):
r""" r"""
next_sentence_label (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`): next_sentence_label (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`):
...@@ -1178,7 +1179,7 @@ class BertForNextSentencePrediction(BertPreTrainedModel): ...@@ -1178,7 +1179,7 @@ class BertForNextSentencePrediction(BertPreTrainedModel):
>>> import torch >>> import torch
>>> tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') >>> tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
>>> model = BertForNextSentencePrediction.from_pretrained('bert-base-uncased') >>> model = BertForNextSentencePrediction.from_pretrained('bert-base-uncased', return_dict=True)
>>> prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced." >>> prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced."
>>> next_sentence = "The sky is blue due to the shorter wavelength of blue light." >>> next_sentence = "The sky is blue due to the shorter wavelength of blue light."
...@@ -1188,7 +1189,7 @@ class BertForNextSentencePrediction(BertPreTrainedModel): ...@@ -1188,7 +1189,7 @@ class BertForNextSentencePrediction(BertPreTrainedModel):
>>> logits = outputs.logits >>> logits = outputs.logits
>>> assert logits[0, 0] < logits[0, 1] # next sentence was random >>> assert logits[0, 0] < logits[0, 1] # next sentence was random
""" """
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
outputs = self.bert( outputs = self.bert(
input_ids, input_ids,
...@@ -1199,7 +1200,7 @@ class BertForNextSentencePrediction(BertPreTrainedModel): ...@@ -1199,7 +1200,7 @@ class BertForNextSentencePrediction(BertPreTrainedModel):
inputs_embeds=inputs_embeds, inputs_embeds=inputs_embeds,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
pooled_output = outputs[1] pooled_output = outputs[1]
...@@ -1211,7 +1212,7 @@ class BertForNextSentencePrediction(BertPreTrainedModel): ...@@ -1211,7 +1212,7 @@ class BertForNextSentencePrediction(BertPreTrainedModel):
loss_fct = CrossEntropyLoss() loss_fct = CrossEntropyLoss()
next_sentence_loss = loss_fct(seq_relationship_scores.view(-1, 2), next_sentence_label.view(-1)) next_sentence_loss = loss_fct(seq_relationship_scores.view(-1, 2), next_sentence_label.view(-1))
if return_tuple: if not return_dict:
output = (seq_relationship_scores,) + outputs[2:] output = (seq_relationship_scores,) + outputs[2:]
return ((next_sentence_loss,) + output) if next_sentence_loss is not None else output return ((next_sentence_loss,) + output) if next_sentence_loss is not None else output
...@@ -1257,7 +1258,7 @@ class BertForSequenceClassification(BertPreTrainedModel): ...@@ -1257,7 +1258,7 @@ class BertForSequenceClassification(BertPreTrainedModel):
labels=None, labels=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
): ):
r""" r"""
labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`): labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`):
...@@ -1266,7 +1267,7 @@ class BertForSequenceClassification(BertPreTrainedModel): ...@@ -1266,7 +1267,7 @@ class BertForSequenceClassification(BertPreTrainedModel):
If :obj:`config.num_labels == 1` a regression loss is computed (Mean-Square loss), If :obj:`config.num_labels == 1` a regression loss is computed (Mean-Square loss),
If :obj:`config.num_labels > 1` a classification loss is computed (Cross-Entropy). If :obj:`config.num_labels > 1` a classification loss is computed (Cross-Entropy).
""" """
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
outputs = self.bert( outputs = self.bert(
input_ids, input_ids,
...@@ -1277,7 +1278,7 @@ class BertForSequenceClassification(BertPreTrainedModel): ...@@ -1277,7 +1278,7 @@ class BertForSequenceClassification(BertPreTrainedModel):
inputs_embeds=inputs_embeds, inputs_embeds=inputs_embeds,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
pooled_output = outputs[1] pooled_output = outputs[1]
...@@ -1295,7 +1296,7 @@ class BertForSequenceClassification(BertPreTrainedModel): ...@@ -1295,7 +1296,7 @@ class BertForSequenceClassification(BertPreTrainedModel):
loss_fct = CrossEntropyLoss() loss_fct = CrossEntropyLoss()
loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1)) loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
if return_tuple: if not return_dict:
output = (logits,) + outputs[2:] output = (logits,) + outputs[2:]
return ((loss,) + output) if loss is not None else output return ((loss,) + output) if loss is not None else output
...@@ -1337,7 +1338,7 @@ class BertForMultipleChoice(BertPreTrainedModel): ...@@ -1337,7 +1338,7 @@ class BertForMultipleChoice(BertPreTrainedModel):
labels=None, labels=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
): ):
r""" r"""
labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`): labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`):
...@@ -1345,7 +1346,7 @@ class BertForMultipleChoice(BertPreTrainedModel): ...@@ -1345,7 +1346,7 @@ class BertForMultipleChoice(BertPreTrainedModel):
Indices should be in ``[0, ..., num_choices-1]`` where `num_choices` is the size of the second dimension Indices should be in ``[0, ..., num_choices-1]`` where `num_choices` is the size of the second dimension
of the input tensors. (see `input_ids` above) of the input tensors. (see `input_ids` above)
""" """
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1] num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1]
input_ids = input_ids.view(-1, input_ids.size(-1)) if input_ids is not None else None input_ids = input_ids.view(-1, input_ids.size(-1)) if input_ids is not None else None
...@@ -1367,7 +1368,7 @@ class BertForMultipleChoice(BertPreTrainedModel): ...@@ -1367,7 +1368,7 @@ class BertForMultipleChoice(BertPreTrainedModel):
inputs_embeds=inputs_embeds, inputs_embeds=inputs_embeds,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
pooled_output = outputs[1] pooled_output = outputs[1]
...@@ -1381,7 +1382,7 @@ class BertForMultipleChoice(BertPreTrainedModel): ...@@ -1381,7 +1382,7 @@ class BertForMultipleChoice(BertPreTrainedModel):
loss_fct = CrossEntropyLoss() loss_fct = CrossEntropyLoss()
loss = loss_fct(reshaped_logits, labels) loss = loss_fct(reshaped_logits, labels)
if return_tuple: if not return_dict:
output = (reshaped_logits,) + outputs[2:] output = (reshaped_logits,) + outputs[2:]
return ((loss,) + output) if loss is not None else output return ((loss,) + output) if loss is not None else output
...@@ -1424,14 +1425,14 @@ class BertForTokenClassification(BertPreTrainedModel): ...@@ -1424,14 +1425,14 @@ class BertForTokenClassification(BertPreTrainedModel):
labels=None, labels=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
): ):
r""" r"""
labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`): labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`):
Labels for computing the token classification loss. Labels for computing the token classification loss.
Indices should be in ``[0, ..., config.num_labels - 1]``. Indices should be in ``[0, ..., config.num_labels - 1]``.
""" """
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
outputs = self.bert( outputs = self.bert(
input_ids, input_ids,
...@@ -1442,7 +1443,7 @@ class BertForTokenClassification(BertPreTrainedModel): ...@@ -1442,7 +1443,7 @@ class BertForTokenClassification(BertPreTrainedModel):
inputs_embeds=inputs_embeds, inputs_embeds=inputs_embeds,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
sequence_output = outputs[0] sequence_output = outputs[0]
...@@ -1464,7 +1465,7 @@ class BertForTokenClassification(BertPreTrainedModel): ...@@ -1464,7 +1465,7 @@ class BertForTokenClassification(BertPreTrainedModel):
else: else:
loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1)) loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
if return_tuple: if not return_dict:
output = (logits,) + outputs[2:] output = (logits,) + outputs[2:]
return ((loss,) + output) if loss is not None else output return ((loss,) + output) if loss is not None else output
...@@ -1507,7 +1508,7 @@ class BertForQuestionAnswering(BertPreTrainedModel): ...@@ -1507,7 +1508,7 @@ class BertForQuestionAnswering(BertPreTrainedModel):
end_positions=None, end_positions=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
): ):
r""" r"""
start_positions (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`): start_positions (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`):
...@@ -1519,7 +1520,7 @@ class BertForQuestionAnswering(BertPreTrainedModel): ...@@ -1519,7 +1520,7 @@ class BertForQuestionAnswering(BertPreTrainedModel):
Positions are clamped to the length of the sequence (`sequence_length`). Positions are clamped to the length of the sequence (`sequence_length`).
Position outside of the sequence are not taken into account for computing the loss. Position outside of the sequence are not taken into account for computing the loss.
""" """
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
outputs = self.bert( outputs = self.bert(
input_ids, input_ids,
...@@ -1530,7 +1531,7 @@ class BertForQuestionAnswering(BertPreTrainedModel): ...@@ -1530,7 +1531,7 @@ class BertForQuestionAnswering(BertPreTrainedModel):
inputs_embeds=inputs_embeds, inputs_embeds=inputs_embeds,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
sequence_output = outputs[0] sequence_output = outputs[0]
...@@ -1557,7 +1558,7 @@ class BertForQuestionAnswering(BertPreTrainedModel): ...@@ -1557,7 +1558,7 @@ class BertForQuestionAnswering(BertPreTrainedModel):
end_loss = loss_fct(end_logits, end_positions) end_loss = loss_fct(end_logits, end_positions)
total_loss = (start_loss + end_loss) / 2 total_loss = (start_loss + end_loss) / 2
if return_tuple: if not return_dict:
output = (start_logits, end_logits) + outputs[2:] output = (start_logits, end_logits) + outputs[2:]
return ((total_loss,) + output) if total_loss is not None else output return ((total_loss,) + output) if total_loss is not None else output
......
...@@ -51,12 +51,6 @@ CAMEMBERT_START_DOCSTRING = r""" ...@@ -51,12 +51,6 @@ CAMEMBERT_START_DOCSTRING = r"""
model. Initializing with a config file does not load the weights associated with the model, only the model. Initializing with a config file does not load the weights associated with the model, only the
configuration. configuration.
Check out the :meth:`~transformers.PreTrainedModel.from_pretrained` method to load the model weights. Check out the :meth:`~transformers.PreTrainedModel.from_pretrained` method to load the model weights.
output_attentions (:obj:`bool`, `optional`, defaults to :obj:`None`):
If set to ``True``, the attentions tensors of all attention layers are returned. See ``attentions`` under returned tensors for more detail.
output_hidden_states (:obj:`bool`, `optional`, defaults to :obj:`None`):
If set to ``True``, the hidden states of all layers are returned. See ``hidden_states`` under returned tensors for more detail.
return_tuple (:obj:`bool`, `optional`, defaults to :obj:`None`):
If set to ``True``, the output of the model will be a plain tuple instead of a ``dataclass``.
""" """
......
...@@ -295,8 +295,9 @@ CTRL_INPUTS_DOCSTRING = r""" ...@@ -295,8 +295,9 @@ CTRL_INPUTS_DOCSTRING = r"""
If set to ``True``, the attentions tensors of all attention layers are returned. See ``attentions`` under returned tensors for more detail. If set to ``True``, the attentions tensors of all attention layers are returned. See ``attentions`` under returned tensors for more detail.
output_hidden_states (:obj:`bool`, `optional`, defaults to :obj:`None`): output_hidden_states (:obj:`bool`, `optional`, defaults to :obj:`None`):
If set to ``True``, the hidden states of all layers are returned. See ``hidden_states`` under returned tensors for more detail. If set to ``True``, the hidden states of all layers are returned. See ``hidden_states`` under returned tensors for more detail.
return_tuple (:obj:`bool`, `optional`, defaults to :obj:`None`): return_dict (:obj:`bool`, `optional`, defaults to :obj:`None`):
If set to ``True``, the output of the model will be a plain tuple instead of a ``dataclass``. If set to ``True``, the model will return a :class:`~transformers.file_utils.ModelOutput` instead of a
plain tuple.
""" """
...@@ -355,7 +356,7 @@ class CTRLModel(CTRLPreTrainedModel): ...@@ -355,7 +356,7 @@ class CTRLModel(CTRLPreTrainedModel):
use_cache=None, use_cache=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
**kwargs, **kwargs,
): ):
if "past" in kwargs: if "past" in kwargs:
...@@ -371,7 +372,7 @@ class CTRLModel(CTRLPreTrainedModel): ...@@ -371,7 +372,7 @@ class CTRLModel(CTRLPreTrainedModel):
output_hidden_states = ( output_hidden_states = (
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
) )
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
if input_ids is not None and inputs_embeds is not None: if input_ids is not None and inputs_embeds is not None:
raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time") raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
...@@ -472,7 +473,7 @@ class CTRLModel(CTRLPreTrainedModel): ...@@ -472,7 +473,7 @@ class CTRLModel(CTRLPreTrainedModel):
attention_output_shape = input_shape[:-1] + (-1,) + all_attentions[0].shape[-2:] attention_output_shape = input_shape[:-1] + (-1,) + all_attentions[0].shape[-2:]
all_attentions = tuple(t.view(*attention_output_shape) for t in all_attentions) all_attentions = tuple(t.view(*attention_output_shape) for t in all_attentions)
if return_tuple: if not return_dict:
return tuple(v for v in [hidden_states, presents, all_hidden_states, all_attentions] if v is not None) return tuple(v for v in [hidden_states, presents, all_hidden_states, all_attentions] if v is not None)
return BaseModelOutputWithPast( return BaseModelOutputWithPast(
...@@ -526,7 +527,7 @@ class CTRLLMHeadModel(CTRLPreTrainedModel): ...@@ -526,7 +527,7 @@ class CTRLLMHeadModel(CTRLPreTrainedModel):
use_cache=None, use_cache=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
**kwargs, **kwargs,
): ):
r""" r"""
...@@ -544,7 +545,7 @@ class CTRLLMHeadModel(CTRLPreTrainedModel): ...@@ -544,7 +545,7 @@ class CTRLLMHeadModel(CTRLPreTrainedModel):
) )
past_key_values = kwargs.pop("past") past_key_values = kwargs.pop("past")
assert kwargs == {}, f"Unexpected keyword arguments: {list(kwargs.keys())}." assert kwargs == {}, f"Unexpected keyword arguments: {list(kwargs.keys())}."
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
transformer_outputs = self.transformer( transformer_outputs = self.transformer(
input_ids, input_ids,
...@@ -557,7 +558,7 @@ class CTRLLMHeadModel(CTRLPreTrainedModel): ...@@ -557,7 +558,7 @@ class CTRLLMHeadModel(CTRLPreTrainedModel):
use_cache=use_cache, use_cache=use_cache,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
hidden_states = transformer_outputs[0] hidden_states = transformer_outputs[0]
...@@ -573,7 +574,7 @@ class CTRLLMHeadModel(CTRLPreTrainedModel): ...@@ -573,7 +574,7 @@ class CTRLLMHeadModel(CTRLPreTrainedModel):
loss_fct = CrossEntropyLoss() loss_fct = CrossEntropyLoss()
loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1)) loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))
if return_tuple: if not return_dict:
output = (lm_logits,) + transformer_outputs[1:] output = (lm_logits,) + transformer_outputs[1:]
return ((loss,) + output) if loss is not None else output return ((loss,) + output) if loss is not None else output
......
...@@ -279,7 +279,7 @@ class Transformer(nn.Module): ...@@ -279,7 +279,7 @@ class Transformer(nn.Module):
self.layer = nn.ModuleList([copy.deepcopy(layer) for _ in range(config.n_layers)]) self.layer = nn.ModuleList([copy.deepcopy(layer) for _ in range(config.n_layers)])
def forward( def forward(
self, x, attn_mask=None, head_mask=None, output_attentions=False, output_hidden_states=False, return_tuple=None self, x, attn_mask=None, head_mask=None, output_attentions=False, output_hidden_states=False, return_dict=None
): ):
""" """
Parameters Parameters
...@@ -324,7 +324,7 @@ class Transformer(nn.Module): ...@@ -324,7 +324,7 @@ class Transformer(nn.Module):
if output_hidden_states: if output_hidden_states:
all_hidden_states = all_hidden_states + (hidden_state,) all_hidden_states = all_hidden_states + (hidden_state,)
if return_tuple: if not return_dict:
return tuple(v for v in [hidden_state, all_hidden_states, all_attentions] if v is not None) return tuple(v for v in [hidden_state, all_hidden_states, all_attentions] if v is not None)
return BaseModelOutput( return BaseModelOutput(
last_hidden_state=hidden_state, hidden_states=all_hidden_states, attentions=all_attentions last_hidden_state=hidden_state, hidden_states=all_hidden_states, attentions=all_attentions
...@@ -396,8 +396,9 @@ DISTILBERT_INPUTS_DOCSTRING = r""" ...@@ -396,8 +396,9 @@ DISTILBERT_INPUTS_DOCSTRING = r"""
If set to ``True``, the attentions tensors of all attention layers are returned. See ``attentions`` under returned tensors for more detail. If set to ``True``, the attentions tensors of all attention layers are returned. See ``attentions`` under returned tensors for more detail.
output_hidden_states (:obj:`bool`, `optional`, defaults to :obj:`None`): output_hidden_states (:obj:`bool`, `optional`, defaults to :obj:`None`):
If set to ``True``, the hidden states of all layers are returned. See ``hidden_states`` under returned tensors for more detail. If set to ``True``, the hidden states of all layers are returned. See ``hidden_states`` under returned tensors for more detail.
return_tuple (:obj:`bool`, `optional`, defaults to :obj:`None`): return_dict (:obj:`bool`, `optional`, defaults to :obj:`None`):
If set to ``True``, the output of the model will be a plain tuple instead of a ``dataclass``. If set to ``True``, the model will return a :class:`~transformers.file_utils.ModelOutput` instead of a
plain tuple.
""" """
...@@ -444,13 +445,13 @@ class DistilBertModel(DistilBertPreTrainedModel): ...@@ -444,13 +445,13 @@ class DistilBertModel(DistilBertPreTrainedModel):
inputs_embeds=None, inputs_embeds=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
): ):
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
output_hidden_states = ( output_hidden_states = (
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
) )
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
if input_ids is not None and inputs_embeds is not None: if input_ids is not None and inputs_embeds is not None:
raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time") raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
...@@ -477,7 +478,7 @@ class DistilBertModel(DistilBertPreTrainedModel): ...@@ -477,7 +478,7 @@ class DistilBertModel(DistilBertPreTrainedModel):
head_mask=head_mask, head_mask=head_mask,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
...@@ -516,7 +517,7 @@ class DistilBertForMaskedLM(DistilBertPreTrainedModel): ...@@ -516,7 +517,7 @@ class DistilBertForMaskedLM(DistilBertPreTrainedModel):
labels=None, labels=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
**kwargs **kwargs
): ):
r""" r"""
...@@ -535,7 +536,7 @@ class DistilBertForMaskedLM(DistilBertPreTrainedModel): ...@@ -535,7 +536,7 @@ class DistilBertForMaskedLM(DistilBertPreTrainedModel):
) )
labels = kwargs.pop("masked_lm_labels") labels = kwargs.pop("masked_lm_labels")
assert kwargs == {}, f"Unexpected keyword arguments: {list(kwargs.keys())}." assert kwargs == {}, f"Unexpected keyword arguments: {list(kwargs.keys())}."
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
dlbrt_output = self.distilbert( dlbrt_output = self.distilbert(
input_ids=input_ids, input_ids=input_ids,
...@@ -544,7 +545,7 @@ class DistilBertForMaskedLM(DistilBertPreTrainedModel): ...@@ -544,7 +545,7 @@ class DistilBertForMaskedLM(DistilBertPreTrainedModel):
inputs_embeds=inputs_embeds, inputs_embeds=inputs_embeds,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
hidden_states = dlbrt_output[0] # (bs, seq_length, dim) hidden_states = dlbrt_output[0] # (bs, seq_length, dim)
prediction_logits = self.vocab_transform(hidden_states) # (bs, seq_length, dim) prediction_logits = self.vocab_transform(hidden_states) # (bs, seq_length, dim)
...@@ -556,7 +557,7 @@ class DistilBertForMaskedLM(DistilBertPreTrainedModel): ...@@ -556,7 +557,7 @@ class DistilBertForMaskedLM(DistilBertPreTrainedModel):
if labels is not None: if labels is not None:
mlm_loss = self.mlm_loss_fct(prediction_logits.view(-1, prediction_logits.size(-1)), labels.view(-1)) mlm_loss = self.mlm_loss_fct(prediction_logits.view(-1, prediction_logits.size(-1)), labels.view(-1))
if return_tuple: if not return_dict:
output = (prediction_logits,) + dlbrt_output[1:] output = (prediction_logits,) + dlbrt_output[1:]
return ((mlm_loss,) + output) if mlm_loss is not None else output return ((mlm_loss,) + output) if mlm_loss is not None else output
...@@ -601,7 +602,7 @@ class DistilBertForSequenceClassification(DistilBertPreTrainedModel): ...@@ -601,7 +602,7 @@ class DistilBertForSequenceClassification(DistilBertPreTrainedModel):
labels=None, labels=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
): ):
r""" r"""
labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`): labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`):
...@@ -610,7 +611,7 @@ class DistilBertForSequenceClassification(DistilBertPreTrainedModel): ...@@ -610,7 +611,7 @@ class DistilBertForSequenceClassification(DistilBertPreTrainedModel):
If :obj:`config.num_labels == 1` a regression loss is computed (Mean-Square loss), If :obj:`config.num_labels == 1` a regression loss is computed (Mean-Square loss),
If :obj:`config.num_labels > 1` a classification loss is computed (Cross-Entropy). If :obj:`config.num_labels > 1` a classification loss is computed (Cross-Entropy).
""" """
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
distilbert_output = self.distilbert( distilbert_output = self.distilbert(
input_ids=input_ids, input_ids=input_ids,
...@@ -619,7 +620,7 @@ class DistilBertForSequenceClassification(DistilBertPreTrainedModel): ...@@ -619,7 +620,7 @@ class DistilBertForSequenceClassification(DistilBertPreTrainedModel):
inputs_embeds=inputs_embeds, inputs_embeds=inputs_embeds,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
hidden_state = distilbert_output[0] # (bs, seq_len, dim) hidden_state = distilbert_output[0] # (bs, seq_len, dim)
pooled_output = hidden_state[:, 0] # (bs, dim) pooled_output = hidden_state[:, 0] # (bs, dim)
...@@ -637,7 +638,7 @@ class DistilBertForSequenceClassification(DistilBertPreTrainedModel): ...@@ -637,7 +638,7 @@ class DistilBertForSequenceClassification(DistilBertPreTrainedModel):
loss_fct = nn.CrossEntropyLoss() loss_fct = nn.CrossEntropyLoss()
loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1)) loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
if return_tuple: if not return_dict:
output = (logits,) + distilbert_output[1:] output = (logits,) + distilbert_output[1:]
return ((loss,) + output) if loss is not None else output return ((loss,) + output) if loss is not None else output
...@@ -682,7 +683,7 @@ class DistilBertForQuestionAnswering(DistilBertPreTrainedModel): ...@@ -682,7 +683,7 @@ class DistilBertForQuestionAnswering(DistilBertPreTrainedModel):
end_positions=None, end_positions=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
): ):
r""" r"""
start_positions (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`): start_positions (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`):
...@@ -694,7 +695,7 @@ class DistilBertForQuestionAnswering(DistilBertPreTrainedModel): ...@@ -694,7 +695,7 @@ class DistilBertForQuestionAnswering(DistilBertPreTrainedModel):
Positions are clamped to the length of the sequence (`sequence_length`). Positions are clamped to the length of the sequence (`sequence_length`).
Position outside of the sequence are not taken into account for computing the loss. Position outside of the sequence are not taken into account for computing the loss.
""" """
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
distilbert_output = self.distilbert( distilbert_output = self.distilbert(
input_ids=input_ids, input_ids=input_ids,
...@@ -703,7 +704,7 @@ class DistilBertForQuestionAnswering(DistilBertPreTrainedModel): ...@@ -703,7 +704,7 @@ class DistilBertForQuestionAnswering(DistilBertPreTrainedModel):
inputs_embeds=inputs_embeds, inputs_embeds=inputs_embeds,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
hidden_states = distilbert_output[0] # (bs, max_query_len, dim) hidden_states = distilbert_output[0] # (bs, max_query_len, dim)
...@@ -730,7 +731,7 @@ class DistilBertForQuestionAnswering(DistilBertPreTrainedModel): ...@@ -730,7 +731,7 @@ class DistilBertForQuestionAnswering(DistilBertPreTrainedModel):
end_loss = loss_fct(end_logits, end_positions) end_loss = loss_fct(end_logits, end_positions)
total_loss = (start_loss + end_loss) / 2 total_loss = (start_loss + end_loss) / 2
if return_tuple: if not return_dict:
output = (start_logits, end_logits) + distilbert_output[1:] output = (start_logits, end_logits) + distilbert_output[1:]
return ((total_loss,) + output) if total_loss is not None else output return ((total_loss,) + output) if total_loss is not None else output
...@@ -775,14 +776,14 @@ class DistilBertForTokenClassification(DistilBertPreTrainedModel): ...@@ -775,14 +776,14 @@ class DistilBertForTokenClassification(DistilBertPreTrainedModel):
labels=None, labels=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
): ):
r""" r"""
labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`): labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`):
Labels for computing the token classification loss. Labels for computing the token classification loss.
Indices should be in ``[0, ..., config.num_labels - 1]``. Indices should be in ``[0, ..., config.num_labels - 1]``.
""" """
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
outputs = self.distilbert( outputs = self.distilbert(
input_ids, input_ids,
...@@ -791,7 +792,7 @@ class DistilBertForTokenClassification(DistilBertPreTrainedModel): ...@@ -791,7 +792,7 @@ class DistilBertForTokenClassification(DistilBertPreTrainedModel):
inputs_embeds=inputs_embeds, inputs_embeds=inputs_embeds,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
sequence_output = outputs[0] sequence_output = outputs[0]
...@@ -813,7 +814,7 @@ class DistilBertForTokenClassification(DistilBertPreTrainedModel): ...@@ -813,7 +814,7 @@ class DistilBertForTokenClassification(DistilBertPreTrainedModel):
else: else:
loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1)) loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
if return_tuple: if not return_dict:
output = (logits,) + outputs[1:] output = (logits,) + outputs[1:]
return ((loss,) + output) if loss is not None else output return ((loss,) + output) if loss is not None else output
...@@ -849,7 +850,7 @@ class DistilBertForMultipleChoice(DistilBertPreTrainedModel): ...@@ -849,7 +850,7 @@ class DistilBertForMultipleChoice(DistilBertPreTrainedModel):
labels=None, labels=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
): ):
r""" r"""
labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`): labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`):
...@@ -865,7 +866,7 @@ class DistilBertForMultipleChoice(DistilBertPreTrainedModel): ...@@ -865,7 +866,7 @@ class DistilBertForMultipleChoice(DistilBertPreTrainedModel):
>>> import torch >>> import torch
>>> tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-cased') >>> tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-cased')
>>> model = DistilBertForMultipleChoice.from_pretrained('distilbert-base-cased') >>> model = DistilBertForMultipleChoice.from_pretrained('distilbert-base-cased', return_dict=True)
>>> prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced." >>> prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced."
>>> choice0 = "It is eaten with a fork and a knife." >>> choice0 = "It is eaten with a fork and a knife."
...@@ -879,7 +880,7 @@ class DistilBertForMultipleChoice(DistilBertPreTrainedModel): ...@@ -879,7 +880,7 @@ class DistilBertForMultipleChoice(DistilBertPreTrainedModel):
>>> loss = outputs.loss >>> loss = outputs.loss
>>> logits = outputs.logits >>> logits = outputs.logits
""" """
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1] num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1]
input_ids = input_ids.view(-1, input_ids.size(-1)) if input_ids is not None else None input_ids = input_ids.view(-1, input_ids.size(-1)) if input_ids is not None else None
...@@ -897,7 +898,7 @@ class DistilBertForMultipleChoice(DistilBertPreTrainedModel): ...@@ -897,7 +898,7 @@ class DistilBertForMultipleChoice(DistilBertPreTrainedModel):
inputs_embeds=inputs_embeds, inputs_embeds=inputs_embeds,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
hidden_state = outputs[0] # (bs * num_choices, seq_len, dim) hidden_state = outputs[0] # (bs * num_choices, seq_len, dim)
...@@ -914,7 +915,7 @@ class DistilBertForMultipleChoice(DistilBertPreTrainedModel): ...@@ -914,7 +915,7 @@ class DistilBertForMultipleChoice(DistilBertPreTrainedModel):
loss_fct = CrossEntropyLoss() loss_fct = CrossEntropyLoss()
loss = loss_fct(reshaped_logits, labels) loss = loss_fct(reshaped_logits, labels)
if return_tuple: if not return_dict:
output = (reshaped_logits,) + outputs[1:] output = (reshaped_logits,) + outputs[1:]
return ((loss,) + output) if loss is not None else output return ((loss,) + output) if loss is not None else output
......
...@@ -134,8 +134,8 @@ class DPRReaderOutput(ModelOutput): ...@@ -134,8 +134,8 @@ class DPRReaderOutput(ModelOutput):
""" """
start_logits: torch.FloatTensor start_logits: torch.FloatTensor
end_logits: torch.FloatTensor end_logits: torch.FloatTensor = None
relevance_logits: torch.FloatTensor relevance_logits: torch.FloatTensor = None
hidden_states: Optional[Tuple[torch.FloatTensor]] = None hidden_states: Optional[Tuple[torch.FloatTensor]] = None
attentions: Optional[Tuple[torch.FloatTensor]] = None attentions: Optional[Tuple[torch.FloatTensor]] = None
...@@ -161,7 +161,7 @@ class DPREncoder(PreTrainedModel): ...@@ -161,7 +161,7 @@ class DPREncoder(PreTrainedModel):
inputs_embeds: Optional[Tensor] = None, inputs_embeds: Optional[Tensor] = None,
output_attentions: bool = False, output_attentions: bool = False,
output_hidden_states: bool = False, output_hidden_states: bool = False,
return_tuple: bool = False, return_dict: bool = False,
) -> Union[BaseModelOutputWithPooling, Tuple[Tensor, ...]]: ) -> Union[BaseModelOutputWithPooling, Tuple[Tensor, ...]]:
outputs = self.bert_model( outputs = self.bert_model(
input_ids=input_ids, input_ids=input_ids,
...@@ -170,14 +170,14 @@ class DPREncoder(PreTrainedModel): ...@@ -170,14 +170,14 @@ class DPREncoder(PreTrainedModel):
inputs_embeds=inputs_embeds, inputs_embeds=inputs_embeds,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
sequence_output, pooled_output = outputs[:2] sequence_output, pooled_output = outputs[:2]
pooled_output = sequence_output[:, 0, :] pooled_output = sequence_output[:, 0, :]
if self.projection_dim > 0: if self.projection_dim > 0:
pooled_output = self.encode_proj(pooled_output) pooled_output = self.encode_proj(pooled_output)
if return_tuple: if not return_dict:
return (sequence_output, pooled_output) + outputs[2:] return (sequence_output, pooled_output) + outputs[2:]
return BaseModelOutputWithPooling( return BaseModelOutputWithPooling(
...@@ -217,7 +217,7 @@ class DPRSpanPredictor(PreTrainedModel): ...@@ -217,7 +217,7 @@ class DPRSpanPredictor(PreTrainedModel):
inputs_embeds: Optional[Tensor] = None, inputs_embeds: Optional[Tensor] = None,
output_attentions: bool = False, output_attentions: bool = False,
output_hidden_states: bool = False, output_hidden_states: bool = False,
return_tuple: bool = False, return_dict: bool = False,
) -> Union[DPRReaderOutput, Tuple[Tensor, ...]]: ) -> Union[DPRReaderOutput, Tuple[Tensor, ...]]:
# notations: N - number of questions in a batch, M - number of passages per questions, L - sequence length # notations: N - number of questions in a batch, M - number of passages per questions, L - sequence length
n_passages, sequence_length = input_ids.size() if input_ids is not None else inputs_embeds.size()[:2] n_passages, sequence_length = input_ids.size() if input_ids is not None else inputs_embeds.size()[:2]
...@@ -228,7 +228,7 @@ class DPRSpanPredictor(PreTrainedModel): ...@@ -228,7 +228,7 @@ class DPRSpanPredictor(PreTrainedModel):
inputs_embeds=inputs_embeds, inputs_embeds=inputs_embeds,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
sequence_output = outputs[0] sequence_output = outputs[0]
...@@ -244,7 +244,7 @@ class DPRSpanPredictor(PreTrainedModel): ...@@ -244,7 +244,7 @@ class DPRSpanPredictor(PreTrainedModel):
end_logits = end_logits.view(n_passages, sequence_length) end_logits = end_logits.view(n_passages, sequence_length)
relevance_logits = relevance_logits.view(n_passages) relevance_logits = relevance_logits.view(n_passages)
if return_tuple: if not return_dict:
return (start_logits, end_logits, relevance_logits) + outputs[2:] return (start_logits, end_logits, relevance_logits) + outputs[2:]
return DPRReaderOutput( return DPRReaderOutput(
...@@ -361,6 +361,9 @@ DPR_ENCODERS_INPUTS_DOCSTRING = r""" ...@@ -361,6 +361,9 @@ DPR_ENCODERS_INPUTS_DOCSTRING = r"""
If set to ``True``, the attentions tensors of all attention layers are returned. See ``attentions`` under returned tensors for more detail. If set to ``True``, the attentions tensors of all attention layers are returned. See ``attentions`` under returned tensors for more detail.
output_hidden_states (:obj:`bool`, `optional`, defaults to :obj:`None`): output_hidden_states (:obj:`bool`, `optional`, defaults to :obj:`None`):
If set to ``True``, the hidden states tensors of all layers are returned. See ``hidden_states`` under returned tensors for more detail. If set to ``True``, the hidden states tensors of all layers are returned. See ``hidden_states`` under returned tensors for more detail.
return_dict (:obj:`bool`, `optional`, defaults to :obj:`None`):
If set to ``True``, the model will return a :class:`~transformers.file_utils.ModelOutput` instead of a
plain tuple.
""" """
DPR_READER_INPUTS_DOCSTRING = r""" DPR_READER_INPUTS_DOCSTRING = r"""
...@@ -388,6 +391,9 @@ DPR_READER_INPUTS_DOCSTRING = r""" ...@@ -388,6 +391,9 @@ DPR_READER_INPUTS_DOCSTRING = r"""
If set to ``True``, the attentions tensors of all attention layers are returned. See ``attentions`` under returned tensors for more detail. If set to ``True``, the attentions tensors of all attention layers are returned. See ``attentions`` under returned tensors for more detail.
output_hidden_states (:obj:`bool`, `optional`, defaults to :obj:`None`): output_hidden_states (:obj:`bool`, `optional`, defaults to :obj:`None`):
If set to ``True``, the hidden states tensors of all layers are returned. See ``hidden_states`` under returned tensors for more detail. If set to ``True``, the hidden states tensors of all layers are returned. See ``hidden_states`` under returned tensors for more detail.
return_dict (:obj:`bool`, `optional`, defaults to :obj:`None`):
If set to ``True``, the model will return a :class:`~transformers.file_utils.ModelOutput` instead of a
plain tuple.
""" """
...@@ -412,7 +418,7 @@ class DPRContextEncoder(DPRPretrainedContextEncoder): ...@@ -412,7 +418,7 @@ class DPRContextEncoder(DPRPretrainedContextEncoder):
inputs_embeds: Optional[Tensor] = None, inputs_embeds: Optional[Tensor] = None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
) -> Union[DPRContextEncoderOutput, Tuple[Tensor, ...]]: ) -> Union[DPRContextEncoderOutput, Tuple[Tensor, ...]]:
r""" r"""
Return: Return:
...@@ -421,7 +427,7 @@ class DPRContextEncoder(DPRPretrainedContextEncoder): ...@@ -421,7 +427,7 @@ class DPRContextEncoder(DPRPretrainedContextEncoder):
from transformers import DPRContextEncoder, DPRContextEncoderTokenizer from transformers import DPRContextEncoder, DPRContextEncoderTokenizer
tokenizer = DPRContextEncoderTokenizer.from_pretrained('facebook/dpr-ctx_encoder-single-nq-base') tokenizer = DPRContextEncoderTokenizer.from_pretrained('facebook/dpr-ctx_encoder-single-nq-base')
model = DPRContextEncoder.from_pretrained('facebook/dpr-ctx_encoder-single-nq-base') model = DPRContextEncoder.from_pretrained('facebook/dpr-ctx_encoder-single-nq-base', return_dict=True)
input_ids = tokenizer("Hello, is my dog cute ?", return_tensors='pt')["input_ids"] input_ids = tokenizer("Hello, is my dog cute ?", return_tensors='pt')["input_ids"]
embeddings = model(input_ids).pooler_output embeddings = model(input_ids).pooler_output
""" """
...@@ -430,7 +436,7 @@ class DPRContextEncoder(DPRPretrainedContextEncoder): ...@@ -430,7 +436,7 @@ class DPRContextEncoder(DPRPretrainedContextEncoder):
output_hidden_states = ( output_hidden_states = (
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
) )
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
if input_ids is not None and inputs_embeds is not None: if input_ids is not None and inputs_embeds is not None:
raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time") raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
...@@ -459,10 +465,10 @@ class DPRContextEncoder(DPRPretrainedContextEncoder): ...@@ -459,10 +465,10 @@ class DPRContextEncoder(DPRPretrainedContextEncoder):
inputs_embeds=inputs_embeds, inputs_embeds=inputs_embeds,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
if return_tuple: if not return_dict:
return outputs[1:] return outputs[1:]
return DPRContextEncoderOutput( return DPRContextEncoderOutput(
pooler_output=outputs.pooler_output, hidden_states=outputs.hidden_states, attentions=outputs.attentions pooler_output=outputs.pooler_output, hidden_states=outputs.hidden_states, attentions=outputs.attentions
...@@ -490,7 +496,7 @@ class DPRQuestionEncoder(DPRPretrainedQuestionEncoder): ...@@ -490,7 +496,7 @@ class DPRQuestionEncoder(DPRPretrainedQuestionEncoder):
inputs_embeds: Optional[Tensor] = None, inputs_embeds: Optional[Tensor] = None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
) -> Union[DPRQuestionEncoderOutput, Tuple[Tensor, ...]]: ) -> Union[DPRQuestionEncoderOutput, Tuple[Tensor, ...]]:
r""" r"""
Return: Return:
...@@ -499,7 +505,7 @@ class DPRQuestionEncoder(DPRPretrainedQuestionEncoder): ...@@ -499,7 +505,7 @@ class DPRQuestionEncoder(DPRPretrainedQuestionEncoder):
from transformers import DPRQuestionEncoder, DPRQuestionEncoderTokenizer from transformers import DPRQuestionEncoder, DPRQuestionEncoderTokenizer
tokenizer = DPRQuestionEncoderTokenizer.from_pretrained('facebook/dpr-question_encoder-single-nq-base') tokenizer = DPRQuestionEncoderTokenizer.from_pretrained('facebook/dpr-question_encoder-single-nq-base')
model = DPRQuestionEncoder.from_pretrained('facebook/dpr-question_encoder-single-nq-base') model = DPRQuestionEncoder.from_pretrained('facebook/dpr-question_encoder-single-nq-base', return_dict=True)
input_ids = tokenizer("Hello, is my dog cute ?", return_tensors='pt')["input_ids"] input_ids = tokenizer("Hello, is my dog cute ?", return_tensors='pt')["input_ids"]
embeddings = model(input_ids).pooler_output embeddings = model(input_ids).pooler_output
""" """
...@@ -507,7 +513,7 @@ class DPRQuestionEncoder(DPRPretrainedQuestionEncoder): ...@@ -507,7 +513,7 @@ class DPRQuestionEncoder(DPRPretrainedQuestionEncoder):
output_hidden_states = ( output_hidden_states = (
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
) )
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
if input_ids is not None and inputs_embeds is not None: if input_ids is not None and inputs_embeds is not None:
raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time") raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
...@@ -536,10 +542,10 @@ class DPRQuestionEncoder(DPRPretrainedQuestionEncoder): ...@@ -536,10 +542,10 @@ class DPRQuestionEncoder(DPRPretrainedQuestionEncoder):
inputs_embeds=inputs_embeds, inputs_embeds=inputs_embeds,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
if return_tuple: if not return_dict:
return outputs[1:] return outputs[1:]
return DPRQuestionEncoderOutput( return DPRQuestionEncoderOutput(
pooler_output=outputs.pooler_output, hidden_states=outputs.hidden_states, attentions=outputs.attentions pooler_output=outputs.pooler_output, hidden_states=outputs.hidden_states, attentions=outputs.attentions
...@@ -565,7 +571,7 @@ class DPRReader(DPRPretrainedReader): ...@@ -565,7 +571,7 @@ class DPRReader(DPRPretrainedReader):
inputs_embeds: Optional[Tensor] = None, inputs_embeds: Optional[Tensor] = None,
output_attentions: bool = None, output_attentions: bool = None,
output_hidden_states: bool = None, output_hidden_states: bool = None,
return_tuple=None, return_dict=None,
) -> Union[DPRReaderOutput, Tuple[Tensor, ...]]: ) -> Union[DPRReaderOutput, Tuple[Tensor, ...]]:
r""" r"""
Return: Return:
...@@ -574,7 +580,7 @@ class DPRReader(DPRPretrainedReader): ...@@ -574,7 +580,7 @@ class DPRReader(DPRPretrainedReader):
from transformers import DPRReader, DPRReaderTokenizer from transformers import DPRReader, DPRReaderTokenizer
tokenizer = DPRReaderTokenizer.from_pretrained('facebook/dpr-reader-single-nq-base') tokenizer = DPRReaderTokenizer.from_pretrained('facebook/dpr-reader-single-nq-base')
model = DPRReader.from_pretrained('facebook/dpr-reader-single-nq-base') model = DPRReader.from_pretrained('facebook/dpr-reader-single-nq-base', return_dict=True)
encoded_inputs = tokenizer( encoded_inputs = tokenizer(
questions=["What is love ?"], questions=["What is love ?"],
titles=["Haddaway"], titles=["Haddaway"],
...@@ -591,7 +597,7 @@ class DPRReader(DPRPretrainedReader): ...@@ -591,7 +597,7 @@ class DPRReader(DPRPretrainedReader):
output_hidden_states = ( output_hidden_states = (
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
) )
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
if input_ids is not None and inputs_embeds is not None: if input_ids is not None and inputs_embeds is not None:
raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time") raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
...@@ -613,5 +619,5 @@ class DPRReader(DPRPretrainedReader): ...@@ -613,5 +619,5 @@ class DPRReader(DPRPretrainedReader):
inputs_embeds=inputs_embeds, inputs_embeds=inputs_embeds,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
...@@ -208,8 +208,8 @@ class ElectraForPretrainingOutput(ModelOutput): ...@@ -208,8 +208,8 @@ class ElectraForPretrainingOutput(ModelOutput):
heads. heads.
""" """
loss: Optional[torch.FloatTensor] loss: Optional[torch.FloatTensor] = None
logits: torch.FloatTensor logits: torch.FloatTensor = None
hidden_states: Optional[Tuple[torch.FloatTensor]] = None hidden_states: Optional[Tuple[torch.FloatTensor]] = None
attentions: Optional[Tuple[torch.FloatTensor]] = None attentions: Optional[Tuple[torch.FloatTensor]] = None
...@@ -272,8 +272,9 @@ ELECTRA_INPUTS_DOCSTRING = r""" ...@@ -272,8 +272,9 @@ ELECTRA_INPUTS_DOCSTRING = r"""
If set to ``True``, the attentions tensors of all attention layers are returned. See ``attentions`` under returned tensors for more detail. If set to ``True``, the attentions tensors of all attention layers are returned. See ``attentions`` under returned tensors for more detail.
output_hidden_states (:obj:`bool`, `optional`, defaults to :obj:`None`): output_hidden_states (:obj:`bool`, `optional`, defaults to :obj:`None`):
If set to ``True``, the hidden states of all layers are returned. See ``hidden_states`` under returned tensors for more detail. If set to ``True``, the hidden states of all layers are returned. See ``hidden_states`` under returned tensors for more detail.
return_tuple (:obj:`bool`, `optional`, defaults to :obj:`None`): return_dict (:obj:`bool`, `optional`, defaults to :obj:`None`):
If set to ``True``, the output of the model will be a plain tuple instead of a ``dataclass``. If set to ``True``, the model will return a :class:`~transformers.file_utils.ModelOutput` instead of a
plain tuple.
""" """
...@@ -331,13 +332,13 @@ class ElectraModel(ElectraPreTrainedModel): ...@@ -331,13 +332,13 @@ class ElectraModel(ElectraPreTrainedModel):
inputs_embeds=None, inputs_embeds=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
): ):
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
output_hidden_states = ( output_hidden_states = (
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
) )
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
if input_ids is not None and inputs_embeds is not None: if input_ids is not None and inputs_embeds is not None:
raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time") raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
...@@ -371,7 +372,7 @@ class ElectraModel(ElectraPreTrainedModel): ...@@ -371,7 +372,7 @@ class ElectraModel(ElectraPreTrainedModel):
head_mask=head_mask, head_mask=head_mask,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
return hidden_states return hidden_states
...@@ -428,7 +429,7 @@ class ElectraForSequenceClassification(ElectraPreTrainedModel): ...@@ -428,7 +429,7 @@ class ElectraForSequenceClassification(ElectraPreTrainedModel):
labels=None, labels=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
): ):
r""" r"""
labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`): labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`):
...@@ -437,7 +438,7 @@ class ElectraForSequenceClassification(ElectraPreTrainedModel): ...@@ -437,7 +438,7 @@ class ElectraForSequenceClassification(ElectraPreTrainedModel):
If :obj:`config.num_labels == 1` a regression loss is computed (Mean-Square loss), If :obj:`config.num_labels == 1` a regression loss is computed (Mean-Square loss),
If :obj:`config.num_labels > 1` a classification loss is computed (Cross-Entropy). If :obj:`config.num_labels > 1` a classification loss is computed (Cross-Entropy).
""" """
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
discriminator_hidden_states = self.electra( discriminator_hidden_states = self.electra(
input_ids, input_ids,
...@@ -448,7 +449,7 @@ class ElectraForSequenceClassification(ElectraPreTrainedModel): ...@@ -448,7 +449,7 @@ class ElectraForSequenceClassification(ElectraPreTrainedModel):
inputs_embeds, inputs_embeds,
output_attentions, output_attentions,
output_hidden_states, output_hidden_states,
return_tuple, return_dict,
) )
sequence_output = discriminator_hidden_states[0] sequence_output = discriminator_hidden_states[0]
...@@ -464,7 +465,7 @@ class ElectraForSequenceClassification(ElectraPreTrainedModel): ...@@ -464,7 +465,7 @@ class ElectraForSequenceClassification(ElectraPreTrainedModel):
loss_fct = CrossEntropyLoss() loss_fct = CrossEntropyLoss()
loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1)) loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
if return_tuple: if not return_dict:
output = (logits,) + discriminator_hidden_states[1:] output = (logits,) + discriminator_hidden_states[1:]
return ((loss,) + output) if loss is not None else output return ((loss,) + output) if loss is not None else output
...@@ -505,7 +506,7 @@ class ElectraForPreTraining(ElectraPreTrainedModel): ...@@ -505,7 +506,7 @@ class ElectraForPreTraining(ElectraPreTrainedModel):
labels=None, labels=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
): ):
r""" r"""
labels (``torch.LongTensor`` of shape ``(batch_size, sequence_length)``, `optional`, defaults to :obj:`None`): labels (``torch.LongTensor`` of shape ``(batch_size, sequence_length)``, `optional`, defaults to :obj:`None`):
...@@ -527,7 +528,7 @@ class ElectraForPreTraining(ElectraPreTrainedModel): ...@@ -527,7 +528,7 @@ class ElectraForPreTraining(ElectraPreTrainedModel):
>>> input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0) # Batch size 1 >>> input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0) # Batch size 1
>>> logits = model(input_ids).logits >>> logits = model(input_ids).logits
""" """
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
discriminator_hidden_states = self.electra( discriminator_hidden_states = self.electra(
input_ids, input_ids,
...@@ -538,7 +539,7 @@ class ElectraForPreTraining(ElectraPreTrainedModel): ...@@ -538,7 +539,7 @@ class ElectraForPreTraining(ElectraPreTrainedModel):
inputs_embeds, inputs_embeds,
output_attentions, output_attentions,
output_hidden_states, output_hidden_states,
return_tuple, return_dict,
) )
discriminator_sequence_output = discriminator_hidden_states[0] discriminator_sequence_output = discriminator_hidden_states[0]
...@@ -555,7 +556,7 @@ class ElectraForPreTraining(ElectraPreTrainedModel): ...@@ -555,7 +556,7 @@ class ElectraForPreTraining(ElectraPreTrainedModel):
else: else:
loss = loss_fct(logits.view(-1, discriminator_sequence_output.shape[1]), labels.float()) loss = loss_fct(logits.view(-1, discriminator_sequence_output.shape[1]), labels.float())
if return_tuple: if not return_dict:
output = (logits,) + discriminator_hidden_states[1:] output = (logits,) + discriminator_hidden_states[1:]
return ((loss,) + output) if loss is not None else output return ((loss,) + output) if loss is not None else output
...@@ -606,7 +607,7 @@ class ElectraForMaskedLM(ElectraPreTrainedModel): ...@@ -606,7 +607,7 @@ class ElectraForMaskedLM(ElectraPreTrainedModel):
labels=None, labels=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
**kwargs **kwargs
): ):
r""" r"""
...@@ -625,7 +626,7 @@ class ElectraForMaskedLM(ElectraPreTrainedModel): ...@@ -625,7 +626,7 @@ class ElectraForMaskedLM(ElectraPreTrainedModel):
) )
labels = kwargs.pop("masked_lm_labels") labels = kwargs.pop("masked_lm_labels")
assert kwargs == {}, f"Unexpected keyword arguments: {list(kwargs.keys())}." assert kwargs == {}, f"Unexpected keyword arguments: {list(kwargs.keys())}."
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
generator_hidden_states = self.electra( generator_hidden_states = self.electra(
input_ids, input_ids,
...@@ -636,7 +637,7 @@ class ElectraForMaskedLM(ElectraPreTrainedModel): ...@@ -636,7 +637,7 @@ class ElectraForMaskedLM(ElectraPreTrainedModel):
inputs_embeds, inputs_embeds,
output_attentions, output_attentions,
output_hidden_states, output_hidden_states,
return_tuple, return_dict,
) )
generator_sequence_output = generator_hidden_states[0] generator_sequence_output = generator_hidden_states[0]
...@@ -649,7 +650,7 @@ class ElectraForMaskedLM(ElectraPreTrainedModel): ...@@ -649,7 +650,7 @@ class ElectraForMaskedLM(ElectraPreTrainedModel):
loss_fct = nn.CrossEntropyLoss() # -100 index = padding token loss_fct = nn.CrossEntropyLoss() # -100 index = padding token
loss = loss_fct(prediction_scores.view(-1, self.config.vocab_size), labels.view(-1)) loss = loss_fct(prediction_scores.view(-1, self.config.vocab_size), labels.view(-1))
if return_tuple: if not return_dict:
output = (prediction_scores,) + generator_hidden_states[1:] output = (prediction_scores,) + generator_hidden_states[1:]
return ((loss,) + output) if loss is not None else output return ((loss,) + output) if loss is not None else output
...@@ -695,14 +696,14 @@ class ElectraForTokenClassification(ElectraPreTrainedModel): ...@@ -695,14 +696,14 @@ class ElectraForTokenClassification(ElectraPreTrainedModel):
labels=None, labels=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
): ):
r""" r"""
labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`): labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`):
Labels for computing the token classification loss. Labels for computing the token classification loss.
Indices should be in ``[0, ..., config.num_labels - 1]``. Indices should be in ``[0, ..., config.num_labels - 1]``.
""" """
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
discriminator_hidden_states = self.electra( discriminator_hidden_states = self.electra(
input_ids, input_ids,
...@@ -713,7 +714,7 @@ class ElectraForTokenClassification(ElectraPreTrainedModel): ...@@ -713,7 +714,7 @@ class ElectraForTokenClassification(ElectraPreTrainedModel):
inputs_embeds, inputs_embeds,
output_attentions, output_attentions,
output_hidden_states, output_hidden_states,
return_tuple, return_dict,
) )
discriminator_sequence_output = discriminator_hidden_states[0] discriminator_sequence_output = discriminator_hidden_states[0]
...@@ -732,7 +733,7 @@ class ElectraForTokenClassification(ElectraPreTrainedModel): ...@@ -732,7 +733,7 @@ class ElectraForTokenClassification(ElectraPreTrainedModel):
else: else:
loss = loss_fct(logits.view(-1, self.config.num_labels), labels.view(-1)) loss = loss_fct(logits.view(-1, self.config.num_labels), labels.view(-1))
if return_tuple: if not return_dict:
output = (logits,) + discriminator_hidden_states[1:] output = (logits,) + discriminator_hidden_states[1:]
return ((loss,) + output) if loss is not None else output return ((loss,) + output) if loss is not None else output
...@@ -782,7 +783,7 @@ class ElectraForQuestionAnswering(ElectraPreTrainedModel): ...@@ -782,7 +783,7 @@ class ElectraForQuestionAnswering(ElectraPreTrainedModel):
end_positions=None, end_positions=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
): ):
r""" r"""
start_positions (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`): start_positions (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`):
...@@ -794,7 +795,7 @@ class ElectraForQuestionAnswering(ElectraPreTrainedModel): ...@@ -794,7 +795,7 @@ class ElectraForQuestionAnswering(ElectraPreTrainedModel):
Positions are clamped to the length of the sequence (`sequence_length`). Positions are clamped to the length of the sequence (`sequence_length`).
Position outside of the sequence are not taken into account for computing the loss. Position outside of the sequence are not taken into account for computing the loss.
""" """
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
discriminator_hidden_states = self.electra( discriminator_hidden_states = self.electra(
input_ids, input_ids,
...@@ -831,7 +832,7 @@ class ElectraForQuestionAnswering(ElectraPreTrainedModel): ...@@ -831,7 +832,7 @@ class ElectraForQuestionAnswering(ElectraPreTrainedModel):
end_loss = loss_fct(end_logits, end_positions) end_loss = loss_fct(end_logits, end_positions)
total_loss = (start_loss + end_loss) / 2 total_loss = (start_loss + end_loss) / 2
if return_tuple: if not return_dict:
output = (start_logits, end_logits,) + discriminator_hidden_states[1:] output = (start_logits, end_logits,) + discriminator_hidden_states[1:]
return ((total_loss,) + output) if total_loss is not None else output return ((total_loss,) + output) if total_loss is not None else output
...@@ -876,7 +877,7 @@ class ElectraForMultipleChoice(ElectraPreTrainedModel): ...@@ -876,7 +877,7 @@ class ElectraForMultipleChoice(ElectraPreTrainedModel):
inputs_embeds=None, inputs_embeds=None,
labels=None, labels=None,
output_attentions=None, output_attentions=None,
return_tuple=None, return_dict=None,
): ):
r""" r"""
labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`): labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`):
...@@ -884,7 +885,7 @@ class ElectraForMultipleChoice(ElectraPreTrainedModel): ...@@ -884,7 +885,7 @@ class ElectraForMultipleChoice(ElectraPreTrainedModel):
Indices should be in ``[0, ..., num_choices-1]`` where `num_choices` is the size of the second dimension Indices should be in ``[0, ..., num_choices-1]`` where `num_choices` is the size of the second dimension
of the input tensors. (see `input_ids` above) of the input tensors. (see `input_ids` above)
""" """
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1] num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1]
input_ids = input_ids.view(-1, input_ids.size(-1)) if input_ids is not None else None input_ids = input_ids.view(-1, input_ids.size(-1)) if input_ids is not None else None
...@@ -905,7 +906,7 @@ class ElectraForMultipleChoice(ElectraPreTrainedModel): ...@@ -905,7 +906,7 @@ class ElectraForMultipleChoice(ElectraPreTrainedModel):
head_mask=head_mask, head_mask=head_mask,
inputs_embeds=inputs_embeds, inputs_embeds=inputs_embeds,
output_attentions=output_attentions, output_attentions=output_attentions,
return_tuple=return_tuple, return_dict=return_dict,
) )
sequence_output = discriminator_hidden_states[0] sequence_output = discriminator_hidden_states[0]
...@@ -919,7 +920,7 @@ class ElectraForMultipleChoice(ElectraPreTrainedModel): ...@@ -919,7 +920,7 @@ class ElectraForMultipleChoice(ElectraPreTrainedModel):
loss_fct = CrossEntropyLoss() loss_fct = CrossEntropyLoss()
loss = loss_fct(reshaped_logits, labels) loss = loss_fct(reshaped_logits, labels)
if return_tuple: if not return_dict:
output = (reshaped_logits,) + discriminator_hidden_states[1:] output = (reshaped_logits,) + discriminator_hidden_states[1:]
return ((loss,) + output) if loss is not None else output return ((loss,) + output) if loss is not None else output
......
...@@ -273,7 +273,6 @@ class EncoderDecoderModel(PreTrainedModel): ...@@ -273,7 +273,6 @@ class EncoderDecoderModel(PreTrainedModel):
attention_mask=attention_mask, attention_mask=attention_mask,
inputs_embeds=inputs_embeds, inputs_embeds=inputs_embeds,
head_mask=head_mask, head_mask=head_mask,
return_tuple=True,
**kwargs_encoder, **kwargs_encoder,
) )
...@@ -288,7 +287,6 @@ class EncoderDecoderModel(PreTrainedModel): ...@@ -288,7 +287,6 @@ class EncoderDecoderModel(PreTrainedModel):
encoder_attention_mask=attention_mask, encoder_attention_mask=attention_mask,
head_mask=decoder_head_mask, head_mask=decoder_head_mask,
labels=labels, labels=labels,
return_tuple=True,
**kwargs_decoder, **kwargs_decoder,
) )
......
...@@ -110,8 +110,9 @@ FLAUBERT_INPUTS_DOCSTRING = r""" ...@@ -110,8 +110,9 @@ FLAUBERT_INPUTS_DOCSTRING = r"""
If set to ``True``, the attentions tensors of all attention layers are returned. See ``attentions`` under returned tensors for more detail. If set to ``True``, the attentions tensors of all attention layers are returned. See ``attentions`` under returned tensors for more detail.
output_hidden_states (:obj:`bool`, `optional`, defaults to :obj:`None`): output_hidden_states (:obj:`bool`, `optional`, defaults to :obj:`None`):
If set to ``True``, the hidden states of all layers are returned. See ``hidden_states`` under returned tensors for more detail. If set to ``True``, the hidden states of all layers are returned. See ``hidden_states`` under returned tensors for more detail.
return_tuple (:obj:`bool`, `optional`, defaults to :obj:`None`): return_dict (:obj:`bool`, `optional`, defaults to :obj:`None`):
If set to ``True``, the output of the model will be a plain tuple instead of a ``dataclass``. If set to ``True``, the model will return a :class:`~transformers.file_utils.ModelOutput` instead of a
plain tuple.
""" """
...@@ -148,13 +149,13 @@ class FlaubertModel(XLMModel): ...@@ -148,13 +149,13 @@ class FlaubertModel(XLMModel):
inputs_embeds=None, inputs_embeds=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
): ):
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
output_hidden_states = ( output_hidden_states = (
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
) )
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
# removed: src_enc=None, src_len=None # removed: src_enc=None, src_len=None
if input_ids is not None: if input_ids is not None:
...@@ -284,7 +285,7 @@ class FlaubertModel(XLMModel): ...@@ -284,7 +285,7 @@ class FlaubertModel(XLMModel):
# move back sequence length to dimension 0 # move back sequence length to dimension 0
# tensor = tensor.transpose(0, 1) # tensor = tensor.transpose(0, 1)
if return_tuple: if not return_dict:
return tuple(v for v in [tensor, hidden_states, attentions] if v is not None) return tuple(v for v in [tensor, hidden_states, attentions] if v is not None)
return BaseModelOutput(last_hidden_state=tensor, hidden_states=hidden_states, attentions=attentions) return BaseModelOutput(last_hidden_state=tensor, hidden_states=hidden_states, attentions=attentions)
......
...@@ -323,10 +323,10 @@ class GPT2DoubleHeadsModelOutput(ModelOutput): ...@@ -323,10 +323,10 @@ class GPT2DoubleHeadsModelOutput(ModelOutput):
heads. heads.
""" """
lm_loss: Optional[torch.FloatTensor] lm_loss: Optional[torch.FloatTensor] = None
mc_loss: Optional[torch.FloatTensor] mc_loss: Optional[torch.FloatTensor] = None
lm_logits: torch.FloatTensor lm_logits: torch.FloatTensor = None
mc_logits: torch.FloatTensor mc_logits: torch.FloatTensor = None
past_key_values: Optional[List[torch.FloatTensor]] = None past_key_values: Optional[List[torch.FloatTensor]] = None
hidden_states: Optional[Tuple[torch.FloatTensor]] = None hidden_states: Optional[Tuple[torch.FloatTensor]] = None
attentions: Optional[Tuple[torch.FloatTensor]] = None attentions: Optional[Tuple[torch.FloatTensor]] = None
...@@ -395,8 +395,9 @@ GPT2_INPUTS_DOCSTRING = r""" ...@@ -395,8 +395,9 @@ GPT2_INPUTS_DOCSTRING = r"""
If set to ``True``, the attentions tensors of all attention layers are returned. See ``attentions`` under returned tensors for more detail. If set to ``True``, the attentions tensors of all attention layers are returned. See ``attentions`` under returned tensors for more detail.
output_hidden_states (:obj:`bool`, `optional`, defaults to :obj:`None`): output_hidden_states (:obj:`bool`, `optional`, defaults to :obj:`None`):
If set to ``True``, the hidden states of all layers are returned. See ``hidden_states`` under returned tensors for more detail. If set to ``True``, the hidden states of all layers are returned. See ``hidden_states`` under returned tensors for more detail.
return_tuple (:obj:`bool`, `optional`, defaults to :obj:`None`): return_dict (:obj:`bool`, `optional`, defaults to :obj:`None`):
If set to ``True``, the output of the model will be a plain tuple instead of a ``dataclass``. If set to ``True``, the model will return a :class:`~transformers.file_utils.ModelOutput` instead of a
plain tuple.
""" """
...@@ -448,7 +449,7 @@ class GPT2Model(GPT2PreTrainedModel): ...@@ -448,7 +449,7 @@ class GPT2Model(GPT2PreTrainedModel):
use_cache=None, use_cache=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
**kwargs, **kwargs,
): ):
if "past" in kwargs: if "past" in kwargs:
...@@ -464,7 +465,7 @@ class GPT2Model(GPT2PreTrainedModel): ...@@ -464,7 +465,7 @@ class GPT2Model(GPT2PreTrainedModel):
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
) )
use_cache = use_cache if use_cache is not None else self.config.use_cache use_cache = use_cache if use_cache is not None else self.config.use_cache
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
if input_ids is not None and inputs_embeds is not None: if input_ids is not None and inputs_embeds is not None:
raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time") raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
...@@ -560,7 +561,7 @@ class GPT2Model(GPT2PreTrainedModel): ...@@ -560,7 +561,7 @@ class GPT2Model(GPT2PreTrainedModel):
if output_hidden_states: if output_hidden_states:
all_hidden_states = all_hidden_states + (hidden_states,) all_hidden_states = all_hidden_states + (hidden_states,)
if return_tuple: if not return_dict:
return tuple(v for v in [hidden_states, presents, all_hidden_states, all_attentions] if v is not None) return tuple(v for v in [hidden_states, presents, all_hidden_states, all_attentions] if v is not None)
return BaseModelOutputWithPast( return BaseModelOutputWithPast(
...@@ -616,7 +617,7 @@ class GPT2LMHeadModel(GPT2PreTrainedModel): ...@@ -616,7 +617,7 @@ class GPT2LMHeadModel(GPT2PreTrainedModel):
use_cache=None, use_cache=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
**kwargs, **kwargs,
): ):
r""" r"""
...@@ -634,7 +635,7 @@ class GPT2LMHeadModel(GPT2PreTrainedModel): ...@@ -634,7 +635,7 @@ class GPT2LMHeadModel(GPT2PreTrainedModel):
) )
past_key_values = kwargs.pop("past") past_key_values = kwargs.pop("past")
assert kwargs == {}, f"Unexpected keyword arguments: {list(kwargs.keys())}." assert kwargs == {}, f"Unexpected keyword arguments: {list(kwargs.keys())}."
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
transformer_outputs = self.transformer( transformer_outputs = self.transformer(
input_ids, input_ids,
...@@ -647,7 +648,7 @@ class GPT2LMHeadModel(GPT2PreTrainedModel): ...@@ -647,7 +648,7 @@ class GPT2LMHeadModel(GPT2PreTrainedModel):
use_cache=use_cache, use_cache=use_cache,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
hidden_states = transformer_outputs[0] hidden_states = transformer_outputs[0]
...@@ -662,7 +663,7 @@ class GPT2LMHeadModel(GPT2PreTrainedModel): ...@@ -662,7 +663,7 @@ class GPT2LMHeadModel(GPT2PreTrainedModel):
loss_fct = CrossEntropyLoss() loss_fct = CrossEntropyLoss()
loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1)) loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))
if return_tuple: if not return_dict:
output = (lm_logits,) + transformer_outputs[1:] output = (lm_logits,) + transformer_outputs[1:]
return ((loss,) + output) if loss is not None else output return ((loss,) + output) if loss is not None else output
...@@ -713,7 +714,7 @@ class GPT2DoubleHeadsModel(GPT2PreTrainedModel): ...@@ -713,7 +714,7 @@ class GPT2DoubleHeadsModel(GPT2PreTrainedModel):
use_cache=None, use_cache=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
**kwargs, **kwargs,
): ):
r""" r"""
...@@ -741,7 +742,7 @@ class GPT2DoubleHeadsModel(GPT2PreTrainedModel): ...@@ -741,7 +742,7 @@ class GPT2DoubleHeadsModel(GPT2PreTrainedModel):
>>> from transformers import GPT2Tokenizer, GPT2DoubleHeadsModel >>> from transformers import GPT2Tokenizer, GPT2DoubleHeadsModel
>>> tokenizer = GPT2Tokenizer.from_pretrained('gpt2') >>> tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
>>> model = GPT2DoubleHeadsModel.from_pretrained('gpt2') >>> model = GPT2DoubleHeadsModel.from_pretrained('gpt2, return_dict=True)
>>> # Add a [CLS] to the vocabulary (we should train it also!) >>> # Add a [CLS] to the vocabulary (we should train it also!)
>>> num_added_tokens = tokenizer.add_special_tokens({'cls_token': '[CLS]'}) >>> num_added_tokens = tokenizer.add_special_tokens({'cls_token': '[CLS]'})
...@@ -773,7 +774,7 @@ class GPT2DoubleHeadsModel(GPT2PreTrainedModel): ...@@ -773,7 +774,7 @@ class GPT2DoubleHeadsModel(GPT2PreTrainedModel):
) )
past_key_values = kwargs.pop("past") past_key_values = kwargs.pop("past")
assert kwargs == {}, f"Unexpected keyword arguments: {list(kwargs.keys())}." assert kwargs == {}, f"Unexpected keyword arguments: {list(kwargs.keys())}."
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
transformer_outputs = self.transformer( transformer_outputs = self.transformer(
input_ids, input_ids,
...@@ -786,7 +787,7 @@ class GPT2DoubleHeadsModel(GPT2PreTrainedModel): ...@@ -786,7 +787,7 @@ class GPT2DoubleHeadsModel(GPT2PreTrainedModel):
use_cache=use_cache, use_cache=use_cache,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
hidden_states = transformer_outputs[0] hidden_states = transformer_outputs[0]
...@@ -805,7 +806,7 @@ class GPT2DoubleHeadsModel(GPT2PreTrainedModel): ...@@ -805,7 +806,7 @@ class GPT2DoubleHeadsModel(GPT2PreTrainedModel):
loss_fct = CrossEntropyLoss() loss_fct = CrossEntropyLoss()
lm_loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1)) lm_loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))
if return_tuple: if not return_dict:
output = (lm_logits, mc_logits) + transformer_outputs[1:] output = (lm_logits, mc_logits) + transformer_outputs[1:]
if mc_loss is not None: if mc_loss is not None:
output = (mc_loss,) + output output = (mc_loss,) + output
......
...@@ -694,7 +694,7 @@ class LongformerEncoder(nn.Module): ...@@ -694,7 +694,7 @@ class LongformerEncoder(nn.Module):
attention_mask=None, attention_mask=None,
output_attentions=False, output_attentions=False,
output_hidden_states=False, output_hidden_states=False,
return_tuple=False, return_dict=False,
): ):
all_hidden_states = () if output_hidden_states else None all_hidden_states = () if output_hidden_states else None
all_attentions = () if output_attentions else None all_attentions = () if output_attentions else None
...@@ -724,7 +724,7 @@ class LongformerEncoder(nn.Module): ...@@ -724,7 +724,7 @@ class LongformerEncoder(nn.Module):
if output_hidden_states: if output_hidden_states:
all_hidden_states = all_hidden_states + (hidden_states,) all_hidden_states = all_hidden_states + (hidden_states,)
if return_tuple: if not return_dict:
return tuple(v for v in [hidden_states, all_hidden_states, all_attentions] if v is not None) return tuple(v for v in [hidden_states, all_hidden_states, all_attentions] if v is not None)
return BaseModelOutput( return BaseModelOutput(
last_hidden_state=hidden_states, hidden_states=all_hidden_states, attentions=all_attentions last_hidden_state=hidden_states, hidden_states=all_hidden_states, attentions=all_attentions
...@@ -811,8 +811,9 @@ LONGFORMER_INPUTS_DOCSTRING = r""" ...@@ -811,8 +811,9 @@ LONGFORMER_INPUTS_DOCSTRING = r"""
If set to ``True``, the attentions tensors of all attention layers are returned. See ``attentions`` under returned tensors for more detail. If set to ``True``, the attentions tensors of all attention layers are returned. See ``attentions`` under returned tensors for more detail.
output_hidden_states (:obj:`bool`, `optional`, defaults to :obj:`None`): output_hidden_states (:obj:`bool`, `optional`, defaults to :obj:`None`):
If set to ``True``, the hidden states of all layers are returned. See ``hidden_states`` under returned tensors for more detail. If set to ``True``, the hidden states of all layers are returned. See ``hidden_states`` under returned tensors for more detail.
return_tuple (:obj:`bool`, `optional`, defaults to :obj:`None`): return_dict (:obj:`bool`, `optional`, defaults to :obj:`None`):
If set to ``True``, the output of the model will be a plain tuple instead of a ``dataclass``. If set to ``True``, the model will return a :class:`~transformers.file_utils.ModelOutput` instead of a
plain tuple.
""" """
...@@ -942,7 +943,7 @@ class LongformerModel(LongformerPreTrainedModel): ...@@ -942,7 +943,7 @@ class LongformerModel(LongformerPreTrainedModel):
inputs_embeds=None, inputs_embeds=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
): ):
r""" r"""
...@@ -953,7 +954,7 @@ class LongformerModel(LongformerPreTrainedModel): ...@@ -953,7 +954,7 @@ class LongformerModel(LongformerPreTrainedModel):
>>> import torch >>> import torch
>>> from transformers import LongformerModel, LongformerTokenizer >>> from transformers import LongformerModel, LongformerTokenizer
>>> model = LongformerModel.from_pretrained('allenai/longformer-base-4096') >>> model = LongformerModel.from_pretrained('allenai/longformer-base-4096', return_dict=True)
>>> tokenizer = LongformerTokenizer.from_pretrained('allenai/longformer-base-4096') >>> tokenizer = LongformerTokenizer.from_pretrained('allenai/longformer-base-4096')
>>> SAMPLE_TEXT = ' '.join(['Hello world! '] * 1000) # long input document >>> SAMPLE_TEXT = ' '.join(['Hello world! '] * 1000) # long input document
...@@ -965,14 +966,16 @@ class LongformerModel(LongformerPreTrainedModel): ...@@ -965,14 +966,16 @@ class LongformerModel(LongformerPreTrainedModel):
... # classification: the <s> token ... # classification: the <s> token
... # QA: question tokens ... # QA: question tokens
... # LM: potentially on the beginning of sentences and paragraphs ... # LM: potentially on the beginning of sentences and paragraphs
>>> sequence_output, pooled_output = model(input_ids, attention_mask=attention_mask) >>> outputs = model(input_ids, attention_mask=attention_mask)
>>> sequence_output = outputs.last_hidden_state
>>> pooled_output = outputs.pooler_output
""" """
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
output_hidden_states = ( output_hidden_states = (
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
) )
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
if input_ids is not None and inputs_embeds is not None: if input_ids is not None and inputs_embeds is not None:
raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time") raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
...@@ -1016,7 +1019,7 @@ class LongformerModel(LongformerPreTrainedModel): ...@@ -1016,7 +1019,7 @@ class LongformerModel(LongformerPreTrainedModel):
attention_mask=extended_attention_mask, attention_mask=extended_attention_mask,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
sequence_output = encoder_outputs[0] sequence_output = encoder_outputs[0]
pooled_output = self.pooler(sequence_output) pooled_output = self.pooler(sequence_output)
...@@ -1026,7 +1029,7 @@ class LongformerModel(LongformerPreTrainedModel): ...@@ -1026,7 +1029,7 @@ class LongformerModel(LongformerPreTrainedModel):
# unpad `sequence_output` because the calling function is expecting a length == input_ids.size(1) # unpad `sequence_output` because the calling function is expecting a length == input_ids.size(1)
sequence_output = sequence_output[:, :-padding_len] sequence_output = sequence_output[:, :-padding_len]
if return_tuple: if not return_dict:
return (sequence_output, pooled_output) + encoder_outputs[1:] return (sequence_output, pooled_output) + encoder_outputs[1:]
return BaseModelOutputWithPooling( return BaseModelOutputWithPooling(
...@@ -1063,7 +1066,7 @@ class LongformerForMaskedLM(LongformerPreTrainedModel): ...@@ -1063,7 +1066,7 @@ class LongformerForMaskedLM(LongformerPreTrainedModel):
labels=None, labels=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
**kwargs **kwargs
): ):
r""" r"""
...@@ -1082,7 +1085,7 @@ class LongformerForMaskedLM(LongformerPreTrainedModel): ...@@ -1082,7 +1085,7 @@ class LongformerForMaskedLM(LongformerPreTrainedModel):
>>> import torch >>> import torch
>>> from transformers import LongformerForMaskedLM, LongformerTokenizer >>> from transformers import LongformerForMaskedLM, LongformerTokenizer
>>> model = LongformerForMaskedLM.from_pretrained('allenai/longformer-base-4096') >>> model = LongformerForMaskedLM.from_pretrained('allenai/longformer-base-4096', return_dict=True)
>>> tokenizer = LongformerTokenizer.from_pretrained('allenai/longformer-base-4096') >>> tokenizer = LongformerTokenizer.from_pretrained('allenai/longformer-base-4096')
>>> SAMPLE_TEXT = ' '.join(['Hello world! '] * 1000) # long input document >>> SAMPLE_TEXT = ' '.join(['Hello world! '] * 1000) # long input document
...@@ -1102,7 +1105,7 @@ class LongformerForMaskedLM(LongformerPreTrainedModel): ...@@ -1102,7 +1105,7 @@ class LongformerForMaskedLM(LongformerPreTrainedModel):
) )
labels = kwargs.pop("masked_lm_labels") labels = kwargs.pop("masked_lm_labels")
assert kwargs == {}, f"Unexpected keyword arguments: {list(kwargs.keys())}." assert kwargs == {}, f"Unexpected keyword arguments: {list(kwargs.keys())}."
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
outputs = self.longformer( outputs = self.longformer(
input_ids, input_ids,
...@@ -1113,7 +1116,7 @@ class LongformerForMaskedLM(LongformerPreTrainedModel): ...@@ -1113,7 +1116,7 @@ class LongformerForMaskedLM(LongformerPreTrainedModel):
inputs_embeds=inputs_embeds, inputs_embeds=inputs_embeds,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
sequence_output = outputs[0] sequence_output = outputs[0]
prediction_scores = self.lm_head(sequence_output) prediction_scores = self.lm_head(sequence_output)
...@@ -1123,7 +1126,7 @@ class LongformerForMaskedLM(LongformerPreTrainedModel): ...@@ -1123,7 +1126,7 @@ class LongformerForMaskedLM(LongformerPreTrainedModel):
loss_fct = CrossEntropyLoss() loss_fct = CrossEntropyLoss()
masked_lm_loss = loss_fct(prediction_scores.view(-1, self.config.vocab_size), labels.view(-1)) masked_lm_loss = loss_fct(prediction_scores.view(-1, self.config.vocab_size), labels.view(-1))
if return_tuple: if not return_dict:
output = (prediction_scores,) + outputs[2:] output = (prediction_scores,) + outputs[2:]
return ((masked_lm_loss,) + output) if masked_lm_loss is not None else output return ((masked_lm_loss,) + output) if masked_lm_loss is not None else output
...@@ -1171,7 +1174,7 @@ class LongformerForSequenceClassification(BertPreTrainedModel): ...@@ -1171,7 +1174,7 @@ class LongformerForSequenceClassification(BertPreTrainedModel):
labels=None, labels=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
): ):
r""" r"""
labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`): labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`):
...@@ -1180,7 +1183,7 @@ class LongformerForSequenceClassification(BertPreTrainedModel): ...@@ -1180,7 +1183,7 @@ class LongformerForSequenceClassification(BertPreTrainedModel):
If :obj:`config.num_labels == 1` a regression loss is computed (Mean-Square loss), If :obj:`config.num_labels == 1` a regression loss is computed (Mean-Square loss),
If :obj:`config.num_labels > 1` a classification loss is computed (Cross-Entropy). If :obj:`config.num_labels > 1` a classification loss is computed (Cross-Entropy).
""" """
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
if global_attention_mask is None: if global_attention_mask is None:
logger.info("Initializing global attention on CLS token...") logger.info("Initializing global attention on CLS token...")
...@@ -1197,7 +1200,7 @@ class LongformerForSequenceClassification(BertPreTrainedModel): ...@@ -1197,7 +1200,7 @@ class LongformerForSequenceClassification(BertPreTrainedModel):
inputs_embeds=inputs_embeds, inputs_embeds=inputs_embeds,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
sequence_output = outputs[0] sequence_output = outputs[0]
logits = self.classifier(sequence_output) logits = self.classifier(sequence_output)
...@@ -1212,7 +1215,7 @@ class LongformerForSequenceClassification(BertPreTrainedModel): ...@@ -1212,7 +1215,7 @@ class LongformerForSequenceClassification(BertPreTrainedModel):
loss_fct = CrossEntropyLoss() loss_fct = CrossEntropyLoss()
loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1)) loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
if return_tuple: if not return_dict:
output = (logits,) + outputs[2:] output = (logits,) + outputs[2:]
return ((loss,) + output) if loss is not None else output return ((loss,) + output) if loss is not None else output
...@@ -1272,7 +1275,7 @@ class LongformerForQuestionAnswering(BertPreTrainedModel): ...@@ -1272,7 +1275,7 @@ class LongformerForQuestionAnswering(BertPreTrainedModel):
end_positions=None, end_positions=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
): ):
r""" r"""
start_positions (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`): start_positions (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`):
...@@ -1291,7 +1294,7 @@ class LongformerForQuestionAnswering(BertPreTrainedModel): ...@@ -1291,7 +1294,7 @@ class LongformerForQuestionAnswering(BertPreTrainedModel):
>>> import torch >>> import torch
>>> tokenizer = LongformerTokenizer.from_pretrained("allenai/longformer-large-4096-finetuned-triviaqa") >>> tokenizer = LongformerTokenizer.from_pretrained("allenai/longformer-large-4096-finetuned-triviaqa")
>>> model = LongformerForQuestionAnswering.from_pretrained("allenai/longformer-large-4096-finetuned-triviaqa") >>> model = LongformerForQuestionAnswering.from_pretrained("allenai/longformer-large-4096-finetuned-triviaqa", return_dict=True)
>>> question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet" >>> question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
>>> encoding = tokenizer(question, text, return_tensors="pt") >>> encoding = tokenizer(question, text, return_tensors="pt")
...@@ -1310,7 +1313,7 @@ class LongformerForQuestionAnswering(BertPreTrainedModel): ...@@ -1310,7 +1313,7 @@ class LongformerForQuestionAnswering(BertPreTrainedModel):
>>> answer = tokenizer.decode(tokenizer.convert_tokens_to_ids(answer_tokens)) # remove space prepending space token >>> answer = tokenizer.decode(tokenizer.convert_tokens_to_ids(answer_tokens)) # remove space prepending space token
""" """
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
# set global attention on question tokens # set global attention on question tokens
if global_attention_mask is None: if global_attention_mask is None:
...@@ -1327,7 +1330,7 @@ class LongformerForQuestionAnswering(BertPreTrainedModel): ...@@ -1327,7 +1330,7 @@ class LongformerForQuestionAnswering(BertPreTrainedModel):
inputs_embeds=inputs_embeds, inputs_embeds=inputs_embeds,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
sequence_output = outputs[0] sequence_output = outputs[0]
...@@ -1354,7 +1357,7 @@ class LongformerForQuestionAnswering(BertPreTrainedModel): ...@@ -1354,7 +1357,7 @@ class LongformerForQuestionAnswering(BertPreTrainedModel):
end_loss = loss_fct(end_logits, end_positions) end_loss = loss_fct(end_logits, end_positions)
total_loss = (start_loss + end_loss) / 2 total_loss = (start_loss + end_loss) / 2
if return_tuple: if not return_dict:
output = (start_logits, end_logits) + outputs[2:] output = (start_logits, end_logits) + outputs[2:]
return ((total_loss,) + output) if total_loss is not None else output return ((total_loss,) + output) if total_loss is not None else output
...@@ -1404,14 +1407,14 @@ class LongformerForTokenClassification(BertPreTrainedModel): ...@@ -1404,14 +1407,14 @@ class LongformerForTokenClassification(BertPreTrainedModel):
labels=None, labels=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
): ):
r""" r"""
labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`): labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`):
Labels for computing the token classification loss. Labels for computing the token classification loss.
Indices should be in ``[0, ..., config.num_labels - 1]``. Indices should be in ``[0, ..., config.num_labels - 1]``.
""" """
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
outputs = self.longformer( outputs = self.longformer(
input_ids, input_ids,
...@@ -1422,7 +1425,7 @@ class LongformerForTokenClassification(BertPreTrainedModel): ...@@ -1422,7 +1425,7 @@ class LongformerForTokenClassification(BertPreTrainedModel):
inputs_embeds=inputs_embeds, inputs_embeds=inputs_embeds,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
sequence_output = outputs[0] sequence_output = outputs[0]
...@@ -1444,7 +1447,7 @@ class LongformerForTokenClassification(BertPreTrainedModel): ...@@ -1444,7 +1447,7 @@ class LongformerForTokenClassification(BertPreTrainedModel):
else: else:
loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1)) loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
if return_tuple: if not return_dict:
output = (logits,) + outputs[2:] output = (logits,) + outputs[2:]
return ((loss,) + output) if loss is not None else output return ((loss,) + output) if loss is not None else output
...@@ -1489,7 +1492,7 @@ class LongformerForMultipleChoice(BertPreTrainedModel): ...@@ -1489,7 +1492,7 @@ class LongformerForMultipleChoice(BertPreTrainedModel):
inputs_embeds=None, inputs_embeds=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
): ):
r""" r"""
labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`): labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`):
...@@ -1498,7 +1501,7 @@ class LongformerForMultipleChoice(BertPreTrainedModel): ...@@ -1498,7 +1501,7 @@ class LongformerForMultipleChoice(BertPreTrainedModel):
of the input tensors. (see `input_ids` above) of the input tensors. (see `input_ids` above)
""" """
num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1] num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1]
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
# set global attention on question tokens # set global attention on question tokens
if global_attention_mask is None: if global_attention_mask is None:
...@@ -1536,7 +1539,7 @@ class LongformerForMultipleChoice(BertPreTrainedModel): ...@@ -1536,7 +1539,7 @@ class LongformerForMultipleChoice(BertPreTrainedModel):
inputs_embeds=flat_inputs_embeds, inputs_embeds=flat_inputs_embeds,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
pooled_output = outputs[1] pooled_output = outputs[1]
...@@ -1549,7 +1552,7 @@ class LongformerForMultipleChoice(BertPreTrainedModel): ...@@ -1549,7 +1552,7 @@ class LongformerForMultipleChoice(BertPreTrainedModel):
loss_fct = CrossEntropyLoss() loss_fct = CrossEntropyLoss()
loss = loss_fct(reshaped_logits, labels) loss = loss_fct(reshaped_logits, labels)
if return_tuple: if not return_dict:
output = (reshaped_logits,) + outputs[2:] output = (reshaped_logits,) + outputs[2:]
return ((loss,) + output) if loss is not None else output return ((loss,) + output) if loss is not None else output
......
...@@ -23,7 +23,7 @@ import torch.nn as nn ...@@ -23,7 +23,7 @@ import torch.nn as nn
from torch.nn import CrossEntropyLoss, MSELoss from torch.nn import CrossEntropyLoss, MSELoss
from .file_utils import add_start_docstrings, add_start_docstrings_to_callable, replace_return_docstrings from .file_utils import add_start_docstrings, add_start_docstrings_to_callable, replace_return_docstrings
from .modeling_outputs import BaseModelOutputWithPooling from .modeling_outputs import BaseModelOutputWithPooling, SequenceClassifierOutput
from .modeling_utils import ModuleUtilsMixin from .modeling_utils import ModuleUtilsMixin
...@@ -148,8 +148,9 @@ MMBT_INPUTS_DOCSTRING = r""" Inputs: ...@@ -148,8 +148,9 @@ MMBT_INPUTS_DOCSTRING = r""" Inputs:
If set to ``True``, the attentions tensors of all attention layers are returned. See ``attentions`` under returned tensors for more detail. If set to ``True``, the attentions tensors of all attention layers are returned. See ``attentions`` under returned tensors for more detail.
output_hidden_states (:obj:`bool`, `optional`, defaults to :obj:`None`): output_hidden_states (:obj:`bool`, `optional`, defaults to :obj:`None`):
If set to ``True``, the hidden states of all layers are returned. See ``hidden_states`` under returned tensors for more detail. If set to ``True``, the hidden states of all layers are returned. See ``hidden_states`` under returned tensors for more detail.
return_tuple (:obj:`bool`, `optional`, defaults to :obj:`None`): return_dict (:obj:`bool`, `optional`, defaults to :obj:`None`):
If set to ``True``, the output of the model will be a plain tuple instead of a ``dataclass``. If set to ``True``, the model will return a :class:`~transformers.file_utils.ModelOutput` instead of a
plain tuple.
""" """
...@@ -182,7 +183,7 @@ class MMBTModel(nn.Module, ModuleUtilsMixin): ...@@ -182,7 +183,7 @@ class MMBTModel(nn.Module, ModuleUtilsMixin):
encoder_attention_mask=None, encoder_attention_mask=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
): ):
r""" r"""
Returns: Returns:
...@@ -198,7 +199,7 @@ class MMBTModel(nn.Module, ModuleUtilsMixin): ...@@ -198,7 +199,7 @@ class MMBTModel(nn.Module, ModuleUtilsMixin):
output_hidden_states = ( output_hidden_states = (
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
) )
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
if input_ids is not None and inputs_embeds is not None: if input_ids is not None and inputs_embeds is not None:
raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time") raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
...@@ -257,13 +258,13 @@ class MMBTModel(nn.Module, ModuleUtilsMixin): ...@@ -257,13 +258,13 @@ class MMBTModel(nn.Module, ModuleUtilsMixin):
encoder_attention_mask=encoder_extended_attention_mask, encoder_attention_mask=encoder_extended_attention_mask,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
sequence_output = encoder_outputs[0] sequence_output = encoder_outputs[0]
pooled_output = self.transformer.pooler(sequence_output) pooled_output = self.transformer.pooler(sequence_output)
if return_tuple: if not return_dict:
return (sequence_output, pooled_output) + encoder_outputs[1:] return (sequence_output, pooled_output) + encoder_outputs[1:]
return BaseModelOutputWithPooling( return BaseModelOutputWithPooling(
...@@ -339,7 +340,9 @@ class MMBTForClassification(nn.Module): ...@@ -339,7 +340,9 @@ class MMBTForClassification(nn.Module):
head_mask=None, head_mask=None,
inputs_embeds=None, inputs_embeds=None,
labels=None, labels=None,
return_dict=None,
): ):
return_dict = return_dict if return_dict is not None else self.config.use_return_dict
outputs = self.mmbt( outputs = self.mmbt(
input_modal=input_modal, input_modal=input_modal,
...@@ -353,6 +356,7 @@ class MMBTForClassification(nn.Module): ...@@ -353,6 +356,7 @@ class MMBTForClassification(nn.Module):
modal_position_ids=modal_position_ids, modal_position_ids=modal_position_ids,
head_mask=head_mask, head_mask=head_mask,
inputs_embeds=inputs_embeds, inputs_embeds=inputs_embeds,
return_dict=return_dict,
) )
pooled_output = outputs[1] pooled_output = outputs[1]
...@@ -360,8 +364,7 @@ class MMBTForClassification(nn.Module): ...@@ -360,8 +364,7 @@ class MMBTForClassification(nn.Module):
pooled_output = self.dropout(pooled_output) pooled_output = self.dropout(pooled_output)
logits = self.classifier(pooled_output) logits = self.classifier(pooled_output)
outputs = (logits,) + outputs[2:] # add hidden states and attention if they are here loss = None
if labels is not None: if labels is not None:
if self.num_labels == 1: if self.num_labels == 1:
# We are doing regression # We are doing regression
...@@ -370,6 +373,11 @@ class MMBTForClassification(nn.Module): ...@@ -370,6 +373,11 @@ class MMBTForClassification(nn.Module):
else: else:
loss_fct = CrossEntropyLoss() loss_fct = CrossEntropyLoss()
loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1)) loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
outputs = (loss,) + outputs
return outputs # (loss), logits, (hidden_states), (attentions) if not return_dict:
output = (logits,) + outputs[2:]
return ((loss,) + output) if loss is not None else output
return SequenceClassifierOutput(
loss=loss, logits=logits, hidden_states=outputs.hidden_states, attentions=outputs.attentions,
)
...@@ -550,7 +550,7 @@ class MobileBertEncoder(nn.Module): ...@@ -550,7 +550,7 @@ class MobileBertEncoder(nn.Module):
encoder_attention_mask=None, encoder_attention_mask=None,
output_attentions=False, output_attentions=False,
output_hidden_states=False, output_hidden_states=False,
return_tuple=False, return_dict=False,
): ):
all_hidden_states = () if output_hidden_states else None all_hidden_states = () if output_hidden_states else None
all_attentions = () if output_attentions else None all_attentions = () if output_attentions else None
...@@ -575,7 +575,7 @@ class MobileBertEncoder(nn.Module): ...@@ -575,7 +575,7 @@ class MobileBertEncoder(nn.Module):
if output_hidden_states: if output_hidden_states:
all_hidden_states = all_hidden_states + (hidden_states,) all_hidden_states = all_hidden_states + (hidden_states,)
if return_tuple: if not return_dict:
return tuple(v for v in [hidden_states, all_hidden_states, all_attentions] if v is not None) return tuple(v for v in [hidden_states, all_hidden_states, all_attentions] if v is not None)
return BaseModelOutput( return BaseModelOutput(
last_hidden_state=hidden_states, hidden_states=all_hidden_states, attentions=all_attentions last_hidden_state=hidden_states, hidden_states=all_hidden_states, attentions=all_attentions
...@@ -708,9 +708,9 @@ class MobileBertForPretrainingOutput(ModelOutput): ...@@ -708,9 +708,9 @@ class MobileBertForPretrainingOutput(ModelOutput):
heads. heads.
""" """
loss: Optional[torch.FloatTensor] loss: Optional[torch.FloatTensor] = None
prediction_logits: torch.FloatTensor prediction_logits: torch.FloatTensor = None
seq_relationship_logits: torch.FloatTensor seq_relationship_logits: torch.FloatTensor = None
hidden_states: Optional[Tuple[torch.FloatTensor]] = None hidden_states: Optional[Tuple[torch.FloatTensor]] = None
attentions: Optional[Tuple[torch.FloatTensor]] = None attentions: Optional[Tuple[torch.FloatTensor]] = None
...@@ -773,8 +773,9 @@ MOBILEBERT_INPUTS_DOCSTRING = r""" ...@@ -773,8 +773,9 @@ MOBILEBERT_INPUTS_DOCSTRING = r"""
If set to ``True``, the attentions tensors of all attention layers are returned. See ``attentions`` under returned tensors for more detail. If set to ``True``, the attentions tensors of all attention layers are returned. See ``attentions`` under returned tensors for more detail.
output_hidden_states (:obj:`bool`, `optional`, defaults to :obj:`None`): output_hidden_states (:obj:`bool`, `optional`, defaults to :obj:`None`):
If set to ``True``, the hidden states of all layers are returned. See ``hidden_states`` under returned tensors for more detail. If set to ``True``, the hidden states of all layers are returned. See ``hidden_states`` under returned tensors for more detail.
return_tuple (:obj:`bool`, `optional`, defaults to :obj:`None`): return_dict (:obj:`bool`, `optional`, defaults to :obj:`None`):
If set to ``True``, the output of the model will be a plain tuple instead of a ``dataclass``. If set to ``True``, the model will return a :class:`~transformers.file_utils.ModelOutput` instead of a
plain tuple.
""" """
...@@ -831,13 +832,13 @@ class MobileBertModel(MobileBertPreTrainedModel): ...@@ -831,13 +832,13 @@ class MobileBertModel(MobileBertPreTrainedModel):
encoder_attention_mask=None, encoder_attention_mask=None,
output_hidden_states=None, output_hidden_states=None,
output_attentions=None, output_attentions=None,
return_tuple=None, return_dict=None,
): ):
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
output_hidden_states = ( output_hidden_states = (
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
) )
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
if input_ids is not None and inputs_embeds is not None: if input_ids is not None and inputs_embeds is not None:
raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time") raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
...@@ -890,12 +891,12 @@ class MobileBertModel(MobileBertPreTrainedModel): ...@@ -890,12 +891,12 @@ class MobileBertModel(MobileBertPreTrainedModel):
encoder_attention_mask=encoder_extended_attention_mask, encoder_attention_mask=encoder_extended_attention_mask,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
sequence_output = encoder_outputs[0] sequence_output = encoder_outputs[0]
pooled_output = self.pooler(sequence_output) pooled_output = self.pooler(sequence_output)
if return_tuple: if not return_dict:
return (sequence_output, pooled_output) + encoder_outputs[1:] return (sequence_output, pooled_output) + encoder_outputs[1:]
return BaseModelOutputWithPooling( return BaseModelOutputWithPooling(
...@@ -958,7 +959,7 @@ class MobileBertForPreTraining(MobileBertPreTrainedModel): ...@@ -958,7 +959,7 @@ class MobileBertForPreTraining(MobileBertPreTrainedModel):
next_sentence_label=None, next_sentence_label=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
): ):
r""" r"""
labels (``torch.LongTensor`` of shape ``(batch_size, sequence_length)``, `optional`, defaults to :obj:`None`): labels (``torch.LongTensor`` of shape ``(batch_size, sequence_length)``, `optional`, defaults to :obj:`None`):
...@@ -979,7 +980,7 @@ class MobileBertForPreTraining(MobileBertPreTrainedModel): ...@@ -979,7 +980,7 @@ class MobileBertForPreTraining(MobileBertPreTrainedModel):
>>> import torch >>> import torch
>>> tokenizer = MobileBertTokenizer.from_pretrained("google/mobilebert-uncased") >>> tokenizer = MobileBertTokenizer.from_pretrained("google/mobilebert-uncased")
>>> model = MobileBertForPreTraining.from_pretrained("google/mobilebert-uncased") >>> model = MobileBertForPreTraining.from_pretrained("google/mobilebert-uncased", return_dict=True)
>>> input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0) # Batch size 1 >>> input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0) # Batch size 1
>>> outputs = model(input_ids) >>> outputs = model(input_ids)
...@@ -988,7 +989,7 @@ class MobileBertForPreTraining(MobileBertPreTrainedModel): ...@@ -988,7 +989,7 @@ class MobileBertForPreTraining(MobileBertPreTrainedModel):
>>> seq_relationship_logits = outputs.seq_relationship_logits >>> seq_relationship_logits = outputs.seq_relationship_logits
""" """
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
outputs = self.mobilebert( outputs = self.mobilebert(
input_ids, input_ids,
...@@ -999,7 +1000,7 @@ class MobileBertForPreTraining(MobileBertPreTrainedModel): ...@@ -999,7 +1000,7 @@ class MobileBertForPreTraining(MobileBertPreTrainedModel):
inputs_embeds=inputs_embeds, inputs_embeds=inputs_embeds,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
sequence_output, pooled_output = outputs[:2] sequence_output, pooled_output = outputs[:2]
prediction_scores, seq_relationship_score = self.cls(sequence_output, pooled_output) prediction_scores, seq_relationship_score = self.cls(sequence_output, pooled_output)
...@@ -1011,7 +1012,7 @@ class MobileBertForPreTraining(MobileBertPreTrainedModel): ...@@ -1011,7 +1012,7 @@ class MobileBertForPreTraining(MobileBertPreTrainedModel):
next_sentence_loss = loss_fct(seq_relationship_score.view(-1, 2), next_sentence_label.view(-1)) next_sentence_loss = loss_fct(seq_relationship_score.view(-1, 2), next_sentence_label.view(-1))
total_loss = masked_lm_loss + next_sentence_loss total_loss = masked_lm_loss + next_sentence_loss
if return_tuple: if not return_dict:
output = (prediction_scores, seq_relationship_score) + outputs[2:] output = (prediction_scores, seq_relationship_score) + outputs[2:]
return ((total_loss,) + output) if total_loss is not None else output return ((total_loss,) + output) if total_loss is not None else output
...@@ -1079,7 +1080,7 @@ class MobileBertForMaskedLM(MobileBertPreTrainedModel): ...@@ -1079,7 +1080,7 @@ class MobileBertForMaskedLM(MobileBertPreTrainedModel):
encoder_attention_mask=None, encoder_attention_mask=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
**kwargs **kwargs
): ):
r""" r"""
...@@ -1097,7 +1098,7 @@ class MobileBertForMaskedLM(MobileBertPreTrainedModel): ...@@ -1097,7 +1098,7 @@ class MobileBertForMaskedLM(MobileBertPreTrainedModel):
FutureWarning, FutureWarning,
) )
labels = kwargs.pop("masked_lm_labels") labels = kwargs.pop("masked_lm_labels")
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
outputs = self.mobilebert( outputs = self.mobilebert(
input_ids, input_ids,
...@@ -1110,7 +1111,7 @@ class MobileBertForMaskedLM(MobileBertPreTrainedModel): ...@@ -1110,7 +1111,7 @@ class MobileBertForMaskedLM(MobileBertPreTrainedModel):
encoder_attention_mask=encoder_attention_mask, encoder_attention_mask=encoder_attention_mask,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
sequence_output = outputs[0] sequence_output = outputs[0]
...@@ -1121,7 +1122,7 @@ class MobileBertForMaskedLM(MobileBertPreTrainedModel): ...@@ -1121,7 +1122,7 @@ class MobileBertForMaskedLM(MobileBertPreTrainedModel):
loss_fct = CrossEntropyLoss() # -100 index = padding token loss_fct = CrossEntropyLoss() # -100 index = padding token
masked_lm_loss = loss_fct(prediction_scores.view(-1, self.config.vocab_size), labels.view(-1)) masked_lm_loss = loss_fct(prediction_scores.view(-1, self.config.vocab_size), labels.view(-1))
if return_tuple: if not return_dict:
output = (prediction_scores,) + outputs[2:] output = (prediction_scores,) + outputs[2:]
return ((masked_lm_loss,) + output) if masked_lm_loss is not None else output return ((masked_lm_loss,) + output) if masked_lm_loss is not None else output
...@@ -1169,7 +1170,7 @@ class MobileBertForNextSentencePrediction(MobileBertPreTrainedModel): ...@@ -1169,7 +1170,7 @@ class MobileBertForNextSentencePrediction(MobileBertPreTrainedModel):
next_sentence_label=None, next_sentence_label=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
): ):
r""" r"""
next_sentence_label (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`): next_sentence_label (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`):
...@@ -1186,7 +1187,7 @@ class MobileBertForNextSentencePrediction(MobileBertPreTrainedModel): ...@@ -1186,7 +1187,7 @@ class MobileBertForNextSentencePrediction(MobileBertPreTrainedModel):
>>> import torch >>> import torch
>>> tokenizer = MobileBertTokenizer.from_pretrained('google/mobilebert-uncased') >>> tokenizer = MobileBertTokenizer.from_pretrained('google/mobilebert-uncased')
>>> model = MobileBertForNextSentencePrediction.from_pretrained('google/mobilebert-uncased') >>> model = MobileBertForNextSentencePrediction.from_pretrained('google/mobilebert-uncased', return_dict=True)
>>> prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced." >>> prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced."
>>> next_sentence = "The sky is blue due to the shorter wavelength of blue light." >>> next_sentence = "The sky is blue due to the shorter wavelength of blue light."
...@@ -1196,7 +1197,7 @@ class MobileBertForNextSentencePrediction(MobileBertPreTrainedModel): ...@@ -1196,7 +1197,7 @@ class MobileBertForNextSentencePrediction(MobileBertPreTrainedModel):
>>> loss = outputs.loss >>> loss = outputs.loss
>>> logits = outputs.logits >>> logits = outputs.logits
""" """
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
outputs = self.mobilebert( outputs = self.mobilebert(
input_ids, input_ids,
...@@ -1207,7 +1208,7 @@ class MobileBertForNextSentencePrediction(MobileBertPreTrainedModel): ...@@ -1207,7 +1208,7 @@ class MobileBertForNextSentencePrediction(MobileBertPreTrainedModel):
inputs_embeds=inputs_embeds, inputs_embeds=inputs_embeds,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
pooled_output = outputs[1] pooled_output = outputs[1]
...@@ -1218,7 +1219,7 @@ class MobileBertForNextSentencePrediction(MobileBertPreTrainedModel): ...@@ -1218,7 +1219,7 @@ class MobileBertForNextSentencePrediction(MobileBertPreTrainedModel):
loss_fct = CrossEntropyLoss() loss_fct = CrossEntropyLoss()
next_sentence_loss = loss_fct(seq_relationship_score.view(-1, 2), next_sentence_label.view(-1)) next_sentence_loss = loss_fct(seq_relationship_score.view(-1, 2), next_sentence_label.view(-1))
if return_tuple: if not return_dict:
output = (seq_relationship_score,) + outputs[2:] output = (seq_relationship_score,) + outputs[2:]
return ((next_sentence_loss,) + output) if next_sentence_loss is not None else output return ((next_sentence_loss,) + output) if next_sentence_loss is not None else output
...@@ -1263,7 +1264,7 @@ class MobileBertForSequenceClassification(MobileBertPreTrainedModel): ...@@ -1263,7 +1264,7 @@ class MobileBertForSequenceClassification(MobileBertPreTrainedModel):
labels=None, labels=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
): ):
r""" r"""
labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`): labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`):
...@@ -1272,7 +1273,7 @@ class MobileBertForSequenceClassification(MobileBertPreTrainedModel): ...@@ -1272,7 +1273,7 @@ class MobileBertForSequenceClassification(MobileBertPreTrainedModel):
If :obj:`config.num_labels == 1` a regression loss is computed (Mean-Square loss), If :obj:`config.num_labels == 1` a regression loss is computed (Mean-Square loss),
If :obj:`config.num_labels > 1` a classification loss is computed (Cross-Entropy). If :obj:`config.num_labels > 1` a classification loss is computed (Cross-Entropy).
""" """
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
outputs = self.mobilebert( outputs = self.mobilebert(
input_ids, input_ids,
...@@ -1283,7 +1284,7 @@ class MobileBertForSequenceClassification(MobileBertPreTrainedModel): ...@@ -1283,7 +1284,7 @@ class MobileBertForSequenceClassification(MobileBertPreTrainedModel):
inputs_embeds=inputs_embeds, inputs_embeds=inputs_embeds,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
pooled_output = outputs[1] pooled_output = outputs[1]
pooled_output = self.dropout(pooled_output) pooled_output = self.dropout(pooled_output)
...@@ -1299,7 +1300,7 @@ class MobileBertForSequenceClassification(MobileBertPreTrainedModel): ...@@ -1299,7 +1300,7 @@ class MobileBertForSequenceClassification(MobileBertPreTrainedModel):
loss_fct = CrossEntropyLoss() loss_fct = CrossEntropyLoss()
loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1)) loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
if return_tuple: if not return_dict:
output = (logits,) + outputs[2:] output = (logits,) + outputs[2:]
return ((loss,) + output) if loss is not None else output return ((loss,) + output) if loss is not None else output
...@@ -1342,7 +1343,7 @@ class MobileBertForQuestionAnswering(MobileBertPreTrainedModel): ...@@ -1342,7 +1343,7 @@ class MobileBertForQuestionAnswering(MobileBertPreTrainedModel):
end_positions=None, end_positions=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
): ):
r""" r"""
start_positions (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`): start_positions (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`):
...@@ -1354,7 +1355,7 @@ class MobileBertForQuestionAnswering(MobileBertPreTrainedModel): ...@@ -1354,7 +1355,7 @@ class MobileBertForQuestionAnswering(MobileBertPreTrainedModel):
Positions are clamped to the length of the sequence (`sequence_length`). Positions are clamped to the length of the sequence (`sequence_length`).
Position outside of the sequence are not taken into account for computing the loss. Position outside of the sequence are not taken into account for computing the loss.
""" """
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
outputs = self.mobilebert( outputs = self.mobilebert(
input_ids, input_ids,
...@@ -1365,7 +1366,7 @@ class MobileBertForQuestionAnswering(MobileBertPreTrainedModel): ...@@ -1365,7 +1366,7 @@ class MobileBertForQuestionAnswering(MobileBertPreTrainedModel):
inputs_embeds=inputs_embeds, inputs_embeds=inputs_embeds,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
sequence_output = outputs[0] sequence_output = outputs[0]
...@@ -1392,7 +1393,7 @@ class MobileBertForQuestionAnswering(MobileBertPreTrainedModel): ...@@ -1392,7 +1393,7 @@ class MobileBertForQuestionAnswering(MobileBertPreTrainedModel):
end_loss = loss_fct(end_logits, end_positions) end_loss = loss_fct(end_logits, end_positions)
total_loss = (start_loss + end_loss) / 2 total_loss = (start_loss + end_loss) / 2
if return_tuple: if not return_dict:
output = (start_logits, end_logits) + outputs[2:] output = (start_logits, end_logits) + outputs[2:]
return ((total_loss,) + output) if total_loss is not None else output return ((total_loss,) + output) if total_loss is not None else output
...@@ -1438,7 +1439,7 @@ class MobileBertForMultipleChoice(MobileBertPreTrainedModel): ...@@ -1438,7 +1439,7 @@ class MobileBertForMultipleChoice(MobileBertPreTrainedModel):
labels=None, labels=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
): ):
r""" r"""
labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`): labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`):
...@@ -1446,7 +1447,7 @@ class MobileBertForMultipleChoice(MobileBertPreTrainedModel): ...@@ -1446,7 +1447,7 @@ class MobileBertForMultipleChoice(MobileBertPreTrainedModel):
Indices should be in ``[0, ..., num_choices-1]`` where `num_choices` is the size of the second dimension Indices should be in ``[0, ..., num_choices-1]`` where `num_choices` is the size of the second dimension
of the input tensors. (see `input_ids` above) of the input tensors. (see `input_ids` above)
""" """
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1] num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1]
input_ids = input_ids.view(-1, input_ids.size(-1)) if input_ids is not None else None input_ids = input_ids.view(-1, input_ids.size(-1)) if input_ids is not None else None
...@@ -1468,7 +1469,7 @@ class MobileBertForMultipleChoice(MobileBertPreTrainedModel): ...@@ -1468,7 +1469,7 @@ class MobileBertForMultipleChoice(MobileBertPreTrainedModel):
inputs_embeds=inputs_embeds, inputs_embeds=inputs_embeds,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
pooled_output = outputs[1] pooled_output = outputs[1]
...@@ -1482,7 +1483,7 @@ class MobileBertForMultipleChoice(MobileBertPreTrainedModel): ...@@ -1482,7 +1483,7 @@ class MobileBertForMultipleChoice(MobileBertPreTrainedModel):
loss_fct = CrossEntropyLoss() loss_fct = CrossEntropyLoss()
loss = loss_fct(reshaped_logits, labels) loss = loss_fct(reshaped_logits, labels)
if return_tuple: if not return_dict:
output = (reshaped_logits,) + outputs[2:] output = (reshaped_logits,) + outputs[2:]
return ((loss,) + output) if loss is not None else output return ((loss,) + output) if loss is not None else output
...@@ -1525,14 +1526,14 @@ class MobileBertForTokenClassification(MobileBertPreTrainedModel): ...@@ -1525,14 +1526,14 @@ class MobileBertForTokenClassification(MobileBertPreTrainedModel):
labels=None, labels=None,
output_attentions=None, output_attentions=None,
output_hidden_states=None, output_hidden_states=None,
return_tuple=None, return_dict=None,
): ):
r""" r"""
labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`): labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`):
Labels for computing the token classification loss. Labels for computing the token classification loss.
Indices should be in ``[0, ..., config.num_labels - 1]``. Indices should be in ``[0, ..., config.num_labels - 1]``.
""" """
return_tuple = return_tuple if return_tuple is not None else self.config.use_return_tuple return_dict = return_dict if return_dict is not None else self.config.use_return_dict
outputs = self.mobilebert( outputs = self.mobilebert(
input_ids, input_ids,
...@@ -1543,7 +1544,7 @@ class MobileBertForTokenClassification(MobileBertPreTrainedModel): ...@@ -1543,7 +1544,7 @@ class MobileBertForTokenClassification(MobileBertPreTrainedModel):
inputs_embeds=inputs_embeds, inputs_embeds=inputs_embeds,
output_attentions=output_attentions, output_attentions=output_attentions,
output_hidden_states=output_hidden_states, output_hidden_states=output_hidden_states,
return_tuple=return_tuple, return_dict=return_dict,
) )
sequence_output = outputs[0] sequence_output = outputs[0]
...@@ -1565,7 +1566,7 @@ class MobileBertForTokenClassification(MobileBertPreTrainedModel): ...@@ -1565,7 +1566,7 @@ class MobileBertForTokenClassification(MobileBertPreTrainedModel):
else: else:
loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1)) loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
if return_tuple: if not return_dict:
output = (logits,) + outputs[2:] output = (logits,) + outputs[2:]
return ((loss,) + output) if loss is not None else output return ((loss,) + output) if loss is not None else output
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment