Commit 00df3d4d authored by Lysandre's avatar Lysandre Committed by Lysandre Debut
Browse files

ALBERT Modeling + required changes to utilities

parent f81b6c95
ALBERT
----------------------------------------------------
``AlbertConfig``
Overview
~~~~~~~~~~~~~~~~~~~~~
The ALBERT model was proposed in `ALBERT: A Lite BERT for Self-supervised Learning of Language Representations <https://arxiv.org/abs/1909.11942>`_
by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut. It presents
two parameter-reduction techniques to lower memory consumption and increase the trainig speed of BERT:
- Splitting the embedding matrix into two smaller matrices
- Using repeating layers split among groups
The abstract from the paper is the following:
*Increasing model size when pretraining natural language representations often results in improved performance on
downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations,
longer training times, and unexpected model degradation. To address these problems, we present two parameter-reduction
techniques to lower memory consumption and increase the training speed of BERT. Comprehensive empirical evidence shows
that our proposed methods lead to models that scale much better compared to the original BERT. We also use a
self-supervised loss that focuses on modeling inter-sentence coherence, and show it consistently helps downstream
tasks with multi-sentence inputs. As a result, our best model establishes new state-of-the-art results on the GLUE,
RACE, and SQuAD benchmarks while having fewer parameters compared to BERT-large.*
Tips:
- ALBERT is a model with absolute position embeddings so it's usually advised to pad the inputs on
the right rather than the left.
- ALBERT uses repeating layers which results in a small memory footprint, however the computational cost remains
similar to a BERT-like architecture with the same number of hidden layers as it has to iterate through the same
number of (repeating) layers.
AlbertConfig
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertConfig
:members:
``AlbertTokenizer``
AlbertTokenizer
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertTokenizer
:members:
``AlbertModel``
AlbertModel
~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertModel
:members:
``AlbertForMaskedLM``
AlbertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertForMaskedLM
:members:
``AlbertForSequenceClassification``
AlbertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertForSequenceClassification
:members:
``AlbertForQuestionAnswering``
AlbertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertForQuestionAnswering
:members:
``TFAlbertModel``
TFAlbertModel
~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAlbertModel
:members:
``TFAlbertForMaskedLM``
TFAlbertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAlbertForMaskedLM
:members:
``TFAlbertForSequenceClassification``
TFAlbertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAlbertForSequenceClassification
......
......@@ -105,7 +105,25 @@ def is_tf_available():
def add_start_docstrings(*docstr):
def docstring_decorator(fn):
fn.__doc__ = "".join(docstr) + fn.__doc__
fn.__doc__ = "".join(docstr) + (fn.__doc__ if fn.__doc__ is not None else "")
return fn
return docstring_decorator
def add_start_docstrings_to_callable(*docstr):
def docstring_decorator(fn):
class_name = ":class:`~transformers.{}`".format(fn.__qualname__.split(".")[0])
intro = " The {} forward method, overrides the :func:`__call__` special method.".format(class_name)
note = r"""
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:`Module` instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
"""
fn.__doc__ = intro + note + "".join(docstr) + (fn.__doc__ if fn.__doc__ is not None else "")
return fn
return docstring_decorator
......
This diff is collapsed.
......@@ -114,7 +114,12 @@ class PreTrainedModel(nn.Module, ModuleUtilsMixin):
return getattr(self, self.base_model_prefix, self)
def get_input_embeddings(self):
""" Get model's input embeddings
"""
Returns the model's input embeddings.
Returns:
:obj:`nn.Module`:
A torch module mapping vocabulary to hidden states.
"""
base_model = getattr(self, self.base_model_prefix, self)
if base_model is not self:
......@@ -123,7 +128,12 @@ class PreTrainedModel(nn.Module, ModuleUtilsMixin):
raise NotImplementedError
def set_input_embeddings(self, value):
""" Set model's input embeddings
"""
Set model's input embeddings
Args:
value (:obj:`nn.Module`):
A module mapping vocabulary to hidden states.
"""
base_model = getattr(self, self.base_model_prefix, self)
if base_model is not self:
......@@ -132,14 +142,20 @@ class PreTrainedModel(nn.Module, ModuleUtilsMixin):
raise NotImplementedError
def get_output_embeddings(self):
""" Get model's output embeddings
Return None if the model doesn't have output embeddings
"""
Returns the model's output embeddings.
Returns:
:obj:`nn.Module`:
A torch module mapping hidden states to vocabulary.
"""
return None # Overwrite for models with output embeddings
def tie_weights(self):
""" Make sure we are sharing the input and output embeddings.
Export to TorchScript can't handle parameter sharing so we are cloning them instead.
"""
Tie the weights between the input embeddings and the output embeddings.
If the `torchscript` flag is set in the configuration, can't handle parameter sharing so we are cloning
the weights instead.
"""
output_embeddings = self.get_output_embeddings()
if output_embeddings is not None:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment