Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
00df3d4d
Commit
00df3d4d
authored
Jan 15, 2020
by
Lysandre
Committed by
Lysandre Debut
Jan 23, 2020
Browse files
ALBERT Modeling + required changes to utilities
parent
f81b6c95
Changes
4
Expand all
Hide whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
259 additions
and
166 deletions
+259
-166
docs/source/model_doc/albert.rst
docs/source/model_doc/albert.rst
+38
-9
src/transformers/file_utils.py
src/transformers/file_utils.py
+19
-1
src/transformers/modeling_albert.py
src/transformers/modeling_albert.py
+180
-150
src/transformers/modeling_utils.py
src/transformers/modeling_utils.py
+22
-6
No files found.
docs/source/model_doc/albert.rst
View file @
00df3d4d
ALBERT
ALBERT
----------------------------------------------------
----------------------------------------------------
``AlbertConfig``
Overview
~~~~~~~~~~~~~~~~~~~~~
The ALBERT model was proposed in `ALBERT: A Lite BERT for Self-supervised Learning of Language Representations <https://arxiv.org/abs/1909.11942>`_
by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut. It presents
two parameter-reduction techniques to lower memory consumption and increase the trainig speed of BERT:
- Splitting the embedding matrix into two smaller matrices
- Using repeating layers split among groups
The abstract from the paper is the following:
*Increasing model size when pretraining natural language representations often results in improved performance on
downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations,
longer training times, and unexpected model degradation. To address these problems, we present two parameter-reduction
techniques to lower memory consumption and increase the training speed of BERT. Comprehensive empirical evidence shows
that our proposed methods lead to models that scale much better compared to the original BERT. We also use a
self-supervised loss that focuses on modeling inter-sentence coherence, and show it consistently helps downstream
tasks with multi-sentence inputs. As a result, our best model establishes new state-of-the-art results on the GLUE,
RACE, and SQuAD benchmarks while having fewer parameters compared to BERT-large.*
Tips:
- ALBERT is a model with absolute position embeddings so it's usually advised to pad the inputs on
the right rather than the left.
- ALBERT uses repeating layers which results in a small memory footprint, however the computational cost remains
similar to a BERT-like architecture with the same number of hidden layers as it has to iterate through the same
number of (repeating) layers.
AlbertConfig
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertConfig
.. autoclass:: transformers.AlbertConfig
:members:
:members:
``
AlbertTokenizer
``
AlbertTokenizer
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertTokenizer
.. autoclass:: transformers.AlbertTokenizer
:members:
:members:
``
AlbertModel
``
AlbertModel
~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertModel
.. autoclass:: transformers.AlbertModel
:members:
:members:
``
AlbertForMaskedLM
``
AlbertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertForMaskedLM
.. autoclass:: transformers.AlbertForMaskedLM
:members:
:members:
``
AlbertForSequenceClassification
``
AlbertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertForSequenceClassification
.. autoclass:: transformers.AlbertForSequenceClassification
:members:
:members:
``
AlbertForQuestionAnswering
``
AlbertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertForQuestionAnswering
.. autoclass:: transformers.AlbertForQuestionAnswering
:members:
:members:
``
TFAlbertModel
``
TFAlbertModel
~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAlbertModel
.. autoclass:: transformers.TFAlbertModel
:members:
:members:
``
TFAlbertForMaskedLM
``
TFAlbertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAlbertForMaskedLM
.. autoclass:: transformers.TFAlbertForMaskedLM
:members:
:members:
``
TFAlbertForSequenceClassification
``
TFAlbertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAlbertForSequenceClassification
.. autoclass:: transformers.TFAlbertForSequenceClassification
...
...
src/transformers/file_utils.py
View file @
00df3d4d
...
@@ -105,7 +105,25 @@ def is_tf_available():
...
@@ -105,7 +105,25 @@ def is_tf_available():
def
add_start_docstrings
(
*
docstr
):
def
add_start_docstrings
(
*
docstr
):
def
docstring_decorator
(
fn
):
def
docstring_decorator
(
fn
):
fn
.
__doc__
=
""
.
join
(
docstr
)
+
fn
.
__doc__
fn
.
__doc__
=
""
.
join
(
docstr
)
+
(
fn
.
__doc__
if
fn
.
__doc__
is
not
None
else
""
)
return
fn
return
docstring_decorator
def
add_start_docstrings_to_callable
(
*
docstr
):
def
docstring_decorator
(
fn
):
class_name
=
":class:`~transformers.{}`"
.
format
(
fn
.
__qualname__
.
split
(
"."
)[
0
])
intro
=
" The {} forward method, overrides the :func:`__call__` special method."
.
format
(
class_name
)
note
=
r
"""
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:`Module` instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
"""
fn
.
__doc__
=
intro
+
note
+
""
.
join
(
docstr
)
+
(
fn
.
__doc__
if
fn
.
__doc__
is
not
None
else
""
)
return
fn
return
fn
return
docstring_decorator
return
docstring_decorator
...
...
src/transformers/modeling_albert.py
View file @
00df3d4d
This diff is collapsed.
Click to expand it.
src/transformers/modeling_utils.py
View file @
00df3d4d
...
@@ -114,7 +114,12 @@ class PreTrainedModel(nn.Module, ModuleUtilsMixin):
...
@@ -114,7 +114,12 @@ class PreTrainedModel(nn.Module, ModuleUtilsMixin):
return
getattr
(
self
,
self
.
base_model_prefix
,
self
)
return
getattr
(
self
,
self
.
base_model_prefix
,
self
)
def
get_input_embeddings
(
self
):
def
get_input_embeddings
(
self
):
""" Get model's input embeddings
"""
Returns the model's input embeddings.
Returns:
:obj:`nn.Module`:
A torch module mapping vocabulary to hidden states.
"""
"""
base_model
=
getattr
(
self
,
self
.
base_model_prefix
,
self
)
base_model
=
getattr
(
self
,
self
.
base_model_prefix
,
self
)
if
base_model
is
not
self
:
if
base_model
is
not
self
:
...
@@ -123,7 +128,12 @@ class PreTrainedModel(nn.Module, ModuleUtilsMixin):
...
@@ -123,7 +128,12 @@ class PreTrainedModel(nn.Module, ModuleUtilsMixin):
raise
NotImplementedError
raise
NotImplementedError
def
set_input_embeddings
(
self
,
value
):
def
set_input_embeddings
(
self
,
value
):
""" Set model's input embeddings
"""
Set model's input embeddings
Args:
value (:obj:`nn.Module`):
A module mapping vocabulary to hidden states.
"""
"""
base_model
=
getattr
(
self
,
self
.
base_model_prefix
,
self
)
base_model
=
getattr
(
self
,
self
.
base_model_prefix
,
self
)
if
base_model
is
not
self
:
if
base_model
is
not
self
:
...
@@ -132,14 +142,20 @@ class PreTrainedModel(nn.Module, ModuleUtilsMixin):
...
@@ -132,14 +142,20 @@ class PreTrainedModel(nn.Module, ModuleUtilsMixin):
raise
NotImplementedError
raise
NotImplementedError
def
get_output_embeddings
(
self
):
def
get_output_embeddings
(
self
):
""" Get model's output embeddings
"""
Return None if the model doesn't have output embeddings
Returns the model's output embeddings.
Returns:
:obj:`nn.Module`:
A torch module mapping hidden states to vocabulary.
"""
"""
return
None
# Overwrite for models with output embeddings
return
None
# Overwrite for models with output embeddings
def
tie_weights
(
self
):
def
tie_weights
(
self
):
""" Make sure we are sharing the input and output embeddings.
"""
Export to TorchScript can't handle parameter sharing so we are cloning them instead.
Tie the weights between the input embeddings and the output embeddings.
If the `torchscript` flag is set in the configuration, can't handle parameter sharing so we are cloning
the weights instead.
"""
"""
output_embeddings
=
self
.
get_output_embeddings
()
output_embeddings
=
self
.
get_output_embeddings
()
if
output_embeddings
is
not
None
:
if
output_embeddings
is
not
None
:
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment