BERT PyTorch models

cd77c750 · Lysandre · Lysandre Debut · 3922a249 · cd77c750 · cd77c750
Commit cd77c750 authored Jan 16, 2020 by Lysandre Committed by Lysandre Debut Jan 23, 2020
3 changed files
--- a/docs/source/model_doc/bert.rst
+++ b/docs/source/model_doc/bert.rst
 BERT
 ----------------------------------------------------

-``BertConfig``
+Overview
+~~~~~~~~~~~~~~~~~~~~~
+
+The BERT model was proposed in `BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding <https://arxiv.org/abs/1810.04805>`__
+by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. It's a bidirectional transformer
+pre-trained using a combination of masked language modeling objective and next sentence prediction
+on a large corpus comprising the Toronto Book Corpus and Wikipedia.
+
+The abstract from the paper is the following:
+
+*We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations
+from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional
+representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result,
+the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models
+for a wide range of tasks, such as question answering and language inference, without substantial task-specific
+architecture modifications.*
+
+*BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural
+language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI
+accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute
+improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).*
+
+Tips:
+
+- BERT is a model with absolute position embeddings so it's usually advised to pad the inputs on
+  the right rather than the left.
+
+
+BertConfig
 ~~~~~~~~~~~~~~~~~~~~~

 .. autoclass:: transformers.BertConfig
    :members:


-``BertTokenizer``
+BertTokenizer
 ~~~~~~~~~~~~~~~~~~~~~

 .. autoclass:: transformers.BertTokenizer
    :members:


-``BertModel``
+BertModel
 ~~~~~~~~~~~~~~~~~~~~

 .. autoclass:: transformers.BertModel
    :members:


-``BertForPreTraining``
+BertForPreTraining
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 .. autoclass:: transformers.BertForPreTraining
    :members:


-``BertForMaskedLM``
+BertForMaskedLM
 ~~~~~~~~~~~~~~~~~~~~~~~~~~

 .. autoclass:: transformers.BertForMaskedLM
    :members:


-``BertForNextSentencePrediction``
+BertForNextSentencePrediction
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 .. autoclass:: transformers.BertForNextSentencePrediction
    :members:


-``BertForSequenceClassification``
+BertForSequenceClassification
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 .. autoclass:: transformers.BertForSequenceClassification
    :members:


-``BertForMultipleChoice``
+BertForMultipleChoice
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 .. autoclass:: transformers.BertForMultipleChoice
    :members:


-``BertForTokenClassification``
+BertForTokenClassification
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 .. autoclass:: transformers.BertForTokenClassification
    :members:


-``BertForQuestionAnswering``
+BertForQuestionAnswering
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 .. autoclass:: transformers.BertForQuestionAnswering
    :members:


-``TFBertModel``
+TFBertModel
 ~~~~~~~~~~~~~~~~~~~~

 .. autoclass:: transformers.TFBertModel
    :members:


-``TFBertForPreTraining``
+TFBertForPreTraining
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 .. autoclass:: transformers.TFBertForPreTraining
    :members:


-``TFBertForMaskedLM``
+TFBertForMaskedLM
 ~~~~~~~~~~~~~~~~~~~~~~~~~~

 .. autoclass:: transformers.TFBertForMaskedLM
    :members:


-``TFBertForNextSentencePrediction``
+TFBertForNextSentencePrediction
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 .. autoclass:: transformers.TFBertForNextSentencePrediction
    :members:


-``TFBertForSequenceClassification``
+TFBertForSequenceClassification
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 .. autoclass:: transformers.TFBertForSequenceClassification
    :members:


-``TFBertForMultipleChoice``
+TFBertForMultipleChoice
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 .. autoclass:: transformers.TFBertForMultipleChoice
    :members:


-``TFBertForTokenClassification``
+TFBertForTokenClassification
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 .. autoclass:: transformers.TFBertForTokenClassification
    :members:


-``TFBertForQuestionAnswering``
+TFBertForQuestionAnswering
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 .. autoclass:: transformers.TFBertForQuestionAnswering

--- a/src/transformers/modeling_albert.py
+++ b/src/transformers/modeling_albert.py
@@ -645,7 +645,7 @@ class AlbertForMaskedLM(AlbertPreTrainedModel):
        :obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (:class:`~transformers.AlbertConfig`) and inputs:
        loss (`optional`, returned when ``masked_lm_labels`` is provided) ``torch.FloatTensor`` of shape ``(1,)``:
            Masked language modeling loss.
-        prediction_scores ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, config.vocab_size)``
+        prediction_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, config.vocab_size)`)
            Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
        hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``config.output_hidden_states=True``):
            Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer)

--- a/src/transformers/modeling_bert.py
+++ b/src/transformers/modeling_bert.py