Merge branch 'master' of https://github.com/huggingface/pytorch-transformers

1d646bad · thomwolf · 9676d1a2 · 8349d757 · 1d646bad · 1d646bad
Commit 1d646bad authored Sep 26, 2019 by thomwolf
14 changed files
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
 Transformers
 ================================================================================================================================================
-Transformers is a library of state-of-the-art pre-trained models for Natural Language Processing (NLP).
+🤗 Transformers (formerly known as `pytorch-transformers` and `pytorch-pretrained-bert`) provides general-purpose architectures
+(BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet...) for Natural Language Understanding (NLU) and Natural Language Generation
+(NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between TensorFlow 2.0 and PyTorch.
-The library currently contains PyTorch implementations, pre-trained model weights, usage scripts and conversion utilities for the following models:
+Features
+---------------------------------------------------
+- As easy to use as pytorch-transformers
+- As powerful and concise as Keras
+- High performance on NLU and NLG tasks
+- Low barrier to entry for educators and practitioners
+State-of-the-art NLP for everyone
+- Deep learning researchers
+- Hands-on practitioners
+- AI/ML/NLP teachers and educators
+Lower compute costs, smaller carbon footprint
+- Researchers can share trained models instead of always retraining
+- Practitioners can reduce compute time and production costs
+- 8 architectures with over 30 pretrained models, some in more than 100 languages
+Choose the right framework for every part of a model's lifetime
+- Train state-of-the-art models in 3 lines of code
+- Deep interoperability between TensorFlow 2.0 and PyTorch models
+- Move a single model between TF2.0/PyTorch frameworks at will
+- Seamlessly pick the right framework for training, evaluation, production
+Contents
+---------------------------------
+The library currently contains PyTorch and Tensorflow implementations, pre-trained model weights, usage scripts and conversion utilities for the following models:
 1. `BERT <https://github.com/google-research/bert>`_ (from Google) released with the paper `BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding <https://arxiv.org/abs/1810.04805>`_ by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
 2. `GPT <https://github.com/openai/finetune-transformer-lm>`_ (from OpenAI) released with the paper `Improving Language Understanding by Generative Pre-Training <https://blog.openai.com/language-unsupervised>`_ by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
@@ -14,6 +43,7 @@ The library currently contains PyTorch implementations, pre-trained model weight
 7. `RoBERTa <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`_ (from Facebook), released together with the paper a `Robustly Optimized BERT Pretraining Approach <https://arxiv.org/abs/1907.11692>`_ by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
 8. `DistilBERT <https://huggingface.co/transformers/model_doc/distilbert.html>`_ (from HuggingFace) released together with the blog post `Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT <https://medium.com/huggingface/distilbert-8cf3380435b5>`_ by Victor Sanh, Lysandre Debut and Thomas Wolf.
 .. toctree::
    :maxdepth: 2
    :caption: Notes
@@ -37,6 +67,7 @@ The library currently contains PyTorch implementations, pre-trained model weight
    main_classes/model
    main_classes/tokenizer
    main_classes/optimizer_schedules
+    main_classes/processors
 .. toctree::
    :maxdepth: 2

--- a/docs/source/main_classes/model.rst
+++ b/docs/source/main_classes/model.rst
@@ -13,3 +13,9 @@ The base class ``PreTrainedModel`` implements the common methods for loading/sav
 .. autoclass:: transformers.PreTrainedModel
    :members:
+``TFPreTrainedModel``
+~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: pytorch_transformers.TFPreTrainedModel
+    :members:
--- a/docs/source/main_classes/processors.rst
+++ b/docs/source/main_classes/processors.rst
+Processors
+----------------------------------------------------
+This library includes processors for several traditional tasks. These processors can be used to process a dataset into
+examples that can be fed to a model.
+Processors
+~~~~~~~~~~~~~~~~~~~~~
+All processors follow the same architecture which is that of the
+:class:`~pytorch_transformers.data.processors.utils.DataProcessor`. The processor returns a list
+of :class:`~pytorch_transformers.data.processors.utils.InputExample`. These
+:class:`~pytorch_transformers.data.processors.utils.InputExample` can be converted to
+:class:`~pytorch_transformers.data.processors.utils.InputFeatures` in order to be fed to the model.
+.. autoclass:: pytorch_transformers.data.processors.utils.DataProcessor
+    :members:
+.. autoclass:: pytorch_transformers.data.processors.utils.InputExample
+    :members:
+.. autoclass:: pytorch_transformers.data.processors.utils.InputFeatures
+    :members:
+GLUE
+~~~~~~~~~~~~~~~~~~~~~
+`General Language Understanding Evaluation (GLUE) <https://gluebenchmark.com/>`__ is a benchmark that evaluates
+the performance of models across a diverse set of existing NLU tasks. It was released together with the paper
+`GLUE: A multi-task benchmark and analysis platform for natural language understanding <https://openreview.net/pdf?id=rJ4km2R5t7>`__
+This library hosts a total of 10 processors for the following tasks: MRPC, MNLI, MNLI (mismatched),
+CoLA, SST2, STSB, QQP, QNLI, RTE and WNLI.
+Those processors are:
+    - :class:`~pytorch_transformers.data.processors.utils.MrpcProcessor`
+    - :class:`~pytorch_transformers.data.processors.utils.MnliProcessor`
+    - :class:`~pytorch_transformers.data.processors.utils.MnliMismatchedProcessor`
+    - :class:`~pytorch_transformers.data.processors.utils.Sst2Processor`
+    - :class:`~pytorch_transformers.data.processors.utils.StsbProcessor`
+    - :class:`~pytorch_transformers.data.processors.utils.QqpProcessor`
+    - :class:`~pytorch_transformers.data.processors.utils.QnliProcessor`
+    - :class:`~pytorch_transformers.data.processors.utils.RteProcessor`
+    - :class:`~pytorch_transformers.data.processors.utils.WnliProcessor`
+Additionally, the following method  can be used to load values from a data file and convert them to a list of
+:class:`~pytorch_transformers.data.processors.utils.InputExample`.
+.. automethod:: pytorch_transformers.data.processors.glue.glue_convert_examples_to_features
+Example usage
+^^^^^^^^^^^^^^^^^^^^^^^^^
+An example using these processors is given in the
+`run_glue.py <https://github.com/huggingface/pytorch-transformers/blob/master/examples/run_glue.py>`__ script.
\ No newline at end of file
--- a/docs/source/model_doc/bert.rst
+++ b/docs/source/model_doc/bert.rst
@@ -70,3 +70,59 @@ BERT
 .. autoclass:: transformers.BertForQuestionAnswering
    :members:
+``TFBertModel``
+~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: pytorch_transformers.TFBertModel
+    :members:
+``TFBertForPreTraining``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: pytorch_transformers.TFBertForPreTraining
+    :members:
+``TFBertForMaskedLM``
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: pytorch_transformers.TFBertForMaskedLM
+    :members:
+``TFBertForNextSentencePrediction``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: pytorch_transformers.TFBertForNextSentencePrediction
+    :members:
+``TFBertForSequenceClassification``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: pytorch_transformers.TFBertForSequenceClassification
+    :members:
+``TFBertForMultipleChoice``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: pytorch_transformers.TFBertForMultipleChoice
+    :members:
+``TFBertForTokenClassification``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: pytorch_transformers.TFBertForTokenClassification
+    :members:
+``TFBertForQuestionAnswering``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: pytorch_transformers.TFBertForQuestionAnswering
+    :members:
--- a/docs/source/model_doc/distilbert.rst
+++ b/docs/source/model_doc/distilbert.rst
@@ -41,3 +41,30 @@ DistilBERT
 .. autoclass:: transformers.DistilBertForQuestionAnswering
    :members:
+``TFDistilBertModel``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: pytorch_transformers.TFDistilBertModel
+    :members:
+``TFDistilBertForMaskedLM``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: pytorch_transformers.TFDistilBertForMaskedLM
+    :members:
+``TFDistilBertForSequenceClassification``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: pytorch_transformers.TFDistilBertForSequenceClassification
+    :members:
+``TFDistilBertForQuestionAnswering``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: pytorch_transformers.TFDistilBertForQuestionAnswering
+    :members:
--- a/docs/source/model_doc/gpt.rst
+++ b/docs/source/model_doc/gpt.rst
@@ -34,3 +34,24 @@ OpenAI GPT
 .. autoclass:: transformers.OpenAIGPTDoubleHeadsModel
    :members:
+``TFOpenAIGPTModel``
+~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: pytorch_transformers.TFOpenAIGPTModel
+    :members:
+``TFOpenAIGPTLMHeadModel``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: pytorch_transformers.TFOpenAIGPTLMHeadModel
+    :members:
+``TFOpenAIGPTDoubleHeadsModel``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: pytorch_transformers.TFOpenAIGPTDoubleHeadsModel
+    :members:
--- a/docs/source/model_doc/gpt2.rst
+++ b/docs/source/model_doc/gpt2.rst
@@ -34,3 +34,24 @@ OpenAI GPT2
 .. autoclass:: transformers.GPT2DoubleHeadsModel
    :members:
+``TFGPT2Model``
+~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: pytorch_transformers.TFGPT2Model
+    :members:
+``TFGPT2LMHeadModel``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: pytorch_transformers.TFGPT2LMHeadModel
+    :members:
+``TFGPT2DoubleHeadsModel``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: pytorch_transformers.TFGPT2DoubleHeadsModel
+    :members:
--- a/docs/source/model_doc/roberta.rst
+++ b/docs/source/model_doc/roberta.rst
@@ -34,3 +34,24 @@ RoBERTa
 .. autoclass:: transformers.RobertaForSequenceClassification
    :members:
+``TFRobertaModel``
+~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: pytorch_transformers.TFRobertaModel
+    :members:
+``TFRobertaForMaskedLM``
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: pytorch_transformers.TFRobertaForMaskedLM
+    :members:
+``TFRobertaForSequenceClassification``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: pytorch_transformers.TFRobertaForSequenceClassification
+    :members:
--- a/docs/source/model_doc/transformerxl.rst
+++ b/docs/source/model_doc/transformerxl.rst
@@ -28,3 +28,17 @@ Transformer XL
 .. autoclass:: transformers.TransfoXLLMHeadModel
    :members:
+``TFTransfoXLModel``
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: pytorch_transformers.TFTransfoXLModel
+    :members:
+``TFTransfoXLLMHeadModel``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: pytorch_transformers.TFTransfoXLLMHeadModel
+    :members:
--- a/docs/source/model_doc/xlm.rst
+++ b/docs/source/model_doc/xlm.rst
@@ -39,3 +39,31 @@ XLM
 .. autoclass:: transformers.XLMForQuestionAnswering
    :members:
+``TFXLMModel``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: pytorch_transformers.TFXLMModel
+    :members:
+``TFXLMWithLMHeadModel``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: pytorch_transformers.TFXLMWithLMHeadModel
+    :members:
+``TFXLMForSequenceClassification``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: pytorch_transformers.TFXLMForSequenceClassification
+    :members:
+``TFXLMForQuestionAnsweringSimple``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: pytorch_transformers.TFXLMForQuestionAnsweringSimple
+    :members:
--- a/docs/source/model_doc/xlnet.rst
+++ b/docs/source/model_doc/xlnet.rst
@@ -41,3 +41,31 @@ XLNet
 .. autoclass:: transformers.XLNetForQuestionAnswering
    :members:
+``TFXLNetModel``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: pytorch_transformers.TFXLNetModel
+    :members:
+``TFXLNetLMHeadModel``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: pytorch_transformers.TFXLNetLMHeadModel
+    :members:
+``TFXLNetForSequenceClassification``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: pytorch_transformers.TFXLNetForSequenceClassification
+    :members:
+``TFXLNetForQuestionAnsweringSimple``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: pytorch_transformers.TFXLNetForQuestionAnsweringSimple
+    :members:
--- a/docs/source/pretrained_models.rst
+++ b/docs/source/pretrained_models.rst
@@ -44,15 +44,15 @@ Here is the full list of the currently provided pretrained models together with
 |                   +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
 |                   | ``bert-large-uncased-whole-word-masking-finetuned-squad``  | | 24-layer, 1024-hidden, 16-heads, 340M parameters.                                                                                   |
 |                   |                                                            | | The ``bert-large-uncased-whole-word-masking`` model fine-tuned on SQuAD                                                             |
-|                   |                                                            | (see details of fine-tuning in the `example section <https://github.com/huggingface/transformers/tree/master/examples>`__).   |
+|                   |                                                            | (see details of fine-tuning in the `example section <https://github.com/huggingface/transformers/tree/master/examples>`__).           |
 |                   +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
 |                   | ``bert-large-cased-whole-word-masking-finetuned-squad``    | | 24-layer, 1024-hidden, 16-heads, 340M parameters                                                                                    |
 |                   |                                                            | | The ``bert-large-cased-whole-word-masking`` model fine-tuned on SQuAD                                                               |
-|                   |                                                            | (see `details of fine-tuning in the example section <https://huggingface.co/transformers/examples.html>`__)                   |
+|                   |                                                            | (see `details of fine-tuning in the example section <https://huggingface.co/transformers/examples.html>`__)                           |
 |                   +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
 |                   | ``bert-base-cased-finetuned-mrpc``                         | | 12-layer, 768-hidden, 12-heads, 110M parameters.                                                                                    |
 |                   |                                                            | | The ``bert-base-cased`` model fine-tuned on MRPC                                                                                    |
-|                   |                                                            | (see `details of fine-tuning in the example section <https://huggingface.co/transformers/examples.html>`__)                   |
+|                   |                                                            | (see `details of fine-tuning in the example section <https://huggingface.co/transformers/examples.html>`__)                           |
 +-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
 | GPT               | ``openai-gpt``                                             | | 12-layer, 768-hidden, 12-heads, 110M parameters.                                                                                    |
 |                   |                                                            | | OpenAI GPT English model                                                                                                            |

--- a/transformers/data/processors/glue.py
+++ b/transformers/data/processors/glue.py
@@ -26,6 +26,7 @@ if is_tf_available():
 logger = logging.getLogger(__name__)
 def glue_convert_examples_to_features(examples, tokenizer,
                                      max_length=512,
                                      task=None,
@@ -36,7 +37,27 @@ def glue_convert_examples_to_features(examples, tokenizer,
                                      pad_token_segment_id=0,
                                      mask_padding_with_zero=True):
    """
-    Loads a data file into a list of `InputBatch`s
+    Loads a data file into a list of ``InputFeatures``
+    Args:
+        examples: List of ``InputExamples`` or ``tf.data.Dataset`` containing the examples.
+        tokenizer: Instance of a tokenizer that will tokenize the examples
+        max_length: Maximum example length
+        task: GLUE task
+        label_list: List of labels. Can be obtained from the processor using the ``processor.get_labels()`` method
+        output_mode: String indicating the output mode. Either ``regression`` or ``classification``
+        pad_on_left: If set to ``True``, the examples will be padded on the left rather than on the right (default)
+        pad_token: Padding token
+        pad_token_segment_id: The segment ID for the padding token (It is usually 0, but can vary such as for XLNet where it is 4)
+        mask_padding_with_zero: If set to ``True``, the attention mask will be filled by ``1`` for actual values
+            and by ``0`` for padded values. If set to ``False``, inverts it (``1`` for padded values, ``0`` for
+            actual values)
+    Returns:
+        If the ``examples`` input is a ``tf.data.Dataset``, will return a ``tf.data.Dataset``
+        containing the task-specific features. If the input is a list of ``InputExamples``, will return
+        a list of task-specific ``InputFeatures`` which can be fed to the model.
    """
    is_tf_dataset = False
    if is_tf_available() and isinstance(examples, tf.data.Dataset):

--- a/transformers/data/processors/utils.py
+++ b/transformers/data/processors/utils.py
@@ -20,19 +20,19 @@ import copy
 import json
 class InputExample(object):
-    """A single training/test example for simple sequence classification."""
+    """
+    A single training/test example for simple sequence classification.
+    Args:
+        guid: Unique id for the example.
+        text_a: string. The untokenized text of the first sequence. For single
+        sequence tasks, only this sequence must be specified.
+        text_b: (Optional) string. The untokenized text of the second sequence.
+        Only must be specified for sequence pair tasks.
+        label: (Optional) string. The label of the example. This should be
+        specified for train and dev examples, but not for test examples.
+    """
    def __init__(self, guid, text_a, text_b=None, label=None):
-        """Constructs a InputExample.
-        Args:
-            guid: Unique id for the example.
-            text_a: string. The untokenized text of the first sequence. For single
-            sequence tasks, only this sequence must be specified.
-            text_b: (Optional) string. The untokenized text of the second sequence.
-            Only must be specified for sequence pair tasks.
-            label: (Optional) string. The label of the example. This should be
-            specified for train and dev examples, but not for test examples.
-        """
        self.guid = guid
        self.text_a = text_a
        self.text_b = text_b
@@ -52,7 +52,17 @@ class InputExample(object):
 class InputFeatures(object):
-    """A single set of features of data."""
+    """
+    A single set of features of data.
+    Args:
+        input_ids: Indices of input sequence tokens in the vocabulary.
+        attention_mask: Mask to avoid performing attention on padding token indices.
+            Mask values selected in ``[0, 1]``:
+            Usually  ``1`` for tokens that are NOT MASKED, ``0`` for MASKED (padded) tokens.
+        token_type_ids: Segment token indices to indicate first and second portions of the inputs.
+        label: Label corresponding to the input
+    """
    def __init__(self, input_ids, attention_mask, token_type_ids, label):
        self.input_ids = input_ids