Commit 1d646bad authored by thomwolf's avatar thomwolf
Browse files
parents 9676d1a2 8349d757
Transformers Transformers
================================================================================================================================================ ================================================================================================================================================
Transformers is a library of state-of-the-art pre-trained models for Natural Language Processing (NLP). 🤗 Transformers (formerly known as `pytorch-transformers` and `pytorch-pretrained-bert`) provides general-purpose architectures
(BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet...) for Natural Language Understanding (NLU) and Natural Language Generation
(NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between TensorFlow 2.0 and PyTorch.
The library currently contains PyTorch implementations, pre-trained model weights, usage scripts and conversion utilities for the following models: Features
---------------------------------------------------
- As easy to use as pytorch-transformers
- As powerful and concise as Keras
- High performance on NLU and NLG tasks
- Low barrier to entry for educators and practitioners
State-of-the-art NLP for everyone
- Deep learning researchers
- Hands-on practitioners
- AI/ML/NLP teachers and educators
Lower compute costs, smaller carbon footprint
- Researchers can share trained models instead of always retraining
- Practitioners can reduce compute time and production costs
- 8 architectures with over 30 pretrained models, some in more than 100 languages
Choose the right framework for every part of a model's lifetime
- Train state-of-the-art models in 3 lines of code
- Deep interoperability between TensorFlow 2.0 and PyTorch models
- Move a single model between TF2.0/PyTorch frameworks at will
- Seamlessly pick the right framework for training, evaluation, production
Contents
---------------------------------
The library currently contains PyTorch and Tensorflow implementations, pre-trained model weights, usage scripts and conversion utilities for the following models:
1. `BERT <https://github.com/google-research/bert>`_ (from Google) released with the paper `BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding <https://arxiv.org/abs/1810.04805>`_ by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. 1. `BERT <https://github.com/google-research/bert>`_ (from Google) released with the paper `BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding <https://arxiv.org/abs/1810.04805>`_ by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
2. `GPT <https://github.com/openai/finetune-transformer-lm>`_ (from OpenAI) released with the paper `Improving Language Understanding by Generative Pre-Training <https://blog.openai.com/language-unsupervised>`_ by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. 2. `GPT <https://github.com/openai/finetune-transformer-lm>`_ (from OpenAI) released with the paper `Improving Language Understanding by Generative Pre-Training <https://blog.openai.com/language-unsupervised>`_ by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
...@@ -14,6 +43,7 @@ The library currently contains PyTorch implementations, pre-trained model weight ...@@ -14,6 +43,7 @@ The library currently contains PyTorch implementations, pre-trained model weight
7. `RoBERTa <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`_ (from Facebook), released together with the paper a `Robustly Optimized BERT Pretraining Approach <https://arxiv.org/abs/1907.11692>`_ by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. 7. `RoBERTa <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`_ (from Facebook), released together with the paper a `Robustly Optimized BERT Pretraining Approach <https://arxiv.org/abs/1907.11692>`_ by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
8. `DistilBERT <https://huggingface.co/transformers/model_doc/distilbert.html>`_ (from HuggingFace) released together with the blog post `Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT <https://medium.com/huggingface/distilbert-8cf3380435b5>`_ by Victor Sanh, Lysandre Debut and Thomas Wolf. 8. `DistilBERT <https://huggingface.co/transformers/model_doc/distilbert.html>`_ (from HuggingFace) released together with the blog post `Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT <https://medium.com/huggingface/distilbert-8cf3380435b5>`_ by Victor Sanh, Lysandre Debut and Thomas Wolf.
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 2
:caption: Notes :caption: Notes
...@@ -37,6 +67,7 @@ The library currently contains PyTorch implementations, pre-trained model weight ...@@ -37,6 +67,7 @@ The library currently contains PyTorch implementations, pre-trained model weight
main_classes/model main_classes/model
main_classes/tokenizer main_classes/tokenizer
main_classes/optimizer_schedules main_classes/optimizer_schedules
main_classes/processors
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 2
......
...@@ -13,3 +13,9 @@ The base class ``PreTrainedModel`` implements the common methods for loading/sav ...@@ -13,3 +13,9 @@ The base class ``PreTrainedModel`` implements the common methods for loading/sav
.. autoclass:: transformers.PreTrainedModel .. autoclass:: transformers.PreTrainedModel
:members: :members:
``TFPreTrainedModel``
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFPreTrainedModel
:members:
Processors
----------------------------------------------------
This library includes processors for several traditional tasks. These processors can be used to process a dataset into
examples that can be fed to a model.
Processors
~~~~~~~~~~~~~~~~~~~~~
All processors follow the same architecture which is that of the
:class:`~pytorch_transformers.data.processors.utils.DataProcessor`. The processor returns a list
of :class:`~pytorch_transformers.data.processors.utils.InputExample`. These
:class:`~pytorch_transformers.data.processors.utils.InputExample` can be converted to
:class:`~pytorch_transformers.data.processors.utils.InputFeatures` in order to be fed to the model.
.. autoclass:: pytorch_transformers.data.processors.utils.DataProcessor
:members:
.. autoclass:: pytorch_transformers.data.processors.utils.InputExample
:members:
.. autoclass:: pytorch_transformers.data.processors.utils.InputFeatures
:members:
GLUE
~~~~~~~~~~~~~~~~~~~~~
`General Language Understanding Evaluation (GLUE) <https://gluebenchmark.com/>`__ is a benchmark that evaluates
the performance of models across a diverse set of existing NLU tasks. It was released together with the paper
`GLUE: A multi-task benchmark and analysis platform for natural language understanding <https://openreview.net/pdf?id=rJ4km2R5t7>`__
This library hosts a total of 10 processors for the following tasks: MRPC, MNLI, MNLI (mismatched),
CoLA, SST2, STSB, QQP, QNLI, RTE and WNLI.
Those processors are:
- :class:`~pytorch_transformers.data.processors.utils.MrpcProcessor`
- :class:`~pytorch_transformers.data.processors.utils.MnliProcessor`
- :class:`~pytorch_transformers.data.processors.utils.MnliMismatchedProcessor`
- :class:`~pytorch_transformers.data.processors.utils.Sst2Processor`
- :class:`~pytorch_transformers.data.processors.utils.StsbProcessor`
- :class:`~pytorch_transformers.data.processors.utils.QqpProcessor`
- :class:`~pytorch_transformers.data.processors.utils.QnliProcessor`
- :class:`~pytorch_transformers.data.processors.utils.RteProcessor`
- :class:`~pytorch_transformers.data.processors.utils.WnliProcessor`
Additionally, the following method can be used to load values from a data file and convert them to a list of
:class:`~pytorch_transformers.data.processors.utils.InputExample`.
.. automethod:: pytorch_transformers.data.processors.glue.glue_convert_examples_to_features
Example usage
^^^^^^^^^^^^^^^^^^^^^^^^^
An example using these processors is given in the
`run_glue.py <https://github.com/huggingface/pytorch-transformers/blob/master/examples/run_glue.py>`__ script.
\ No newline at end of file
...@@ -70,3 +70,59 @@ BERT ...@@ -70,3 +70,59 @@ BERT
.. autoclass:: transformers.BertForQuestionAnswering .. autoclass:: transformers.BertForQuestionAnswering
:members: :members:
``TFBertModel``
~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFBertModel
:members:
``TFBertForPreTraining``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFBertForPreTraining
:members:
``TFBertForMaskedLM``
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFBertForMaskedLM
:members:
``TFBertForNextSentencePrediction``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFBertForNextSentencePrediction
:members:
``TFBertForSequenceClassification``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFBertForSequenceClassification
:members:
``TFBertForMultipleChoice``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFBertForMultipleChoice
:members:
``TFBertForTokenClassification``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFBertForTokenClassification
:members:
``TFBertForQuestionAnswering``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFBertForQuestionAnswering
:members:
...@@ -41,3 +41,30 @@ DistilBERT ...@@ -41,3 +41,30 @@ DistilBERT
.. autoclass:: transformers.DistilBertForQuestionAnswering .. autoclass:: transformers.DistilBertForQuestionAnswering
:members: :members:
``TFDistilBertModel``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFDistilBertModel
:members:
``TFDistilBertForMaskedLM``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFDistilBertForMaskedLM
:members:
``TFDistilBertForSequenceClassification``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFDistilBertForSequenceClassification
:members:
``TFDistilBertForQuestionAnswering``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFDistilBertForQuestionAnswering
:members:
...@@ -34,3 +34,24 @@ OpenAI GPT ...@@ -34,3 +34,24 @@ OpenAI GPT
.. autoclass:: transformers.OpenAIGPTDoubleHeadsModel .. autoclass:: transformers.OpenAIGPTDoubleHeadsModel
:members: :members:
``TFOpenAIGPTModel``
~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFOpenAIGPTModel
:members:
``TFOpenAIGPTLMHeadModel``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFOpenAIGPTLMHeadModel
:members:
``TFOpenAIGPTDoubleHeadsModel``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFOpenAIGPTDoubleHeadsModel
:members:
...@@ -34,3 +34,24 @@ OpenAI GPT2 ...@@ -34,3 +34,24 @@ OpenAI GPT2
.. autoclass:: transformers.GPT2DoubleHeadsModel .. autoclass:: transformers.GPT2DoubleHeadsModel
:members: :members:
``TFGPT2Model``
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFGPT2Model
:members:
``TFGPT2LMHeadModel``
~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFGPT2LMHeadModel
:members:
``TFGPT2DoubleHeadsModel``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFGPT2DoubleHeadsModel
:members:
...@@ -34,3 +34,24 @@ RoBERTa ...@@ -34,3 +34,24 @@ RoBERTa
.. autoclass:: transformers.RobertaForSequenceClassification .. autoclass:: transformers.RobertaForSequenceClassification
:members: :members:
``TFRobertaModel``
~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFRobertaModel
:members:
``TFRobertaForMaskedLM``
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFRobertaForMaskedLM
:members:
``TFRobertaForSequenceClassification``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFRobertaForSequenceClassification
:members:
...@@ -28,3 +28,17 @@ Transformer XL ...@@ -28,3 +28,17 @@ Transformer XL
.. autoclass:: transformers.TransfoXLLMHeadModel .. autoclass:: transformers.TransfoXLLMHeadModel
:members: :members:
``TFTransfoXLModel``
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFTransfoXLModel
:members:
``TFTransfoXLLMHeadModel``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFTransfoXLLMHeadModel
:members:
...@@ -39,3 +39,31 @@ XLM ...@@ -39,3 +39,31 @@ XLM
.. autoclass:: transformers.XLMForQuestionAnswering .. autoclass:: transformers.XLMForQuestionAnswering
:members: :members:
``TFXLMModel``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFXLMModel
:members:
``TFXLMWithLMHeadModel``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFXLMWithLMHeadModel
:members:
``TFXLMForSequenceClassification``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFXLMForSequenceClassification
:members:
``TFXLMForQuestionAnsweringSimple``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFXLMForQuestionAnsweringSimple
:members:
...@@ -41,3 +41,31 @@ XLNet ...@@ -41,3 +41,31 @@ XLNet
.. autoclass:: transformers.XLNetForQuestionAnswering .. autoclass:: transformers.XLNetForQuestionAnswering
:members: :members:
``TFXLNetModel``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFXLNetModel
:members:
``TFXLNetLMHeadModel``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFXLNetLMHeadModel
:members:
``TFXLNetForSequenceClassification``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFXLNetForSequenceClassification
:members:
``TFXLNetForQuestionAnsweringSimple``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFXLNetForQuestionAnsweringSimple
:members:
...@@ -44,15 +44,15 @@ Here is the full list of the currently provided pretrained models together with ...@@ -44,15 +44,15 @@ Here is the full list of the currently provided pretrained models together with
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ | +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-large-uncased-whole-word-masking-finetuned-squad`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. | | | ``bert-large-uncased-whole-word-masking-finetuned-squad`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
| | | | The ``bert-large-uncased-whole-word-masking`` model fine-tuned on SQuAD | | | | | The ``bert-large-uncased-whole-word-masking`` model fine-tuned on SQuAD |
| | | (see details of fine-tuning in the `example section <https://github.com/huggingface/transformers/tree/master/examples>`__). | | | | (see details of fine-tuning in the `example section <https://github.com/huggingface/transformers/tree/master/examples>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ | +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-large-cased-whole-word-masking-finetuned-squad`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters | | | ``bert-large-cased-whole-word-masking-finetuned-squad`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters |
| | | | The ``bert-large-cased-whole-word-masking`` model fine-tuned on SQuAD | | | | | The ``bert-large-cased-whole-word-masking`` model fine-tuned on SQuAD |
| | | (see `details of fine-tuning in the example section <https://huggingface.co/transformers/examples.html>`__) | | | | (see `details of fine-tuning in the example section <https://huggingface.co/transformers/examples.html>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ | +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-base-cased-finetuned-mrpc`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. | | | ``bert-base-cased-finetuned-mrpc`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | The ``bert-base-cased`` model fine-tuned on MRPC | | | | | The ``bert-base-cased`` model fine-tuned on MRPC |
| | | (see `details of fine-tuning in the example section <https://huggingface.co/transformers/examples.html>`__) | | | | (see `details of fine-tuning in the example section <https://huggingface.co/transformers/examples.html>`__) |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ +-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| GPT | ``openai-gpt`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. | | GPT | ``openai-gpt`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | OpenAI GPT English model | | | | | OpenAI GPT English model |
......
...@@ -26,6 +26,7 @@ if is_tf_available(): ...@@ -26,6 +26,7 @@ if is_tf_available():
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
def glue_convert_examples_to_features(examples, tokenizer, def glue_convert_examples_to_features(examples, tokenizer,
max_length=512, max_length=512,
task=None, task=None,
...@@ -36,7 +37,27 @@ def glue_convert_examples_to_features(examples, tokenizer, ...@@ -36,7 +37,27 @@ def glue_convert_examples_to_features(examples, tokenizer,
pad_token_segment_id=0, pad_token_segment_id=0,
mask_padding_with_zero=True): mask_padding_with_zero=True):
""" """
Loads a data file into a list of `InputBatch`s Loads a data file into a list of ``InputFeatures``
Args:
examples: List of ``InputExamples`` or ``tf.data.Dataset`` containing the examples.
tokenizer: Instance of a tokenizer that will tokenize the examples
max_length: Maximum example length
task: GLUE task
label_list: List of labels. Can be obtained from the processor using the ``processor.get_labels()`` method
output_mode: String indicating the output mode. Either ``regression`` or ``classification``
pad_on_left: If set to ``True``, the examples will be padded on the left rather than on the right (default)
pad_token: Padding token
pad_token_segment_id: The segment ID for the padding token (It is usually 0, but can vary such as for XLNet where it is 4)
mask_padding_with_zero: If set to ``True``, the attention mask will be filled by ``1`` for actual values
and by ``0`` for padded values. If set to ``False``, inverts it (``1`` for padded values, ``0`` for
actual values)
Returns:
If the ``examples`` input is a ``tf.data.Dataset``, will return a ``tf.data.Dataset``
containing the task-specific features. If the input is a list of ``InputExamples``, will return
a list of task-specific ``InputFeatures`` which can be fed to the model.
""" """
is_tf_dataset = False is_tf_dataset = False
if is_tf_available() and isinstance(examples, tf.data.Dataset): if is_tf_available() and isinstance(examples, tf.data.Dataset):
......
...@@ -20,19 +20,19 @@ import copy ...@@ -20,19 +20,19 @@ import copy
import json import json
class InputExample(object): class InputExample(object):
"""A single training/test example for simple sequence classification.""" """
A single training/test example for simple sequence classification.
Args:
guid: Unique id for the example.
text_a: string. The untokenized text of the first sequence. For single
sequence tasks, only this sequence must be specified.
text_b: (Optional) string. The untokenized text of the second sequence.
Only must be specified for sequence pair tasks.
label: (Optional) string. The label of the example. This should be
specified for train and dev examples, but not for test examples.
"""
def __init__(self, guid, text_a, text_b=None, label=None): def __init__(self, guid, text_a, text_b=None, label=None):
"""Constructs a InputExample.
Args:
guid: Unique id for the example.
text_a: string. The untokenized text of the first sequence. For single
sequence tasks, only this sequence must be specified.
text_b: (Optional) string. The untokenized text of the second sequence.
Only must be specified for sequence pair tasks.
label: (Optional) string. The label of the example. This should be
specified for train and dev examples, but not for test examples.
"""
self.guid = guid self.guid = guid
self.text_a = text_a self.text_a = text_a
self.text_b = text_b self.text_b = text_b
...@@ -52,7 +52,17 @@ class InputExample(object): ...@@ -52,7 +52,17 @@ class InputExample(object):
class InputFeatures(object): class InputFeatures(object):
"""A single set of features of data.""" """
A single set of features of data.
Args:
input_ids: Indices of input sequence tokens in the vocabulary.
attention_mask: Mask to avoid performing attention on padding token indices.
Mask values selected in ``[0, 1]``:
Usually ``1`` for tokens that are NOT MASKED, ``0`` for MASKED (padded) tokens.
token_type_ids: Segment token indices to indicate first and second portions of the inputs.
label: Label corresponding to the input
"""
def __init__(self, input_ids, attention_mask, token_type_ids, label): def __init__(self, input_ids, attention_mask, token_type_ids, label):
self.input_ids = input_ids self.input_ids = input_ids
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment