"...git@developer.sourcefind.cn:chenpangpang/open-webui.git" did not exist on "9bcd4ce5c0a01af68c0d2aa44554a68bb741c61b"
Unverified Commit 37be3786 authored by Sylvain Gugger's avatar Sylvain Gugger Committed by GitHub
Browse files

Clean documentation (#4849)

* Clean documentation
parent 42860e92
...@@ -67,6 +67,12 @@ AlbertForSequenceClassification ...@@ -67,6 +67,12 @@ AlbertForSequenceClassification
.. autoclass:: transformers.AlbertForSequenceClassification .. autoclass:: transformers.AlbertForSequenceClassification
:members: :members:
AlbertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertForTokenClassification
:members:
AlbertForQuestionAnswering AlbertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
...@@ -103,6 +109,13 @@ TFAlbertForMultipleChoice ...@@ -103,6 +109,13 @@ TFAlbertForMultipleChoice
:members: :members:
TFAlbertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAlbertForTokenClassification
:members:
TFAlbertForQuestionAnswering TFAlbertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......
...@@ -4,8 +4,9 @@ Bart ...@@ -4,8 +4,9 @@ Bart
file a `Github Issue <https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`__ and assign file a `Github Issue <https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`__ and assign
@sshleifer @sshleifer
Paper Overview
~~~~~ ~~~~~~~~~~~~~~~~~~~~~
The Bart model was `proposed <https://arxiv.org/abs/1910.13461>`_ by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer on 29 Oct, 2019. The Bart model was `proposed <https://arxiv.org/abs/1910.13461>`_ by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer on 29 Oct, 2019.
According to the abstract, According to the abstract,
...@@ -16,14 +17,26 @@ According to the abstract, ...@@ -16,14 +17,26 @@ According to the abstract,
The Authors' code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/bart>`_ The Authors' code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/bart>`_
Implementation Notes Implementation Notes:
~~~~~~~~~~~~~~~~~~~~
- Bart doesn't use :obj:`token_type_ids` for sequence classification. Use BartTokenizer.encode to get the proper splitting. - Bart doesn't use :obj:`token_type_ids` for sequence classification. Use BartTokenizer.encode to get the proper splitting.
- The forward pass of ``BartModel`` will create decoder inputs (using the helper function ``transformers.modeling_bart._prepare_bart_decoder_inputs``) if they are not passed. This is different than some other modeling APIs. - The forward pass of ``BartModel`` will create decoder inputs (using the helper function ``transformers.modeling_bart._prepare_bart_decoder_inputs``) if they are not passed. This is different than some other modeling APIs.
- Model predictions are intended to be identical to the original implementation. This only works, however, if the string you pass to ``fairseq.encode`` starts with a space. - Model predictions are intended to be identical to the original implementation. This only works, however, if the string you pass to ``fairseq.encode`` starts with a space.
- ``BartForConditionalGeneration.generate`` should be used for conditional generation tasks like summarization, see the example in that docstrings - ``BartForConditionalGeneration.generate`` should be used for conditional generation tasks like summarization, see the example in that docstrings
- Models that load the ``"facebook/bart-large-cnn"`` weights will not have a ``mask_token_id``, or be able to perform mask filling tasks. - Models that load the ``"facebook/bart-large-cnn"`` weights will not have a ``mask_token_id``, or be able to perform mask filling tasks.
BartConfig
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BartConfig
:members:
BartTokenizer
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BartTokenizer
:members:
BartModel BartModel
...@@ -35,22 +48,17 @@ BartModel ...@@ -35,22 +48,17 @@ BartModel
.. autofunction:: transformers.modeling_bart._prepare_bart_decoder_inputs .. autofunction:: transformers.modeling_bart._prepare_bart_decoder_inputs
BartForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BartForConditionalGeneration
:members: generate, forward
BartForSequenceClassification BartForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BartForSequenceClassification .. autoclass:: transformers.BartForSequenceClassification
:members: forward :members: forward
BartConfig
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BartConfig BartForConditionalGeneration
:members: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BartForConditionalGeneration
:members: generate, forward
CamemBERT CamemBERT
---------------------------------------------------- ----------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
The CamemBERT model was proposed in `CamemBERT: a Tasty French Language Model <https://arxiv.org/abs/1911.03894>`__ The CamemBERT model was proposed in `CamemBERT: a Tasty French Language Model <https://arxiv.org/abs/1911.03894>`__
by Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez, Yoann Dupont, Laurent Romary, Éric Villemonte de la by Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez, Yoann Dupont, Laurent Romary, Éric Villemonte de la
Clergerie, Djamé Seddah, and Benoît Sagot. It is based on Facebook's RoBERTa model released in 2019. It is a model Clergerie, Djamé Seddah, and Benoît Sagot. It is based on Facebook's RoBERTa model released in 2019. It is a model
...@@ -74,6 +77,13 @@ CamembertForTokenClassification ...@@ -74,6 +77,13 @@ CamembertForTokenClassification
:members: :members:
CamembertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertForQuestionAnswering
:members:
TFCamembertModel TFCamembertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
...@@ -100,3 +110,10 @@ TFCamembertForTokenClassification ...@@ -100,3 +110,10 @@ TFCamembertForTokenClassification
.. autoclass:: transformers.TFCamembertForTokenClassification .. autoclass:: transformers.TFCamembertForTokenClassification
:members: :members:
TFCamembertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFCamembertForQuestionAnswering
:members:
\ No newline at end of file
CTRL CTRL
---------------------------------------------------- ----------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
CTRL model was proposed in `CTRL: A Conditional Transformer Language Model for Controllable Generation <https://arxiv.org/abs/1909.05858>`_ CTRL model was proposed in `CTRL: A Conditional Transformer Language Model for Controllable Generation <https://arxiv.org/abs/1909.05858>`_
by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher. by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
It's a causal (unidirectional) transformer pre-trained using language modeling on a very large It's a causal (unidirectional) transformer pre-trained using language modeling on a very large
......
DistilBERT DistilBERT
---------------------------------------------------- ----------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
The DistilBERT model was proposed in the blog post The DistilBERT model was proposed in the blog post
`Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT <https://medium.com/huggingface/distilbert-8cf3380435b5>`__, `Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT <https://medium.com/huggingface/distilbert-8cf3380435b5>`__,
and the paper `DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter <https://arxiv.org/abs/1910.01108>`__. and the paper `DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter <https://arxiv.org/abs/1910.01108>`__.
...@@ -72,6 +75,13 @@ DistilBertForSequenceClassification ...@@ -72,6 +75,13 @@ DistilBertForSequenceClassification
:members: :members:
DistilBertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertForTokenClassification
:members:
DistilBertForQuestionAnswering DistilBertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
...@@ -99,6 +109,22 @@ TFDistilBertForSequenceClassification ...@@ -99,6 +109,22 @@ TFDistilBertForSequenceClassification
:members: :members:
TFDistilBertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFDistilBertForMultipleChoice
:members:
TFDistilBertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFDistilBertForTokenClassification
:members:
TFDistilBertForQuestionAnswering TFDistilBertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......
ELECTRA ELECTRA
---------------------------------------------------- ----------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
The ELECTRA model was proposed in the paper. The ELECTRA model was proposed in the paper.
`ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators <https://openreview.net/pdf?id=r1xMH1BtvB>`__. `ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators <https://openreview.net/pdf?id=r1xMH1BtvB>`__.
ELECTRA is a new pre-training approach which trains two transformer models: the generator and the discriminator. The ELECTRA is a new pre-training approach which trains two transformer models: the generator and the discriminator. The
...@@ -89,6 +92,13 @@ ElectraForMaskedLM ...@@ -89,6 +92,13 @@ ElectraForMaskedLM
:members: :members:
ElectraForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraForSequenceClassification
:members:
ElectraForTokenClassification ElectraForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
...@@ -122,3 +132,10 @@ TFElectraForTokenClassification ...@@ -122,3 +132,10 @@ TFElectraForTokenClassification
.. autoclass:: transformers.TFElectraForTokenClassification .. autoclass:: transformers.TFElectraForTokenClassification
:members: :members:
TFElectraForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFElectraForQuestionAnswering
:members:
\ No newline at end of file
FlauBERT FlauBERT
---------------------------------------------------- ----------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
The FlauBERT model was proposed in the paper The FlauBERT model was proposed in the paper
`FlauBERT: Unsupervised Language Model Pre-training for French <https://arxiv.org/abs/1912.05372>`__ by Hang Le et al. `FlauBERT: Unsupervised Language Model Pre-training for French <https://arxiv.org/abs/1912.05372>`__ by Hang Le et al.
It's a transformer pre-trained using a masked language modeling (MLM) objective (BERT-like). It's a transformer pre-trained using a masked language modeling (MLM) objective (BERT-like).
...@@ -72,3 +75,43 @@ FlaubertForQuestionAnswering ...@@ -72,3 +75,43 @@ FlaubertForQuestionAnswering
:members: :members:
TFFlaubertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFlaubertModel
:members:
TFFlaubertWithLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFlaubertWithLMHeadModel
:members:
TFFlaubertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFlaubertForSequenceClassification
:members:
TFFlaubertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFlaubertForMultipleChoice
:members:
TFFlaubertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFlaubertForTokenClassification
:members:
TFFlaubertForQuestionAnsweringSimple
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFlaubertForQuestionAnsweringSimple
:members:
\ No newline at end of file
...@@ -4,7 +4,7 @@ Longformer ...@@ -4,7 +4,7 @@ Longformer
file a `Github Issue <https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`_ file a `Github Issue <https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`_
Overview Overview
~~~~~ ~~~~~~~~~
The Longformer model was presented in `Longformer: The Long-Document Transformer <https://arxiv.org/pdf/2004.05150.pdf>`_ by Iz Beltagy, Matthew E. Peters, Arman Cohan. The Longformer model was presented in `Longformer: The Long-Document Transformer <https://arxiv.org/pdf/2004.05150.pdf>`_ by Iz Beltagy, Matthew E. Peters, Arman Cohan.
Here the abstract: Here the abstract:
...@@ -13,7 +13,7 @@ Here the abstract: ...@@ -13,7 +13,7 @@ Here the abstract:
The Authors' code can be found `here <https://github.com/allenai/longformer>`_ . The Authors' code can be found `here <https://github.com/allenai/longformer>`_ .
Longformer Self Attention Longformer Self Attention
~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~
Longformer self attention employs self attention on both a "local" context and a "global" context. Longformer self attention employs self attention on both a "local" context and a "global" context.
Most tokens only attend "locally" to each other meaning that each token attends to its :math:`\frac{1}{2} w` previous tokens and :math:`\frac{1}{2} w` succeding tokens with :math:`w` being the window length as defined in `config.attention_window`. Note that `config.attention_window` can be of type ``list`` to define a different :math:`w` for each layer. Most tokens only attend "locally" to each other meaning that each token attends to its :math:`\frac{1}{2} w` previous tokens and :math:`\frac{1}{2} w` succeding tokens with :math:`w` being the window length as defined in `config.attention_window`. Note that `config.attention_window` can be of type ``list`` to define a different :math:`w` for each layer.
A selecetd few tokens attend "globally" to all other tokens, as it is conventionally done for all tokens in *e.g.* `BertSelfAttention`. A selecetd few tokens attend "globally" to all other tokens, as it is conventionally done for all tokens in *e.g.* `BertSelfAttention`.
...@@ -55,6 +55,13 @@ LongformerTokenizer ...@@ -55,6 +55,13 @@ LongformerTokenizer
:members: :members:
LongformerTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerTokenizerFast
:members:
LongformerModel LongformerModel
~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~
...@@ -69,10 +76,10 @@ LongformerForMaskedLM ...@@ -69,10 +76,10 @@ LongformerForMaskedLM
:members: :members:
LongformerForQuestionAnswering LongformerForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerForQuestionAnswering .. autoclass:: transformers.LongformerForSequenceClassification
:members: :members:
...@@ -89,3 +96,9 @@ LongformerForTokenClassification ...@@ -89,3 +96,9 @@ LongformerForTokenClassification
.. autoclass:: transformers.LongformerForTokenClassification .. autoclass:: transformers.LongformerForTokenClassification
:members: :members:
LongformerForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerForQuestionAnswering
:members:
...@@ -6,11 +6,11 @@ file a `Github Issue <https://github.com/huggingface/transformers/issues/new?ass ...@@ -6,11 +6,11 @@ file a `Github Issue <https://github.com/huggingface/transformers/issues/new?ass
Implementation Notes Implementation Notes
~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~
- each model is about 298 MB on disk, there are 1,000+ models. - Each model is about 298 MB on disk, there are 1,000+ models.
- The list of supported language pairs can be found `here <https://huggingface.co/Helsinki-NLP>`__. - The list of supported language pairs can be found `here <https://huggingface.co/Helsinki-NLP>`__.
- The 1,000+ models were originally trained by `Jörg Tiedemann <https://researchportal.helsinki.fi/en/persons/j%C3%B6rg-tiedemann>`__ using the `Marian <https://marian-nmt.github.io/>`_ C++ library, which supports fast training and translation. - The 1,000+ models were originally trained by `Jörg Tiedemann <https://researchportal.helsinki.fi/en/persons/j%C3%B6rg-tiedemann>`__ using the `Marian <https://marian-nmt.github.io/>`_ C++ library, which supports fast training and translation.
- All models are transformer encoder-decoders with 6 layers in each component. Each model's performance is documented in a model card. - All models are transformer encoder-decoders with 6 layers in each component. Each model's performance is documented in a model card.
- the 80 opus models that require BPE preprocessing are not supported. - The 80 opus models that require BPE preprocessing are not supported.
- The modeling code is the same as ``BartForConditionalGeneration`` with a few minor modifications: - The modeling code is the same as ``BartForConditionalGeneration`` with a few minor modifications:
- static (sinusoid) positional embeddings (``MarianConfig.static_position_embeddings=True``) - static (sinusoid) positional embeddings (``MarianConfig.static_position_embeddings=True``)
- a new final_logits_bias (``MarianConfig.add_bias_logits=True``) - a new final_logits_bias (``MarianConfig.add_bias_logits=True``)
...@@ -86,6 +86,19 @@ Code to see available pretrained models: ...@@ -86,6 +86,19 @@ Code to see available pretrained models:
suffix = [x.split('/')[1] for x in model_ids] suffix = [x.split('/')[1] for x in model_ids]
multi_models = [f'{org}/{s}' for s in suffix if s != s.lower()] multi_models = [f'{org}/{s}' for s in suffix if s != s.lower()]
MarianConfig
~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MarianConfig
:members:
MarianTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MarianTokenizer
:members: prepare_translation_batch
MarianMTModel MarianMTModel
~~~~~~~~~~~~~ ~~~~~~~~~~~~~
...@@ -96,10 +109,3 @@ This class inherits all functionality from ``BartForConditionalGeneration``, see ...@@ -96,10 +109,3 @@ This class inherits all functionality from ``BartForConditionalGeneration``, see
.. autoclass:: transformers.MarianMTModel .. autoclass:: transformers.MarianMTModel
:members: :members:
MarianTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MarianTokenizer
:members: prepare_translation_batch
RoBERTa RoBERTa
---------------------------------------------------- ----------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
The RoBERTa model was proposed in `RoBERTa: A Robustly Optimized BERT Pretraining Approach <https://arxiv.org/abs/1907.11692>`_ The RoBERTa model was proposed in `RoBERTa: A Robustly Optimized BERT Pretraining Approach <https://arxiv.org/abs/1907.11692>`_
by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer,
Veselin Stoyanov. It is based on Google's BERT model released in 2018. Veselin Stoyanov. It is based on Google's BERT model released in 2018.
...@@ -87,6 +90,14 @@ RobertaForTokenClassification ...@@ -87,6 +90,14 @@ RobertaForTokenClassification
.. autoclass:: transformers.RobertaForTokenClassification .. autoclass:: transformers.RobertaForTokenClassification
:members: :members:
RobertaForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RobertaForQuestionAnswering
:members:
TFRobertaModel TFRobertaModel
~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~
...@@ -108,8 +119,22 @@ TFRobertaForSequenceClassification ...@@ -108,8 +119,22 @@ TFRobertaForSequenceClassification
:members: :members:
TFRobertaForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFRobertaForMultipleChoice
:members:
TFRobertaForTokenClassification TFRobertaForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFRobertaForTokenClassification .. autoclass:: transformers.TFRobertaForTokenClassification
:members: :members:
TFRobertaForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFRobertaForQuestionAnswering
:members:
...@@ -4,7 +4,8 @@ T5 ...@@ -4,7 +4,8 @@ T5
file a `Github Issue <https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`_ file a `Github Issue <https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`_
Overview Overview
~~~~~ ~~~~~~~~~~~~~~~~~~~~~
The T5 model was presented in `Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer <https://arxiv.org/pdf/1910.10683.pdf>`_ by Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu in The T5 model was presented in `Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer <https://arxiv.org/pdf/1910.10683.pdf>`_ by Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu in
Here the abstract: Here the abstract:
...@@ -14,10 +15,20 @@ Our systematic study compares pre-training objectives, architectures, unlabeled ...@@ -14,10 +15,20 @@ Our systematic study compares pre-training objectives, architectures, unlabeled
By combining the insights from our exploration with scale and our new "Colossal Clean Crawled Corpus", we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. By combining the insights from our exploration with scale and our new "Colossal Clean Crawled Corpus", we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.
To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.* To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.*
The Authors' code can be found `here <https://github.com/google-research/text-to-text-transfer-transformer>`_ . Tips:
- T5 is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised
and supervised tasks and for which each task is converted into a text-to-text format.
T5 works well on a variety of tasks out-of-the-box by prepending a different prefix to the input corresponding to each task, e.g.: for translation: *translate English to German: ..., summarize: ...*.
For more information about which prefix to use, it is easiest to look into Appendix D of the `paper <https://arxiv.org/pdf/1910.10683.pdf>`_ .
- For sequence to sequence generation, it is recommended to use ``T5ForConditionalGeneration.generate()``. The method takes care of feeding the encoded input via cross-attention layers to the decoder and auto-regressively generates the decoder output.
- T5 uses relative scalar embeddings. Encoder input padding can be done on the left and on the right.
The original code can be found `here <https://github.com/google-research/text-to-text-transfer-transformer>`_.
Training Training
~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~
T5 is an encoder-decoder model and converts all NLP problems into a text-to-text format. It is trained using teacher forcing. T5 is an encoder-decoder model and converts all NLP problems into a text-to-text format. It is trained using teacher forcing.
This means that for training we always need an input sequence and a target sequence. This means that for training we always need an input sequence and a target sequence.
The input sequence is fed to the model using ``input_ids``. The target sequence is shifted to the right, *i.e.* prepended by a start-sequence token and fed to the decoder using the `decoder_input_ids`. In teacher-forcing style, the target sequence is then appended by the EOS token and corresponds to the ``lm_labels``. The PAD token is hereby used as the start-sequence token. The input sequence is fed to the model using ``input_ids``. The target sequence is shifted to the right, *i.e.* prepended by a start-sequence token and fed to the decoder using the `decoder_input_ids`. In teacher-forcing style, the target sequence is then appended by the EOS token and corresponds to the ``lm_labels``. The PAD token is hereby used as the start-sequence token.
...@@ -50,17 +61,6 @@ T5 can be trained / fine-tuned both in a supervised and unsupervised fashion. ...@@ -50,17 +61,6 @@ T5 can be trained / fine-tuned both in a supervised and unsupervised fashion.
# the forward function automatically creates the correct decoder_input_ids # the forward function automatically creates the correct decoder_input_ids
model(input_ids=input_ids, lm_labels=lm_labels) model(input_ids=input_ids, lm_labels=lm_labels)
Tips
~~~~~~~~~~~~~~~~~~~~
- T5 is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised
and supervised tasks and for which each task is converted into a text-to-text format.
T5 works well on a variety of tasks out-of-the-box by prepending a different prefix to the input corresponding to each task, e.g.: for translation: *translate English to German: ..., summarize: ...*.
For more information about which prefix to use, it is easiest to look into Appendix D of the `paper <https://arxiv.org/pdf/1910.10683.pdf>`_ .
- For sequence to sequence generation, it is recommended to use ``T5ForConditionalGeneration.generate()``. The method takes care of feeding the encoded input via cross-attention layers to the decoder and auto-regressively generates the decoder output.
- T5 uses relative scalar embeddings. Encoder input padding can be done on the left and on the right.
The original code can be found `here <https://github.com/google-research/text-to-text-transfer-transformer>`_.
T5Config T5Config
~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~
...@@ -99,7 +99,7 @@ TFT5Model ...@@ -99,7 +99,7 @@ TFT5Model
TFT5ForConditionalGeneration TFT5ForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFT5ForConditionalGeneration .. autoclass:: transformers.TFT5ForConditionalGeneration
:members: :members:
...@@ -102,6 +102,21 @@ TFXLMForSequenceClassification ...@@ -102,6 +102,21 @@ TFXLMForSequenceClassification
:members: :members:
TFXLMForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMForMultipleChoice
:members:
TFXLMForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMForTokenClassification
:members:
TFXLMForQuestionAnsweringSimple TFXLMForQuestionAnsweringSimple
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......
XLM-RoBERTa XLM-RoBERTa
------------------------------------------ ------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
The XLM-RoBERTa model was proposed in `Unsupervised Cross-lingual Representation Learning at Scale <https://arxiv.org/abs/1911.02116>`__ The XLM-RoBERTa model was proposed in `Unsupervised Cross-lingual Representation Learning at Scale <https://arxiv.org/abs/1911.02116>`__
by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán,
Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. It is based on Facebook's RoBERTa model released in 2019. Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. It is based on Facebook's RoBERTa model released in 2019.
...@@ -102,8 +105,22 @@ TFXLMRobertaForSequenceClassification ...@@ -102,8 +105,22 @@ TFXLMRobertaForSequenceClassification
:members: :members:
TFXLMRobertaForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMRobertaForMultipleChoice
:members:
TFXLMRobertaForTokenClassification TFXLMRobertaForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMRobertaForTokenClassification .. autoclass:: transformers.TFXLMRobertaForTokenClassification
:members: :members:
TFXLMRobertaForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMRobertaForQuestionAnswering
:members:
\ No newline at end of file
...@@ -71,17 +71,17 @@ XLNetForSequenceClassification ...@@ -71,17 +71,17 @@ XLNetForSequenceClassification
:members: :members:
XLNetForTokenClassification XLNetForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLNetForTokenClassification .. autoclass:: transformers.XLNetForMultipleChoice
:members: :members:
XLNetForMultipleChoice XLNetForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLNetForMultipleChoice .. autoclass:: transformers.XLNetForTokenClassification
:members: :members:
...@@ -120,6 +120,20 @@ TFXLNetForSequenceClassification ...@@ -120,6 +120,20 @@ TFXLNetForSequenceClassification
:members: :members:
TFLNetForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLNetForMultipleChoice
:members:
TFXLNetForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLNetForTokenClassification
:members:
TFXLNetForQuestionAnsweringSimple TFXLNetForQuestionAnsweringSimple
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......
...@@ -406,8 +406,8 @@ class ElectraForSequenceClassification(ElectraPreTrainedModel): ...@@ -406,8 +406,8 @@ class ElectraForSequenceClassification(ElectraPreTrainedModel):
from transformers import BertTokenizer, BertForSequenceClassification from transformers import BertTokenizer, BertForSequenceClassification
import torch import torch
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') tokenizer = ElectraTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained('bert-base-uncased') model = ElectraForSequenceClassification.from_pretrained('bert-base-uncased')
input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0) # Batch size 1 input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0) # Batch size 1
labels = torch.tensor([1]).unsqueeze(0) # Batch size 1 labels = torch.tensor([1]).unsqueeze(0) # Batch size 1
......
...@@ -499,9 +499,10 @@ LONGFORMER_INPUTS_DOCSTRING = r""" ...@@ -499,9 +499,10 @@ LONGFORMER_INPUTS_DOCSTRING = r"""
class LongformerModel(RobertaModel): class LongformerModel(RobertaModel):
""" """
This class overrides :class:`~transformers.RobertaModel` to provide the ability to process This class overrides :class:`~transformers.RobertaModel` to provide the ability to process
long sequences following the selfattention approach described in `Longformer: the Long-Document Transformer`_by long sequences following the selfattention approach described in `Longformer: the Long-Document Transformer
Iz Beltagy, Matthew E. Peters, and Arman Cohan. Longformer selfattention combines a local (sliding window) <https://arxiv.org/abs/2004.05150>`_ by Iz Beltagy, Matthew E. Peters, and Arman Cohan. Longformer selfattention
and global attention to extend to long documents without the O(n^2) increase in memory and compute. combines a local (sliding window) and global attention to extend to long documents without the O(n^2) increase in
memory and compute.
The selfattention module `LongformerSelfAttention` implemented here supports the combination of local and The selfattention module `LongformerSelfAttention` implemented here supports the combination of local and
global attention but it lacks support for autoregressive attention and dilated attention. Autoregressive global attention but it lacks support for autoregressive attention and dilated attention. Autoregressive
...@@ -509,9 +510,6 @@ class LongformerModel(RobertaModel): ...@@ -509,9 +510,6 @@ class LongformerModel(RobertaModel):
tasks. Future release will add support for autoregressive attention, but the support for dilated attention tasks. Future release will add support for autoregressive attention, but the support for dilated attention
requires a custom CUDA kernel to be memory and compute efficient. requires a custom CUDA kernel to be memory and compute efficient.
.. _`Longformer: the Long-Document Transformer`:
https://arxiv.org/abs/2004.05150
""" """
config_class = LongformerConfig config_class = LongformerConfig
...@@ -1154,7 +1152,7 @@ class LongformerForMultipleChoice(BertPreTrainedModel): ...@@ -1154,7 +1152,7 @@ class LongformerForMultipleChoice(BertPreTrainedModel):
Returns: Returns:
:obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (:class:`~transformers.RobertaConfig`) and inputs: :obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (:class:`~transformers.RobertaConfig`) and inputs:
loss (:obj:`torch.FloatTensor`` of shape ``(1,)`, `optional`, returned when :obj:`labels` is provided): loss (:obj:`torch.FloatTensor`` of shape `(1,)`, `optional`, returned when :obj:`labels` is provided):
Classification loss. Classification loss.
classification_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, num_choices)`): classification_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, num_choices)`):
`num_choices` is the second dimension of the input tensors. (see `input_ids` above). `num_choices` is the second dimension of the input tensors. (see `input_ids` above).
......
...@@ -407,7 +407,7 @@ class RobertaForMultipleChoice(BertPreTrainedModel): ...@@ -407,7 +407,7 @@ class RobertaForMultipleChoice(BertPreTrainedModel):
Returns: Returns:
:obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (:class:`~transformers.RobertaConfig`) and inputs: :obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (:class:`~transformers.RobertaConfig`) and inputs:
loss (:obj:`torch.FloatTensor`` of shape ``(1,)`, `optional`, returned when :obj:`labels` is provided): loss (:obj:`torch.FloatTensor`` of shape `(1,)`, `optional`, returned when :obj:`labels` is provided):
Classification loss. Classification loss.
classification_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, num_choices)`): classification_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, num_choices)`):
`num_choices` is the second dimension of the input tensors. (see `input_ids` above). `num_choices` is the second dimension of the input tensors. (see `input_ids` above).
......
...@@ -1328,7 +1328,7 @@ class XLNetForMultipleChoice(XLNetPreTrainedModel): ...@@ -1328,7 +1328,7 @@ class XLNetForMultipleChoice(XLNetPreTrainedModel):
Returns: Returns:
:obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (:class:`~transformers.XLNetConfig`) and inputs: :obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (:class:`~transformers.XLNetConfig`) and inputs:
loss (:obj:`torch.FloatTensor`` of shape ``(1,)`, `optional`, returned when :obj:`labels` is provided): loss (:obj:`torch.FloatTensor`` of shape `(1,)`, `optional`, returned when :obj:`labels` is provided):
Classification loss. Classification loss.
classification_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, num_choices)`): classification_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, num_choices)`):
`num_choices` is the second dimension of the input tensors. (see `input_ids` above). `num_choices` is the second dimension of the input tensors. (see `input_ids` above).
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment