Unverified Commit 96f4828a authored by Lysandre Debut's avatar Lysandre Debut Committed by GitHub
Browse files

Respect the 119 line chars (#7928)

parent ef0ac063
...@@ -500,8 +500,8 @@ BART ...@@ -500,8 +500,8 @@ BART
<https://arxiv.org/abs/1910.13461>`_, Mike Lewis et al. <https://arxiv.org/abs/1910.13461>`_, Mike Lewis et al.
Sequence-to-sequence model with an encoder and a decoder. Encoder is fed a corrupted version of the tokens, decoder is Sequence-to-sequence model with an encoder and a decoder. Encoder is fed a corrupted version of the tokens, decoder is
fed the original tokens (but has a mask to hide the future words like a regular transformers decoder). For the encoder, on the fed the original tokens (but has a mask to hide the future words like a regular transformers decoder). For the encoder
pretraining tasks, a composition of the following transformations are applied: , on the pretraining tasks, a composition of the following transformations are applied:
* mask random tokens (like in BERT) * mask random tokens (like in BERT)
* delete random tokens * delete random tokens
...@@ -526,12 +526,17 @@ Pegasus ...@@ -526,12 +526,17 @@ Pegasus
`PEGASUS: Pre-training with Extracted Gap-sentences forAbstractive Summarization `PEGASUS: Pre-training with Extracted Gap-sentences forAbstractive Summarization
<https://arxiv.org/pdf/1912.08777.pdf>`_, Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu on Dec 18, 2019. <https://arxiv.org/pdf/1912.08777.pdf>`_, Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu on Dec 18, 2019.
Sequence-to-sequence model with the same encoder-decoder model architecture as BART. Pegasus is pre-trained jointly on two self-supervised objective functions: Masked Language Modeling (MLM) and a novel summarization specific pre-training objective, called Gap Sentence Generation (GSG). Sequence-to-sequence model with the same encoder-decoder model architecture as BART. Pegasus is pre-trained jointly on
two self-supervised objective functions: Masked Language Modeling (MLM) and a novel summarization specific pre-training
objective, called Gap Sentence Generation (GSG).
* MLM: encoder input tokens are randomely replaced by a mask tokens and have to be predicted by the encoder (like in BERT) * MLM: encoder input tokens are randomely replaced by a mask tokens and have to be predicted by the encoder (like
* GSG: whole encoder input sentences are replaced by a second mask token and fed to the decoder, but which has a causal mask to hide the future words like a regular auto-regressive transformer decoder. in BERT)
* GSG: whole encoder input sentences are replaced by a second mask token and fed to the decoder, but which has a
causal mask to hide the future words like a regular auto-regressive transformer decoder.
In contrast to BART, Pegasus' pretraining task is intentionally similar to summarization: important sentences are masked and are generated together as one output sequence from the remaining sentences, similar to an extractive summary. In contrast to BART, Pegasus' pretraining task is intentionally similar to summarization: important sentences are
masked and are generated together as one output sequence from the remaining sentences, similar to an extractive summary.
The library provides a version of this model for conditional generation, which should be used for summarization. The library provides a version of this model for conditional generation, which should be used for summarization.
...@@ -577,11 +582,12 @@ The pretraining includes both supervised and self-supervised training. Supervise ...@@ -577,11 +582,12 @@ The pretraining includes both supervised and self-supervised training. Supervise
tasks provided by the GLUE and SuperGLUE benchmarks (converting them into text-to-text tasks as explained above). tasks provided by the GLUE and SuperGLUE benchmarks (converting them into text-to-text tasks as explained above).
Self-supervised training uses corrupted tokens, by randomly removing 15% of the tokens and Self-supervised training uses corrupted tokens, by randomly removing 15% of the tokens and
replacing them with individual sentinel tokens (if several consecutive tokens are marked for removal, the whole group is replaced with a single sentinel token). The input of the encoder is the corrupted sentence, the input of the decoder is the replacing them with individual sentinel tokens (if several consecutive tokens are marked for removal, the whole group
original sentence and the target is then the dropped out tokens delimited by their sentinel tokens. is replaced with a single sentinel token). The input of the encoder is the corrupted sentence, the input of the decoder
is the original sentence and the target is then the dropped out tokens delimited by their sentinel tokens.
For instance, if we have the sentence My dog is very cute ., and we decide to remove the tokens: "dog", "is" and "cute", the encoder For instance, if we have the sentence My dog is very cute ., and we decide to remove the tokens: "dog", "is" and
input becomes My <x> very <y> . and the target input becomes <x> dog is <y> cute .<z> "cute", the encoder input becomes My <x> very <y> . and the target input becomes <x> dog is <y> cute .<z>
The library provides a version of this model for conditional generation. The library provides a version of this model for conditional generation.
...@@ -597,7 +603,8 @@ MBart ...@@ -597,7 +603,8 @@ MBart
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-mbart-blueviolet"> <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-mbart-blueviolet">
</a> </a>
`Multilingual Denoising Pre-training for Neural Machine Translation <https://arxiv.org/abs/2001.08210>`_ by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov `Multilingual Denoising Pre-training for Neural Machine Translation <https://arxiv.org/abs/2001.08210>`_ by Yinhan
Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov
Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer. Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
The model architecture and pre-training objective is same as BART, but MBart is trained on 25 languages The model architecture and pre-training objective is same as BART, but MBart is trained on 25 languages
...@@ -606,11 +613,12 @@ for pre-training a complete sequence-to-sequence model by denoising full texts i ...@@ -606,11 +613,12 @@ for pre-training a complete sequence-to-sequence model by denoising full texts i
The library provides a version of this model for conditional generation. The library provides a version of this model for conditional generation.
The `mbart-large-en-ro checkpoint <https://huggingface.co/facebook/mbart-large-en-ro>`_ can be used for english -> romanian translation. The `mbart-large-en-ro checkpoint <https://huggingface.co/facebook/mbart-large-en-ro>`_ can be used for english ->
romanian translation.
The `mbart-large-cc25 <https://huggingface.co/facebook/mbart-large-cc25>`_ checkpoint can be finetuned for other translation and summarization tasks, using code in ```examples/seq2seq/``` , but is not very useful without finetuning. The `mbart-large-cc25 <https://huggingface.co/facebook/mbart-large-cc25>`_ checkpoint can be finetuned for other
translation and summarization tasks, using code in ```examples/seq2seq/``` , but is not very useful without finetuning.
.. _multimodal-models:
ProphetNet ProphetNet
----------------------------------------------------------------------------------------------------------------------- -----------------------------------------------------------------------------------------------------------------------
...@@ -624,12 +632,18 @@ ProphetNet ...@@ -624,12 +632,18 @@ ProphetNet
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-prophetnet-blueviolet"> <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-prophetnet-blueviolet">
</a> </a>
`ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training, <https://arxiv.org/abs/2001.04063>`__ by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, Ming Zhou. `ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training, <https://arxiv.org/abs/2001.04063>`__ by
Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, Ming Zhou.
ProphetNet introduces a novel *sequence-to-sequence* pre-training objective, called *future n-gram prediction*. In future n-gram prediction, the model predicts the next n tokens simultaneously based on previous context tokens at each time step instead instead of just the single next token. The future n-gram prediction explicitly encourages the model to plan for the future tokens and prevent overfitting on strong local correlations. ProphetNet introduces a novel *sequence-to-sequence* pre-training objective, called *future n-gram prediction*. In
The model architecture is based on the original Transformer, but replaces the "standard" self-attention mechanism in the decoder by a a main self-attention mechanism and a self and n-stream (predict) self-attention mechanism. future n-gram prediction, the model predicts the next n tokens simultaneously based on previous context tokens at
each time step instead instead of just the single next token. The future n-gram prediction explicitly encourages
the model to plan for the future tokens and prevent overfitting on strong local correlations.
The model architecture is based on the original Transformer, but replaces the "standard" self-attention mechanism
in the decoder by a a main self-attention mechanism and a self and n-stream (predict) self-attention mechanism.
The library provides a pre-trained version of this model for conditional generation and a fine-tuned version for summarization. The library provides a pre-trained version of this model for conditional generation and a fine-tuned version for
summarization.
XLM-ProphetNet XLM-ProphetNet
----------------------------------------------------------------------------------------------------------------------- -----------------------------------------------------------------------------------------------------------------------
...@@ -643,11 +657,16 @@ XLM-ProphetNet ...@@ -643,11 +657,16 @@ XLM-ProphetNet
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-xprophetnet-blueviolet"> <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-xprophetnet-blueviolet">
</a> </a>
`ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training, <https://arxiv.org/abs/2001.04063>`__ by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, Ming Zhou. `ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training, <https://arxiv.org/abs/2001.04063>`__ by
Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, Ming Zhou.
XLM-ProphetNet's model architecture and pre-training objective is same as ProphetNet, but XLM-ProphetNet was
pre-trained on the cross-lingual dataset `XGLUE <https://arxiv.org/abs/2004.01401>`__.
XLM-ProphetNet's model architecture and pre-training objective is same as ProphetNet, but XLM-ProphetNet was pre-trained on the cross-lingual dataset `XGLUE <https://arxiv.org/abs/2004.01401>`__. The library provides a pre-trained version of this model for multi-lingual conditional generation and fine-tuned
versions for headline generation and question generation, respectively.
The library provides a pre-trained version of this model for multi-lingual conditional generation and fine-tuned versions for headline generation and question generation, respectively. .. _multimodal-models:
Multimodal models Multimodal models
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment