Unverified Commit c9564f53 authored by Suraj Patil's avatar Suraj Patil Committed by GitHub
Browse files

[Doc] add more MBart and other doc (#6490)

* add mbart example

* add Pegasus and MBart in readme

* typo

* add MBart in Pretrained models

* add pre-proc doc

* add DPR in readme

* fix indent

* doc fix
parent f68c8731
...@@ -167,8 +167,13 @@ At some point in the future, you'll be able to seamlessly move from pre-training ...@@ -167,8 +167,13 @@ At some point in the future, you'll be able to seamlessly move from pre-training
19. **[Reformer](https://huggingface.co/transformers/model_doc/reformer.html)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya. 19. **[Reformer](https://huggingface.co/transformers/model_doc/reformer.html)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
20. **[MarianMT](https://huggingface.co/transformers/model_doc/marian.html)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team. 20. **[MarianMT](https://huggingface.co/transformers/model_doc/marian.html)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
21. **[Longformer](https://huggingface.co/transformers/model_doc/longformer.html)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan. 21. **[Longformer](https://huggingface.co/transformers/model_doc/longformer.html)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
22. **[Other community models](https://huggingface.co/models)**, contributed by the [community](https://huggingface.co/users). 22. **[DPR](https://github.com/facebookresearch/DPR)** (from Facebook) released with the paper [Dense Passage Retrieval
23. Want to contribute a new model? We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder of the repository. Be sure to check the [contributing guidelines](./CONTRIBUTING.md) and contact the maintainers or open an issue to collect feedbacks before starting your PR. for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon
Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
23. **[Pegasus](https://github.com/google-research/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777)> by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
24. **[MBart](https://github.com/pytorch/fairseq/tree/master/examples/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
25. **[Other community models](https://huggingface.co/models)**, contributed by the [community](https://huggingface.co/users).
26. Want to contribute a new model? We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder of the repository. Be sure to check the [contributing guidelines](./CONTRIBUTING.md) and contact the maintainers or open an issue to collect feedbacks before starting your PR.
These implementations have been tested on several datasets (see the example scripts) and should match the performances of the original implementations (e.g. ~93 F1 on SQuAD for BERT Whole-Word-Masking, ~88 F1 on RocStories for OpenAI GPT, ~18.3 perplexity on WikiText 103 for Transformer-XL, ~0.916 Pearson R coefficient on STS-B for XLNet). You can find more details on the performances in the Examples section of the [documentation](https://huggingface.co/transformers/examples.html). These implementations have been tested on several datasets (see the example scripts) and should match the performances of the original implementations (e.g. ~93 F1 on SQuAD for BERT Whole-Word-Masking, ~88 F1 on RocStories for OpenAI GPT, ~18.3 perplexity on WikiText 103 for Transformer-XL, ~0.916 Pearson R coefficient on STS-B for XLNet). You can find more details on the performances in the Examples section of the [documentation](https://huggingface.co/transformers/examples.html).
......
...@@ -126,7 +126,7 @@ conversion utilities for the following models: ...@@ -126,7 +126,7 @@ conversion utilities for the following models:
Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
23. `Pegasus <https://github.com/google-research/pegasus>`_ (from Google) released with the paper `PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization 23. `Pegasus <https://github.com/google-research/pegasus>`_ (from Google) released with the paper `PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
<https://arxiv.org/abs/1912.08777>`_ by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu. <https://arxiv.org/abs/1912.08777>`_ by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
24. `MBart <https://github.com/pytorch/fairseq/tree/master/examples/mbart>`_ (from Facebook) released with the paper `Multilingual Denoising Pre-training for Neural Machine Translation <https://arxiv.org/abs/2001.08210>`_ by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov 24. `MBart <https://github.com/pytorch/fairseq/tree/master/examples/mbart>`_ (from Facebook) released with the paper `Multilingual Denoising Pre-training for Neural Machine Translation <https://arxiv.org/abs/2001.08210>`_ by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov,
Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer. Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
25. `Other community models <https://huggingface.co/models>`_, contributed by the `community 25. `Other community models <https://huggingface.co/models>`_, contributed by the `community
<https://huggingface.co/users>`_. <https://huggingface.co/users>`_.
......
...@@ -14,6 +14,45 @@ MBART is a sequence-to-sequence denoising auto-encoder pre-trained on large-scal ...@@ -14,6 +14,45 @@ MBART is a sequence-to-sequence denoising auto-encoder pre-trained on large-scal
The Authors' code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/mbart>`__ The Authors' code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/mbart>`__
Training
~~~~~~~~~~~~~~~~~~~~~
MBart is a multilingual encoder-decoder (seq-to-seq) model primarily intended for translation task.
As the model is multilingual it expects the sequences in a different format. A special language id token
is added in both the source and target text. The source text format is ``X [eos, src_lang_code]``
where ``X`` is the source text. The target text format is ```[tgt_lang_code] X [eos]```. ```bos``` is never used.
The ```MBartTokenizer.prepare_seq2seq_batch``` handles this automatically and should be used to encode
the sequences for seq-2-seq fine-tuning.
- Supervised training
::
example_english_phrase = "UN Chief Says There Is No Military Solution in Syria"
expected_translation_romanian = "Şeful ONU declară că nu există o soluţie militară în Siria"
batch = tokenizer.prepare_seq2seq_batch(example_english_phrase, src_lang="en_XX", tgt_lang="ro_RO", tgt_texts=expected_translation_romanian)
input_ids = batch["input_ids"]
target_ids = batch["decoder_input_ids"]
decoder_input_ids = target_ids[:, :-1].contiguous()
labels = target_ids[:, 1:].clone()
model(input_ids=input_ids, decoder_input_ids=decoder_input_ids, labels=labels) #forward
- Generation
While generating the target text set the `decoder_start_token_id` to the target language id.
The following example shows how to translate English to Romanian using the ```facebook/mbart-large-en-ro``` model.
::
from transformers import MBartForConditionalGeneration, MBartTokenizer
model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-en-ro")
tokenizer = MBartTokenizer.from_pretrained("facebook/mbart-large-en-ro")
article = "UN Chief Says There Is No Military Solution in Syria"
batch = tokenizer.prepare_seq2seq_batch(src_texts=[article], src_lang="en_XX")
translated_tokens = model.generate(**batch, decoder_start_token_id=tokenizer.lang_code_to_id["ro_RO"])
translation = tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0]
assert translation == "Şeful ONU declară că nu există o soluţie militară în Siria"
MBartConfig MBartConfig
~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~
......
...@@ -331,9 +331,6 @@ For a list that includes community-uploaded models, refer to `https://huggingfac ...@@ -331,9 +331,6 @@ For a list that includes community-uploaded models, refer to `https://huggingfac
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ | +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``facebook/bart-large-cnn`` | | 12-layer, 1024-hidden, 16-heads, 406M parameters (same as base) | | | ``facebook/bart-large-cnn`` | | 12-layer, 1024-hidden, 16-heads, 406M parameters (same as base) |
| | | | bart-large base architecture finetuned on cnn summarization task | | | | | bart-large base architecture finetuned on cnn summarization task |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``facebook/mbart-large-en-ro`` | | 12-layer, 1024-hidden, 16-heads, 880M parameters |
| | | | bart-large architecture pretrained on cc25 multilingual data , finetuned on WMT english romanian translation. |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ +-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| DialoGPT | ``DialoGPT-small`` | | 12-layer, 768-hidden, 12-heads, 124M parameters | | DialoGPT | ``DialoGPT-small`` | | 12-layer, 768-hidden, 12-heads, 124M parameters |
| | | | Trained on English text: 147M conversation-like exchanges extracted from Reddit. | | | | | Trained on English text: 147M conversation-like exchanges extracted from Reddit. |
...@@ -361,3 +358,9 @@ For a list that includes community-uploaded models, refer to `https://huggingfac ...@@ -361,3 +358,9 @@ For a list that includes community-uploaded models, refer to `https://huggingfac
| | ``allenai/longformer-large-4096`` | | 24-layer, 1024-hidden, 16-heads, ~435M parameters | | | ``allenai/longformer-large-4096`` | | 24-layer, 1024-hidden, 16-heads, ~435M parameters |
| | | | Starting from RoBERTa-large checkpoint, trained on documents of max length 4,096 | | | | | Starting from RoBERTa-large checkpoint, trained on documents of max length 4,096 |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ +-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| MBart | ``facebook/mbart-large-cc25`` | | 24-layer, 1024-hidden, 16-heads, 610M parameters |
| | | | mBART (bart-large architecture) model trained on 25 languages' monolingual corpus |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``facebook/mbart-large-en-ro`` | | 24-layer, 1024-hidden, 16-heads, 610M parameters |
| | | | mbart-large-cc25 model finetuned on WMT english romanian translation. |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
\ No newline at end of file
...@@ -30,9 +30,19 @@ MBART_START_DOCSTRING = r""" ...@@ -30,9 +30,19 @@ MBART_START_DOCSTRING = r"""
"The BART Model with a language modeling head. Can be used for machine translation.", MBART_START_DOCSTRING "The BART Model with a language modeling head. Can be used for machine translation.", MBART_START_DOCSTRING
) )
class MBartForConditionalGeneration(BartForConditionalGeneration): class MBartForConditionalGeneration(BartForConditionalGeneration):
""" r"""
This class overrides :class:`~transformers.BartForConditionalGeneration`. Please check the This class overrides :class:`~transformers.BartForConditionalGeneration`. Please check the
superclass for the appropriate documentation alongside usage examples. superclass for the appropriate documentation alongside usage examples.
Examples::
>>> from transformers import MBartForConditionalGeneration, MBartTokenizer
>>> model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-en-ro")
>>> tokenizer = MBartTokenizer.from_pretrained("facebook/mbart-large-en-ro")
>>> article = "UN Chief Says There Is No Military Solution in Syria"
>>> batch = tokenizer.prepare_seq2seq_batch(src_texts=[article])
>>> translated_tokens = model.generate(**batch)
>>> translation = tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0]
>>> assert translation == "Şeful ONU declară că nu există o soluţie militară în Siria"
""" """
config_class = MBartConfig config_class = MBartConfig
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment