[Doc] add more MBart and other doc (#6490)

* add mbart example * add Pegasus and MBart in readme * typo * add MBart in Pretrained models * add pre-proc doc * add DPR in readme * fix indent * doc fix

[Doc] add more MBart and other doc (#6490)
* add mbart example * add Pegasus and MBart in readme * typo * add MBart in Pretrained models * add pre-proc doc * add DPR in readme * fix indent * doc fix
c9564f53 · Suraj Patil · GitHub · f68c8731 · c9564f53 · c9564f53
Unverified Commit c9564f53 authored Aug 17, 2020 by Suraj Patil Committed by GitHub Aug 17, 2020
5 changed files
--- a/README.md
+++ b/README.md
@@ -167,8 +167,13 @@ At some point in the future, you'll be able to seamlessly move from pre-training
 19. **[Reformer](https://huggingface.co/transformers/model_doc/reformer.html)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
 20. **[MarianMT](https://huggingface.co/transformers/model_doc/marian.html)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
 21. **[Longformer](https://huggingface.co/transformers/model_doc/longformer.html)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
-22. **[Other community models](https://huggingface.co/models)**, contributed by the [community](https://huggingface.co/users).
+22. **[DPR](https://github.com/facebookresearch/DPR)** (from Facebook) released with the paper [Dense Passage Retrieval
-23. Want to contribute a new model? We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder of the repository. Be sure to check the [contributing guidelines](./CONTRIBUTING.md) and contact the maintainers or open an issue to collect feedbacks before starting your PR.
+for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon
+Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
+23. **[Pegasus](https://github.com/google-research/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777)> by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
+24. **[MBart](https://github.com/pytorch/fairseq/tree/master/examples/mbart)** (from Facebook) released with the paper  [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.  
+25. **[Other community models](https://huggingface.co/models)**, contributed by the [community](https://huggingface.co/users).
+26. Want to contribute a new model? We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder of the repository. Be sure to check the [contributing guidelines](./CONTRIBUTING.md) and contact the maintainers or open an issue to collect feedbacks before starting your PR.
 These implementations have been tested on several datasets (see the example scripts) and should match the performances of the original implementations (e.g. ~93 F1 on SQuAD for BERT Whole-Word-Masking, ~88 F1 on RocStories for OpenAI GPT, ~18.3 perplexity on WikiText 103 for Transformer-XL, ~0.916 Pearson R coefficient on STS-B for XLNet). You can find more details on the performances in the Examples section of the [documentation](https://huggingface.co/transformers/examples.html).

--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -126,7 +126,7 @@ conversion utilities for the following models:
    Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
 23. `Pegasus <https://github.com/google-research/pegasus>`_ (from Google) released with the paper `PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
    <https://arxiv.org/abs/1912.08777>`_ by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
-24. `MBart <https://github.com/pytorch/fairseq/tree/master/examples/mbart>`_ (from Facebook) released with the paper  `Multilingual Denoising Pre-training for Neural Machine Translation <https://arxiv.org/abs/2001.08210>`_ by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov
+24. `MBart <https://github.com/pytorch/fairseq/tree/master/examples/mbart>`_ (from Facebook) released with the paper  `Multilingual Denoising Pre-training for Neural Machine Translation <https://arxiv.org/abs/2001.08210>`_ by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov,
    Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.  
 25. `Other community models <https://huggingface.co/models>`_, contributed by the `community
    <https://huggingface.co/users>`_.

--- a/docs/source/model_doc/mbart.rst
+++ b/docs/source/model_doc/mbart.rst
@@ -14,6 +14,45 @@ MBART is a sequence-to-sequence denoising auto-encoder pre-trained on large-scal
 The Authors' code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/mbart>`__
+Training
+~~~~~~~~~~~~~~~~~~~~~
+MBart is a multilingual encoder-decoder (seq-to-seq) model primarily intended for translation task. 
+As the model is multilingual it expects the sequences in a different format. A special language id token 
+is added in both the source and target text. The source text format is ``X [eos, src_lang_code]`` 
+where ``X`` is the source text. The target text format is ```[tgt_lang_code] X [eos]```. ```bos``` is never used.
+The ```MBartTokenizer.prepare_seq2seq_batch``` handles this automatically and should be used to encode 
+the sequences for seq-2-seq fine-tuning.
+- Supervised training
+::
+    example_english_phrase = "UN Chief Says There Is No Military Solution in Syria"
+    expected_translation_romanian = "Şeful ONU declară că nu există o soluţie militară în Siria"
+    batch = tokenizer.prepare_seq2seq_batch(example_english_phrase, src_lang="en_XX", tgt_lang="ro_RO", tgt_texts=expected_translation_romanian)
+    input_ids = batch["input_ids"]
+    target_ids = batch["decoder_input_ids"]
+    decoder_input_ids = target_ids[:, :-1].contiguous()
+    labels = target_ids[:, 1:].clone()
+    model(input_ids=input_ids, decoder_input_ids=decoder_input_ids, labels=labels) #forward
+- Generation
+    While generating the target text set the `decoder_start_token_id` to the target language id. 
+    The following example shows how to translate English to Romanian using the ```facebook/mbart-large-en-ro``` model.
+::
+    from transformers import MBartForConditionalGeneration, MBartTokenizer
+    model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-en-ro")
+    tokenizer = MBartTokenizer.from_pretrained("facebook/mbart-large-en-ro")
+    article = "UN Chief Says There Is No Military Solution in Syria"
+    batch = tokenizer.prepare_seq2seq_batch(src_texts=[article], src_lang="en_XX")
+    translated_tokens = model.generate(**batch, decoder_start_token_id=tokenizer.lang_code_to_id["ro_RO"])
+    translation = tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0]
+    assert translation == "Şeful ONU declară că nu există o soluţie militară în Siria"
 MBartConfig
 ~~~~~~~~~~~~~~~~~~~~~

--- a/docs/source/pretrained_models.rst
+++ b/docs/source/pretrained_models.rst
@@ -331,9 +331,6 @@ For a list that includes community-uploaded models, refer to `https://huggingfac
 |                   +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
 |                   | ``facebook/bart-large-cnn``                                | | 12-layer, 1024-hidden, 16-heads, 406M parameters       (same as base)                                                               |
 |                   |                                                            | | bart-large base architecture finetuned on cnn summarization task                                                                    |
-|                   +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
-|                   | ``facebook/mbart-large-en-ro``                             | | 12-layer, 1024-hidden, 16-heads, 880M parameters                                                                                    |
-|                   |                                                            | | bart-large architecture pretrained on cc25 multilingual data , finetuned on WMT english romanian translation.                       |
 +-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
 | DialoGPT          | ``DialoGPT-small``                                         | | 12-layer, 768-hidden, 12-heads, 124M parameters                                                                                     |
 |                   |                                                            | | Trained on English text: 147M conversation-like exchanges extracted from Reddit.                                                    |
@@ -361,3 +358,9 @@ For a list that includes community-uploaded models, refer to `https://huggingfac
 |                   | ``allenai/longformer-large-4096``                          | | 24-layer, 1024-hidden, 16-heads, ~435M parameters                                                                                   |
 |                   |                                                            | | Starting from RoBERTa-large checkpoint, trained on documents of max length 4,096                                                    |
 +-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
+| MBart             | ``facebook/mbart-large-cc25``                              | | 24-layer, 1024-hidden, 16-heads, 610M parameters                                                                                    |
+|                   |                                                            | | mBART (bart-large architecture) model trained on 25 languages' monolingual corpus                                                   |
+|                   +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
+|                   | ``facebook/mbart-large-en-ro``                             | | 24-layer, 1024-hidden, 16-heads, 610M parameters                                                                                    |
+|                   |                                                            | | mbart-large-cc25 model finetuned on WMT english romanian translation.                                                               |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
\ No newline at end of file
--- a/src/transformers/modeling_mbart.py
+++ b/src/transformers/modeling_mbart.py
@@ -30,9 +30,19 @@ MBART_START_DOCSTRING = r"""
    "The BART Model with a language modeling head. Can be used for machine translation.", MBART_START_DOCSTRING
 )
 class MBartForConditionalGeneration(BartForConditionalGeneration):
-    """
+    r"""
    This class overrides :class:`~transformers.BartForConditionalGeneration`. Please check the
    superclass for the appropriate documentation alongside usage examples.
+    Examples::
+        >>> from transformers import MBartForConditionalGeneration, MBartTokenizer
+        >>> model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-en-ro")
+        >>> tokenizer = MBartTokenizer.from_pretrained("facebook/mbart-large-en-ro")
+        >>> article = "UN Chief Says There Is No Military Solution in Syria"
+        >>> batch = tokenizer.prepare_seq2seq_batch(src_texts=[article])
+        >>> translated_tokens = model.generate(**batch)
+        >>> translation = tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0]
+        >>> assert translation == "Şeful ONU declară că nu există o soluţie militară în Siria"
    """
    config_class = MBartConfig