Convert model files from rst to mdx (#14865)

* First pass * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Convert model files from rst to mdx (#14865)
* First pass * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
ec3567fe · Lysandre Debut · GitHub · d0422de5 · ec3567fe · d0422de5
Unverified Commit ec3567fe authored Dec 22, 2021 by Lysandre Debut Committed by GitHub Dec 22, 2021
20 changed files
--- a/docs/source/model_doc/encoderdecoder.mdx
+++ b/docs/source/model_doc/encoderdecoder.mdx
+<!--Copyright 2020 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# Encoder Decoder Models
+
+The [`EncoderDecoderModel`] can be used to initialize a sequence-to-sequence model with any
+pretrained autoencoding model as the encoder and any pretrained autoregressive model as the decoder.
+
+The effectiveness of initializing sequence-to-sequence models with pretrained checkpoints for sequence generation tasks
+was shown in [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by
+Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
+
+After such an [`EncoderDecoderModel`] has been trained/fine-tuned, it can be saved/loaded just like
+any other models (see the examples for more information).
+
+An application of this architecture could be to leverage two pretrained [`BertModel`] as the encoder
+and decoder for a summarization model as was shown in: [Text Summarization with Pretrained Encoders](https://arxiv.org/abs/1908.08345) by Yang Liu and Mirella Lapata.
+
+The [`~TFEncoderDecoderModel.from_pretrained`] currently doesn't support initializing the model from a
+pytorch checkpoint. Passing `from_pt=True` to this method will throw an exception. If there are only pytorch
+checkpoints for a particular encoder-decoder model, a workaround is:
+
+```python
+>>> # a workaround to load from pytorch checkpoint
+>>> _model = EncoderDecoderModel.from_pretrained("patrickvonplaten/bert2bert-cnn_dailymail-fp16")
+>>> _model.encoder.save_pretrained("./encoder")
+>>> _model.decoder.save_pretrained("./decoder")
+>>> model = TFEncoderDecoderModel.from_encoder_decoder_pretrained(
+...     "./encoder", "./decoder", encoder_from_pt=True, decoder_from_pt=True
+... )
+>>> # This is only for copying some specific attributes of this particular model.
+>>> model.config = _model.config
+```
+
+This model was contributed by [thomwolf](https://github.com/thomwolf). This model's TensorFlow and Flax versions
+were contributed by [ydshieh](https://github.com/ydshieh).
+
+
+## EncoderDecoderConfig
+
+[[autodoc]] EncoderDecoderConfig
+
+## EncoderDecoderModel
+
+[[autodoc]] EncoderDecoderModel
+    - forward
+    - from_encoder_decoder_pretrained
+
+## TFEncoderDecoderModel
+
+[[autodoc]] TFEncoderDecoderModel
+    - call
+    - from_encoder_decoder_pretrained
+
+## FlaxEncoderDecoderModel
+
+[[autodoc]] FlaxEncoderDecoderModel
+    - __call__
+    - from_encoder_decoder_pretrained
--- a/docs/source/model_doc/encoderdecoder.rst
+++ b/docs/source/model_doc/encoderdecoder.rst
--- a/docs/source/model_doc/flaubert.mdx
+++ b/docs/source/model_doc/flaubert.mdx
--- a/docs/source/model_doc/flaubert.rst
+++ b/docs/source/model_doc/flaubert.rst
--- a/docs/source/model_doc/fnet.rst
+++ b/docs/source/model_doc/fnet.rst
--- a/docs/source/model_doc/fsmt.mdx
+++ b/docs/source/model_doc/fsmt.mdx
--- a/docs/source/model_doc/fsmt.rst
+++ b/docs/source/model_doc/fsmt.rst
--- a/docs/source/model_doc/funnel.mdx
+++ b/docs/source/model_doc/funnel.mdx
--- a/docs/source/model_doc/funnel.rst
+++ b/docs/source/model_doc/funnel.rst
--- a/docs/source/model_doc/gpt.mdx
+++ b/docs/source/model_doc/gpt.mdx
--- a/docs/source/model_doc/gpt.rst
+++ b/docs/source/model_doc/gpt.rst
--- a/docs/source/model_doc/gpt2.mdx
+++ b/docs/source/model_doc/gpt2.mdx
--- a/docs/source/model_doc/gpt2.rst
+++ b/docs/source/model_doc/gpt2.rst
--- a/docs/source/model_doc/gpt_neo.mdx
+++ b/docs/source/model_doc/gpt_neo.mdx
--- a/docs/source/model_doc/gpt_neo.rst
+++ b/docs/source/model_doc/gpt_neo.rst
--- a/docs/source/model_doc/gptj.mdx
+++ b/docs/source/model_doc/gptj.mdx
--- a/docs/source/model_doc/gptj.rst
+++ b/docs/source/model_doc/gptj.rst
--- a/docs/source/model_doc/herbert.rst
+++ b/docs/source/model_doc/herbert.rst
--- a/docs/source/model_doc/hubert.rst
+++ b/docs/source/model_doc/hubert.rst
--- a/docs/source/model_doc/ibert.rst
+++ b/docs/source/model_doc/ibert.rst