[model_cards] Migrate cards from this repo to model repos on huggingface.co (#9013)

* rm all model cards * Update the .rst @sgugger it is still not super crystal clear/streamlined so let me know if any ideas to make it simpler * Add a rootlevel README.md with simple instructions/context * Update docs/source/model_sharing.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * make style * rm all model cards Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

[model_cards] Migrate cards from this repo to model repos on huggingface.co (#9013)
* rm all model cards * Update the .rst @sgugger it is still not super crystal clear/streamlined so let me know if any ideas to make it simpler * Add a rootlevel README.md with simple instructions/context * Update docs/source/model_sharing.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * make style * rm all model cards Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
3552d0e0 · Julien Chaumond · GitHub · 29e45979 · 29e45979 · 29e45979
Unverified Commit 3552d0e0 authored Dec 12, 2020 by Julien Chaumond Committed by GitHub Dec 11, 2020
20 changed files
--- a/model_cards/google/bert_uncased_L-6_H-512_A-8/README.md
+++ b/model_cards/google/bert_uncased_L-6_H-512_A-8/README.md
-../../iuliaturc/bert_uncased_L-2_H-128_A-2/README.md
\ No newline at end of file
--- a/model_cards/google/bert_uncased_L-6_H-768_A-12/README.md
+++ b/model_cards/google/bert_uncased_L-6_H-768_A-12/README.md
-../../iuliaturc/bert_uncased_L-2_H-128_A-2/README.md
\ No newline at end of file
--- a/model_cards/google/bert_uncased_L-8_H-128_A-2/README.md
+++ b/model_cards/google/bert_uncased_L-8_H-128_A-2/README.md
-../../iuliaturc/bert_uncased_L-2_H-128_A-2/README.md
\ No newline at end of file
--- a/model_cards/google/bert_uncased_L-8_H-256_A-4/README.md
+++ b/model_cards/google/bert_uncased_L-8_H-256_A-4/README.md
-../../iuliaturc/bert_uncased_L-2_H-128_A-2/README.md
\ No newline at end of file
--- a/model_cards/google/bert_uncased_L-8_H-512_A-8/README.md
+++ b/model_cards/google/bert_uncased_L-8_H-512_A-8/README.md
-../../iuliaturc/bert_uncased_L-2_H-128_A-2/README.md
\ No newline at end of file
--- a/model_cards/google/bert_uncased_L-8_H-768_A-12/README.md
+++ b/model_cards/google/bert_uncased_L-8_H-768_A-12/README.md
-../../iuliaturc/bert_uncased_L-2_H-128_A-2/README.md
\ No newline at end of file
--- a/model_cards/google/electra-base-discriminator/README.md
+++ b/model_cards/google/electra-base-discriminator/README.md
---
-language: en
-thumbnail: https://huggingface.co/front/thumbnails/google.png
-
-license: apache-2.0
---
-
-## ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
-
-**ELECTRA** is a new method for self-supervised language representation learning. It can be used to pre-train transformer networks using relatively little compute. ELECTRA models are trained to distinguish "real" input tokens vs "fake" input tokens generated by another neural network, similar to the discriminator of a [GAN](https://arxiv.org/pdf/1406.2661.pdf). At small scale, ELECTRA achieves strong results even when trained on a single GPU. At large scale, ELECTRA achieves state-of-the-art results on the [SQuAD 2.0](https://rajpurkar.github.io/SQuAD-explorer/) dataset.
-
-For a detailed description and experimental results, please refer to our paper [ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators](https://openreview.net/pdf?id=r1xMH1BtvB).
-
-This repository contains code to pre-train ELECTRA, including small ELECTRA models on a single GPU. It also supports fine-tuning ELECTRA on downstream tasks including classification tasks (e.g,. [GLUE](https://gluebenchmark.com/)), QA tasks (e.g., [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/)), and sequence tagging tasks (e.g., [text chunking](https://www.clips.uantwerpen.be/conll2000/chunking/)).
-
-## How to use the discriminator in `transformers`
-
-```python
-from transformers import ElectraForPreTraining, ElectraTokenizerFast
-import torch
-
-discriminator = ElectraForPreTraining.from_pretrained("google/electra-base-discriminator")
-tokenizer = ElectraTokenizerFast.from_pretrained("google/electra-base-discriminator")
-
-sentence = "The quick brown fox jumps over the lazy dog"
-fake_sentence = "The quick brown fox fake over the lazy dog"
-
-fake_tokens = tokenizer.tokenize(fake_sentence)
-fake_inputs = tokenizer.encode(fake_sentence, return_tensors="pt")
-discriminator_outputs = discriminator(fake_inputs)
-predictions = torch.round((torch.sign(discriminator_outputs[0]) + 1) / 2)
-
-[print("%7s" % token, end="") for token in fake_tokens]
-
-[print("%7s" % int(prediction), end="") for prediction in predictions.tolist()]
-```
--- a/model_cards/google/electra-base-generator/README.md
+++ b/model_cards/google/electra-base-generator/README.md
---
-language: en
-thumbnail: https://huggingface.co/front/thumbnails/google.png
-
-license: apache-2.0
---
-
-## ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
-
-**ELECTRA** is a new method for self-supervised language representation learning. It can be used to pre-train transformer networks using relatively little compute. ELECTRA models are trained to distinguish "real" input tokens vs "fake" input tokens generated by another neural network, similar to the discriminator of a [GAN](https://arxiv.org/pdf/1406.2661.pdf). At small scale, ELECTRA achieves strong results even when trained on a single GPU. At large scale, ELECTRA achieves state-of-the-art results on the [SQuAD 2.0](https://rajpurkar.github.io/SQuAD-explorer/) dataset.
-
-For a detailed description and experimental results, please refer to our paper [ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators](https://openreview.net/pdf?id=r1xMH1BtvB).
-
-This repository contains code to pre-train ELECTRA, including small ELECTRA models on a single GPU. It also supports fine-tuning ELECTRA on downstream tasks including classification tasks (e.g,. [GLUE](https://gluebenchmark.com/)), QA tasks (e.g., [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/)), and sequence tagging tasks (e.g., [text chunking](https://www.clips.uantwerpen.be/conll2000/chunking/)).
-
-## How to use the generator in `transformers`
-
-```python
-from transformers import pipeline
-
-fill_mask = pipeline(
-	"fill-mask",
-	model="google/electra-base-generator",
-	tokenizer="google/electra-base-generator"
-)
-
-print(
-	fill_mask(f"HuggingFace is creating a {fill_mask.tokenizer.mask_token} that the community uses to solve NLP tasks.")
-)
-
-```
--- a/model_cards/google/electra-large-discriminator/README.md
+++ b/model_cards/google/electra-large-discriminator/README.md
---
-language: en
-thumbnail: https://huggingface.co/front/thumbnails/google.png
-
-license: apache-2.0
---
-
-## ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
-
-**ELECTRA** is a new method for self-supervised language representation learning. It can be used to pre-train transformer networks using relatively little compute. ELECTRA models are trained to distinguish "real" input tokens vs "fake" input tokens generated by another neural network, similar to the discriminator of a [GAN](https://arxiv.org/pdf/1406.2661.pdf). At small scale, ELECTRA achieves strong results even when trained on a single GPU. At large scale, ELECTRA achieves state-of-the-art results on the [SQuAD 2.0](https://rajpurkar.github.io/SQuAD-explorer/) dataset.
-
-For a detailed description and experimental results, please refer to our paper [ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators](https://openreview.net/pdf?id=r1xMH1BtvB).
-
-This repository contains code to pre-train ELECTRA, including small ELECTRA models on a single GPU. It also supports fine-tuning ELECTRA on downstream tasks including classification tasks (e.g,. [GLUE](https://gluebenchmark.com/)), QA tasks (e.g., [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/)), and sequence tagging tasks (e.g., [text chunking](https://www.clips.uantwerpen.be/conll2000/chunking/)).
-
-## How to use the discriminator in `transformers`
-
-```python
-from transformers import ElectraForPreTraining, ElectraTokenizerFast
-import torch
-
-discriminator = ElectraForPreTraining.from_pretrained("google/electra-large-discriminator")
-tokenizer = ElectraTokenizerFast.from_pretrained("google/electra-large-discriminator")
-
-sentence = "The quick brown fox jumps over the lazy dog"
-fake_sentence = "The quick brown fox fake over the lazy dog"
-
-fake_tokens = tokenizer.tokenize(fake_sentence)
-fake_inputs = tokenizer.encode(fake_sentence, return_tensors="pt")
-discriminator_outputs = discriminator(fake_inputs)
-predictions = torch.round((torch.sign(discriminator_outputs[0]) + 1) / 2)
-
-[print("%7s" % token, end="") for token in fake_tokens]
-
-[print("%7s" % int(prediction), end="") for prediction in predictions.tolist()]
-```
--- a/model_cards/google/electra-large-generator/README.md
+++ b/model_cards/google/electra-large-generator/README.md
---
-language: en
-thumbnail: https://huggingface.co/front/thumbnails/google.png
-
-license: apache-2.0
---
-
-## ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
-
-**ELECTRA** is a new method for self-supervised language representation learning. It can be used to pre-train transformer networks using relatively little compute. ELECTRA models are trained to distinguish "real" input tokens vs "fake" input tokens generated by another neural network, similar to the discriminator of a [GAN](https://arxiv.org/pdf/1406.2661.pdf). At small scale, ELECTRA achieves strong results even when trained on a single GPU. At large scale, ELECTRA achieves state-of-the-art results on the [SQuAD 2.0](https://rajpurkar.github.io/SQuAD-explorer/) dataset.
-
-For a detailed description and experimental results, please refer to our paper [ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators](https://openreview.net/pdf?id=r1xMH1BtvB).
-
-This repository contains code to pre-train ELECTRA, including small ELECTRA models on a single GPU. It also supports fine-tuning ELECTRA on downstream tasks including classification tasks (e.g,. [GLUE](https://gluebenchmark.com/)), QA tasks (e.g., [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/)), and sequence tagging tasks (e.g., [text chunking](https://www.clips.uantwerpen.be/conll2000/chunking/)).
-
-## How to use the generator in `transformers`
-
-```python
-from transformers import pipeline
-
-fill_mask = pipeline(
-	"fill-mask",
-	model="google/electra-large-generator",
-	tokenizer="google/electra-large-generator"
-)
-
-print(
-	fill_mask(f"HuggingFace is creating a {nlp.tokenizer.mask_token} that the community uses to solve NLP tasks.")
-)
-
-```
--- a/model_cards/google/electra-small-discriminator/README.md
+++ b/model_cards/google/electra-small-discriminator/README.md
---
-language: en
-thumbnail: https://huggingface.co/front/thumbnails/google.png
-
-license: apache-2.0
---
-
-## ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
-
-**ELECTRA** is a new method for self-supervised language representation learning. It can be used to pre-train transformer networks using relatively little compute. ELECTRA models are trained to distinguish "real" input tokens vs "fake" input tokens generated by another neural network, similar to the discriminator of a [GAN](https://arxiv.org/pdf/1406.2661.pdf). At small scale, ELECTRA achieves strong results even when trained on a single GPU. At large scale, ELECTRA achieves state-of-the-art results on the [SQuAD 2.0](https://rajpurkar.github.io/SQuAD-explorer/) dataset.
-
-For a detailed description and experimental results, please refer to our paper [ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators](https://openreview.net/pdf?id=r1xMH1BtvB).
-
-This repository contains code to pre-train ELECTRA, including small ELECTRA models on a single GPU. It also supports fine-tuning ELECTRA on downstream tasks including classification tasks (e.g,. [GLUE](https://gluebenchmark.com/)), QA tasks (e.g., [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/)), and sequence tagging tasks (e.g., [text chunking](https://www.clips.uantwerpen.be/conll2000/chunking/)).
-
-## How to use the discriminator in `transformers`
-
-```python
-from transformers import ElectraForPreTraining, ElectraTokenizerFast
-import torch
-
-discriminator = ElectraForPreTraining.from_pretrained("google/electra-small-discriminator")
-tokenizer = ElectraTokenizerFast.from_pretrained("google/electra-small-discriminator")
-
-sentence = "The quick brown fox jumps over the lazy dog"
-fake_sentence = "The quick brown fox fake over the lazy dog"
-
-fake_tokens = tokenizer.tokenize(fake_sentence)
-fake_inputs = tokenizer.encode(fake_sentence, return_tensors="pt")
-discriminator_outputs = discriminator(fake_inputs)
-predictions = torch.round((torch.sign(discriminator_outputs[0]) + 1) / 2)
-
-[print("%7s" % token, end="") for token in fake_tokens]
-
-[print("%7s" % int(prediction), end="") for prediction in predictions.tolist()]
-```
--- a/model_cards/google/electra-small-generator/README.md
+++ b/model_cards/google/electra-small-generator/README.md
---
-language: en
-thumbnail: https://huggingface.co/front/thumbnails/google.png
-
-license: apache-2.0
---
-
-## ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
-
-**ELECTRA** is a new method for self-supervised language representation learning. It can be used to pre-train transformer networks using relatively little compute. ELECTRA models are trained to distinguish "real" input tokens vs "fake" input tokens generated by another neural network, similar to the discriminator of a [GAN](https://arxiv.org/pdf/1406.2661.pdf). At small scale, ELECTRA achieves strong results even when trained on a single GPU. At large scale, ELECTRA achieves state-of-the-art results on the [SQuAD 2.0](https://rajpurkar.github.io/SQuAD-explorer/) dataset.
-
-For a detailed description and experimental results, please refer to our paper [ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators](https://openreview.net/pdf?id=r1xMH1BtvB).
-
-This repository contains code to pre-train ELECTRA, including small ELECTRA models on a single GPU. It also supports fine-tuning ELECTRA on downstream tasks including classification tasks (e.g,. [GLUE](https://gluebenchmark.com/)), QA tasks (e.g., [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/)), and sequence tagging tasks (e.g., [text chunking](https://www.clips.uantwerpen.be/conll2000/chunking/)).
-
-## How to use the generator in `transformers`
-
-```python
-from transformers import pipeline
-
-fill_mask = pipeline(
-	"fill-mask",
-	model="google/electra-small-generator",
-	tokenizer="google/electra-small-generator"
-)
-
-print(
-	fill_mask(f"HuggingFace is creating a {nlp.tokenizer.mask_token} that the community uses to solve NLP tasks.")
-)
-
-```
--- a/model_cards/google/mobilebert-uncased/README.md
+++ b/model_cards/google/mobilebert-uncased/README.md
---
-language: en
-thumbnail: https://huggingface.co/front/thumbnails/google.png
-
-license: apache-2.0
---
-
-## MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
-
-MobileBERT is a thin version of BERT_LARGE, while equipped with bottleneck structures and a carefully designed balance
-between self-attentions and feed-forward networks.
-
-This checkpoint is the original MobileBert Optimized Uncased English: 
-[uncased_L-24_H-128_B-512_A-4_F-4_OPT](https://storage.googleapis.com/cloud-tpu-checkpoints/mobilebert/uncased_L-24_H-128_B-512_A-4_F-4_OPT.tar.gz) 
-checkpoint.
-
-## How to use MobileBERT in `transformers`
-
-```python
-from transformers import pipeline
-
-fill_mask = pipeline(
-	"fill-mask",
-	model="google/mobilebert-uncased",
-	tokenizer="google/mobilebert-uncased"
-)
-
-print(
-	fill_mask(f"HuggingFace is creating a {fill_mask.tokenizer.mask_token} that the community uses to solve NLP tasks.")
-)
-
-```
--- a/model_cards/google/reformer-crime-and-punishment/README.md
+++ b/model_cards/google/reformer-crime-and-punishment/README.md
-## Reformer Model trained on "Crime and Punishment" 
-
-Crime and Punishment is a novel written by Fyodor Dostoevsky and was translated into English. 
-
-Crime and Punishment training data was taken from `gs://trax-ml/reformer/crime-and-punishment-2554.txt` and contains 
-roughly 0.5M tokens. 
-
-The ReformerLM model was trained in flax using colab notebook proposed by authors: https://colab.research.google.com/github/google/trax/blob/master/trax/models/reformer/text_generation.ipynb and the weights were converted to Hugging Face's PyTorch ReformerLM model `ReformerModelWithLMHead`.
-
-The model is a language model that operates on small sub-word units. Text can be generated as follows:
-
-```python
-model = ReformerModelWithLMHead.from_pretrained("google/reformer-crime-and-punishment")
-tok = ReformerTokenizer.from_pretrained("google/reformer-crime-and-punishment")
-tok.decode(model.generate(tok.encode("A few months later", return_tensors="pt"), do_sample=True,temperature=0.7, max_length=100)[0])
-
-# gives:'A few months later on was more than anything in the flat. 
-# “I have already.” “That’s not my notion that he had forgotten him. 
-# What does that matter? And why do you mean? It’s only another fellow,” he said as he went out, as though he want'
-```
--- a/model_cards/google/reformer-enwik8/README.md
+++ b/model_cards/google/reformer-enwik8/README.md
-## Reformer Language model on character level and trained on enwik8. 
-
-*enwik8* is a dataset based on Wikipedia and is often used to measure the model's ability to *compress* data, *e.g.* in 
-the scope of the *Hutter prize*: https://en.wikipedia.org/wiki/Hutter_Prize.
-
-`reformer-enwik8` was pretrained on the first 90M chars of *enwik8* whereas the text was chunked into batches of size 65536 chars (=2^16).
-The model's weights were taken from https://console.cloud.google.com/storage/browser/trax-ml/reformer/enwik8 and converted 
-to Hugging Face's PyTorch ReformerLM model `ReformerModelWithLMHead`.
-
-The model is a language model that operates on characters. 
-Therefore, this model does not need a tokenizer. The following function can instead be used for **encoding** and **decoding**:
-
-```python
-import torch
-
-# Encoding
-def encode(list_of_strings, pad_token_id=0):
-    max_length = max([len(string) for string in list_of_strings])
-
-    # create emtpy tensors
-    attention_masks = torch.zeros((len(list_of_strings), max_length), dtype=torch.long)
-    input_ids = torch.full((len(list_of_strings), max_length), pad_token_id, dtype=torch.long)
-
-    for idx, string in enumerate(list_of_strings):
-        # make sure string is in byte format
-        if not isinstance(string, bytes):
-            string = str.encode(string)
-
-        input_ids[idx, :len(string)] = torch.tensor([x + 2 for x in string])
-        attention_masks[idx, :len(string)] = 1
-
-    return input_ids, attention_masks
-    
-# Decoding
-def decode(outputs_ids):
-    decoded_outputs = []
-    for output_ids in outputs_ids.tolist():
-        # transform id back to char IDs < 2 are simply transformed to ""
-        decoded_outputs.append("".join([chr(x - 2) if x > 1 else "" for x in output_ids]))
-    return decoded_outputs
-```
-
-Text can be generated as follows:
-
-```python
-from transformers import ReformerModelWithLMHead
-
-model = ReformerModelWithLMHead.from_pretrained("google/reformer-enwik8")
-encoded, attention_masks = encode(["In 1965, Brooks left IBM to found the Department of"])
-decode(model.generate(encoded, do_sample=True, max_length=150))
-
-# gives:
-# In 1965, Brooks left IBM to found the Department of Journalism in 1968. IBM had jurisdiction himself in 1980, while Brooks resolved, nevertheless thro
-
-```
-
-***Note***: Language generation using `ReformerModelWithLMHead` is not optimized yet and is rather slow.
--- a/model_cards/google/roberta2roberta_L-24_bbc/README.md
+++ b/model_cards/google/roberta2roberta_L-24_bbc/README.md
---
-language: en
-license: apache-2.0
-datasets:
- xsum
-tags:
- summarization
---
-
-# Roberta2Roberta_L-24_bbc EncoderDecoder model
-
-The model was introduced in 
-[this paper](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn and first released in [this repository](https://tfhub.dev/google/bertseq2seq/roberta24_bbc/1). 
-
-The model is an encoder-decoder model that was initialized on the `roberta-large` checkpoints for both the encoder 
-and decoder and fine-tuned on extreme summarization on the BBC XSum dataset, which is linked above.
-
-Disclaimer: The model card has been written by the Hugging Face team.
-
-## How to use
-
-You can use this model for extreme summarization, *e.g.*
-
-```python
-from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
-
-tokenizer = AutoTokenizer.from_pretrained("google/roberta2roberta_L-24_bbc")
-model = AutoModelForSeq2SeqLM.from_pretrained("google/roberta2roberta_L-24_bbc")
-
-article = """The problem is affecting people using the older
-versions of the PlayStation 3, called the "Fat"
-model.The problem isn't affecting the newer PS3
-Slim systems that have been on sale since
-September last year.Sony have also said they are
-aiming to have the problem fixed shortly but is
-advising some users to avoid using their console
-for the time being."We hope to resolve this
-problem within the next 24 hours," a statement
-reads. "In the meantime, if you have a model other
-than the new slim PS3, we advise that you do not
-use your PS3 system, as doing so may result in
-errors in some functionality, such as recording
-obtained trophies, and not being able to restore
-certain data."We believe we have identified that
-this problem is being caused by a bug in the clock
-functionality incorporated in the system."The
-PlayStation Network is used by millions of people
-around the world.It allows users to play their
-friends at games like Fifa over the internet and
-also do things like download software or visit
-online stores."""
-
-input_ids = tokenizer(article, return_tensors="pt").input_ids
-output_ids = model.generate(input_ids)[0]
-print(tokenizer.decode(output_ids, skip_special_tokens=True))
-# should output
-# Some Sony PlayStation gamers are being advised to stay away from the network because of a problem with the PlayStation 3 network.
-```
--- a/model_cards/google/roberta2roberta_L-24_cnn_daily_mail/README.md
+++ b/model_cards/google/roberta2roberta_L-24_cnn_daily_mail/README.md
---
-language: en
-license: apache-2.0
-datasets:
- cnn_dailymail
-tags:
- summarization
---
-
-# Roberta2Roberta_L-24_cnn_daily_mail EncoderDecoder model
-
-The model was introduced in 
-[this paper](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn and first released in [this repository](https://tfhub.dev/google/bertseq2seq/roberta24_cnndm/1). 
-
-The model is an encoder-decoder model that was initialized on the `roberta-large` checkpoints for both the encoder 
-and decoder and fine-tuned on summarization on the CNN / Dailymail dataset, which is linked above.
-
-Disclaimer: The model card has been written by the Hugging Face team.
-
-## How to use
-
-You can use this model for summarization, *e.g.*
-
-```python
-from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
-
-tokenizer = AutoTokenizer.from_pretrained("google/roberta2roberta_L-24_cnn_daily_mail")
-model = AutoModelForSeq2SeqLM.from_pretrained("google/roberta2roberta_L-24_cnn_daily_mail")
-
-article = """	(The Hollywood Reporter)"The Rocky Horror Picture
-Show" is the latest musical getting the small-
-screen treatment. Fox is developing a two-hour
-remake of the 1975 cult classic to be directed,
-executive-produced and choreographed by Kenneth
-Ortega ("High School Musical"). The project,
-tentatively titled "The Rocky Horror Picture Show
-Event," is casting-contingent. The special will be
-filmed in advance and not air live, but few
-details beyond that are known. In addition to
-Ortega, Gail Berman and Lou Adler, who produced
-the original film, are also attached as executive
-producers. The special will be produced by Fox 21
-Television Studios, and Berman's The Jackal Group.
-The special is timed to celebrate the 40th
-anniversary of the film, which has grossed more
-than $112 million and still plays in theaters
-across the country. TV premiere dates: The
-complete guide . This isn't the first stab at
-adapting "The Rocky Horror Picture Show." In 2002,
-Fox unveiled plans for an adaptation timed to the
-30th anniversary that never came to fruition. The
-faces of pilot season 2015 . Fox's "Glee" covered
-several of the show's most popular songs for a
-Season 2 episode and even released a special "The
-Rocky Horror Glee Show" EP. There is no plan yet
-for when the adaptation will air. Fox also has a
-live musical production of "Grease", starring
-Julianne Hough and Vanessa Hudgens, scheduled to
-air on Jan. 31, 2016. Broadcast TV scorecard .
-Following in the footsteps of "The Sound of Music"
-and "Peter Pan," NBC recently announced plans to
-air a live version of The Wiz later this year.
-Ortega's credits include "Gilmore Girls," "This Is
-It" and "Hocus Pocus." He is repped by Paradigm
-and Hanson, Jacobson. ©2015 The Hollywood
-Reporter. All rights reserved."""
-
-input_ids = tokenizer(article, return_tensors="pt").input_ids
-output_ids = model.generate(input_ids)[0]
-print(tokenizer.decode(output_ids, skip_special_tokens=True))
-# should output
-# Fox is developing a two-hour remake of the 1975 cult classic. The special will be directed, executive-produced and choreographed by Kenneth Ortega. 
-# The special is timed to celebrate the 40th anniversary of the film, which has grossed more than $112 million.
-
-```
--- a/model_cards/google/roberta2roberta_L-24_discofuse/README.md
+++ b/model_cards/google/roberta2roberta_L-24_discofuse/README.md
---
-language: en
-license: apache-2.0
-datasets:
- discofuse
---
-
-# Roberta2Roberta_L-24_discofuse EncoderDecoder model
-
-The model was introduced in 
-[this paper](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn and first released in [this repository](https://tfhub.dev/google/bertseq2seq/roberta24_discofuse/1). 
-
-The model is an encoder-decoder model that was initialized on the `roberta-large` checkpoints for both the encoder 
-and decoder and fine-tuned on sentencefusion on the discofuse dataset, which is linked above.
-
-Disclaimer: The model card has been written by the Hugging Face team.
-
-## How to use
-
-You can use this model for sentence fusion, *e.g.*
-
-IMPORTANT: The model was not trained on the `"` (double quotation mark) character -> so the before tokenizing the text, it is advised to replace all `"` (double quotation marks) with a single `` ` `` (single back tick).
-
-```python
-from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
-
-tokenizer = AutoTokenizer.from_pretrained("google/roberta2roberta_L-24_discofuse")
-model = AutoModelForSeq2SeqLM.from_pretrained("google/roberta2roberta_L-24_discofuse")
-
-discofuse = """As a run-blocker, Zeitler moves relatively well. Zeitler often struggles at the point of contact in space."""
-
-input_ids = tokenizer(discofuse, return_tensors="pt").input_ids
-output_ids = model.generate(input_ids)[0]
-print(tokenizer.decode(output_ids, skip_special_tokens=True))
-# should output
-# As a run-blocker, Zeitler moves relatively well. However, Zeitler often struggles at the point of contact in space.  
-```
--- a/model_cards/google/roberta2roberta_L-24_gigaword/README.md
+++ b/model_cards/google/roberta2roberta_L-24_gigaword/README.md
---
-language: en
-license: apache-2.0
-datasets:
- gigaword
-tags:
- summarization
---
-
-# Roberta2Roberta_L-24_gigaword EncoderDecoder model
-
-The model was introduced in 
-[this paper](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn and first released in [this repository](https://tfhub.dev/google/bertseq2seq/roberta24_gigaword/1). 
-
-The model is an encoder-decoder model that was initialized on the `roberta-large` checkpoints for both the encoder 
-and decoder and fine-tuned on headline generation using the Gigaword dataset, which is linked above.
-
-Disclaimer: The model card has been written by the Hugging Face team.
-
-## How to use
-
-You can use this model for extreme summarization, *e.g.*
-
-```python
-from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
-
-tokenizer = AutoTokenizer.from_pretrained("google/roberta2roberta_L-24_gigaword")
-model = AutoModelForSeq2SeqLM.from_pretrained("google/roberta2roberta_L-24_gigaword")
-
-article = """australian shares closed down #.# percent monday
-following a weak lead from the united states and
-lower commodity prices , dealers said ."""
-
-input_ids = tokenizer(article, return_tensors="pt").input_ids
-output_ids = model.generate(input_ids)[0]
-print(tokenizer.decode(output_ids, skip_special_tokens=True))
-# should output
-# australian shares close down #.# percent.
-```
--- a/model_cards/google/roberta2roberta_L-24_wikisplit/README.md
+++ b/model_cards/google/roberta2roberta_L-24_wikisplit/README.md
---
-language: en
-license: apache-2.0
---
-
-# Roberta2Roberta_L-24_wikisplit EncoderDecoder model
-
-The model was introduced in 
-[this paper](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn and first released in [this repository](https://tfhub.dev/google/bertseq2seq/roberta24_cnndm/1). 
-
-The model is an encoder-decoder model that was initialized on the `roberta-large` checkpoints for both the encoder 
-and decoder and fine-tuned on sentence splitting on the [WikiSplit](https://github.com/google-research-datasets/wiki-split) dataset.
-
-Disclaimer: The model card has been written by the Hugging Face team.
-
-## How to use
-
-You can use this model for sentence splitting, *e.g.*
-
-**IMPORTANT**: The model was not trained on the `"` (double quotation mark) character -> so the before tokenizing the text, 
-it is advised to replace all `"` (double quotation marks) with two single `'` (single quotation mark).
-
-```python
-from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
-
-tokenizer = AutoTokenizer.from_pretrained("google/roberta2roberta_L-24_wikisplit")
-model = AutoModelForSeq2SeqLM.from_pretrained("google/roberta2roberta_L-24_wikisplit")
-
-long_sentence = """Due to the hurricane, Lobsterfest has been canceled, making Bob very happy about it and he decides to open Bob 's Burgers for customers who were planning on going to Lobsterfest."""
-
-input_ids = tokenizer(tokenizer.bos_token + long_sentence + tokenizer.eos_token, return_tensors="pt").input_ids
-output_ids = model.generate(input_ids)[0]
-print(tokenizer.decode(output_ids, skip_special_tokens=True))
-# should output
-# Due to the hurricane, Lobsterfest has been canceled, making Bob very happy about it. He decides to open Bob's Burgers for customers who were planning on going to Lobsterfest. 
-```