Unverified Commit 3552d0e0 authored by Julien Chaumond's avatar Julien Chaumond Committed by GitHub
Browse files

[model_cards] Migrate cards from this repo to model repos on huggingface.co (#9013)



* rm all model cards

* Update the .rst

@sgugger it is still not super crystal clear/streamlined so let me know if any ideas to make it simpler

* Add a rootlevel README.md with simple instructions/context

* Update docs/source/model_sharing.rst
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply suggestions from code review
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* make style

* rm all model cards
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
parent 29e45979
../../iuliaturc/bert_uncased_L-2_H-128_A-2/README.md
\ No newline at end of file
../../iuliaturc/bert_uncased_L-2_H-128_A-2/README.md
\ No newline at end of file
../../iuliaturc/bert_uncased_L-2_H-128_A-2/README.md
\ No newline at end of file
../../iuliaturc/bert_uncased_L-2_H-128_A-2/README.md
\ No newline at end of file
../../iuliaturc/bert_uncased_L-2_H-128_A-2/README.md
\ No newline at end of file
../../iuliaturc/bert_uncased_L-2_H-128_A-2/README.md
\ No newline at end of file
---
language: en
thumbnail: https://huggingface.co/front/thumbnails/google.png
license: apache-2.0
---
## ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
**ELECTRA** is a new method for self-supervised language representation learning. It can be used to pre-train transformer networks using relatively little compute. ELECTRA models are trained to distinguish "real" input tokens vs "fake" input tokens generated by another neural network, similar to the discriminator of a [GAN](https://arxiv.org/pdf/1406.2661.pdf). At small scale, ELECTRA achieves strong results even when trained on a single GPU. At large scale, ELECTRA achieves state-of-the-art results on the [SQuAD 2.0](https://rajpurkar.github.io/SQuAD-explorer/) dataset.
For a detailed description and experimental results, please refer to our paper [ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators](https://openreview.net/pdf?id=r1xMH1BtvB).
This repository contains code to pre-train ELECTRA, including small ELECTRA models on a single GPU. It also supports fine-tuning ELECTRA on downstream tasks including classification tasks (e.g,. [GLUE](https://gluebenchmark.com/)), QA tasks (e.g., [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/)), and sequence tagging tasks (e.g., [text chunking](https://www.clips.uantwerpen.be/conll2000/chunking/)).
## How to use the discriminator in `transformers`
```python
from transformers import ElectraForPreTraining, ElectraTokenizerFast
import torch
discriminator = ElectraForPreTraining.from_pretrained("google/electra-base-discriminator")
tokenizer = ElectraTokenizerFast.from_pretrained("google/electra-base-discriminator")
sentence = "The quick brown fox jumps over the lazy dog"
fake_sentence = "The quick brown fox fake over the lazy dog"
fake_tokens = tokenizer.tokenize(fake_sentence)
fake_inputs = tokenizer.encode(fake_sentence, return_tensors="pt")
discriminator_outputs = discriminator(fake_inputs)
predictions = torch.round((torch.sign(discriminator_outputs[0]) + 1) / 2)
[print("%7s" % token, end="") for token in fake_tokens]
[print("%7s" % int(prediction), end="") for prediction in predictions.tolist()]
```
---
language: en
thumbnail: https://huggingface.co/front/thumbnails/google.png
license: apache-2.0
---
## ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
**ELECTRA** is a new method for self-supervised language representation learning. It can be used to pre-train transformer networks using relatively little compute. ELECTRA models are trained to distinguish "real" input tokens vs "fake" input tokens generated by another neural network, similar to the discriminator of a [GAN](https://arxiv.org/pdf/1406.2661.pdf). At small scale, ELECTRA achieves strong results even when trained on a single GPU. At large scale, ELECTRA achieves state-of-the-art results on the [SQuAD 2.0](https://rajpurkar.github.io/SQuAD-explorer/) dataset.
For a detailed description and experimental results, please refer to our paper [ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators](https://openreview.net/pdf?id=r1xMH1BtvB).
This repository contains code to pre-train ELECTRA, including small ELECTRA models on a single GPU. It also supports fine-tuning ELECTRA on downstream tasks including classification tasks (e.g,. [GLUE](https://gluebenchmark.com/)), QA tasks (e.g., [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/)), and sequence tagging tasks (e.g., [text chunking](https://www.clips.uantwerpen.be/conll2000/chunking/)).
## How to use the generator in `transformers`
```python
from transformers import pipeline
fill_mask = pipeline(
"fill-mask",
model="google/electra-base-generator",
tokenizer="google/electra-base-generator"
)
print(
fill_mask(f"HuggingFace is creating a {fill_mask.tokenizer.mask_token} that the community uses to solve NLP tasks.")
)
```
---
language: en
thumbnail: https://huggingface.co/front/thumbnails/google.png
license: apache-2.0
---
## ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
**ELECTRA** is a new method for self-supervised language representation learning. It can be used to pre-train transformer networks using relatively little compute. ELECTRA models are trained to distinguish "real" input tokens vs "fake" input tokens generated by another neural network, similar to the discriminator of a [GAN](https://arxiv.org/pdf/1406.2661.pdf). At small scale, ELECTRA achieves strong results even when trained on a single GPU. At large scale, ELECTRA achieves state-of-the-art results on the [SQuAD 2.0](https://rajpurkar.github.io/SQuAD-explorer/) dataset.
For a detailed description and experimental results, please refer to our paper [ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators](https://openreview.net/pdf?id=r1xMH1BtvB).
This repository contains code to pre-train ELECTRA, including small ELECTRA models on a single GPU. It also supports fine-tuning ELECTRA on downstream tasks including classification tasks (e.g,. [GLUE](https://gluebenchmark.com/)), QA tasks (e.g., [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/)), and sequence tagging tasks (e.g., [text chunking](https://www.clips.uantwerpen.be/conll2000/chunking/)).
## How to use the discriminator in `transformers`
```python
from transformers import ElectraForPreTraining, ElectraTokenizerFast
import torch
discriminator = ElectraForPreTraining.from_pretrained("google/electra-large-discriminator")
tokenizer = ElectraTokenizerFast.from_pretrained("google/electra-large-discriminator")
sentence = "The quick brown fox jumps over the lazy dog"
fake_sentence = "The quick brown fox fake over the lazy dog"
fake_tokens = tokenizer.tokenize(fake_sentence)
fake_inputs = tokenizer.encode(fake_sentence, return_tensors="pt")
discriminator_outputs = discriminator(fake_inputs)
predictions = torch.round((torch.sign(discriminator_outputs[0]) + 1) / 2)
[print("%7s" % token, end="") for token in fake_tokens]
[print("%7s" % int(prediction), end="") for prediction in predictions.tolist()]
```
---
language: en
thumbnail: https://huggingface.co/front/thumbnails/google.png
license: apache-2.0
---
## ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
**ELECTRA** is a new method for self-supervised language representation learning. It can be used to pre-train transformer networks using relatively little compute. ELECTRA models are trained to distinguish "real" input tokens vs "fake" input tokens generated by another neural network, similar to the discriminator of a [GAN](https://arxiv.org/pdf/1406.2661.pdf). At small scale, ELECTRA achieves strong results even when trained on a single GPU. At large scale, ELECTRA achieves state-of-the-art results on the [SQuAD 2.0](https://rajpurkar.github.io/SQuAD-explorer/) dataset.
For a detailed description and experimental results, please refer to our paper [ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators](https://openreview.net/pdf?id=r1xMH1BtvB).
This repository contains code to pre-train ELECTRA, including small ELECTRA models on a single GPU. It also supports fine-tuning ELECTRA on downstream tasks including classification tasks (e.g,. [GLUE](https://gluebenchmark.com/)), QA tasks (e.g., [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/)), and sequence tagging tasks (e.g., [text chunking](https://www.clips.uantwerpen.be/conll2000/chunking/)).
## How to use the generator in `transformers`
```python
from transformers import pipeline
fill_mask = pipeline(
"fill-mask",
model="google/electra-large-generator",
tokenizer="google/electra-large-generator"
)
print(
fill_mask(f"HuggingFace is creating a {nlp.tokenizer.mask_token} that the community uses to solve NLP tasks.")
)
```
---
language: en
thumbnail: https://huggingface.co/front/thumbnails/google.png
license: apache-2.0
---
## ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
**ELECTRA** is a new method for self-supervised language representation learning. It can be used to pre-train transformer networks using relatively little compute. ELECTRA models are trained to distinguish "real" input tokens vs "fake" input tokens generated by another neural network, similar to the discriminator of a [GAN](https://arxiv.org/pdf/1406.2661.pdf). At small scale, ELECTRA achieves strong results even when trained on a single GPU. At large scale, ELECTRA achieves state-of-the-art results on the [SQuAD 2.0](https://rajpurkar.github.io/SQuAD-explorer/) dataset.
For a detailed description and experimental results, please refer to our paper [ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators](https://openreview.net/pdf?id=r1xMH1BtvB).
This repository contains code to pre-train ELECTRA, including small ELECTRA models on a single GPU. It also supports fine-tuning ELECTRA on downstream tasks including classification tasks (e.g,. [GLUE](https://gluebenchmark.com/)), QA tasks (e.g., [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/)), and sequence tagging tasks (e.g., [text chunking](https://www.clips.uantwerpen.be/conll2000/chunking/)).
## How to use the discriminator in `transformers`
```python
from transformers import ElectraForPreTraining, ElectraTokenizerFast
import torch
discriminator = ElectraForPreTraining.from_pretrained("google/electra-small-discriminator")
tokenizer = ElectraTokenizerFast.from_pretrained("google/electra-small-discriminator")
sentence = "The quick brown fox jumps over the lazy dog"
fake_sentence = "The quick brown fox fake over the lazy dog"
fake_tokens = tokenizer.tokenize(fake_sentence)
fake_inputs = tokenizer.encode(fake_sentence, return_tensors="pt")
discriminator_outputs = discriminator(fake_inputs)
predictions = torch.round((torch.sign(discriminator_outputs[0]) + 1) / 2)
[print("%7s" % token, end="") for token in fake_tokens]
[print("%7s" % int(prediction), end="") for prediction in predictions.tolist()]
```
---
language: en
thumbnail: https://huggingface.co/front/thumbnails/google.png
license: apache-2.0
---
## ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
**ELECTRA** is a new method for self-supervised language representation learning. It can be used to pre-train transformer networks using relatively little compute. ELECTRA models are trained to distinguish "real" input tokens vs "fake" input tokens generated by another neural network, similar to the discriminator of a [GAN](https://arxiv.org/pdf/1406.2661.pdf). At small scale, ELECTRA achieves strong results even when trained on a single GPU. At large scale, ELECTRA achieves state-of-the-art results on the [SQuAD 2.0](https://rajpurkar.github.io/SQuAD-explorer/) dataset.
For a detailed description and experimental results, please refer to our paper [ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators](https://openreview.net/pdf?id=r1xMH1BtvB).
This repository contains code to pre-train ELECTRA, including small ELECTRA models on a single GPU. It also supports fine-tuning ELECTRA on downstream tasks including classification tasks (e.g,. [GLUE](https://gluebenchmark.com/)), QA tasks (e.g., [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/)), and sequence tagging tasks (e.g., [text chunking](https://www.clips.uantwerpen.be/conll2000/chunking/)).
## How to use the generator in `transformers`
```python
from transformers import pipeline
fill_mask = pipeline(
"fill-mask",
model="google/electra-small-generator",
tokenizer="google/electra-small-generator"
)
print(
fill_mask(f"HuggingFace is creating a {nlp.tokenizer.mask_token} that the community uses to solve NLP tasks.")
)
```
---
language: en
thumbnail: https://huggingface.co/front/thumbnails/google.png
license: apache-2.0
---
## MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
MobileBERT is a thin version of BERT_LARGE, while equipped with bottleneck structures and a carefully designed balance
between self-attentions and feed-forward networks.
This checkpoint is the original MobileBert Optimized Uncased English:
[uncased_L-24_H-128_B-512_A-4_F-4_OPT](https://storage.googleapis.com/cloud-tpu-checkpoints/mobilebert/uncased_L-24_H-128_B-512_A-4_F-4_OPT.tar.gz)
checkpoint.
## How to use MobileBERT in `transformers`
```python
from transformers import pipeline
fill_mask = pipeline(
"fill-mask",
model="google/mobilebert-uncased",
tokenizer="google/mobilebert-uncased"
)
print(
fill_mask(f"HuggingFace is creating a {fill_mask.tokenizer.mask_token} that the community uses to solve NLP tasks.")
)
```
## Reformer Model trained on "Crime and Punishment"
Crime and Punishment is a novel written by Fyodor Dostoevsky and was translated into English.
Crime and Punishment training data was taken from `gs://trax-ml/reformer/crime-and-punishment-2554.txt` and contains
roughly 0.5M tokens.
The ReformerLM model was trained in flax using colab notebook proposed by authors: https://colab.research.google.com/github/google/trax/blob/master/trax/models/reformer/text_generation.ipynb and the weights were converted to Hugging Face's PyTorch ReformerLM model `ReformerModelWithLMHead`.
The model is a language model that operates on small sub-word units. Text can be generated as follows:
```python
model = ReformerModelWithLMHead.from_pretrained("google/reformer-crime-and-punishment")
tok = ReformerTokenizer.from_pretrained("google/reformer-crime-and-punishment")
tok.decode(model.generate(tok.encode("A few months later", return_tensors="pt"), do_sample=True,temperature=0.7, max_length=100)[0])
# gives:'A few months later on was more than anything in the flat.
# “I have already.” “That’s not my notion that he had forgotten him.
# What does that matter? And why do you mean? It’s only another fellow,” he said as he went out, as though he want'
```
## Reformer Language model on character level and trained on enwik8.
*enwik8* is a dataset based on Wikipedia and is often used to measure the model's ability to *compress* data, *e.g.* in
the scope of the *Hutter prize*: https://en.wikipedia.org/wiki/Hutter_Prize.
`reformer-enwik8` was pretrained on the first 90M chars of *enwik8* whereas the text was chunked into batches of size 65536 chars (=2^16).
The model's weights were taken from https://console.cloud.google.com/storage/browser/trax-ml/reformer/enwik8 and converted
to Hugging Face's PyTorch ReformerLM model `ReformerModelWithLMHead`.
The model is a language model that operates on characters.
Therefore, this model does not need a tokenizer. The following function can instead be used for **encoding** and **decoding**:
```python
import torch
# Encoding
def encode(list_of_strings, pad_token_id=0):
max_length = max([len(string) for string in list_of_strings])
# create emtpy tensors
attention_masks = torch.zeros((len(list_of_strings), max_length), dtype=torch.long)
input_ids = torch.full((len(list_of_strings), max_length), pad_token_id, dtype=torch.long)
for idx, string in enumerate(list_of_strings):
# make sure string is in byte format
if not isinstance(string, bytes):
string = str.encode(string)
input_ids[idx, :len(string)] = torch.tensor([x + 2 for x in string])
attention_masks[idx, :len(string)] = 1
return input_ids, attention_masks
# Decoding
def decode(outputs_ids):
decoded_outputs = []
for output_ids in outputs_ids.tolist():
# transform id back to char IDs < 2 are simply transformed to ""
decoded_outputs.append("".join([chr(x - 2) if x > 1 else "" for x in output_ids]))
return decoded_outputs
```
Text can be generated as follows:
```python
from transformers import ReformerModelWithLMHead
model = ReformerModelWithLMHead.from_pretrained("google/reformer-enwik8")
encoded, attention_masks = encode(["In 1965, Brooks left IBM to found the Department of"])
decode(model.generate(encoded, do_sample=True, max_length=150))
# gives:
# In 1965, Brooks left IBM to found the Department of Journalism in 1968. IBM had jurisdiction himself in 1980, while Brooks resolved, nevertheless thro
```
***Note***: Language generation using `ReformerModelWithLMHead` is not optimized yet and is rather slow.
---
language: en
license: apache-2.0
datasets:
- xsum
tags:
- summarization
---
# Roberta2Roberta_L-24_bbc EncoderDecoder model
The model was introduced in
[this paper](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn and first released in [this repository](https://tfhub.dev/google/bertseq2seq/roberta24_bbc/1).
The model is an encoder-decoder model that was initialized on the `roberta-large` checkpoints for both the encoder
and decoder and fine-tuned on extreme summarization on the BBC XSum dataset, which is linked above.
Disclaimer: The model card has been written by the Hugging Face team.
## How to use
You can use this model for extreme summarization, *e.g.*
```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("google/roberta2roberta_L-24_bbc")
model = AutoModelForSeq2SeqLM.from_pretrained("google/roberta2roberta_L-24_bbc")
article = """The problem is affecting people using the older
versions of the PlayStation 3, called the "Fat"
model.The problem isn't affecting the newer PS3
Slim systems that have been on sale since
September last year.Sony have also said they are
aiming to have the problem fixed shortly but is
advising some users to avoid using their console
for the time being."We hope to resolve this
problem within the next 24 hours," a statement
reads. "In the meantime, if you have a model other
than the new slim PS3, we advise that you do not
use your PS3 system, as doing so may result in
errors in some functionality, such as recording
obtained trophies, and not being able to restore
certain data."We believe we have identified that
this problem is being caused by a bug in the clock
functionality incorporated in the system."The
PlayStation Network is used by millions of people
around the world.It allows users to play their
friends at games like Fifa over the internet and
also do things like download software or visit
online stores."""
input_ids = tokenizer(article, return_tensors="pt").input_ids
output_ids = model.generate(input_ids)[0]
print(tokenizer.decode(output_ids, skip_special_tokens=True))
# should output
# Some Sony PlayStation gamers are being advised to stay away from the network because of a problem with the PlayStation 3 network.
```
---
language: en
license: apache-2.0
datasets:
- cnn_dailymail
tags:
- summarization
---
# Roberta2Roberta_L-24_cnn_daily_mail EncoderDecoder model
The model was introduced in
[this paper](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn and first released in [this repository](https://tfhub.dev/google/bertseq2seq/roberta24_cnndm/1).
The model is an encoder-decoder model that was initialized on the `roberta-large` checkpoints for both the encoder
and decoder and fine-tuned on summarization on the CNN / Dailymail dataset, which is linked above.
Disclaimer: The model card has been written by the Hugging Face team.
## How to use
You can use this model for summarization, *e.g.*
```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("google/roberta2roberta_L-24_cnn_daily_mail")
model = AutoModelForSeq2SeqLM.from_pretrained("google/roberta2roberta_L-24_cnn_daily_mail")
article = """ (The Hollywood Reporter)"The Rocky Horror Picture
Show" is the latest musical getting the small-
screen treatment. Fox is developing a two-hour
remake of the 1975 cult classic to be directed,
executive-produced and choreographed by Kenneth
Ortega ("High School Musical"). The project,
tentatively titled "The Rocky Horror Picture Show
Event," is casting-contingent. The special will be
filmed in advance and not air live, but few
details beyond that are known. In addition to
Ortega, Gail Berman and Lou Adler, who produced
the original film, are also attached as executive
producers. The special will be produced by Fox 21
Television Studios, and Berman's The Jackal Group.
The special is timed to celebrate the 40th
anniversary of the film, which has grossed more
than $112 million and still plays in theaters
across the country. TV premiere dates: The
complete guide . This isn't the first stab at
adapting "The Rocky Horror Picture Show." In 2002,
Fox unveiled plans for an adaptation timed to the
30th anniversary that never came to fruition. The
faces of pilot season 2015 . Fox's "Glee" covered
several of the show's most popular songs for a
Season 2 episode and even released a special "The
Rocky Horror Glee Show" EP. There is no plan yet
for when the adaptation will air. Fox also has a
live musical production of "Grease", starring
Julianne Hough and Vanessa Hudgens, scheduled to
air on Jan. 31, 2016. Broadcast TV scorecard .
Following in the footsteps of "The Sound of Music"
and "Peter Pan," NBC recently announced plans to
air a live version of The Wiz later this year.
Ortega's credits include "Gilmore Girls," "This Is
It" and "Hocus Pocus." He is repped by Paradigm
and Hanson, Jacobson. ©2015 The Hollywood
Reporter. All rights reserved."""
input_ids = tokenizer(article, return_tensors="pt").input_ids
output_ids = model.generate(input_ids)[0]
print(tokenizer.decode(output_ids, skip_special_tokens=True))
# should output
# Fox is developing a two-hour remake of the 1975 cult classic. The special will be directed, executive-produced and choreographed by Kenneth Ortega.
# The special is timed to celebrate the 40th anniversary of the film, which has grossed more than $112 million.
```
---
language: en
license: apache-2.0
datasets:
- discofuse
---
# Roberta2Roberta_L-24_discofuse EncoderDecoder model
The model was introduced in
[this paper](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn and first released in [this repository](https://tfhub.dev/google/bertseq2seq/roberta24_discofuse/1).
The model is an encoder-decoder model that was initialized on the `roberta-large` checkpoints for both the encoder
and decoder and fine-tuned on sentencefusion on the discofuse dataset, which is linked above.
Disclaimer: The model card has been written by the Hugging Face team.
## How to use
You can use this model for sentence fusion, *e.g.*
IMPORTANT: The model was not trained on the `"` (double quotation mark) character -> so the before tokenizing the text, it is advised to replace all `"` (double quotation marks) with a single `` ` `` (single back tick).
```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("google/roberta2roberta_L-24_discofuse")
model = AutoModelForSeq2SeqLM.from_pretrained("google/roberta2roberta_L-24_discofuse")
discofuse = """As a run-blocker, Zeitler moves relatively well. Zeitler often struggles at the point of contact in space."""
input_ids = tokenizer(discofuse, return_tensors="pt").input_ids
output_ids = model.generate(input_ids)[0]
print(tokenizer.decode(output_ids, skip_special_tokens=True))
# should output
# As a run-blocker, Zeitler moves relatively well. However, Zeitler often struggles at the point of contact in space.
```
---
language: en
license: apache-2.0
datasets:
- gigaword
tags:
- summarization
---
# Roberta2Roberta_L-24_gigaword EncoderDecoder model
The model was introduced in
[this paper](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn and first released in [this repository](https://tfhub.dev/google/bertseq2seq/roberta24_gigaword/1).
The model is an encoder-decoder model that was initialized on the `roberta-large` checkpoints for both the encoder
and decoder and fine-tuned on headline generation using the Gigaword dataset, which is linked above.
Disclaimer: The model card has been written by the Hugging Face team.
## How to use
You can use this model for extreme summarization, *e.g.*
```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("google/roberta2roberta_L-24_gigaword")
model = AutoModelForSeq2SeqLM.from_pretrained("google/roberta2roberta_L-24_gigaword")
article = """australian shares closed down #.# percent monday
following a weak lead from the united states and
lower commodity prices , dealers said ."""
input_ids = tokenizer(article, return_tensors="pt").input_ids
output_ids = model.generate(input_ids)[0]
print(tokenizer.decode(output_ids, skip_special_tokens=True))
# should output
# australian shares close down #.# percent.
```
---
language: en
license: apache-2.0
---
# Roberta2Roberta_L-24_wikisplit EncoderDecoder model
The model was introduced in
[this paper](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn and first released in [this repository](https://tfhub.dev/google/bertseq2seq/roberta24_cnndm/1).
The model is an encoder-decoder model that was initialized on the `roberta-large` checkpoints for both the encoder
and decoder and fine-tuned on sentence splitting on the [WikiSplit](https://github.com/google-research-datasets/wiki-split) dataset.
Disclaimer: The model card has been written by the Hugging Face team.
## How to use
You can use this model for sentence splitting, *e.g.*
**IMPORTANT**: The model was not trained on the `"` (double quotation mark) character -> so the before tokenizing the text,
it is advised to replace all `"` (double quotation marks) with two single `'` (single quotation mark).
```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("google/roberta2roberta_L-24_wikisplit")
model = AutoModelForSeq2SeqLM.from_pretrained("google/roberta2roberta_L-24_wikisplit")
long_sentence = """Due to the hurricane, Lobsterfest has been canceled, making Bob very happy about it and he decides to open Bob 's Burgers for customers who were planning on going to Lobsterfest."""
input_ids = tokenizer(tokenizer.bos_token + long_sentence + tokenizer.eos_token, return_tensors="pt").input_ids
output_ids = model.generate(input_ids)[0]
print(tokenizer.decode(output_ids, skip_special_tokens=True))
# should output
# Due to the hurricane, Lobsterfest has been canceled, making Bob very happy about it. He decides to open Bob's Burgers for customers who were planning on going to Lobsterfest.
```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment