pegasus.md 6.5 KB
Newer Older
Sylvain Gugger's avatar
Sylvain Gugger committed
1
2
3
4
5
6
7
8
9
10
<!--Copyright 2020 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
11
12
13
14

鈿狅笍 Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

Sylvain Gugger's avatar
Sylvain Gugger committed
15
16
17
18
-->

# Pegasus

Steven Liu's avatar
Steven Liu committed
19
20
21
22
23
24
25
26
27
<div class="flex flex-wrap space-x-1">
<a href="https://huggingface.co/models?filter=pegasus">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-pegasus-blueviolet">
</a>
<a href="https://huggingface.co/spaces/docs-demos/pegasus_paraphrase">
<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
</a>
</div>

Sylvain Gugger's avatar
Sylvain Gugger committed
28
29
30
31
32
33
34
35
36
37
38
39
40
41

## Overview

The Pegasus model was proposed in [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/pdf/1912.08777.pdf) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu on Dec 18, 2019.

According to the abstract,

- Pegasus' pretraining task is intentionally similar to summarization: important sentences are removed/masked from an
  input document and are generated together as one output sequence from the remaining sentences, similar to an
  extractive summary.
- Pegasus achieves SOTA summarization performance on all 12 downstream tasks, as measured by ROUGE and human eval.

This model was contributed by [sshleifer](https://huggingface.co/sshleifer). The Authors' code can be found [here](https://github.com/google-research/pegasus).

42
## Usage tips
Steven Liu's avatar
Steven Liu committed
43
44
45
46
47

- Sequence-to-sequence model with the same encoder-decoder model architecture as BART. Pegasus is pre-trained jointly on two self-supervised objective functions: Masked Language Modeling (MLM) and a novel summarization specific pretraining objective, called Gap Sentence Generation (GSG).

  * MLM: encoder input tokens are randomly replaced by a mask tokens and have to be predicted by the encoder (like in BERT)
  * GSG: whole encoder input sentences are replaced by a second mask token and fed to the decoder, but which has a causal mask to hide the future words like a regular auto-regressive transformer decoder.
Sylvain Gugger's avatar
Sylvain Gugger committed
48

49
50
51
52
- FP16 is not supported (help/ideas on this appreciated!).
- The adafactor optimizer is recommended for pegasus fine-tuning.


Sylvain Gugger's avatar
Sylvain Gugger committed
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
## Checkpoints

All the [checkpoints](https://huggingface.co/models?search=pegasus) are fine-tuned for summarization, besides
*pegasus-large*, whence the other checkpoints are fine-tuned:

- Each checkpoint is 2.2 GB on disk and 568M parameters.
- FP16 is not supported (help/ideas on this appreciated!).
- Summarizing xsum in fp32 takes about 400ms/sample, with default parameters on a v100 GPU.
- Full replication results and correctly pre-processed data can be found in this [Issue](https://github.com/huggingface/transformers/issues/6844#issue-689259666).
- [Distilled checkpoints](https://huggingface.co/models?search=distill-pegasus) are described in this [paper](https://arxiv.org/abs/2010.13002).

## Implementation Notes

- All models are transformer encoder-decoders with 16 layers in each component.
- The implementation is completely inherited from [`BartForConditionalGeneration`]
- Some key configuration differences:
  - static, sinusoidal position embeddings
  - the model starts generating with pad_token_id (which has 0 token_embedding) as the prefix.
  - more beams are used (`num_beams=8`)
- All pretrained pegasus checkpoints are the same besides three attributes: `tokenizer.model_max_length` (maximum
  input size), `max_length` (the maximum number of tokens to generate) and `length_penalty`.
- The code to convert checkpoints trained in the author's [repo](https://github.com/google-research/pegasus) can be
  found in `convert_pegasus_tf_to_pytorch.py`.

## Usage Example

```python
>>> from transformers import PegasusForConditionalGeneration, PegasusTokenizer
>>> import torch
Sylvain Gugger's avatar
Sylvain Gugger committed
82

Sylvain Gugger's avatar
Sylvain Gugger committed
83
84
>>> src_text = [
...     """ PG&E stated it scheduled the blackouts in response to forecasts for high winds amid dry conditions. The aim is to reduce the risk of wildfires. Nearly 800 thousand customers were scheduled to be affected by the shutoffs which were expected to last through at least midday tomorrow."""
Sylvain Gugger's avatar
Sylvain Gugger committed
85
86
87
88
89
90
91
92
93
94
95
96
97
... ]

... model_name = "google/pegasus-xsum"
... device = "cuda" if torch.cuda.is_available() else "cpu"
... tokenizer = PegasusTokenizer.from_pretrained(model_name)
... model = PegasusForConditionalGeneration.from_pretrained(model_name).to(device)
... batch = tokenizer(src_text, truncation=True, padding="longest", return_tensors="pt").to(device)
... translated = model.generate(**batch)
... tgt_text = tokenizer.batch_decode(translated, skip_special_tokens=True)
... assert (
...     tgt_text[0]
...     == "California's largest electricity provider has turned off power to hundreds of thousands of customers."
... )
Sylvain Gugger's avatar
Sylvain Gugger committed
98
99
```

100
## Resources
101

102
103
- [Script](https://github.com/huggingface/transformers/tree/main/examples/research_projects/seq2seq-distillation/finetune_pegasus_xsum.sh) to fine-tune pegasus
  on the XSUM dataset. Data download instructions at [examples/pytorch/summarization/](https://github.com/huggingface/transformers/tree/main/examples/pytorch/summarization/README.md).
104
105
106
- [Causal language modeling task guide](../tasks/language_modeling)
- [Translation task guide](../tasks/translation)
- [Summarization task guide](../tasks/summarization)
107

Sylvain Gugger's avatar
Sylvain Gugger committed
108
109
110
111
112
113
114
115
116
117
118
119
120
121
## PegasusConfig

[[autodoc]] PegasusConfig

## PegasusTokenizer

warning: `add_tokens` does not work at the moment.

[[autodoc]] PegasusTokenizer

## PegasusTokenizerFast

[[autodoc]] PegasusTokenizerFast

122
123
124
<frameworkcontent>
<pt>

Sylvain Gugger's avatar
Sylvain Gugger committed
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
## PegasusModel

[[autodoc]] PegasusModel
    - forward

## PegasusForConditionalGeneration

[[autodoc]] PegasusForConditionalGeneration
    - forward

## PegasusForCausalLM

[[autodoc]] PegasusForCausalLM
    - forward

140
141
142
</pt>
<tf>

Sylvain Gugger's avatar
Sylvain Gugger committed
143
144
145
146
147
148
149
150
151
152
## TFPegasusModel

[[autodoc]] TFPegasusModel
    - call

## TFPegasusForConditionalGeneration

[[autodoc]] TFPegasusForConditionalGeneration
    - call

153
154
155
</tf>
<jax>

Sylvain Gugger's avatar
Sylvain Gugger committed
156
157
158
159
160
161
162
163
164
165
166
167
168
## FlaxPegasusModel

[[autodoc]] FlaxPegasusModel
    - __call__
    - encode
    - decode

## FlaxPegasusForConditionalGeneration

[[autodoc]] FlaxPegasusForConditionalGeneration
    - __call__
    - encode
    - decode
169
170
171

</jax>
</frameworkcontent>