pegasus.rst 6.19 KB
Newer Older
Sylvain Gugger's avatar
Sylvain Gugger committed
1
2
3
4
5
6
7
8
9
10
11
12
.. 
    Copyright 2020 The HuggingFace Team. All rights reserved.

    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
    the License. You may obtain a copy of the License at

        http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
    an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
    specific language governing permissions and limitations under the License.

13
Pegasus
Sylvain Gugger's avatar
Sylvain Gugger committed
14
-----------------------------------------------------------------------------------------------------------------------
15
16
17

**DISCLAIMER:** If you see something strange, file a `Github Issue
<https://github.com/huggingface/transformers/issues/new?assignees=sshleifer&labels=&template=bug-report.md&title>`__
18
and assign @patrickvonplaten.
19
20
21


Overview
Sylvain Gugger's avatar
Sylvain Gugger committed
22
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
23

Sylvain Gugger's avatar
Sylvain Gugger committed
24
25
The Pegasus model was proposed in `PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
<https://arxiv.org/pdf/1912.08777.pdf>`__ by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu on Dec 18, 2019.
26

27
28
According to the abstract,

29
30
31
- Pegasus' pretraining task is intentionally similar to summarization: important sentences are removed/masked from an
  input document and are generated together as one output sequence from the remaining sentences, similar to an
  extractive summary.
32
33
- Pegasus achieves SOTA summarization performance on all 12 downstream tasks, as measured by ROUGE and human eval.

34
The Authors' code can be found `here <https://github.com/google-research/pegasus>`__.
35
36
37


Checkpoints
Sylvain Gugger's avatar
Sylvain Gugger committed
38
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
39

Sylvain Gugger's avatar
Sylvain Gugger committed
40
All the `checkpoints <https://huggingface.co/models?search=pegasus>`__ are fine-tuned for summarization, besides
41
42
`pegasus-large`, whence the other checkpoints are fine-tuned:

43
44
45
- Each checkpoint is 2.2 GB on disk and 568M parameters.
- FP16 is not supported (help/ideas on this appreciated!).
- Summarizing xsum in fp32 takes about 400ms/sample, with default parameters on a v100 GPU.
46
47
48
49
- Full replication results and correctly pre-processed data can be found in this `Issue
  <https://github.com/huggingface/transformers/issues/6844#issue-689259666>`__.
- `Distilled checkpoints <https://huggingface.co/models?search=distill-pegasus>`__ are described in this `paper
  <https://arxiv.org/abs/2010.13002>`__.
50

51
52
53
54
55
56
57
58
Examples
_______________________________________________________________________________________________________________________

- `Script <https://github.com/huggingface/transformers/blob/master/examples/seq2seq/finetune_pegasus_xsum.sh>`__ to
  fine-tune pegasus on the XSUM dataset. Data download instructions at `examples/seq2seq/
  <https://github.com/huggingface/transformers/blob/master/examples/seq2seq/README.md>`__.
- FP16 is not supported (help/ideas on this appreciated!).
- The adafactor optimizer is recommended for pegasus fine-tuning.
59
60
61


Implementation Notes
Sylvain Gugger's avatar
Sylvain Gugger committed
62
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
63
64

- All models are transformer encoder-decoders with 16 layers in each component.
65
- The implementation is completely inherited from :class:`~transformers.BartForConditionalGeneration`
66
- Some key configuration differences:
Sylvain Gugger's avatar
Sylvain Gugger committed
67

68
    - static, sinusoidal position embeddings
69
    - no :obj:`layernorm_embedding` (:obj:`PegasusConfig.normalize_embedding=False`)
70
    - the model starts generating with pad_token_id (which has 0 token_embedding) as the prefix.
71
72
73
74
75
    - more beams are used (:obj:`num_beams=8`)
- All pretrained pegasus checkpoints are the same besides three attributes: :obj:`tokenizer.model_max_length` (maximum
  input size), :obj:`max_length` (the maximum number of tokens to generate) and :obj:`length_penalty`.
- The code to convert checkpoints trained in the author's `repo <https://github.com/google-research/pegasus>`_ can be
  found in ``convert_pegasus_tf_to_pytorch.py``.
76
77
78


Usage Example
Sylvain Gugger's avatar
Sylvain Gugger committed
79
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
80
81
82
83

.. code-block:: python

    from transformers import PegasusForConditionalGeneration, PegasusTokenizer
84
    import torch
85
86
87
88
89
90
91
92
    src_text = [
        """ PG&E stated it scheduled the blackouts in response to forecasts for high winds amid dry conditions. The aim is to reduce the risk of wildfires. Nearly 800 thousand customers were scheduled to be affected by the shutoffs which were expected to last through at least midday tomorrow."""
    ]

    model_name = 'google/pegasus-xsum'
    torch_device = 'cuda' if torch.cuda.is_available() else 'cpu'
    tokenizer = PegasusTokenizer.from_pretrained(model_name)
    model = PegasusForConditionalGeneration.from_pretrained(model_name).to(torch_device)
93
    batch = tokenizer.prepare_seq2seq_batch(src_text, truncation=True, padding='longest', return_tensors="pt").to(torch_device)
94
95
    translated = model.generate(**batch)
    tgt_text = tokenizer.batch_decode(translated, skip_special_tokens=True)
96
    assert tgt_text[0] == "California's largest electricity provider has turned off power to hundreds of thousands of customers."
97

98
99


100
PegasusConfig
Sylvain Gugger's avatar
Sylvain Gugger committed
101
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
102

103
.. autoclass:: transformers.PegasusConfig
104
105
106


PegasusTokenizer
Sylvain Gugger's avatar
Sylvain Gugger committed
107
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
108

109
110
111
112
113
114
warning: ``add_tokens`` does not work at the moment.

.. autoclass:: transformers.PegasusTokenizer
    :members: __call__, prepare_seq2seq_batch


115
116
PegasusForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
117

118
.. autoclass:: transformers.PegasusForConditionalGeneration
119
120
121
122
123
124


TFPegasusForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.TFPegasusForConditionalGeneration