bertgeneration.rst 4.47 KB
Newer Older
1
BertGeneration
Sylvain Gugger's avatar
Sylvain Gugger committed
2
-----------------------------------------------------------------------------------------------------------------------
3
4

Overview
Sylvain Gugger's avatar
Sylvain Gugger committed
5
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
6

Sylvain Gugger's avatar
Sylvain Gugger committed
7
8
9
The BertGeneration model is a BERT model that can be leveraged for sequence-to-sequence tasks using
:class:`~transformers.EncoderDecoderModel` as proposed in `Leveraging Pre-trained Checkpoints for Sequence Generation
Tasks <https://arxiv.org/abs/1907.12461>`__ by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
10
11
12

The abstract from the paper is the following:

Sylvain Gugger's avatar
Sylvain Gugger committed
13
14
15
16
17
18
19
20
*Unsupervised pre-training of large neural models has recently revolutionized Natural Language Processing. By
warm-starting from the publicly released checkpoints, NLP practitioners have pushed the state-of-the-art on multiple
benchmarks while saving significant amounts of compute time. So far the focus has been mainly on the Natural Language
Understanding tasks. In this paper, we demonstrate the efficacy of pre-trained checkpoints for Sequence Generation. We
developed a Transformer-based sequence-to-sequence model that is compatible with publicly available pre-trained BERT,
GPT-2 and RoBERTa checkpoints and conducted an extensive empirical study on the utility of initializing our model, both
encoder and decoder, with these checkpoints. Our models result in new state-of-the-art results on Machine Translation,
Text Summarization, Sentence Splitting, and Sentence Fusion.*
21
22
23

Usage:

Sylvain Gugger's avatar
Sylvain Gugger committed
24
25
- The model can be used in combination with the :class:`~transformers.EncoderDecoderModel` to leverage two pretrained
  BERT checkpoints for subsequent fine-tuning.
26

Sylvain Gugger's avatar
Sylvain Gugger committed
27
28
.. code-block::

29
  # leverage checkpoints for Bert2Bert model...
Sylvain Gugger's avatar
Sylvain Gugger committed
30
31
32
33
  # use BERT's cls token as BOS token and sep token as EOS token
  encoder = BertGenerationEncoder.from_pretrained("bert-large-uncased", bos_token_id=101, eos_token_id=102)
  # add cross attention layers and use BERT's cls token as BOS token and sep token as EOS token
  decoder = BertGenerationDecoder.from_pretrained("bert-large-uncased", add_cross_attention=True, is_decoder=True, bos_token_id=101, eos_token_id=102)
34
  bert2bert = EncoderDecoderModel(encoder=encoder, decoder=decoder)
Sylvain Gugger's avatar
Sylvain Gugger committed
35

36
37
38
39
40
41
42
  # create tokenizer...
  tokenizer = BertTokenizer.from_pretrained("bert-large-uncased")

  input_ids = tokenizer('This is a long article to summarize', add_special_tokens=False, return_tensors="pt").input_ids
  labels = tokenizer('This is a short summary', return_tensors="pt").input_ids

  # train...
43
  loss = bert2bert(input_ids=input_ids, decoder_input_ids=labels, labels=labels).loss
44
45
46
  loss.backward()


Sylvain Gugger's avatar
Sylvain Gugger committed
47
- Pretrained :class:`~transformers.EncoderDecoderModel` are also directly available in the model hub, e.g.,
48
49


50
.. code-block::
51
52
53
54
55
56
57
58
59
60
61
62
63
64

  # instantiate sentence fusion model
  sentence_fuser = EncoderDecoderModel.from_pretrained("google/roberta2roberta_L-24_discofuse")
  tokenizer = AutoTokenizer.from_pretrained("google/roberta2roberta_L-24_discofuse")

  input_ids = tokenizer('This is the first sentence. This is the second sentence.', add_special_tokens=False, return_tensors="pt").input_ids

  outputs = sentence_fuser.generate(input_ids)

  print(tokenizer.decode(outputs[0]))


Tips:

Sylvain Gugger's avatar
Sylvain Gugger committed
65
66
67
68
- :class:`~transformers.BertGenerationEncoder` and :class:`~transformers.BertGenerationDecoder` should be used in
  combination with :class:`~transformers.EncoderDecoder`.
- For summarization, sentence splitting, sentence fusion and translation, no special tokens are required for the input.
  Therefore, no EOS token should be added to the end of the input.
69
70
71
72

The original code can be found `here <https://tfhub.dev/s?module-type=text-generation&subtype=module,placeholder>`__.

BertGenerationConfig
Sylvain Gugger's avatar
Sylvain Gugger committed
73
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
74
75
76
77
78
79

.. autoclass:: transformers.BertGenerationConfig
    :members:


BertGenerationTokenizer
Sylvain Gugger's avatar
Sylvain Gugger committed
80
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
81
82

.. autoclass:: transformers.BertGenerationTokenizer
Sylvain Gugger's avatar
Sylvain Gugger committed
83
    :members: save_vocabulary
84
85

BertGenerationEncoder
Sylvain Gugger's avatar
Sylvain Gugger committed
86
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
87
88

.. autoclass:: transformers.BertGenerationEncoder
Sylvain Gugger's avatar
Sylvain Gugger committed
89
    :members: forward
90
91
92


BertGenerationDecoder
Sylvain Gugger's avatar
Sylvain Gugger committed
93
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
94
95

.. autoclass:: transformers.BertGenerationDecoder
Sylvain Gugger's avatar
Sylvain Gugger committed
96
    :members: forward