"tests/layoutlmv2/test_tokenization_layoutlmv2.py" did not exist on "6326aa4bf0f20c1f7825d096e64143343986cef2"
bart.md 10.3 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
<!--Copyright 2020 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
11
12
13
14

鈿狅笍 Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

15
16
17
18
-->

# BART

Steven Liu's avatar
Steven Liu committed
19
20
21
22
23
24
25
26
27
<div class="flex flex-wrap space-x-1">
<a href="https://huggingface.co/models?filter=bart">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-bart-blueviolet">
</a>
<a href="https://huggingface.co/spaces/docs-demos/bart-large-mnli">
<img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
</a>
</div>

28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
## Overview

The Bart model was proposed in [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation,
Translation, and Comprehension](https://arxiv.org/abs/1910.13461) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan
Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer on 29 Oct, 2019.

According to the abstract,

- Bart uses a standard seq2seq/machine translation architecture with a bidirectional encoder (like BERT) and a
  left-to-right decoder (like GPT).
- The pretraining task involves randomly shuffling the order of the original sentences and a novel in-filling scheme,
  where spans of text are replaced with a single mask token.
- BART is particularly effective when fine tuned for text generation but also works well for comprehension tasks. It
  matches the performance of RoBERTa with comparable training resources on GLUE and SQuAD, achieves new
  state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains
  of up to 6 ROUGE.

45
46
47
This model was contributed by [sshleifer](https://huggingface.co/sshleifer). The authors' code can be found [here](https://github.com/pytorch/fairseq/tree/master/examples/bart).

## Usage tips:
48
49
50

- BART is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
  the left.
Steven Liu's avatar
Steven Liu committed
51
52
53
54
55
56
57
- Sequence-to-sequence model with an encoder and a decoder. Encoder is fed a corrupted version of the tokens, decoder is fed the original tokens (but has a mask to hide the future words like a regular transformers decoder). A composition of the following transformations are applied on the pretraining tasks for the encoder:

  * mask random tokens (like in BERT)
  * delete random tokens
  * mask a span of k tokens with a single mask token (a span of 0 tokens is an insertion of a mask token)
  * permute sentences
  * rotate the document to make it start at a specific token
58

59
60
61
62
63
64
65
## Implementation Notes

- Bart doesn't use `token_type_ids` for sequence classification. Use [`BartTokenizer`] or
  [`~BartTokenizer.encode`] to get the proper splitting.
- The forward pass of [`BartModel`] will create the `decoder_input_ids` if they are not passed.
  This is different than some other modeling APIs. A typical use case of this feature is mask filling.
- Model predictions are intended to be identical to the original implementation when
66
  `forced_bos_token_id=0`. This only works, however, if the string you pass to
67
  [`fairseq.encode`] starts with a space.
68
- [`~generation.GenerationMixin.generate`] should be used for conditional generation tasks like
69
70
71
72
73
74
75
76
77
78
  summarization, see the example in that docstrings.
- Models that load the *facebook/bart-large-cnn* weights will not have a `mask_token_id`, or be able to perform
  mask-filling tasks.

## Mask Filling

The `facebook/bart-base` and `facebook/bart-large` checkpoints can be used to fill multi-token masks.

```python
from transformers import BartForConditionalGeneration, BartTokenizer
Sylvain Gugger's avatar
Sylvain Gugger committed
79

80
81
82
model = BartForConditionalGeneration.from_pretrained("facebook/bart-large", forced_bos_token_id=0)
tok = BartTokenizer.from_pretrained("facebook/bart-large")
example_english_phrase = "UN Chief Says There Is No <mask> in Syria"
Sylvain Gugger's avatar
Sylvain Gugger committed
83
84
85
86
87
batch = tok(example_english_phrase, return_tensors="pt")
generated_ids = model.generate(batch["input_ids"])
assert tok.batch_decode(generated_ids, skip_special_tokens=True) == [
    "UN Chief Says There Is No Plan to Stop Chemical Weapons in Syria"
]
88
89
```

Steven Liu's avatar
Steven Liu committed
90
91
92
93
94
95
96
## Resources

A list of official Hugging Face and community (indicated by 馃寧) resources to help you get started with BART. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.

<PipelineTag pipeline="summarization"/>

- A blog post on [Distributed Training: Train BART/T5 for Summarization using 馃 Transformers and Amazon SageMaker](https://huggingface.co/blog/sagemaker-distributed-training-seq2seq).
97
- A notebook on how to [finetune BART for summarization with fastai using blurr](https://colab.research.google.com/github/ohmeow/ohmeow_website/blob/master/posts/2021-05-25-mbart-sequence-classification-with-blurr.ipynb). 馃寧
Steven Liu's avatar
Steven Liu committed
98
- A notebook on how to [finetune BART for summarization in two languages with Trainer class](https://colab.research.google.com/github/elsanns/xai-nlp-notebooks/blob/master/fine_tune_bart_summarization_two_langs.ipynb). 馃寧
99
- [`BartForConditionalGeneration`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/summarization) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/summarization.ipynb).
Steven Liu's avatar
Steven Liu committed
100
101
- [`TFBartForConditionalGeneration`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/summarization) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/summarization-tf.ipynb).
- [`FlaxBartForConditionalGeneration`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/summarization).
102
- An example of how to train [`BartForConditionalGeneration`] with a Hugging Face `datasets` object can be found in this [forum discussion](https://discuss.huggingface.co/t/train-bart-for-conditional-generation-e-g-summarization/1904)
Steven Liu's avatar
Steven Liu committed
103
- [Summarization](https://huggingface.co/course/chapter7/5?fw=pt#summarization) chapter of the 馃 Hugging Face course.
104
- [Summarization task guide](../tasks/summarization)
Steven Liu's avatar
Steven Liu committed
105
106
107
108
109
110
111

<PipelineTag pipeline="fill-mask"/>

- [`BartForConditionalGeneration`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling#robertabertdistilbert-and-masked-language-modeling) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling.ipynb).
- [`TFBartForConditionalGeneration`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/language-modeling#run_mlmpy) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling-tf.ipynb).
- [`FlaxBartForConditionalGeneration`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/language-modeling#masked-language-modeling) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/masked_language_modeling_flax.ipynb).
- [Masked language modeling](https://huggingface.co/course/chapter7/3?fw=pt) chapter of the 馃 Hugging Face Course.
112
- [Masked language modeling task guide](../tasks/masked_language_modeling)
Steven Liu's avatar
Steven Liu committed
113
114
115
116
117
118

<PipelineTag pipeline="translation"/>

- A notebook on how to [finetune mBART using Seq2SeqTrainer for Hindi to English translation](https://colab.research.google.com/github/vasudevgupta7/huggingface-tutorials/blob/main/translation_training.ipynb). 馃寧
- [`BartForConditionalGeneration`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/translation) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/translation.ipynb).
- [`TFBartForConditionalGeneration`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/translation) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/translation-tf.ipynb).
119
- [Translation task guide](../tasks/translation)
120
121

See also:
122
123
124
- [Text classification task guide](../tasks/sequence_classification)
- [Question answering task guide](../tasks/question_answering)
- [Causal language modeling task guide](../tasks/language_modeling)
125
- [Distilled checkpoints](https://huggingface.co/models?search=distilbart) are described in this [paper](https://arxiv.org/abs/2010.13002).
Steven Liu's avatar
Steven Liu committed
126

127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
## BartConfig

[[autodoc]] BartConfig
    - all

## BartTokenizer

[[autodoc]] BartTokenizer
    - all

## BartTokenizerFast

[[autodoc]] BartTokenizerFast
    - all

142
143
144
145

<frameworkcontent>
<pt>

146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
## BartModel

[[autodoc]] BartModel
    - forward

## BartForConditionalGeneration

[[autodoc]] BartForConditionalGeneration
    - forward

## BartForSequenceClassification

[[autodoc]] BartForSequenceClassification
    - forward

## BartForQuestionAnswering

[[autodoc]] BartForQuestionAnswering
    - forward

## BartForCausalLM

[[autodoc]] BartForCausalLM
    - forward

171
172
173
</pt>
<tf>

174
175
176
177
178
179
180
181
182
183
## TFBartModel

[[autodoc]] TFBartModel
    - call

## TFBartForConditionalGeneration

[[autodoc]] TFBartForConditionalGeneration
    - call

184
185
186
187
188
## TFBartForSequenceClassification

[[autodoc]] TFBartForSequenceClassification
    - call

189
190
191
</tf>
<jax>

192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
## FlaxBartModel

[[autodoc]] FlaxBartModel
    - __call__
    - encode
    - decode

## FlaxBartForConditionalGeneration

[[autodoc]] FlaxBartForConditionalGeneration
    - __call__
    - encode
    - decode

## FlaxBartForSequenceClassification

[[autodoc]] FlaxBartForSequenceClassification
    - __call__
    - encode
    - decode

## FlaxBartForQuestionAnswering

[[autodoc]] FlaxBartForQuestionAnswering
    - __call__
    - encode
    - decode
219
220
221
222

## FlaxBartForCausalLM

[[autodoc]] FlaxBartForCausalLM
223
    - __call__
224
225
226
227
228
</jax>
</frameworkcontent>