quicktour.md 24.5 KB
Newer Older
Steven Liu's avatar
Steven Liu committed
1
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Sylvain Gugger's avatar
Sylvain Gugger committed
2
3
4
5
6
7
8
9
10

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
11
12
13
14

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

Sylvain Gugger's avatar
Sylvain Gugger committed
15
16
17
18
-->

# Quick tour

19
20
[[open-in-colab]]

Steven Liu's avatar
Steven Liu committed
21
Get up and running with 🤗 Transformers! Whether you're a developer or an everyday user, this quick tour will help you get started and show you how to use the [`pipeline`] for inference, load a pretrained model and preprocessor with an [AutoClass](./model_doc/auto), and quickly train a model with PyTorch or TensorFlow. If you're a beginner, we recommend checking out our tutorials or [course](https://huggingface.co/course/chapter1/1) next for more in-depth explanations of the concepts introduced here.
Sylvain Gugger's avatar
Sylvain Gugger committed
22

Steven Liu's avatar
Steven Liu committed
23
Before you begin, make sure you have all the necessary libraries installed:
Steven Liu's avatar
Steven Liu committed
24

Steven Liu's avatar
Steven Liu committed
25
```bash
26
!pip install transformers datasets evaluate accelerate
Steven Liu's avatar
Steven Liu committed
27
```
Sylvain Gugger's avatar
Sylvain Gugger committed
28

Steven Liu's avatar
Steven Liu committed
29
You'll also need to install your preferred machine learning framework:
Sylvain Gugger's avatar
Sylvain Gugger committed
30

Sylvain Gugger's avatar
Sylvain Gugger committed
31
32
<frameworkcontent>
<pt>
33

Sylvain Gugger's avatar
Sylvain Gugger committed
34
35
```bash
pip install torch
Sylvain Gugger's avatar
Sylvain Gugger committed
36
37
38
```
</pt>
<tf>
39

Sylvain Gugger's avatar
Sylvain Gugger committed
40
```bash
Sylvain Gugger's avatar
Sylvain Gugger committed
41
42
pip install tensorflow
```
Sylvain Gugger's avatar
Sylvain Gugger committed
43
44
</tf>
</frameworkcontent>
Sylvain Gugger's avatar
Sylvain Gugger committed
45

Steven Liu's avatar
Steven Liu committed
46
47
48
49
## Pipeline

<Youtube id="tiZFewofSLM"/>

50
51
52
53
54
55
56
The [`pipeline`] is the easiest and fastest way to use a pretrained model for inference. You can use the [`pipeline`] out-of-the-box for many tasks across different modalities, some of which are shown in the table below:

<Tip>

For a complete list of available tasks, check out the [pipeline API reference](./main_classes/pipelines).

</Tip>
Steven Liu's avatar
Steven Liu committed
57
58
59

| **Task**                     | **Description**                                                                                              | **Modality**    | **Pipeline identifier**                       |
|------------------------------|--------------------------------------------------------------------------------------------------------------|-----------------|-----------------------------------------------|
60
61
62
63
64
65
66
67
68
| Text classification          | assign a label to a given sequence of text                                                                   | NLP             | pipeline(task=“sentiment-analysis”)           |
| Text generation              | generate text given a prompt                                                                                 | NLP             | pipeline(task=“text-generation”)              |
| Summarization                | generate a summary of a sequence of text or document                                                         | NLP             | pipeline(task=“summarization”)                |
| Image classification         | assign a label to an image                                                                                   | Computer vision | pipeline(task=“image-classification”)         |
| Image segmentation           | assign a label to each individual pixel of an image (supports semantic, panoptic, and instance segmentation) | Computer vision | pipeline(task=“image-segmentation”)           |
| Object detection             | predict the bounding boxes and classes of objects in an image                                                | Computer vision | pipeline(task=“object-detection”)             |
| Audio classification         | assign a label to some audio data                                                                            | Audio           | pipeline(task=“audio-classification”)         |
| Automatic speech recognition | transcribe speech into text                                                                                  | Audio           | pipeline(task=“automatic-speech-recognition”) |
| Visual question answering    | answer a question about the image, given an image and a question                                             | Multimodal      | pipeline(task=“vqa”)                          |
69
| Document question answering  | answer a question about the document, given a document and a question                                        | Multimodal      | pipeline(task="document-question-answering")  |
70
71
72
| Image captioning             | generate a caption for a given image                                                                         | Multimodal      | pipeline(task="image-to-text")                |

Start by creating an instance of [`pipeline`] and specifying a task you want to use it for. In this guide, you'll use the [`pipeline`] for sentiment analysis as an example:
Steven Liu's avatar
Steven Liu committed
73

Sylvain Gugger's avatar
Sylvain Gugger committed
74
75
```py
>>> from transformers import pipeline
Sylvain Gugger's avatar
Sylvain Gugger committed
76
77

>>> classifier = pipeline("sentiment-analysis")
Sylvain Gugger's avatar
Sylvain Gugger committed
78
79
```

80
The [`pipeline`] downloads and caches a default [pretrained model](https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english) and tokenizer for sentiment analysis. Now you can use the `classifier` on your target text:
Sylvain Gugger's avatar
Sylvain Gugger committed
81
82

```py
Sylvain Gugger's avatar
Sylvain Gugger committed
83
>>> classifier("We are very happy to show you the 🤗 Transformers library.")
84
[{'label': 'POSITIVE', 'score': 0.9998}]
Sylvain Gugger's avatar
Sylvain Gugger committed
85
86
```

Steven Liu's avatar
Steven Liu committed
87
If you have more than one input, pass your inputs as a list to the [`pipeline`] to return a list of dictionaries:
Sylvain Gugger's avatar
Sylvain Gugger committed
88
89

```py
Sylvain Gugger's avatar
Sylvain Gugger committed
90
>>> results = classifier(["We are very happy to show you the 🤗 Transformers library.", "We hope you don't hate it."])
Sylvain Gugger's avatar
Sylvain Gugger committed
91
92
93
94
95
96
>>> for result in results:
...     print(f"label: {result['label']}, with score: {round(result['score'], 4)}")
label: POSITIVE, with score: 0.9998
label: NEGATIVE, with score: 0.5309
```

Steven Liu's avatar
Steven Liu committed
97
The [`pipeline`] can also iterate over an entire dataset for any task you like. For this example, let's choose automatic speech recognition as our task:
Sylvain Gugger's avatar
Sylvain Gugger committed
98
99

```py
100
>>> import torch
Steven Liu's avatar
Steven Liu committed
101
>>> from transformers import pipeline
Sylvain Gugger's avatar
Sylvain Gugger committed
102

103
>>> speech_recognizer = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h")
Steven Liu's avatar
Steven Liu committed
104
```
Sylvain Gugger's avatar
Sylvain Gugger committed
105

Steven Liu's avatar
Steven Liu committed
106
Load an audio dataset (see the 🤗 Datasets [Quick Start](https://huggingface.co/docs/datasets/quickstart#audio) for more details) you'd like to iterate over. For example, load the [MInDS-14](https://huggingface.co/datasets/PolyAI/minds14) dataset:
Sylvain Gugger's avatar
Sylvain Gugger committed
107
108

```py
109
>>> from datasets import load_dataset, Audio
Steven Liu's avatar
Steven Liu committed
110

111
>>> dataset = load_dataset("PolyAI/minds14", name="en-US", split="train")  # doctest: +IGNORE_RESULT
Sylvain Gugger's avatar
Sylvain Gugger committed
112
113
```

Steven Liu's avatar
Steven Liu committed
114
115
You need to make sure the sampling rate of the dataset matches the sampling 
rate [`facebook/wav2vec2-base-960h`](https://huggingface.co/facebook/wav2vec2-base-960h) was trained on:
Sylvain Gugger's avatar
Sylvain Gugger committed
116
117

```py
118
119
120
>>> dataset = dataset.cast_column("audio", Audio(sampling_rate=speech_recognizer.feature_extractor.sampling_rate))
```

Steven Liu's avatar
Steven Liu committed
121
122
The audio files are automatically loaded and resampled when calling the `"audio"` column.
Extract the raw waveform arrays from the first 4 samples and pass it as a list to the pipeline:
123
124

```py
125
126
>>> result = speech_recognizer(dataset[:4]["audio"])
>>> print([d["text"] for d in result])
127
['I WOULD LIKE TO SET UP A JOINT ACCOUNT WITH MY PARTNER HOW DO I PROCEED WITH DOING THAT', "FONDERING HOW I'D SET UP A JOIN TO HELL T WITH MY WIFE AND WHERE THE AP MIGHT BE", "I I'D LIKE TOY SET UP A JOINT ACCOUNT WITH MY PARTNER I'M NOT SEEING THE OPTION TO DO IT ON THE APSO I CALLED IN TO GET SOME HELP CAN I JUST DO IT OVER THE PHONE WITH YOU AND GIVE YOU THE INFORMATION OR SHOULD I DO IT IN THE AP AN I'M MISSING SOMETHING UQUETTE HAD PREFERRED TO JUST DO IT OVER THE PHONE OF POSSIBLE THINGS", 'HOW DO I FURN A JOINA COUT']
Steven Liu's avatar
Steven Liu committed
128
```
Sylvain Gugger's avatar
Sylvain Gugger committed
129

Steven Liu's avatar
Steven Liu committed
130
For larger datasets where the inputs are big (like in speech or vision), you'll want to pass a generator instead of a list to load all the inputs in memory. Take a look at the [pipeline API reference](./main_classes/pipelines) for more information.
131

Steven Liu's avatar
Steven Liu committed
132
### Use another model and tokenizer in the pipeline
Sylvain Gugger's avatar
Sylvain Gugger committed
133

Steven Liu's avatar
Steven Liu committed
134
The [`pipeline`] can accommodate any model from the [Hub](https://huggingface.co/models), making it easy to adapt the [`pipeline`] for other use-cases. For example, if you'd like a model capable of handling French text, use the tags on the Hub to filter for an appropriate model. The top filtered result returns a multilingual [BERT model](https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment) finetuned for sentiment analysis you can use for French text:
Sylvain Gugger's avatar
Sylvain Gugger committed
135

Steven Liu's avatar
Steven Liu committed
136
137
138
```py
>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
```
Sylvain Gugger's avatar
Sylvain Gugger committed
139

Sylvain Gugger's avatar
Sylvain Gugger committed
140
141
<frameworkcontent>
<pt>
Steven Liu's avatar
Steven Liu committed
142
Use [`AutoModelForSequenceClassification`] and [`AutoTokenizer`] to load the pretrained model and it's associated tokenizer (more on an `AutoClass` in the next section):
Sylvain Gugger's avatar
Sylvain Gugger committed
143
144
145

```py
>>> from transformers import AutoTokenizer, AutoModelForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
146

Steven Liu's avatar
Steven Liu committed
147
>>> model = AutoModelForSequenceClassification.from_pretrained(model_name)
Sylvain Gugger's avatar
Sylvain Gugger committed
148
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
Sylvain Gugger's avatar
Sylvain Gugger committed
149
150
151
```
</pt>
<tf>
Steven Liu's avatar
Steven Liu committed
152
Use [`TFAutoModelForSequenceClassification`] and [`AutoTokenizer`] to load the pretrained model and it's associated tokenizer (more on an `TFAutoClass` in the next section):
Sylvain Gugger's avatar
Sylvain Gugger committed
153
154

```py
Sylvain Gugger's avatar
Sylvain Gugger committed
155
>>> from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
156

Steven Liu's avatar
Steven Liu committed
157
>>> model = TFAutoModelForSequenceClassification.from_pretrained(model_name)
Sylvain Gugger's avatar
Sylvain Gugger committed
158
159
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
```
Sylvain Gugger's avatar
Sylvain Gugger committed
160
161
</tf>
</frameworkcontent>
Sylvain Gugger's avatar
Sylvain Gugger committed
162

Steven Liu's avatar
Steven Liu committed
163
Specify the model and tokenizer in the [`pipeline`], and now you can apply the `classifier` on French text:
Steven Liu's avatar
Steven Liu committed
164
165
166
167

```py
>>> classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
>>> classifier("Nous sommes très heureux de vous présenter la bibliothèque 🤗 Transformers.")
168
[{'label': '5 stars', 'score': 0.7273}]
Steven Liu's avatar
Steven Liu committed
169
170
```

Steven Liu's avatar
Steven Liu committed
171
If you can't find a model for your use-case, you'll need to finetune a pretrained model on your data. Take a look at our [finetuning tutorial](./training) to learn how. Finally, after you've finetuned your pretrained model, please consider [sharing](./model_sharing) the model with the community on the Hub to democratize machine learning for everyone! 🤗
Steven Liu's avatar
Steven Liu committed
172
173
174
175
176

## AutoClass

<Youtube id="AhChOFRegn4"/>

Nathan Barry's avatar
Nathan Barry committed
177
Under the hood, the [`AutoModelForSequenceClassification`] and [`AutoTokenizer`] classes work together to power the [`pipeline`] you used above. An [AutoClass](./model_doc/auto) is a shortcut that automatically retrieves the architecture of a pretrained model from its name or path. You only need to select the appropriate `AutoClass` for your task and it's associated preprocessing class. 
Steven Liu's avatar
Steven Liu committed
178

Steven Liu's avatar
Steven Liu committed
179
Let's return to the example from the previous section and see how you can use the `AutoClass` to replicate the results of the [`pipeline`].
Sylvain Gugger's avatar
Sylvain Gugger committed
180

Steven Liu's avatar
Steven Liu committed
181
### AutoTokenizer
Sylvain Gugger's avatar
Sylvain Gugger committed
182

Steven Liu's avatar
Steven Liu committed
183
A tokenizer is responsible for preprocessing text into an array of numbers as inputs to a model. There are multiple rules that govern the tokenization process, including how to split a word and at what level words should be split (learn more about tokenization in the [tokenizer summary](./tokenizer_summary)). The most important thing to remember is you need to instantiate a tokenizer with the same model name to ensure you're using the same tokenization rules a model was pretrained with.
Sylvain Gugger's avatar
Sylvain Gugger committed
184

Steven Liu's avatar
Steven Liu committed
185
Load a tokenizer with [`AutoTokenizer`]:
Sylvain Gugger's avatar
Sylvain Gugger committed
186
187

```py
Steven Liu's avatar
Steven Liu committed
188
189
190
191
>>> from transformers import AutoTokenizer

>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
Sylvain Gugger's avatar
Sylvain Gugger committed
192
193
```

Steven Liu's avatar
Steven Liu committed
194
Pass your text to the tokenizer:
Sylvain Gugger's avatar
Sylvain Gugger committed
195
196

```py
Steven Liu's avatar
Steven Liu committed
197
198
>>> encoding = tokenizer("We are very happy to show you the 🤗 Transformers library.")
>>> print(encoding)
199
200
201
{'input_ids': [101, 11312, 10320, 12495, 19308, 10114, 11391, 10855, 10103, 100, 58263, 13299, 119, 102],
 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
Sylvain Gugger's avatar
Sylvain Gugger committed
202
203
```

Steven Liu's avatar
Steven Liu committed
204
The tokenizer returns a dictionary containing:
Steven Liu's avatar
Steven Liu committed
205

0xflotus's avatar
0xflotus committed
206
* [input_ids](./glossary#input-ids): numerical representations of your tokens.
207
* [attention_mask](./glossary#attention-mask): indicates which tokens should be attended to.
Steven Liu's avatar
Steven Liu committed
208

Steven Liu's avatar
Steven Liu committed
209
A tokenizer can also accept a list of inputs, and pad and truncate the text to return a batch with uniform length:
Sylvain Gugger's avatar
Sylvain Gugger committed
210

Sylvain Gugger's avatar
Sylvain Gugger committed
211
212
<frameworkcontent>
<pt>
213

Sylvain Gugger's avatar
Sylvain Gugger committed
214
215
216
217
218
219
```py
>>> pt_batch = tokenizer(
...     ["We are very happy to show you the 🤗 Transformers library.", "We hope you don't hate it."],
...     padding=True,
...     truncation=True,
...     max_length=512,
Sylvain Gugger's avatar
Sylvain Gugger committed
220
...     return_tensors="pt",
Sylvain Gugger's avatar
Sylvain Gugger committed
221
... )
Sylvain Gugger's avatar
Sylvain Gugger committed
222
223
224
```
</pt>
<tf>
225

Sylvain Gugger's avatar
Sylvain Gugger committed
226
```py
Sylvain Gugger's avatar
Sylvain Gugger committed
227
228
229
230
231
>>> tf_batch = tokenizer(
...     ["We are very happy to show you the 🤗 Transformers library.", "We hope you don't hate it."],
...     padding=True,
...     truncation=True,
...     max_length=512,
Sylvain Gugger's avatar
Sylvain Gugger committed
232
...     return_tensors="tf",
Sylvain Gugger's avatar
Sylvain Gugger committed
233
234
... )
```
Sylvain Gugger's avatar
Sylvain Gugger committed
235
236
</tf>
</frameworkcontent>
Sylvain Gugger's avatar
Sylvain Gugger committed
237

Steven Liu's avatar
Steven Liu committed
238
239
<Tip>

240
Check out the [preprocess](./preprocessing) tutorial for more details about tokenization, and how to use an [`AutoImageProcessor`], [`AutoFeatureExtractor`] and [`AutoProcessor`] to preprocess image, audio, and multimodal inputs.
Steven Liu's avatar
Steven Liu committed
241
242

</Tip>
Sylvain Gugger's avatar
Sylvain Gugger committed
243

Steven Liu's avatar
Steven Liu committed
244
### AutoModel
Sylvain Gugger's avatar
Sylvain Gugger committed
245

Sylvain Gugger's avatar
Sylvain Gugger committed
246
247
<frameworkcontent>
<pt>
Steven Liu's avatar
Steven Liu committed
248
🤗 Transformers provides a simple and unified way to load pretrained instances. This means you can load an [`AutoModel`] like you would load an [`AutoTokenizer`]. The only difference is selecting the correct [`AutoModel`] for the task. For text (or sequence) classification, you should load [`AutoModelForSequenceClassification`]:
Sylvain Gugger's avatar
Sylvain Gugger committed
249
250

```py
Steven Liu's avatar
Steven Liu committed
251
>>> from transformers import AutoModelForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
252

Steven Liu's avatar
Steven Liu committed
253
254
255
>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
>>> pt_model = AutoModelForSequenceClassification.from_pretrained(model_name)
```
Sylvain Gugger's avatar
Sylvain Gugger committed
256
257
258

<Tip>

Steven Liu's avatar
Steven Liu committed
259
See the [task summary](./task_summary) for tasks supported by an [`AutoModel`] class.
Sylvain Gugger's avatar
Sylvain Gugger committed
260
261
262

</Tip>

Steven Liu's avatar
Steven Liu committed
263
Now pass your preprocessed batch of inputs directly to the model. You just have to unpack the dictionary by adding `**`:
Sylvain Gugger's avatar
Sylvain Gugger committed
264
265

```py
Steven Liu's avatar
Steven Liu committed
266
>>> pt_outputs = pt_model(**pt_batch)
Sylvain Gugger's avatar
Sylvain Gugger committed
267
268
```

Steven Liu's avatar
Steven Liu committed
269
The model outputs the final activations in the `logits` attribute. Apply the softmax function to the `logits` to retrieve the probabilities:
Sylvain Gugger's avatar
Sylvain Gugger committed
270
271

```py
Steven Liu's avatar
Steven Liu committed
272
273
274
>>> from torch import nn

>>> pt_predictions = nn.functional.softmax(pt_outputs.logits, dim=-1)
Sylvain Gugger's avatar
Sylvain Gugger committed
275
>>> print(pt_predictions)
276
277
tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
        [0.2084, 0.1826, 0.1969, 0.1755, 0.2365]], grad_fn=<SoftmaxBackward0>)
Sylvain Gugger's avatar
Sylvain Gugger committed
278
279
280
```
</pt>
<tf>
Steven Liu's avatar
Steven Liu committed
281
🤗 Transformers provides a simple and unified way to load pretrained instances. This means you can load an [`TFAutoModel`] like you would load an [`AutoTokenizer`]. The only difference is selecting the correct [`TFAutoModel`] for the task. For text (or sequence) classification, you should load [`TFAutoModelForSequenceClassification`]:
Sylvain Gugger's avatar
Sylvain Gugger committed
282
283
284
285
286
287
288
289
290
291

```py
>>> from transformers import TFAutoModelForSequenceClassification

>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained(model_name)
```

<Tip>

Steven Liu's avatar
Steven Liu committed
292
See the [task summary](./task_summary) for tasks supported by an [`AutoModel`] class.
293

Sylvain Gugger's avatar
Sylvain Gugger committed
294
295
</Tip>

296
Now pass your preprocessed batch of inputs directly to the model. You can pass the tensors as-is:
Sylvain Gugger's avatar
Sylvain Gugger committed
297
298
299
300
301
302
303
304

```py
>>> tf_outputs = tf_model(tf_batch)
```

The model outputs the final activations in the `logits` attribute. Apply the softmax function to the `logits` to retrieve the probabilities:

```py
Steven Liu's avatar
Steven Liu committed
305
306
307
>>> import tensorflow as tf

>>> tf_predictions = tf.nn.softmax(tf_outputs.logits, axis=-1)
308
>>> tf_predictions  # doctest: +IGNORE_RESULT
Sylvain Gugger's avatar
Sylvain Gugger committed
309
```
Sylvain Gugger's avatar
Sylvain Gugger committed
310
311
</tf>
</frameworkcontent>
Sylvain Gugger's avatar
Sylvain Gugger committed
312

Steven Liu's avatar
Steven Liu committed
313
<Tip>
Sylvain Gugger's avatar
Sylvain Gugger committed
314

Steven Liu's avatar
Steven Liu committed
315
316
All 🤗 Transformers models (PyTorch or TensorFlow) output the tensors *before* the final activation
function (like softmax) because the final activation function is often fused with the loss. Model outputs are special dataclasses so their attributes are autocompleted in an IDE. The model outputs behave like a tuple or a dictionary (you can index with an integer, a slice or a string) in which case, attributes that are None are ignored.
Sylvain Gugger's avatar
Sylvain Gugger committed
317
318
319

</Tip>

Steven Liu's avatar
Steven Liu committed
320
321
### Save a model

Sylvain Gugger's avatar
Sylvain Gugger committed
322
323
<frameworkcontent>
<pt>
Steven Liu's avatar
Steven Liu committed
324
Once your model is fine-tuned, you can save it with its tokenizer using [`PreTrainedModel.save_pretrained`]:
Sylvain Gugger's avatar
Sylvain Gugger committed
325
326

```py
Sylvain Gugger's avatar
Sylvain Gugger committed
327
>>> pt_save_directory = "./pt_save_pretrained"
328
>>> tokenizer.save_pretrained(pt_save_directory)  # doctest: +IGNORE_RESULT
Sylvain Gugger's avatar
Sylvain Gugger committed
329
>>> pt_model.save_pretrained(pt_save_directory)
Sylvain Gugger's avatar
Sylvain Gugger committed
330
331
332
333
334
335
336
337
338
339
340
341
```

When you are ready to use the model again, reload it with [`PreTrainedModel.from_pretrained`]:

```py
>>> pt_model = AutoModelForSequenceClassification.from_pretrained("./pt_save_pretrained")
```
</pt>
<tf>
Once your model is fine-tuned, you can save it with its tokenizer using [`TFPreTrainedModel.save_pretrained`]:

```py
Sylvain Gugger's avatar
Sylvain Gugger committed
342
>>> tf_save_directory = "./tf_save_pretrained"
343
>>> tokenizer.save_pretrained(tf_save_directory)  # doctest: +IGNORE_RESULT
Sylvain Gugger's avatar
Sylvain Gugger committed
344
345
346
>>> tf_model.save_pretrained(tf_save_directory)
```

Sylvain Gugger's avatar
Sylvain Gugger committed
347
When you are ready to use the model again, reload it with [`TFPreTrainedModel.from_pretrained`]:
Sylvain Gugger's avatar
Sylvain Gugger committed
348

Steven Liu's avatar
Steven Liu committed
349
350
```py
>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained("./tf_save_pretrained")
Sylvain Gugger's avatar
Sylvain Gugger committed
351
```
Sylvain Gugger's avatar
Sylvain Gugger committed
352
353
</tf>
</frameworkcontent>
Sylvain Gugger's avatar
Sylvain Gugger committed
354

Steven Liu's avatar
Steven Liu committed
355
One particularly cool 🤗 Transformers feature is the ability to save a model and reload it as either a PyTorch or TensorFlow model. The `from_pt` or `from_tf` parameter can convert the model from one framework to the other:
Sylvain Gugger's avatar
Sylvain Gugger committed
356

Sylvain Gugger's avatar
Sylvain Gugger committed
357
358
<frameworkcontent>
<pt>
359

Sylvain Gugger's avatar
Sylvain Gugger committed
360
361
```py
>>> from transformers import AutoModel
Sylvain Gugger's avatar
Sylvain Gugger committed
362

Sylvain Gugger's avatar
Sylvain Gugger committed
363
>>> tokenizer = AutoTokenizer.from_pretrained(tf_save_directory)
Steven Liu's avatar
Steven Liu committed
364
>>> pt_model = AutoModelForSequenceClassification.from_pretrained(tf_save_directory, from_tf=True)
Sylvain Gugger's avatar
Sylvain Gugger committed
365
366
367
```
</pt>
<tf>
368

Sylvain Gugger's avatar
Sylvain Gugger committed
369
```py
370
>>> from transformers import TFAutoModel
Sylvain Gugger's avatar
Sylvain Gugger committed
371

372
>>> tokenizer = AutoTokenizer.from_pretrained(pt_save_directory)
Steven Liu's avatar
Steven Liu committed
373
>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained(pt_save_directory, from_pt=True)
374
```
Sylvain Gugger's avatar
Sylvain Gugger committed
375
376
</tf>
</frameworkcontent>
377
378
379
380
381
382
383
384
385
386

## Custom model builds

You can modify the model's configuration class to change how a model is built. The configuration specifies a model's attributes, such as the number of hidden layers or attention heads. You start from scratch when you initialize a model from a custom configuration class. The model attributes are randomly initialized, and you'll need to train the model before you can use it to get meaningful results.

Start by importing [`AutoConfig`], and then load the pretrained model you want to modify. Within [`AutoConfig.from_pretrained`], you can specify the attribute you want to change, such as the number of attention heads:

```py
>>> from transformers import AutoConfig

387
>>> my_config = AutoConfig.from_pretrained("distilbert/distilbert-base-uncased", n_heads=12)
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
```

<frameworkcontent>
<pt>
Create a model from your custom configuration with [`AutoModel.from_config`]:

```py
>>> from transformers import AutoModel

>>> my_model = AutoModel.from_config(my_config)
```
</pt>
<tf>
Create a model from your custom configuration with [`TFAutoModel.from_config`]:

```py
>>> from transformers import TFAutoModel

>>> my_model = TFAutoModel.from_config(my_config)
```
</tf>
</frameworkcontent>

Take a look at the [Create a custom architecture](./create_a_model) guide for more information about building custom configurations.

Steven Liu's avatar
Steven Liu committed
413
414
415
416
417
418
## Trainer - a PyTorch optimized training loop

All models are a standard [`torch.nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) so you can use them in any typical training loop. While you can write your own training loop, 🤗 Transformers provides a [`Trainer`] class for PyTorch, which contains the basic training loop and adds additional functionality for features like distributed training, mixed precision, and more.

Depending on your task, you'll typically pass the following parameters to [`Trainer`]:

419
1. You'll start with a [`PreTrainedModel`] or a [`torch.nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module):
Steven Liu's avatar
Steven Liu committed
420
421
422
423

   ```py
   >>> from transformers import AutoModelForSequenceClassification

424
   >>> model = AutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
Steven Liu's avatar
Steven Liu committed
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
   ```

2. [`TrainingArguments`] contains the model hyperparameters you can change like learning rate, batch size, and the number of epochs to train for. The default values are used if you don't specify any training arguments:

   ```py
   >>> from transformers import TrainingArguments

   >>> training_args = TrainingArguments(
   ...     output_dir="path/to/save/folder/",
   ...     learning_rate=2e-5,
   ...     per_device_train_batch_size=8,
   ...     per_device_eval_batch_size=8,
   ...     num_train_epochs=2,
   ... )
   ```

441
3. Load a preprocessing class like a tokenizer, image processor, feature extractor, or processor:
Steven Liu's avatar
Steven Liu committed
442
443
444
445

   ```py
   >>> from transformers import AutoTokenizer

446
   >>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
Steven Liu's avatar
Steven Liu committed
447
448
   ```

Steven Liu's avatar
Steven Liu committed
449
4. Load a dataset:
Steven Liu's avatar
Steven Liu committed
450
451

   ```py
Steven Liu's avatar
Steven Liu committed
452
453
   >>> from datasets import load_dataset

Yih-Dar's avatar
Yih-Dar committed
454
   >>> dataset = load_dataset("rotten_tomatoes")  # doctest: +IGNORE_RESULT
Steven Liu's avatar
Steven Liu committed
455
456
   ```

457
5. Create a function to tokenize the dataset:
Steven Liu's avatar
Steven Liu committed
458
459
460
461

   ```py
   >>> def tokenize_dataset(dataset):
   ...     return tokenizer(dataset["text"])
462
   ```
Steven Liu's avatar
Steven Liu committed
463

464
   Then apply it over the entire dataset with [`~datasets.Dataset.map`]:
Steven Liu's avatar
Steven Liu committed
465

466
   ```py
Steven Liu's avatar
Steven Liu committed
467
   >>> dataset = dataset.map(tokenize_dataset, batched=True)
Steven Liu's avatar
Steven Liu committed
468
469
   ```

Steven Liu's avatar
Steven Liu committed
470
6. A [`DataCollatorWithPadding`] to create a batch of examples from your dataset:
Steven Liu's avatar
Steven Liu committed
471
472

   ```py
Steven Liu's avatar
Steven Liu committed
473
   >>> from transformers import DataCollatorWithPadding
Steven Liu's avatar
Steven Liu committed
474

Steven Liu's avatar
Steven Liu committed
475
   >>> data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
Steven Liu's avatar
Steven Liu committed
476
477
478
479
480
481
482
483
484
485
486
487
488
489
   ```

Now gather all these classes in [`Trainer`]:

```py
>>> from transformers import Trainer

>>> trainer = Trainer(
...     model=model,
...     args=training_args,
...     train_dataset=dataset["train"],
...     eval_dataset=dataset["test"],
...     tokenizer=tokenizer,
...     data_collator=data_collator,
490
... )  # doctest: +SKIP
Steven Liu's avatar
Steven Liu committed
491
492
493
494
495
```

When you're ready, call [`~Trainer.train`] to start training:

```py
496
>>> trainer.train()  # doctest: +SKIP
Steven Liu's avatar
Steven Liu committed
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
```

<Tip>

For tasks - like translation or summarization - that use a sequence-to-sequence model, use the [`Seq2SeqTrainer`] and [`Seq2SeqTrainingArguments`] classes instead.

</Tip>

You can customize the training loop behavior by subclassing the methods inside [`Trainer`]. This allows you to customize features such as the loss function, optimizer, and scheduler. Take a look at the [`Trainer`] reference for which methods can be subclassed. 

The other way to customize the training loop is by using [Callbacks](./main_classes/callbacks). You can use callbacks to integrate with other libraries and inspect the training loop to report on progress or stop the training early. Callbacks do not modify anything in the training loop itself. To customize something like the loss function, you need to subclass the [`Trainer`] instead.

## Train with TensorFlow

All models are a standard [`tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model) so they can be trained in TensorFlow with the [Keras](https://keras.io/) API. 🤗 Transformers provides the [`~TFPreTrainedModel.prepare_tf_dataset`] method to easily load your dataset as a `tf.data.Dataset` so you can start training right away with Keras' [`compile`](https://keras.io/api/models/model_training_apis/#compile-method) and [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) methods.

1. You'll start with a [`TFPreTrainedModel`] or a [`tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model):

   ```py
   >>> from transformers import TFAutoModelForSequenceClassification

518
   >>> model = TFAutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
Steven Liu's avatar
Steven Liu committed
519
520
   ```

521
2. Load a preprocessing class like a tokenizer, image processor, feature extractor, or processor:
Steven Liu's avatar
Steven Liu committed
522
523
524
525

   ```py
   >>> from transformers import AutoTokenizer

526
   >>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
Steven Liu's avatar
Steven Liu committed
527
528
   ```

529
3. Create a function to tokenize the dataset:
Steven Liu's avatar
Steven Liu committed
530
531
532

   ```py
   >>> def tokenize_dataset(dataset):
533
534
   ...     return tokenizer(dataset["text"])  # doctest: +SKIP
   ```
Steven Liu's avatar
Steven Liu committed
535

536
4. Apply the tokenizer over the entire dataset with [`~datasets.Dataset.map`] and then pass the dataset and tokenizer to [`~TFPreTrainedModel.prepare_tf_dataset`]. You can also change the batch size and shuffle the dataset here if you'd like:
Steven Liu's avatar
Steven Liu committed
537

538
539
540
   ```py
   >>> dataset = dataset.map(tokenize_dataset)  # doctest: +SKIP
   >>> tf_dataset = model.prepare_tf_dataset(
Matt's avatar
Matt committed
541
   ...     dataset["train"], batch_size=16, shuffle=True, tokenizer=tokenizer
542
   ... )  # doctest: +SKIP
Steven Liu's avatar
Steven Liu committed
543
544
   ```

545
5. When you're ready, you can call `compile` and `fit` to start training. Note that Transformers models all have a default task-relevant loss function, so you don't need to specify one unless you want to:
Steven Liu's avatar
Steven Liu committed
546
547
548
549

   ```py
   >>> from tensorflow.keras.optimizers import Adam

550
   >>> model.compile(optimizer='adam')  # No loss argument!
Matt's avatar
Matt committed
551
   >>> model.fit(tf_dataset)  # doctest: +SKIP
Steven Liu's avatar
Steven Liu committed
552
553
   ```

554
555
## What's next?

0xflotus's avatar
0xflotus committed
556
Now that you've completed the 🤗 Transformers quick tour, check out our guides and learn how to do more specific things like writing a custom model, fine-tuning a model for a task, and how to train a model with a script. If you're interested in learning more about 🤗 Transformers core concepts, grab a cup of coffee and take a look at our Conceptual Guides!