quicktour.md 24.4 KB
Newer Older
Steven Liu's avatar
Steven Liu committed
1
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Sylvain Gugger's avatar
Sylvain Gugger committed
2
3
4
5
6
7
8
9
10

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
11
12
13
14

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

Sylvain Gugger's avatar
Sylvain Gugger committed
15
16
17
18
-->

# Quick tour

19
20
[[open-in-colab]]

Steven Liu's avatar
Steven Liu committed
21
Get up and running with 🤗 Transformers! Whether you're a developer or an everyday user, this quick tour will help you get started and show you how to use the [`pipeline`] for inference, load a pretrained model and preprocessor with an [AutoClass](./model_doc/auto), and quickly train a model with PyTorch or TensorFlow. If you're a beginner, we recommend checking out our tutorials or [course](https://huggingface.co/course/chapter1/1) next for more in-depth explanations of the concepts introduced here.
Sylvain Gugger's avatar
Sylvain Gugger committed
22

Steven Liu's avatar
Steven Liu committed
23
Before you begin, make sure you have all the necessary libraries installed:
Steven Liu's avatar
Steven Liu committed
24

Steven Liu's avatar
Steven Liu committed
25
26
27
```bash
!pip install transformers datasets
```
Sylvain Gugger's avatar
Sylvain Gugger committed
28

Steven Liu's avatar
Steven Liu committed
29
You'll also need to install your preferred machine learning framework:
Sylvain Gugger's avatar
Sylvain Gugger committed
30

Sylvain Gugger's avatar
Sylvain Gugger committed
31
32
<frameworkcontent>
<pt>
Sylvain Gugger's avatar
Sylvain Gugger committed
33
34
```bash
pip install torch
Sylvain Gugger's avatar
Sylvain Gugger committed
35
36
37
38
```
</pt>
<tf>
```bash
Sylvain Gugger's avatar
Sylvain Gugger committed
39
40
pip install tensorflow
```
Sylvain Gugger's avatar
Sylvain Gugger committed
41
42
</tf>
</frameworkcontent>
Sylvain Gugger's avatar
Sylvain Gugger committed
43

Steven Liu's avatar
Steven Liu committed
44
45
46
47
## Pipeline

<Youtube id="tiZFewofSLM"/>

48
49
50
51
52
53
54
The [`pipeline`] is the easiest and fastest way to use a pretrained model for inference. You can use the [`pipeline`] out-of-the-box for many tasks across different modalities, some of which are shown in the table below:

<Tip>

For a complete list of available tasks, check out the [pipeline API reference](./main_classes/pipelines).

</Tip>
Steven Liu's avatar
Steven Liu committed
55
56
57

| **Task**                     | **Description**                                                                                              | **Modality**    | **Pipeline identifier**                       |
|------------------------------|--------------------------------------------------------------------------------------------------------------|-----------------|-----------------------------------------------|
58
59
60
61
62
63
64
65
66
67
68
69
70
| Text classification          | assign a label to a given sequence of text                                                                   | NLP             | pipeline(task=“sentiment-analysis”)           |
| Text generation              | generate text given a prompt                                                                                 | NLP             | pipeline(task=“text-generation”)              |
| Summarization                | generate a summary of a sequence of text or document                                                         | NLP             | pipeline(task=“summarization”)                |
| Image classification         | assign a label to an image                                                                                   | Computer vision | pipeline(task=“image-classification”)         |
| Image segmentation           | assign a label to each individual pixel of an image (supports semantic, panoptic, and instance segmentation) | Computer vision | pipeline(task=“image-segmentation”)           |
| Object detection             | predict the bounding boxes and classes of objects in an image                                                | Computer vision | pipeline(task=“object-detection”)             |
| Audio classification         | assign a label to some audio data                                                                            | Audio           | pipeline(task=“audio-classification”)         |
| Automatic speech recognition | transcribe speech into text                                                                                  | Audio           | pipeline(task=“automatic-speech-recognition”) |
| Visual question answering    | answer a question about the image, given an image and a question                                             | Multimodal      | pipeline(task=“vqa”)                          |
| Document question answering  | answer a question about a document, given an image and a question                                            | Multimodal      | pipeline(task="document-question-answering")  |
| Image captioning             | generate a caption for a given image                                                                         | Multimodal      | pipeline(task="image-to-text")                |

Start by creating an instance of [`pipeline`] and specifying a task you want to use it for. In this guide, you'll use the [`pipeline`] for sentiment analysis as an example:
Steven Liu's avatar
Steven Liu committed
71

Sylvain Gugger's avatar
Sylvain Gugger committed
72
73
```py
>>> from transformers import pipeline
Sylvain Gugger's avatar
Sylvain Gugger committed
74
75

>>> classifier = pipeline("sentiment-analysis")
Sylvain Gugger's avatar
Sylvain Gugger committed
76
77
```

Steven Liu's avatar
Steven Liu committed
78
The [`pipeline`] downloads and caches a default [pretrained model](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english) and tokenizer for sentiment analysis. Now you can use the `classifier` on your target text:
Sylvain Gugger's avatar
Sylvain Gugger committed
79
80

```py
Sylvain Gugger's avatar
Sylvain Gugger committed
81
>>> classifier("We are very happy to show you the 🤗 Transformers library.")
82
[{'label': 'POSITIVE', 'score': 0.9998}]
Sylvain Gugger's avatar
Sylvain Gugger committed
83
84
```

Steven Liu's avatar
Steven Liu committed
85
If you have more than one input, pass your inputs as a list to the [`pipeline`] to return a list of dictionaries:
Sylvain Gugger's avatar
Sylvain Gugger committed
86
87

```py
Sylvain Gugger's avatar
Sylvain Gugger committed
88
>>> results = classifier(["We are very happy to show you the 🤗 Transformers library.", "We hope you don't hate it."])
Sylvain Gugger's avatar
Sylvain Gugger committed
89
90
91
92
93
94
>>> for result in results:
...     print(f"label: {result['label']}, with score: {round(result['score'], 4)}")
label: POSITIVE, with score: 0.9998
label: NEGATIVE, with score: 0.5309
```

Steven Liu's avatar
Steven Liu committed
95
The [`pipeline`] can also iterate over an entire dataset for any task you like. For this example, let's choose automatic speech recognition as our task:
Sylvain Gugger's avatar
Sylvain Gugger committed
96
97

```py
98
>>> import torch
Steven Liu's avatar
Steven Liu committed
99
>>> from transformers import pipeline
Sylvain Gugger's avatar
Sylvain Gugger committed
100

101
>>> speech_recognizer = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h")
Steven Liu's avatar
Steven Liu committed
102
```
Sylvain Gugger's avatar
Sylvain Gugger committed
103

Steven Liu's avatar
Steven Liu committed
104
Load an audio dataset (see the 🤗 Datasets [Quick Start](https://huggingface.co/docs/datasets/quickstart#audio) for more details) you'd like to iterate over. For example, load the [MInDS-14](https://huggingface.co/datasets/PolyAI/minds14) dataset:
Sylvain Gugger's avatar
Sylvain Gugger committed
105
106

```py
107
>>> from datasets import load_dataset, Audio
Steven Liu's avatar
Steven Liu committed
108

109
>>> dataset = load_dataset("PolyAI/minds14", name="en-US", split="train")  # doctest: +IGNORE_RESULT
Sylvain Gugger's avatar
Sylvain Gugger committed
110
111
```

Steven Liu's avatar
Steven Liu committed
112
113
You need to make sure the sampling rate of the dataset matches the sampling 
rate [`facebook/wav2vec2-base-960h`](https://huggingface.co/facebook/wav2vec2-base-960h) was trained on:
Sylvain Gugger's avatar
Sylvain Gugger committed
114
115

```py
116
117
118
>>> dataset = dataset.cast_column("audio", Audio(sampling_rate=speech_recognizer.feature_extractor.sampling_rate))
```

Steven Liu's avatar
Steven Liu committed
119
120
The audio files are automatically loaded and resampled when calling the `"audio"` column.
Extract the raw waveform arrays from the first 4 samples and pass it as a list to the pipeline:
121
122

```py
123
124
>>> result = speech_recognizer(dataset[:4]["audio"])
>>> print([d["text"] for d in result])
125
['I WOULD LIKE TO SET UP A JOINT ACCOUNT WITH MY PARTNER HOW DO I PROCEED WITH DOING THAT', "FONDERING HOW I'D SET UP A JOIN TO HELL T WITH MY WIFE AND WHERE THE AP MIGHT BE", "I I'D LIKE TOY SET UP A JOINT ACCOUNT WITH MY PARTNER I'M NOT SEEING THE OPTION TO DO IT ON THE APSO I CALLED IN TO GET SOME HELP CAN I JUST DO IT OVER THE PHONE WITH YOU AND GIVE YOU THE INFORMATION OR SHOULD I DO IT IN THE AP AN I'M MISSING SOMETHING UQUETTE HAD PREFERRED TO JUST DO IT OVER THE PHONE OF POSSIBLE THINGS", 'HOW DO I FURN A JOINA COUT']
Steven Liu's avatar
Steven Liu committed
126
```
Sylvain Gugger's avatar
Sylvain Gugger committed
127

Steven Liu's avatar
Steven Liu committed
128
For larger datasets where the inputs are big (like in speech or vision), you'll want to pass a generator instead of a list to load all the inputs in memory. Take a look at the [pipeline API reference](./main_classes/pipelines) for more information.
129

Steven Liu's avatar
Steven Liu committed
130
### Use another model and tokenizer in the pipeline
Sylvain Gugger's avatar
Sylvain Gugger committed
131

Steven Liu's avatar
Steven Liu committed
132
The [`pipeline`] can accommodate any model from the [Hub](https://huggingface.co/models), making it easy to adapt the [`pipeline`] for other use-cases. For example, if you'd like a model capable of handling French text, use the tags on the Hub to filter for an appropriate model. The top filtered result returns a multilingual [BERT model](https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment) finetuned for sentiment analysis you can use for French text:
Sylvain Gugger's avatar
Sylvain Gugger committed
133

Steven Liu's avatar
Steven Liu committed
134
135
136
```py
>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
```
Sylvain Gugger's avatar
Sylvain Gugger committed
137

Sylvain Gugger's avatar
Sylvain Gugger committed
138
139
<frameworkcontent>
<pt>
Steven Liu's avatar
Steven Liu committed
140
Use [`AutoModelForSequenceClassification`] and [`AutoTokenizer`] to load the pretrained model and it's associated tokenizer (more on an `AutoClass` in the next section):
Sylvain Gugger's avatar
Sylvain Gugger committed
141
142
143

```py
>>> from transformers import AutoTokenizer, AutoModelForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
144

Steven Liu's avatar
Steven Liu committed
145
>>> model = AutoModelForSequenceClassification.from_pretrained(model_name)
Sylvain Gugger's avatar
Sylvain Gugger committed
146
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
Sylvain Gugger's avatar
Sylvain Gugger committed
147
148
149
```
</pt>
<tf>
Steven Liu's avatar
Steven Liu committed
150
Use [`TFAutoModelForSequenceClassification`] and [`AutoTokenizer`] to load the pretrained model and it's associated tokenizer (more on an `TFAutoClass` in the next section):
Sylvain Gugger's avatar
Sylvain Gugger committed
151
152

```py
Sylvain Gugger's avatar
Sylvain Gugger committed
153
>>> from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
154

Steven Liu's avatar
Steven Liu committed
155
>>> model = TFAutoModelForSequenceClassification.from_pretrained(model_name)
Sylvain Gugger's avatar
Sylvain Gugger committed
156
157
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
```
Sylvain Gugger's avatar
Sylvain Gugger committed
158
159
</tf>
</frameworkcontent>
Sylvain Gugger's avatar
Sylvain Gugger committed
160

Steven Liu's avatar
Steven Liu committed
161
Specify the model and tokenizer in the [`pipeline`], and now you can apply the `classifier` on French text:
Steven Liu's avatar
Steven Liu committed
162
163
164
165

```py
>>> classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
>>> classifier("Nous sommes très heureux de vous présenter la bibliothèque 🤗 Transformers.")
166
[{'label': '5 stars', 'score': 0.7273}]
Steven Liu's avatar
Steven Liu committed
167
168
```

Steven Liu's avatar
Steven Liu committed
169
If you can't find a model for your use-case, you'll need to finetune a pretrained model on your data. Take a look at our [finetuning tutorial](./training) to learn how. Finally, after you've finetuned your pretrained model, please consider [sharing](./model_sharing) the model with the community on the Hub to democratize machine learning for everyone! 🤗
Steven Liu's avatar
Steven Liu committed
170
171
172
173
174

## AutoClass

<Youtube id="AhChOFRegn4"/>

Nathan Barry's avatar
Nathan Barry committed
175
Under the hood, the [`AutoModelForSequenceClassification`] and [`AutoTokenizer`] classes work together to power the [`pipeline`] you used above. An [AutoClass](./model_doc/auto) is a shortcut that automatically retrieves the architecture of a pretrained model from its name or path. You only need to select the appropriate `AutoClass` for your task and it's associated preprocessing class. 
Steven Liu's avatar
Steven Liu committed
176

Steven Liu's avatar
Steven Liu committed
177
Let's return to the example from the previous section and see how you can use the `AutoClass` to replicate the results of the [`pipeline`].
Sylvain Gugger's avatar
Sylvain Gugger committed
178

Steven Liu's avatar
Steven Liu committed
179
### AutoTokenizer
Sylvain Gugger's avatar
Sylvain Gugger committed
180

Steven Liu's avatar
Steven Liu committed
181
A tokenizer is responsible for preprocessing text into an array of numbers as inputs to a model. There are multiple rules that govern the tokenization process, including how to split a word and at what level words should be split (learn more about tokenization in the [tokenizer summary](./tokenizer_summary)). The most important thing to remember is you need to instantiate a tokenizer with the same model name to ensure you're using the same tokenization rules a model was pretrained with.
Sylvain Gugger's avatar
Sylvain Gugger committed
182

Steven Liu's avatar
Steven Liu committed
183
Load a tokenizer with [`AutoTokenizer`]:
Sylvain Gugger's avatar
Sylvain Gugger committed
184
185

```py
Steven Liu's avatar
Steven Liu committed
186
187
188
189
>>> from transformers import AutoTokenizer

>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
Sylvain Gugger's avatar
Sylvain Gugger committed
190
191
```

Steven Liu's avatar
Steven Liu committed
192
Pass your text to the tokenizer:
Sylvain Gugger's avatar
Sylvain Gugger committed
193
194

```py
Steven Liu's avatar
Steven Liu committed
195
196
>>> encoding = tokenizer("We are very happy to show you the 🤗 Transformers library.")
>>> print(encoding)
197
198
199
{'input_ids': [101, 11312, 10320, 12495, 19308, 10114, 11391, 10855, 10103, 100, 58263, 13299, 119, 102],
 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
Sylvain Gugger's avatar
Sylvain Gugger committed
200
201
```

Steven Liu's avatar
Steven Liu committed
202
The tokenizer returns a dictionary containing:
Steven Liu's avatar
Steven Liu committed
203

0xflotus's avatar
0xflotus committed
204
205
* [input_ids](./glossary#input-ids): numerical representations of your tokens.
* [attention_mask](.glossary#attention-mask): indicates which tokens should be attended to.
Steven Liu's avatar
Steven Liu committed
206

Steven Liu's avatar
Steven Liu committed
207
A tokenizer can also accept a list of inputs, and pad and truncate the text to return a batch with uniform length:
Sylvain Gugger's avatar
Sylvain Gugger committed
208

Sylvain Gugger's avatar
Sylvain Gugger committed
209
210
<frameworkcontent>
<pt>
Sylvain Gugger's avatar
Sylvain Gugger committed
211
212
213
214
215
216
```py
>>> pt_batch = tokenizer(
...     ["We are very happy to show you the 🤗 Transformers library.", "We hope you don't hate it."],
...     padding=True,
...     truncation=True,
...     max_length=512,
Sylvain Gugger's avatar
Sylvain Gugger committed
217
...     return_tensors="pt",
Sylvain Gugger's avatar
Sylvain Gugger committed
218
... )
Sylvain Gugger's avatar
Sylvain Gugger committed
219
220
221
222
```
</pt>
<tf>
```py
Sylvain Gugger's avatar
Sylvain Gugger committed
223
224
225
226
227
>>> tf_batch = tokenizer(
...     ["We are very happy to show you the 🤗 Transformers library.", "We hope you don't hate it."],
...     padding=True,
...     truncation=True,
...     max_length=512,
Sylvain Gugger's avatar
Sylvain Gugger committed
228
...     return_tensors="tf",
Sylvain Gugger's avatar
Sylvain Gugger committed
229
230
... )
```
Sylvain Gugger's avatar
Sylvain Gugger committed
231
232
</tf>
</frameworkcontent>
Sylvain Gugger's avatar
Sylvain Gugger committed
233

Steven Liu's avatar
Steven Liu committed
234
235
<Tip>

236
Check out the [preprocess](./preprocessing) tutorial for more details about tokenization, and how to use an [`AutoImageProcessor`], [`AutoFeatureExtractor`] and [`AutoProcessor`] to preprocess image, audio, and multimodal inputs.
Steven Liu's avatar
Steven Liu committed
237
238

</Tip>
Sylvain Gugger's avatar
Sylvain Gugger committed
239

Steven Liu's avatar
Steven Liu committed
240
### AutoModel
Sylvain Gugger's avatar
Sylvain Gugger committed
241

Sylvain Gugger's avatar
Sylvain Gugger committed
242
243
<frameworkcontent>
<pt>
Steven Liu's avatar
Steven Liu committed
244
🤗 Transformers provides a simple and unified way to load pretrained instances. This means you can load an [`AutoModel`] like you would load an [`AutoTokenizer`]. The only difference is selecting the correct [`AutoModel`] for the task. For text (or sequence) classification, you should load [`AutoModelForSequenceClassification`]:
Sylvain Gugger's avatar
Sylvain Gugger committed
245
246

```py
Steven Liu's avatar
Steven Liu committed
247
>>> from transformers import AutoModelForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
248

Steven Liu's avatar
Steven Liu committed
249
250
251
>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
>>> pt_model = AutoModelForSequenceClassification.from_pretrained(model_name)
```
Sylvain Gugger's avatar
Sylvain Gugger committed
252
253
254

<Tip>

Steven Liu's avatar
Steven Liu committed
255
See the [task summary](./task_summary) for tasks supported by an [`AutoModel`] class.
Sylvain Gugger's avatar
Sylvain Gugger committed
256
257
258

</Tip>

Steven Liu's avatar
Steven Liu committed
259
Now pass your preprocessed batch of inputs directly to the model. You just have to unpack the dictionary by adding `**`:
Sylvain Gugger's avatar
Sylvain Gugger committed
260
261

```py
Steven Liu's avatar
Steven Liu committed
262
>>> pt_outputs = pt_model(**pt_batch)
Sylvain Gugger's avatar
Sylvain Gugger committed
263
264
```

Steven Liu's avatar
Steven Liu committed
265
The model outputs the final activations in the `logits` attribute. Apply the softmax function to the `logits` to retrieve the probabilities:
Sylvain Gugger's avatar
Sylvain Gugger committed
266
267

```py
Steven Liu's avatar
Steven Liu committed
268
269
270
>>> from torch import nn

>>> pt_predictions = nn.functional.softmax(pt_outputs.logits, dim=-1)
Sylvain Gugger's avatar
Sylvain Gugger committed
271
>>> print(pt_predictions)
272
273
tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
        [0.2084, 0.1826, 0.1969, 0.1755, 0.2365]], grad_fn=<SoftmaxBackward0>)
Sylvain Gugger's avatar
Sylvain Gugger committed
274
275
276
```
</pt>
<tf>
Steven Liu's avatar
Steven Liu committed
277
🤗 Transformers provides a simple and unified way to load pretrained instances. This means you can load an [`TFAutoModel`] like you would load an [`AutoTokenizer`]. The only difference is selecting the correct [`TFAutoModel`] for the task. For text (or sequence) classification, you should load [`TFAutoModelForSequenceClassification`]:
Sylvain Gugger's avatar
Sylvain Gugger committed
278
279
280
281
282
283
284
285
286
287

```py
>>> from transformers import TFAutoModelForSequenceClassification

>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained(model_name)
```

<Tip>

Steven Liu's avatar
Steven Liu committed
288
See the [task summary](./task_summary) for tasks supported by an [`AutoModel`] class.
289

Sylvain Gugger's avatar
Sylvain Gugger committed
290
291
</Tip>

Steven Liu's avatar
Steven Liu committed
292
Now pass your preprocessed batch of inputs directly to the model by passing the dictionary keys directly to the tensors:
Sylvain Gugger's avatar
Sylvain Gugger committed
293
294
295
296
297
298
299
300

```py
>>> tf_outputs = tf_model(tf_batch)
```

The model outputs the final activations in the `logits` attribute. Apply the softmax function to the `logits` to retrieve the probabilities:

```py
Steven Liu's avatar
Steven Liu committed
301
302
303
>>> import tensorflow as tf

>>> tf_predictions = tf.nn.softmax(tf_outputs.logits, axis=-1)
304
>>> tf_predictions  # doctest: +IGNORE_RESULT
Sylvain Gugger's avatar
Sylvain Gugger committed
305
```
Sylvain Gugger's avatar
Sylvain Gugger committed
306
307
</tf>
</frameworkcontent>
Sylvain Gugger's avatar
Sylvain Gugger committed
308

Steven Liu's avatar
Steven Liu committed
309
<Tip>
Sylvain Gugger's avatar
Sylvain Gugger committed
310

Steven Liu's avatar
Steven Liu committed
311
312
All 🤗 Transformers models (PyTorch or TensorFlow) output the tensors *before* the final activation
function (like softmax) because the final activation function is often fused with the loss. Model outputs are special dataclasses so their attributes are autocompleted in an IDE. The model outputs behave like a tuple or a dictionary (you can index with an integer, a slice or a string) in which case, attributes that are None are ignored.
Sylvain Gugger's avatar
Sylvain Gugger committed
313
314
315

</Tip>

Steven Liu's avatar
Steven Liu committed
316
317
### Save a model

Sylvain Gugger's avatar
Sylvain Gugger committed
318
319
<frameworkcontent>
<pt>
Steven Liu's avatar
Steven Liu committed
320
Once your model is fine-tuned, you can save it with its tokenizer using [`PreTrainedModel.save_pretrained`]:
Sylvain Gugger's avatar
Sylvain Gugger committed
321
322

```py
Sylvain Gugger's avatar
Sylvain Gugger committed
323
>>> pt_save_directory = "./pt_save_pretrained"
324
>>> tokenizer.save_pretrained(pt_save_directory)  # doctest: +IGNORE_RESULT
Sylvain Gugger's avatar
Sylvain Gugger committed
325
>>> pt_model.save_pretrained(pt_save_directory)
Sylvain Gugger's avatar
Sylvain Gugger committed
326
327
328
329
330
331
332
333
334
335
336
337
```

When you are ready to use the model again, reload it with [`PreTrainedModel.from_pretrained`]:

```py
>>> pt_model = AutoModelForSequenceClassification.from_pretrained("./pt_save_pretrained")
```
</pt>
<tf>
Once your model is fine-tuned, you can save it with its tokenizer using [`TFPreTrainedModel.save_pretrained`]:

```py
Sylvain Gugger's avatar
Sylvain Gugger committed
338
>>> tf_save_directory = "./tf_save_pretrained"
339
>>> tokenizer.save_pretrained(tf_save_directory)  # doctest: +IGNORE_RESULT
Sylvain Gugger's avatar
Sylvain Gugger committed
340
341
342
>>> tf_model.save_pretrained(tf_save_directory)
```

Sylvain Gugger's avatar
Sylvain Gugger committed
343
When you are ready to use the model again, reload it with [`TFPreTrainedModel.from_pretrained`]:
Sylvain Gugger's avatar
Sylvain Gugger committed
344

Steven Liu's avatar
Steven Liu committed
345
346
```py
>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained("./tf_save_pretrained")
Sylvain Gugger's avatar
Sylvain Gugger committed
347
```
Sylvain Gugger's avatar
Sylvain Gugger committed
348
349
</tf>
</frameworkcontent>
Sylvain Gugger's avatar
Sylvain Gugger committed
350

Steven Liu's avatar
Steven Liu committed
351
One particularly cool 🤗 Transformers feature is the ability to save a model and reload it as either a PyTorch or TensorFlow model. The `from_pt` or `from_tf` parameter can convert the model from one framework to the other:
Sylvain Gugger's avatar
Sylvain Gugger committed
352

Sylvain Gugger's avatar
Sylvain Gugger committed
353
354
<frameworkcontent>
<pt>
Sylvain Gugger's avatar
Sylvain Gugger committed
355
356
```py
>>> from transformers import AutoModel
Sylvain Gugger's avatar
Sylvain Gugger committed
357

Sylvain Gugger's avatar
Sylvain Gugger committed
358
>>> tokenizer = AutoTokenizer.from_pretrained(tf_save_directory)
Steven Liu's avatar
Steven Liu committed
359
>>> pt_model = AutoModelForSequenceClassification.from_pretrained(tf_save_directory, from_tf=True)
Sylvain Gugger's avatar
Sylvain Gugger committed
360
361
362
363
```
</pt>
<tf>
```py
364
>>> from transformers import TFAutoModel
Sylvain Gugger's avatar
Sylvain Gugger committed
365

366
>>> tokenizer = AutoTokenizer.from_pretrained(pt_save_directory)
Steven Liu's avatar
Steven Liu committed
367
>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained(pt_save_directory, from_pt=True)
368
```
Sylvain Gugger's avatar
Sylvain Gugger committed
369
370
</tf>
</frameworkcontent>
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406

## Custom model builds

You can modify the model's configuration class to change how a model is built. The configuration specifies a model's attributes, such as the number of hidden layers or attention heads. You start from scratch when you initialize a model from a custom configuration class. The model attributes are randomly initialized, and you'll need to train the model before you can use it to get meaningful results.

Start by importing [`AutoConfig`], and then load the pretrained model you want to modify. Within [`AutoConfig.from_pretrained`], you can specify the attribute you want to change, such as the number of attention heads:

```py
>>> from transformers import AutoConfig

>>> my_config = AutoConfig.from_pretrained("distilbert-base-uncased", n_heads=12)
```

<frameworkcontent>
<pt>
Create a model from your custom configuration with [`AutoModel.from_config`]:

```py
>>> from transformers import AutoModel

>>> my_model = AutoModel.from_config(my_config)
```
</pt>
<tf>
Create a model from your custom configuration with [`TFAutoModel.from_config`]:

```py
>>> from transformers import TFAutoModel

>>> my_model = TFAutoModel.from_config(my_config)
```
</tf>
</frameworkcontent>

Take a look at the [Create a custom architecture](./create_a_model) guide for more information about building custom configurations.

Steven Liu's avatar
Steven Liu committed
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
## Trainer - a PyTorch optimized training loop

All models are a standard [`torch.nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) so you can use them in any typical training loop. While you can write your own training loop, 🤗 Transformers provides a [`Trainer`] class for PyTorch, which contains the basic training loop and adds additional functionality for features like distributed training, mixed precision, and more.

Depending on your task, you'll typically pass the following parameters to [`Trainer`]:

1. A [`PreTrainedModel`] or a [`torch.nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module):

   ```py
   >>> from transformers import AutoModelForSequenceClassification

   >>> model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
   ```

2. [`TrainingArguments`] contains the model hyperparameters you can change like learning rate, batch size, and the number of epochs to train for. The default values are used if you don't specify any training arguments:

   ```py
   >>> from transformers import TrainingArguments

   >>> training_args = TrainingArguments(
   ...     output_dir="path/to/save/folder/",
   ...     learning_rate=2e-5,
   ...     per_device_train_batch_size=8,
   ...     per_device_eval_batch_size=8,
   ...     num_train_epochs=2,
   ... )
   ```

435
3. A preprocessing class like a tokenizer, image processor, feature extractor, or processor:
Steven Liu's avatar
Steven Liu committed
436
437
438
439
440
441
442

   ```py
   >>> from transformers import AutoTokenizer

   >>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
   ```

Steven Liu's avatar
Steven Liu committed
443
4. Load a dataset:
Steven Liu's avatar
Steven Liu committed
444
445

   ```py
Steven Liu's avatar
Steven Liu committed
446
447
   >>> from datasets import load_dataset

Yih-Dar's avatar
Yih-Dar committed
448
   >>> dataset = load_dataset("rotten_tomatoes")  # doctest: +IGNORE_RESULT
Steven Liu's avatar
Steven Liu committed
449
450
   ```

451
5. Create a function to tokenize the dataset:
Steven Liu's avatar
Steven Liu committed
452
453
454
455

   ```py
   >>> def tokenize_dataset(dataset):
   ...     return tokenizer(dataset["text"])
456
   ```
Steven Liu's avatar
Steven Liu committed
457

458
   Then apply it over the entire dataset with [`~datasets.Dataset.map`]:
Steven Liu's avatar
Steven Liu committed
459

460
   ```py
Steven Liu's avatar
Steven Liu committed
461
   >>> dataset = dataset.map(tokenize_dataset, batched=True)
Steven Liu's avatar
Steven Liu committed
462
463
   ```

Steven Liu's avatar
Steven Liu committed
464
6. A [`DataCollatorWithPadding`] to create a batch of examples from your dataset:
Steven Liu's avatar
Steven Liu committed
465
466

   ```py
Steven Liu's avatar
Steven Liu committed
467
   >>> from transformers import DataCollatorWithPadding
Steven Liu's avatar
Steven Liu committed
468

Steven Liu's avatar
Steven Liu committed
469
   >>> data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
Steven Liu's avatar
Steven Liu committed
470
471
472
473
474
475
476
477
478
479
480
481
482
483
   ```

Now gather all these classes in [`Trainer`]:

```py
>>> from transformers import Trainer

>>> trainer = Trainer(
...     model=model,
...     args=training_args,
...     train_dataset=dataset["train"],
...     eval_dataset=dataset["test"],
...     tokenizer=tokenizer,
...     data_collator=data_collator,
484
... )  # doctest: +SKIP
Steven Liu's avatar
Steven Liu committed
485
486
487
488
489
```

When you're ready, call [`~Trainer.train`] to start training:

```py
490
>>> trainer.train()  # doctest: +SKIP
Steven Liu's avatar
Steven Liu committed
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
```

<Tip>

For tasks - like translation or summarization - that use a sequence-to-sequence model, use the [`Seq2SeqTrainer`] and [`Seq2SeqTrainingArguments`] classes instead.

</Tip>

You can customize the training loop behavior by subclassing the methods inside [`Trainer`]. This allows you to customize features such as the loss function, optimizer, and scheduler. Take a look at the [`Trainer`] reference for which methods can be subclassed. 

The other way to customize the training loop is by using [Callbacks](./main_classes/callbacks). You can use callbacks to integrate with other libraries and inspect the training loop to report on progress or stop the training early. Callbacks do not modify anything in the training loop itself. To customize something like the loss function, you need to subclass the [`Trainer`] instead.

## Train with TensorFlow

All models are a standard [`tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model) so they can be trained in TensorFlow with the [Keras](https://keras.io/) API. 🤗 Transformers provides the [`~TFPreTrainedModel.prepare_tf_dataset`] method to easily load your dataset as a `tf.data.Dataset` so you can start training right away with Keras' [`compile`](https://keras.io/api/models/model_training_apis/#compile-method) and [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) methods.

1. You'll start with a [`TFPreTrainedModel`] or a [`tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model):

   ```py
   >>> from transformers import TFAutoModelForSequenceClassification

   >>> model = TFAutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
   ```

515
2. A preprocessing class like a tokenizer, image processor, feature extractor, or processor:
Steven Liu's avatar
Steven Liu committed
516
517
518
519
520
521
522

   ```py
   >>> from transformers import AutoTokenizer

   >>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
   ```

523
3. Create a function to tokenize the dataset:
Steven Liu's avatar
Steven Liu committed
524
525
526

   ```py
   >>> def tokenize_dataset(dataset):
527
528
   ...     return tokenizer(dataset["text"])  # doctest: +SKIP
   ```
Steven Liu's avatar
Steven Liu committed
529

530
4. Apply the tokenizer over the entire dataset with [`~datasets.Dataset.map`] and then pass the dataset and tokenizer to [`~TFPreTrainedModel.prepare_tf_dataset`]. You can also change the batch size and shuffle the dataset here if you'd like:
Steven Liu's avatar
Steven Liu committed
531

532
533
534
   ```py
   >>> dataset = dataset.map(tokenize_dataset)  # doctest: +SKIP
   >>> tf_dataset = model.prepare_tf_dataset(
Matt's avatar
Matt committed
535
   ...     dataset["train"], batch_size=16, shuffle=True, tokenizer=tokenizer
536
   ... )  # doctest: +SKIP
Steven Liu's avatar
Steven Liu committed
537
538
   ```

539
5. When you're ready, you can call `compile` and `fit` to start training. Note that Transformers models all have a default task-relevant loss function, so you don't need to specify one unless you want to:
Steven Liu's avatar
Steven Liu committed
540
541
542
543

   ```py
   >>> from tensorflow.keras.optimizers import Adam

544
   >>> model.compile(optimizer=Adam(3e-5))  # No loss argument!
Matt's avatar
Matt committed
545
   >>> model.fit(tf_dataset)  # doctest: +SKIP
Steven Liu's avatar
Steven Liu committed
546
547
   ```

548
549
## What's next?

0xflotus's avatar
0xflotus committed
550
Now that you've completed the 🤗 Transformers quick tour, check out our guides and learn how to do more specific things like writing a custom model, fine-tuning a model for a task, and how to train a model with a script. If you're interested in learning more about 🤗 Transformers core concepts, grab a cup of coffee and take a look at our Conceptual Guides!