quicktour.mdx 24.1 KB
Newer Older
Steven Liu's avatar
Steven Liu committed
1
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Sylvain Gugger's avatar
Sylvain Gugger committed
2
3
4
5
6
7
8
9
10
11
12
13
14

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# Quick tour

15
16
[[open-in-colab]]

Steven Liu's avatar
Steven Liu committed
17
Get up and running with 🤗 Transformers! Whether you're a developer or an everyday user, this quick tour will help you get started and show you how to use the [`pipeline`] for inference, load a pretrained model and preprocessor with an [AutoClass](./model_doc/auto), and quickly train a model with PyTorch or TensorFlow. If you're a beginner, we recommend checking out our tutorials or [course](https://huggingface.co/course/chapter1/1) next for more in-depth explanations of the concepts introduced here.
Sylvain Gugger's avatar
Sylvain Gugger committed
18

Steven Liu's avatar
Steven Liu committed
19
Before you begin, make sure you have all the necessary libraries installed:
Steven Liu's avatar
Steven Liu committed
20

Steven Liu's avatar
Steven Liu committed
21
22
23
```bash
!pip install transformers datasets
```
Sylvain Gugger's avatar
Sylvain Gugger committed
24

Steven Liu's avatar
Steven Liu committed
25
You'll also need to install your preferred machine learning framework:
Sylvain Gugger's avatar
Sylvain Gugger committed
26

Sylvain Gugger's avatar
Sylvain Gugger committed
27
28
<frameworkcontent>
<pt>
Sylvain Gugger's avatar
Sylvain Gugger committed
29
30
```bash
pip install torch
Sylvain Gugger's avatar
Sylvain Gugger committed
31
32
33
34
```
</pt>
<tf>
```bash
Sylvain Gugger's avatar
Sylvain Gugger committed
35
36
pip install tensorflow
```
Sylvain Gugger's avatar
Sylvain Gugger committed
37
38
</tf>
</frameworkcontent>
Sylvain Gugger's avatar
Sylvain Gugger committed
39

Steven Liu's avatar
Steven Liu committed
40
41
42
43
## Pipeline

<Youtube id="tiZFewofSLM"/>

44
45
46
47
48
49
50
The [`pipeline`] is the easiest and fastest way to use a pretrained model for inference. You can use the [`pipeline`] out-of-the-box for many tasks across different modalities, some of which are shown in the table below:

<Tip>

For a complete list of available tasks, check out the [pipeline API reference](./main_classes/pipelines).

</Tip>
Steven Liu's avatar
Steven Liu committed
51
52
53

| **Task**                     | **Description**                                                                                              | **Modality**    | **Pipeline identifier**                       |
|------------------------------|--------------------------------------------------------------------------------------------------------------|-----------------|-----------------------------------------------|
54
55
56
57
58
59
60
61
62
63
64
65
66
| Text classification          | assign a label to a given sequence of text                                                                   | NLP             | pipeline(task=“sentiment-analysis”)           |
| Text generation              | generate text given a prompt                                                                                 | NLP             | pipeline(task=“text-generation”)              |
| Summarization                | generate a summary of a sequence of text or document                                                         | NLP             | pipeline(task=“summarization”)                |
| Image classification         | assign a label to an image                                                                                   | Computer vision | pipeline(task=“image-classification”)         |
| Image segmentation           | assign a label to each individual pixel of an image (supports semantic, panoptic, and instance segmentation) | Computer vision | pipeline(task=“image-segmentation”)           |
| Object detection             | predict the bounding boxes and classes of objects in an image                                                | Computer vision | pipeline(task=“object-detection”)             |
| Audio classification         | assign a label to some audio data                                                                            | Audio           | pipeline(task=“audio-classification”)         |
| Automatic speech recognition | transcribe speech into text                                                                                  | Audio           | pipeline(task=“automatic-speech-recognition”) |
| Visual question answering    | answer a question about the image, given an image and a question                                             | Multimodal      | pipeline(task=“vqa”)                          |
| Document question answering  | answer a question about a document, given an image and a question                                            | Multimodal      | pipeline(task="document-question-answering")  |
| Image captioning             | generate a caption for a given image                                                                         | Multimodal      | pipeline(task="image-to-text")                |

Start by creating an instance of [`pipeline`] and specifying a task you want to use it for. In this guide, you'll use the [`pipeline`] for sentiment analysis as an example:
Steven Liu's avatar
Steven Liu committed
67

Sylvain Gugger's avatar
Sylvain Gugger committed
68
69
```py
>>> from transformers import pipeline
Sylvain Gugger's avatar
Sylvain Gugger committed
70
71

>>> classifier = pipeline("sentiment-analysis")
Sylvain Gugger's avatar
Sylvain Gugger committed
72
73
```

Steven Liu's avatar
Steven Liu committed
74
The [`pipeline`] downloads and caches a default [pretrained model](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english) and tokenizer for sentiment analysis. Now you can use the `classifier` on your target text:
Sylvain Gugger's avatar
Sylvain Gugger committed
75
76

```py
Sylvain Gugger's avatar
Sylvain Gugger committed
77
>>> classifier("We are very happy to show you the 🤗 Transformers library.")
78
[{'label': 'POSITIVE', 'score': 0.9998}]
Sylvain Gugger's avatar
Sylvain Gugger committed
79
80
```

Steven Liu's avatar
Steven Liu committed
81
If you have more than one input, pass your inputs as a list to the [`pipeline`] to return a list of dictionaries:
Sylvain Gugger's avatar
Sylvain Gugger committed
82
83

```py
Sylvain Gugger's avatar
Sylvain Gugger committed
84
>>> results = classifier(["We are very happy to show you the 🤗 Transformers library.", "We hope you don't hate it."])
Sylvain Gugger's avatar
Sylvain Gugger committed
85
86
87
88
89
90
>>> for result in results:
...     print(f"label: {result['label']}, with score: {round(result['score'], 4)}")
label: POSITIVE, with score: 0.9998
label: NEGATIVE, with score: 0.5309
```

Steven Liu's avatar
Steven Liu committed
91
The [`pipeline`] can also iterate over an entire dataset for any task you like. For this example, let's choose automatic speech recognition as our task:
Sylvain Gugger's avatar
Sylvain Gugger committed
92
93

```py
94
>>> import torch
Steven Liu's avatar
Steven Liu committed
95
>>> from transformers import pipeline
Sylvain Gugger's avatar
Sylvain Gugger committed
96

97
>>> speech_recognizer = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h")
Steven Liu's avatar
Steven Liu committed
98
```
Sylvain Gugger's avatar
Sylvain Gugger committed
99

Steven Liu's avatar
Steven Liu committed
100
Load an audio dataset (see the 🤗 Datasets [Quick Start](https://huggingface.co/docs/datasets/quickstart#audio) for more details) you'd like to iterate over. For example, load the [MInDS-14](https://huggingface.co/datasets/PolyAI/minds14) dataset:
Sylvain Gugger's avatar
Sylvain Gugger committed
101
102

```py
103
>>> from datasets import load_dataset, Audio
Steven Liu's avatar
Steven Liu committed
104

105
>>> dataset = load_dataset("PolyAI/minds14", name="en-US", split="train")  # doctest: +IGNORE_RESULT
Sylvain Gugger's avatar
Sylvain Gugger committed
106
107
```

Steven Liu's avatar
Steven Liu committed
108
109
You need to make sure the sampling rate of the dataset matches the sampling 
rate [`facebook/wav2vec2-base-960h`](https://huggingface.co/facebook/wav2vec2-base-960h) was trained on:
Sylvain Gugger's avatar
Sylvain Gugger committed
110
111

```py
112
113
114
>>> dataset = dataset.cast_column("audio", Audio(sampling_rate=speech_recognizer.feature_extractor.sampling_rate))
```

Steven Liu's avatar
Steven Liu committed
115
116
The audio files are automatically loaded and resampled when calling the `"audio"` column.
Extract the raw waveform arrays from the first 4 samples and pass it as a list to the pipeline:
117
118

```py
119
120
>>> result = speech_recognizer(dataset[:4]["audio"])
>>> print([d["text"] for d in result])
121
['I WOULD LIKE TO SET UP A JOINT ACCOUNT WITH MY PARTNER HOW DO I PROCEED WITH DOING THAT', "FODING HOW I'D SET UP A JOIN TO HET WITH MY WIFE AND WHERE THE AP MIGHT BE", "I I'D LIKE TOY SET UP A JOINT ACCOUNT WITH MY PARTNER I'M NOT SEEING THE OPTION TO DO IT ON THE AP SO I CALLED IN TO GET SOME HELP CAN I JUST DO IT OVER THE PHONE WITH YOU AND GIVE YOU THE INFORMATION OR SHOULD I DO IT IN THE AP AND I'M MISSING SOMETHING UQUETTE HAD PREFERRED TO JUST DO IT OVER THE PHONE OF POSSIBLE THINGS", 'HOW DO I THURN A JOIN A COUNT']
Steven Liu's avatar
Steven Liu committed
122
```
Sylvain Gugger's avatar
Sylvain Gugger committed
123

Steven Liu's avatar
Steven Liu committed
124
For larger datasets where the inputs are big (like in speech or vision), you'll want to pass a generator instead of a list to load all the inputs in memory. Take a look at the [pipeline API reference](./main_classes/pipelines) for more information.
125

Steven Liu's avatar
Steven Liu committed
126
### Use another model and tokenizer in the pipeline
Sylvain Gugger's avatar
Sylvain Gugger committed
127

Steven Liu's avatar
Steven Liu committed
128
The [`pipeline`] can accommodate any model from the [Hub](https://huggingface.co/models), making it easy to adapt the [`pipeline`] for other use-cases. For example, if you'd like a model capable of handling French text, use the tags on the Hub to filter for an appropriate model. The top filtered result returns a multilingual [BERT model](https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment) finetuned for sentiment analysis you can use for French text:
Sylvain Gugger's avatar
Sylvain Gugger committed
129

Steven Liu's avatar
Steven Liu committed
130
131
132
```py
>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
```
Sylvain Gugger's avatar
Sylvain Gugger committed
133

Sylvain Gugger's avatar
Sylvain Gugger committed
134
135
<frameworkcontent>
<pt>
Steven Liu's avatar
Steven Liu committed
136
Use [`AutoModelForSequenceClassification`] and [`AutoTokenizer`] to load the pretrained model and it's associated tokenizer (more on an `AutoClass` in the next section):
Sylvain Gugger's avatar
Sylvain Gugger committed
137
138
139

```py
>>> from transformers import AutoTokenizer, AutoModelForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
140

Steven Liu's avatar
Steven Liu committed
141
>>> model = AutoModelForSequenceClassification.from_pretrained(model_name)
Sylvain Gugger's avatar
Sylvain Gugger committed
142
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
Sylvain Gugger's avatar
Sylvain Gugger committed
143
144
145
```
</pt>
<tf>
Steven Liu's avatar
Steven Liu committed
146
Use [`TFAutoModelForSequenceClassification`] and [`AutoTokenizer`] to load the pretrained model and it's associated tokenizer (more on an `TFAutoClass` in the next section):
Sylvain Gugger's avatar
Sylvain Gugger committed
147
148

```py
Sylvain Gugger's avatar
Sylvain Gugger committed
149
>>> from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
150

Steven Liu's avatar
Steven Liu committed
151
>>> model = TFAutoModelForSequenceClassification.from_pretrained(model_name)
Sylvain Gugger's avatar
Sylvain Gugger committed
152
153
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
```
Sylvain Gugger's avatar
Sylvain Gugger committed
154
155
</tf>
</frameworkcontent>
Sylvain Gugger's avatar
Sylvain Gugger committed
156

Steven Liu's avatar
Steven Liu committed
157
Specify the model and tokenizer in the [`pipeline`], and now you can apply the `classifier` on French text:
Steven Liu's avatar
Steven Liu committed
158
159
160
161

```py
>>> classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
>>> classifier("Nous sommes très heureux de vous présenter la bibliothèque 🤗 Transformers.")
162
[{'label': '5 stars', 'score': 0.7273}]
Steven Liu's avatar
Steven Liu committed
163
164
```

Steven Liu's avatar
Steven Liu committed
165
If you can't find a model for your use-case, you'll need to finetune a pretrained model on your data. Take a look at our [finetuning tutorial](./training) to learn how. Finally, after you've finetuned your pretrained model, please consider [sharing](./model_sharing) the model with the community on the Hub to democratize machine learning for everyone! 🤗
Steven Liu's avatar
Steven Liu committed
166
167
168
169
170

## AutoClass

<Youtube id="AhChOFRegn4"/>

Nathan Barry's avatar
Nathan Barry committed
171
Under the hood, the [`AutoModelForSequenceClassification`] and [`AutoTokenizer`] classes work together to power the [`pipeline`] you used above. An [AutoClass](./model_doc/auto) is a shortcut that automatically retrieves the architecture of a pretrained model from its name or path. You only need to select the appropriate `AutoClass` for your task and it's associated preprocessing class. 
Steven Liu's avatar
Steven Liu committed
172

Steven Liu's avatar
Steven Liu committed
173
Let's return to the example from the previous section and see how you can use the `AutoClass` to replicate the results of the [`pipeline`].
Sylvain Gugger's avatar
Sylvain Gugger committed
174

Steven Liu's avatar
Steven Liu committed
175
### AutoTokenizer
Sylvain Gugger's avatar
Sylvain Gugger committed
176

Steven Liu's avatar
Steven Liu committed
177
A tokenizer is responsible for preprocessing text into an array of numbers as inputs to a model. There are multiple rules that govern the tokenization process, including how to split a word and at what level words should be split (learn more about tokenization in the [tokenizer summary](./tokenizer_summary)). The most important thing to remember is you need to instantiate a tokenizer with the same model name to ensure you're using the same tokenization rules a model was pretrained with.
Sylvain Gugger's avatar
Sylvain Gugger committed
178

Steven Liu's avatar
Steven Liu committed
179
Load a tokenizer with [`AutoTokenizer`]:
Sylvain Gugger's avatar
Sylvain Gugger committed
180
181

```py
Steven Liu's avatar
Steven Liu committed
182
183
184
185
>>> from transformers import AutoTokenizer

>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
Sylvain Gugger's avatar
Sylvain Gugger committed
186
187
```

Steven Liu's avatar
Steven Liu committed
188
Pass your text to the tokenizer:
Sylvain Gugger's avatar
Sylvain Gugger committed
189
190

```py
Steven Liu's avatar
Steven Liu committed
191
192
>>> encoding = tokenizer("We are very happy to show you the 🤗 Transformers library.")
>>> print(encoding)
193
194
195
{'input_ids': [101, 11312, 10320, 12495, 19308, 10114, 11391, 10855, 10103, 100, 58263, 13299, 119, 102],
 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
Sylvain Gugger's avatar
Sylvain Gugger committed
196
197
```

Steven Liu's avatar
Steven Liu committed
198
The tokenizer returns a dictionary containing:
Steven Liu's avatar
Steven Liu committed
199

0xflotus's avatar
0xflotus committed
200
201
* [input_ids](./glossary#input-ids): numerical representations of your tokens.
* [attention_mask](.glossary#attention-mask): indicates which tokens should be attended to.
Steven Liu's avatar
Steven Liu committed
202

Steven Liu's avatar
Steven Liu committed
203
A tokenizer can also accept a list of inputs, and pad and truncate the text to return a batch with uniform length:
Sylvain Gugger's avatar
Sylvain Gugger committed
204

Sylvain Gugger's avatar
Sylvain Gugger committed
205
206
<frameworkcontent>
<pt>
Sylvain Gugger's avatar
Sylvain Gugger committed
207
208
209
210
211
212
```py
>>> pt_batch = tokenizer(
...     ["We are very happy to show you the 🤗 Transformers library.", "We hope you don't hate it."],
...     padding=True,
...     truncation=True,
...     max_length=512,
Sylvain Gugger's avatar
Sylvain Gugger committed
213
...     return_tensors="pt",
Sylvain Gugger's avatar
Sylvain Gugger committed
214
... )
Sylvain Gugger's avatar
Sylvain Gugger committed
215
216
217
218
```
</pt>
<tf>
```py
Sylvain Gugger's avatar
Sylvain Gugger committed
219
220
221
222
223
>>> tf_batch = tokenizer(
...     ["We are very happy to show you the 🤗 Transformers library.", "We hope you don't hate it."],
...     padding=True,
...     truncation=True,
...     max_length=512,
Sylvain Gugger's avatar
Sylvain Gugger committed
224
...     return_tensors="tf",
Sylvain Gugger's avatar
Sylvain Gugger committed
225
226
... )
```
Sylvain Gugger's avatar
Sylvain Gugger committed
227
228
</tf>
</frameworkcontent>
Sylvain Gugger's avatar
Sylvain Gugger committed
229

Steven Liu's avatar
Steven Liu committed
230
231
<Tip>

232
Check out the [preprocess](./preprocessing) tutorial for more details about tokenization, and how to use an [`AutoImageProcessor`], [`AutoFeatureExtractor`] and [`AutoProcessor`] to preprocess image, audio, and multimodal inputs.
Steven Liu's avatar
Steven Liu committed
233
234

</Tip>
Sylvain Gugger's avatar
Sylvain Gugger committed
235

Steven Liu's avatar
Steven Liu committed
236
### AutoModel
Sylvain Gugger's avatar
Sylvain Gugger committed
237

Sylvain Gugger's avatar
Sylvain Gugger committed
238
239
<frameworkcontent>
<pt>
Steven Liu's avatar
Steven Liu committed
240
🤗 Transformers provides a simple and unified way to load pretrained instances. This means you can load an [`AutoModel`] like you would load an [`AutoTokenizer`]. The only difference is selecting the correct [`AutoModel`] for the task. For text (or sequence) classification, you should load [`AutoModelForSequenceClassification`]:
Sylvain Gugger's avatar
Sylvain Gugger committed
241
242

```py
Steven Liu's avatar
Steven Liu committed
243
>>> from transformers import AutoModelForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
244

Steven Liu's avatar
Steven Liu committed
245
246
247
>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
>>> pt_model = AutoModelForSequenceClassification.from_pretrained(model_name)
```
Sylvain Gugger's avatar
Sylvain Gugger committed
248
249
250

<Tip>

Steven Liu's avatar
Steven Liu committed
251
See the [task summary](./task_summary) for tasks supported by an [`AutoModel`] class.
Sylvain Gugger's avatar
Sylvain Gugger committed
252
253
254

</Tip>

Steven Liu's avatar
Steven Liu committed
255
Now pass your preprocessed batch of inputs directly to the model. You just have to unpack the dictionary by adding `**`:
Sylvain Gugger's avatar
Sylvain Gugger committed
256
257

```py
Steven Liu's avatar
Steven Liu committed
258
>>> pt_outputs = pt_model(**pt_batch)
Sylvain Gugger's avatar
Sylvain Gugger committed
259
260
```

Steven Liu's avatar
Steven Liu committed
261
The model outputs the final activations in the `logits` attribute. Apply the softmax function to the `logits` to retrieve the probabilities:
Sylvain Gugger's avatar
Sylvain Gugger committed
262
263

```py
Steven Liu's avatar
Steven Liu committed
264
265
266
>>> from torch import nn

>>> pt_predictions = nn.functional.softmax(pt_outputs.logits, dim=-1)
Sylvain Gugger's avatar
Sylvain Gugger committed
267
>>> print(pt_predictions)
268
269
tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
        [0.2084, 0.1826, 0.1969, 0.1755, 0.2365]], grad_fn=<SoftmaxBackward0>)
Sylvain Gugger's avatar
Sylvain Gugger committed
270
271
272
```
</pt>
<tf>
Steven Liu's avatar
Steven Liu committed
273
🤗 Transformers provides a simple and unified way to load pretrained instances. This means you can load an [`TFAutoModel`] like you would load an [`AutoTokenizer`]. The only difference is selecting the correct [`TFAutoModel`] for the task. For text (or sequence) classification, you should load [`TFAutoModelForSequenceClassification`]:
Sylvain Gugger's avatar
Sylvain Gugger committed
274
275
276
277
278
279
280
281
282
283

```py
>>> from transformers import TFAutoModelForSequenceClassification

>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained(model_name)
```

<Tip>

Steven Liu's avatar
Steven Liu committed
284
See the [task summary](./task_summary) for tasks supported by an [`AutoModel`] class.
285

Sylvain Gugger's avatar
Sylvain Gugger committed
286
287
</Tip>

Steven Liu's avatar
Steven Liu committed
288
Now pass your preprocessed batch of inputs directly to the model by passing the dictionary keys directly to the tensors:
Sylvain Gugger's avatar
Sylvain Gugger committed
289
290
291
292
293
294
295
296

```py
>>> tf_outputs = tf_model(tf_batch)
```

The model outputs the final activations in the `logits` attribute. Apply the softmax function to the `logits` to retrieve the probabilities:

```py
Steven Liu's avatar
Steven Liu committed
297
298
299
>>> import tensorflow as tf

>>> tf_predictions = tf.nn.softmax(tf_outputs.logits, axis=-1)
300
>>> tf_predictions  # doctest: +IGNORE_RESULT
Sylvain Gugger's avatar
Sylvain Gugger committed
301
```
Sylvain Gugger's avatar
Sylvain Gugger committed
302
303
</tf>
</frameworkcontent>
Sylvain Gugger's avatar
Sylvain Gugger committed
304

Steven Liu's avatar
Steven Liu committed
305
<Tip>
Sylvain Gugger's avatar
Sylvain Gugger committed
306

Steven Liu's avatar
Steven Liu committed
307
308
All 🤗 Transformers models (PyTorch or TensorFlow) output the tensors *before* the final activation
function (like softmax) because the final activation function is often fused with the loss. Model outputs are special dataclasses so their attributes are autocompleted in an IDE. The model outputs behave like a tuple or a dictionary (you can index with an integer, a slice or a string) in which case, attributes that are None are ignored.
Sylvain Gugger's avatar
Sylvain Gugger committed
309
310
311

</Tip>

Steven Liu's avatar
Steven Liu committed
312
313
### Save a model

Sylvain Gugger's avatar
Sylvain Gugger committed
314
315
<frameworkcontent>
<pt>
Steven Liu's avatar
Steven Liu committed
316
Once your model is fine-tuned, you can save it with its tokenizer using [`PreTrainedModel.save_pretrained`]:
Sylvain Gugger's avatar
Sylvain Gugger committed
317
318

```py
Sylvain Gugger's avatar
Sylvain Gugger committed
319
>>> pt_save_directory = "./pt_save_pretrained"
320
>>> tokenizer.save_pretrained(pt_save_directory)  # doctest: +IGNORE_RESULT
Sylvain Gugger's avatar
Sylvain Gugger committed
321
>>> pt_model.save_pretrained(pt_save_directory)
Sylvain Gugger's avatar
Sylvain Gugger committed
322
323
324
325
326
327
328
329
330
331
332
333
```

When you are ready to use the model again, reload it with [`PreTrainedModel.from_pretrained`]:

```py
>>> pt_model = AutoModelForSequenceClassification.from_pretrained("./pt_save_pretrained")
```
</pt>
<tf>
Once your model is fine-tuned, you can save it with its tokenizer using [`TFPreTrainedModel.save_pretrained`]:

```py
Sylvain Gugger's avatar
Sylvain Gugger committed
334
>>> tf_save_directory = "./tf_save_pretrained"
335
>>> tokenizer.save_pretrained(tf_save_directory)  # doctest: +IGNORE_RESULT
Sylvain Gugger's avatar
Sylvain Gugger committed
336
337
338
>>> tf_model.save_pretrained(tf_save_directory)
```

Sylvain Gugger's avatar
Sylvain Gugger committed
339
When you are ready to use the model again, reload it with [`TFPreTrainedModel.from_pretrained`]:
Sylvain Gugger's avatar
Sylvain Gugger committed
340

Steven Liu's avatar
Steven Liu committed
341
342
```py
>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained("./tf_save_pretrained")
Sylvain Gugger's avatar
Sylvain Gugger committed
343
```
Sylvain Gugger's avatar
Sylvain Gugger committed
344
345
</tf>
</frameworkcontent>
Sylvain Gugger's avatar
Sylvain Gugger committed
346

Steven Liu's avatar
Steven Liu committed
347
One particularly cool 🤗 Transformers feature is the ability to save a model and reload it as either a PyTorch or TensorFlow model. The `from_pt` or `from_tf` parameter can convert the model from one framework to the other:
Sylvain Gugger's avatar
Sylvain Gugger committed
348

Sylvain Gugger's avatar
Sylvain Gugger committed
349
350
<frameworkcontent>
<pt>
Sylvain Gugger's avatar
Sylvain Gugger committed
351
352
```py
>>> from transformers import AutoModel
Sylvain Gugger's avatar
Sylvain Gugger committed
353

Sylvain Gugger's avatar
Sylvain Gugger committed
354
>>> tokenizer = AutoTokenizer.from_pretrained(tf_save_directory)
Steven Liu's avatar
Steven Liu committed
355
>>> pt_model = AutoModelForSequenceClassification.from_pretrained(tf_save_directory, from_tf=True)
Sylvain Gugger's avatar
Sylvain Gugger committed
356
357
358
359
```
</pt>
<tf>
```py
360
>>> from transformers import TFAutoModel
Sylvain Gugger's avatar
Sylvain Gugger committed
361

362
>>> tokenizer = AutoTokenizer.from_pretrained(pt_save_directory)
Steven Liu's avatar
Steven Liu committed
363
>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained(pt_save_directory, from_pt=True)
364
```
Sylvain Gugger's avatar
Sylvain Gugger committed
365
366
</tf>
</frameworkcontent>
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402

## Custom model builds

You can modify the model's configuration class to change how a model is built. The configuration specifies a model's attributes, such as the number of hidden layers or attention heads. You start from scratch when you initialize a model from a custom configuration class. The model attributes are randomly initialized, and you'll need to train the model before you can use it to get meaningful results.

Start by importing [`AutoConfig`], and then load the pretrained model you want to modify. Within [`AutoConfig.from_pretrained`], you can specify the attribute you want to change, such as the number of attention heads:

```py
>>> from transformers import AutoConfig

>>> my_config = AutoConfig.from_pretrained("distilbert-base-uncased", n_heads=12)
```

<frameworkcontent>
<pt>
Create a model from your custom configuration with [`AutoModel.from_config`]:

```py
>>> from transformers import AutoModel

>>> my_model = AutoModel.from_config(my_config)
```
</pt>
<tf>
Create a model from your custom configuration with [`TFAutoModel.from_config`]:

```py
>>> from transformers import TFAutoModel

>>> my_model = TFAutoModel.from_config(my_config)
```
</tf>
</frameworkcontent>

Take a look at the [Create a custom architecture](./create_a_model) guide for more information about building custom configurations.

Steven Liu's avatar
Steven Liu committed
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
## Trainer - a PyTorch optimized training loop

All models are a standard [`torch.nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) so you can use them in any typical training loop. While you can write your own training loop, 🤗 Transformers provides a [`Trainer`] class for PyTorch, which contains the basic training loop and adds additional functionality for features like distributed training, mixed precision, and more.

Depending on your task, you'll typically pass the following parameters to [`Trainer`]:

1. A [`PreTrainedModel`] or a [`torch.nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module):

   ```py
   >>> from transformers import AutoModelForSequenceClassification

   >>> model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
   ```

2. [`TrainingArguments`] contains the model hyperparameters you can change like learning rate, batch size, and the number of epochs to train for. The default values are used if you don't specify any training arguments:

   ```py
   >>> from transformers import TrainingArguments

   >>> training_args = TrainingArguments(
   ...     output_dir="path/to/save/folder/",
   ...     learning_rate=2e-5,
   ...     per_device_train_batch_size=8,
   ...     per_device_eval_batch_size=8,
   ...     num_train_epochs=2,
   ... )
   ```

431
3. A preprocessing class like a tokenizer, image processor, feature extractor, or processor:
Steven Liu's avatar
Steven Liu committed
432
433
434
435
436
437
438

   ```py
   >>> from transformers import AutoTokenizer

   >>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
   ```

Steven Liu's avatar
Steven Liu committed
439
4. Load a dataset:
Steven Liu's avatar
Steven Liu committed
440
441

   ```py
Steven Liu's avatar
Steven Liu committed
442
443
   >>> from datasets import load_dataset

Yih-Dar's avatar
Yih-Dar committed
444
   >>> dataset = load_dataset("rotten_tomatoes")  # doctest: +IGNORE_RESULT
Steven Liu's avatar
Steven Liu committed
445
446
   ```

447
5. Create a function to tokenize the dataset:
Steven Liu's avatar
Steven Liu committed
448
449
450
451

   ```py
   >>> def tokenize_dataset(dataset):
   ...     return tokenizer(dataset["text"])
452
   ```
Steven Liu's avatar
Steven Liu committed
453

454
   Then apply it over the entire dataset with [`~datasets.Dataset.map`]:
Steven Liu's avatar
Steven Liu committed
455

456
   ```py
Steven Liu's avatar
Steven Liu committed
457
   >>> dataset = dataset.map(tokenize_dataset, batched=True)
Steven Liu's avatar
Steven Liu committed
458
459
   ```

Steven Liu's avatar
Steven Liu committed
460
6. A [`DataCollatorWithPadding`] to create a batch of examples from your dataset:
Steven Liu's avatar
Steven Liu committed
461
462

   ```py
Steven Liu's avatar
Steven Liu committed
463
   >>> from transformers import DataCollatorWithPadding
Steven Liu's avatar
Steven Liu committed
464

Steven Liu's avatar
Steven Liu committed
465
   >>> data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
Steven Liu's avatar
Steven Liu committed
466
467
468
469
470
471
472
473
474
475
476
477
478
479
   ```

Now gather all these classes in [`Trainer`]:

```py
>>> from transformers import Trainer

>>> trainer = Trainer(
...     model=model,
...     args=training_args,
...     train_dataset=dataset["train"],
...     eval_dataset=dataset["test"],
...     tokenizer=tokenizer,
...     data_collator=data_collator,
480
... )  # doctest: +SKIP
Steven Liu's avatar
Steven Liu committed
481
482
483
484
485
```

When you're ready, call [`~Trainer.train`] to start training:

```py
486
>>> trainer.train()  # doctest: +SKIP
Steven Liu's avatar
Steven Liu committed
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
```

<Tip>

For tasks - like translation or summarization - that use a sequence-to-sequence model, use the [`Seq2SeqTrainer`] and [`Seq2SeqTrainingArguments`] classes instead.

</Tip>

You can customize the training loop behavior by subclassing the methods inside [`Trainer`]. This allows you to customize features such as the loss function, optimizer, and scheduler. Take a look at the [`Trainer`] reference for which methods can be subclassed. 

The other way to customize the training loop is by using [Callbacks](./main_classes/callbacks). You can use callbacks to integrate with other libraries and inspect the training loop to report on progress or stop the training early. Callbacks do not modify anything in the training loop itself. To customize something like the loss function, you need to subclass the [`Trainer`] instead.

## Train with TensorFlow

All models are a standard [`tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model) so they can be trained in TensorFlow with the [Keras](https://keras.io/) API. 🤗 Transformers provides the [`~TFPreTrainedModel.prepare_tf_dataset`] method to easily load your dataset as a `tf.data.Dataset` so you can start training right away with Keras' [`compile`](https://keras.io/api/models/model_training_apis/#compile-method) and [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) methods.

1. You'll start with a [`TFPreTrainedModel`] or a [`tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model):

   ```py
   >>> from transformers import TFAutoModelForSequenceClassification

   >>> model = TFAutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
   ```

511
2. A preprocessing class like a tokenizer, image processor, feature extractor, or processor:
Steven Liu's avatar
Steven Liu committed
512
513
514
515
516
517
518

   ```py
   >>> from transformers import AutoTokenizer

   >>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
   ```

519
3. Create a function to tokenize the dataset:
Steven Liu's avatar
Steven Liu committed
520
521
522

   ```py
   >>> def tokenize_dataset(dataset):
523
524
   ...     return tokenizer(dataset["text"])  # doctest: +SKIP
   ```
Steven Liu's avatar
Steven Liu committed
525

526
4. Apply the tokenizer over the entire dataset with [`~datasets.Dataset.map`] and then pass the dataset and tokenizer to [`~TFPreTrainedModel.prepare_tf_dataset`]. You can also change the batch size and shuffle the dataset here if you'd like:
Steven Liu's avatar
Steven Liu committed
527

528
529
530
531
532
   ```py
   >>> dataset = dataset.map(tokenize_dataset)  # doctest: +SKIP
   >>> tf_dataset = model.prepare_tf_dataset(
   ...     dataset, batch_size=16, shuffle=True, tokenizer=tokenizer
   ... )  # doctest: +SKIP
Steven Liu's avatar
Steven Liu committed
533
534
   ```

535
5. When you're ready, you can call `compile` and `fit` to start training:
Steven Liu's avatar
Steven Liu committed
536
537
538
539
540

   ```py
   >>> from tensorflow.keras.optimizers import Adam

   >>> model.compile(optimizer=Adam(3e-5))
541
   >>> model.fit(dataset)  # doctest: +SKIP
Steven Liu's avatar
Steven Liu committed
542
543
   ```

544
545
## What's next?

0xflotus's avatar
0xflotus committed
546
Now that you've completed the 🤗 Transformers quick tour, check out our guides and learn how to do more specific things like writing a custom model, fine-tuning a model for a task, and how to train a model with a script. If you're interested in learning more about 🤗 Transformers core concepts, grab a cup of coffee and take a look at our Conceptual Guides!