quicktour.mdx 24.2 KB
Newer Older
Steven Liu's avatar
Steven Liu committed
1
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Sylvain Gugger's avatar
Sylvain Gugger committed
2
3
4
5
6
7
8
9
10
11
12
13
14

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# Quick tour

15
16
[[open-in-colab]]

Steven Liu's avatar
Steven Liu committed
17
Get up and running with 馃 Transformers! Whether you're a developer or an everyday user, this quick tour will help you get started and show you how to use the [`pipeline`] for inference, load a pretrained model and preprocessor with an [AutoClass](./model_doc/auto), and quickly train a model with PyTorch or TensorFlow. If you're a beginner, we recommend checking out our tutorials or [course](https://huggingface.co/course/chapter1/1) next for more in-depth explanations of the concepts introduced here.
Sylvain Gugger's avatar
Sylvain Gugger committed
18

Steven Liu's avatar
Steven Liu committed
19
Before you begin, make sure you have all the necessary libraries installed:
Steven Liu's avatar
Steven Liu committed
20

Steven Liu's avatar
Steven Liu committed
21
22
23
```bash
!pip install transformers datasets
```
Sylvain Gugger's avatar
Sylvain Gugger committed
24

Steven Liu's avatar
Steven Liu committed
25
You'll also need to install your preferred machine learning framework:
Sylvain Gugger's avatar
Sylvain Gugger committed
26

Sylvain Gugger's avatar
Sylvain Gugger committed
27
28
<frameworkcontent>
<pt>
Sylvain Gugger's avatar
Sylvain Gugger committed
29
30
```bash
pip install torch
Sylvain Gugger's avatar
Sylvain Gugger committed
31
32
33
34
```
</pt>
<tf>
```bash
Sylvain Gugger's avatar
Sylvain Gugger committed
35
36
pip install tensorflow
```
Sylvain Gugger's avatar
Sylvain Gugger committed
37
38
</tf>
</frameworkcontent>
Sylvain Gugger's avatar
Sylvain Gugger committed
39

Steven Liu's avatar
Steven Liu committed
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
## Pipeline

<Youtube id="tiZFewofSLM"/>

The [`pipeline`] is the easiest way to use a pretrained model for inference. You can use the [`pipeline`] out-of-the-box for many tasks across different modalities. Take a look at the table below for some supported tasks:

| **Task**                     | **Description**                                                                                              | **Modality**    | **Pipeline identifier**                       |
|------------------------------|--------------------------------------------------------------------------------------------------------------|-----------------|-----------------------------------------------|
| Text classification          | assign a label to a given sequence of text                                                                   | NLP             | pipeline(task="sentiment-analysis")           |
| Text generation              | generate text that follows a given prompt                                                                    | NLP             | pipeline(task="text-generation")              |
| Name entity recognition      | assign a label to each token in a sequence (people, organization, location, etc.)                            | NLP             | pipeline(task="ner")                          |
| Question answering           | extract an answer from the text given some context and a question                                            | NLP             | pipeline(task="question-answering")           |
| Fill-mask                    | predict the correct masked token in a sequence                                                               | NLP             | pipeline(task="fill-mask")                    |
| Summarization                | generate a summary of a sequence of text or document                                                         | NLP             | pipeline(task="summarization")                |
| Translation                  | translate text from one language into another                                                                | NLP             | pipeline(task="translation")                  |
| Image classification         | assign a label to an image                                                                                   | Computer vision | pipeline(task="image-classification")         |
| Image segmentation           | assign a label to each individual pixel of an image (supports semantic, panoptic, and instance segmentation) | Computer vision | pipeline(task="image-segmentation")           |
| Object detection             | predict the bounding boxes and classes of objects in an image                                                | Computer vision | pipeline(task="object-detection")             |
| Audio classification         | assign a label to an audio file                                                                              | Audio           | pipeline(task="audio-classification")         |
| Automatic speech recognition | extract speech from an audio file into text                                                                  | Audio           | pipeline(task="automatic-speech-recognition") |
| Visual question answering    | given an image and a question, correctly answer a question about the image                                   | Multimodal      | pipeline(task="vqa")                          |

Start by creating an instance of [`pipeline`] and specifying a task you want to use it for. You can use the [`pipeline`] for any of the previously mentioned tasks, and for a complete list of supported tasks, take a look at the [pipeline API reference](./main_classes/pipelines). In this guide though, you'll use the [`pipeline`] for sentiment analysis as an example:
Steven Liu's avatar
Steven Liu committed
63

Sylvain Gugger's avatar
Sylvain Gugger committed
64
65
```py
>>> from transformers import pipeline
Sylvain Gugger's avatar
Sylvain Gugger committed
66
67

>>> classifier = pipeline("sentiment-analysis")
Sylvain Gugger's avatar
Sylvain Gugger committed
68
69
```

Steven Liu's avatar
Steven Liu committed
70
The [`pipeline`] downloads and caches a default [pretrained model](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english) and tokenizer for sentiment analysis. Now you can use the `classifier` on your target text:
Sylvain Gugger's avatar
Sylvain Gugger committed
71
72

```py
Sylvain Gugger's avatar
Sylvain Gugger committed
73
>>> classifier("We are very happy to show you the 馃 Transformers library.")
74
[{'label': 'POSITIVE', 'score': 0.9998}]
Sylvain Gugger's avatar
Sylvain Gugger committed
75
76
```

Steven Liu's avatar
Steven Liu committed
77
If you have more than one input, pass your inputs as a list to the [`pipeline`] to return a list of dictionaries:
Sylvain Gugger's avatar
Sylvain Gugger committed
78
79

```py
Sylvain Gugger's avatar
Sylvain Gugger committed
80
>>> results = classifier(["We are very happy to show you the 馃 Transformers library.", "We hope you don't hate it."])
Sylvain Gugger's avatar
Sylvain Gugger committed
81
82
83
84
85
86
>>> for result in results:
...     print(f"label: {result['label']}, with score: {round(result['score'], 4)}")
label: POSITIVE, with score: 0.9998
label: NEGATIVE, with score: 0.5309
```

Steven Liu's avatar
Steven Liu committed
87
The [`pipeline`] can also iterate over an entire dataset for any task you like. For this example, let's choose automatic speech recognition as our task:
Sylvain Gugger's avatar
Sylvain Gugger committed
88
89

```py
90
>>> import torch
Steven Liu's avatar
Steven Liu committed
91
>>> from transformers import pipeline
Sylvain Gugger's avatar
Sylvain Gugger committed
92

93
>>> speech_recognizer = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h")
Steven Liu's avatar
Steven Liu committed
94
```
Sylvain Gugger's avatar
Sylvain Gugger committed
95

Steven Liu's avatar
Steven Liu committed
96
Load an audio dataset (see the 馃 Datasets [Quick Start](https://huggingface.co/docs/datasets/quickstart#audio) for more details) you'd like to iterate over. For example, load the [MInDS-14](https://huggingface.co/datasets/PolyAI/minds14) dataset:
Sylvain Gugger's avatar
Sylvain Gugger committed
97
98

```py
99
>>> from datasets import load_dataset, Audio
Steven Liu's avatar
Steven Liu committed
100

101
>>> dataset = load_dataset("PolyAI/minds14", name="en-US", split="train")  # doctest: +IGNORE_RESULT
Sylvain Gugger's avatar
Sylvain Gugger committed
102
103
```

Steven Liu's avatar
Steven Liu committed
104
105
You need to make sure the sampling rate of the dataset matches the sampling 
rate [`facebook/wav2vec2-base-960h`](https://huggingface.co/facebook/wav2vec2-base-960h) was trained on:
Sylvain Gugger's avatar
Sylvain Gugger committed
106
107

```py
108
109
110
>>> dataset = dataset.cast_column("audio", Audio(sampling_rate=speech_recognizer.feature_extractor.sampling_rate))
```

Steven Liu's avatar
Steven Liu committed
111
112
The audio files are automatically loaded and resampled when calling the `"audio"` column.
Extract the raw waveform arrays from the first 4 samples and pass it as a list to the pipeline:
113
114

```py
115
116
>>> result = speech_recognizer(dataset[:4]["audio"])
>>> print([d["text"] for d in result])
117
['I WOULD LIKE TO SET UP A JOINT ACCOUNT WITH MY PARTNER HOW DO I PROCEED WITH DOING THAT', "FODING HOW I'D SET UP A JOIN TO HET WITH MY WIFE AND WHERE THE AP MIGHT BE", "I I'D LIKE TOY SET UP A JOINT ACCOUNT WITH MY PARTNER I'M NOT SEEING THE OPTION TO DO IT ON THE AP SO I CALLED IN TO GET SOME HELP CAN I JUST DO IT OVER THE PHONE WITH YOU AND GIVE YOU THE INFORMATION OR SHOULD I DO IT IN THE AP AND I'M MISSING SOMETHING UQUETTE HAD PREFERRED TO JUST DO IT OVER THE PHONE OF POSSIBLE THINGS", 'HOW DO I THURN A JOIN A COUNT']
Steven Liu's avatar
Steven Liu committed
118
```
Sylvain Gugger's avatar
Sylvain Gugger committed
119

Steven Liu's avatar
Steven Liu committed
120
For larger datasets where the inputs are big (like in speech or vision), you'll want to pass a generator instead of a list to load all the inputs in memory. Take a look at the [pipeline API reference](./main_classes/pipelines) for more information.
121

Steven Liu's avatar
Steven Liu committed
122
### Use another model and tokenizer in the pipeline
Sylvain Gugger's avatar
Sylvain Gugger committed
123

Steven Liu's avatar
Steven Liu committed
124
The [`pipeline`] can accommodate any model from the [Hub](https://huggingface.co/models), making it easy to adapt the [`pipeline`] for other use-cases. For example, if you'd like a model capable of handling French text, use the tags on the Hub to filter for an appropriate model. The top filtered result returns a multilingual [BERT model](https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment) finetuned for sentiment analysis you can use for French text:
Sylvain Gugger's avatar
Sylvain Gugger committed
125

Steven Liu's avatar
Steven Liu committed
126
127
128
```py
>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
```
Sylvain Gugger's avatar
Sylvain Gugger committed
129

Sylvain Gugger's avatar
Sylvain Gugger committed
130
131
<frameworkcontent>
<pt>
Steven Liu's avatar
Steven Liu committed
132
Use [`AutoModelForSequenceClassification`] and [`AutoTokenizer`] to load the pretrained model and it's associated tokenizer (more on an `AutoClass` in the next section):
Sylvain Gugger's avatar
Sylvain Gugger committed
133
134
135

```py
>>> from transformers import AutoTokenizer, AutoModelForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
136

Steven Liu's avatar
Steven Liu committed
137
>>> model = AutoModelForSequenceClassification.from_pretrained(model_name)
Sylvain Gugger's avatar
Sylvain Gugger committed
138
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
Sylvain Gugger's avatar
Sylvain Gugger committed
139
140
141
```
</pt>
<tf>
Steven Liu's avatar
Steven Liu committed
142
Use [`TFAutoModelForSequenceClassification`] and [`AutoTokenizer`] to load the pretrained model and it's associated tokenizer (more on an `TFAutoClass` in the next section):
Sylvain Gugger's avatar
Sylvain Gugger committed
143
144

```py
Sylvain Gugger's avatar
Sylvain Gugger committed
145
>>> from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
146

Steven Liu's avatar
Steven Liu committed
147
>>> model = TFAutoModelForSequenceClassification.from_pretrained(model_name)
Sylvain Gugger's avatar
Sylvain Gugger committed
148
149
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
```
Sylvain Gugger's avatar
Sylvain Gugger committed
150
151
</tf>
</frameworkcontent>
Sylvain Gugger's avatar
Sylvain Gugger committed
152

Steven Liu's avatar
Steven Liu committed
153
Specify the model and tokenizer in the [`pipeline`], and now you can apply the `classifier` on French text:
Steven Liu's avatar
Steven Liu committed
154
155
156
157

```py
>>> classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
>>> classifier("Nous sommes tr猫s heureux de vous pr茅senter la biblioth猫que 馃 Transformers.")
158
[{'label': '5 stars', 'score': 0.7273}]
Steven Liu's avatar
Steven Liu committed
159
160
```

Steven Liu's avatar
Steven Liu committed
161
If you can't find a model for your use-case, you'll need to finetune a pretrained model on your data. Take a look at our [finetuning tutorial](./training) to learn how. Finally, after you've finetuned your pretrained model, please consider [sharing](./model_sharing) the model with the community on the Hub to democratize machine learning for everyone! 馃
Steven Liu's avatar
Steven Liu committed
162
163
164
165
166

## AutoClass

<Youtube id="AhChOFRegn4"/>

Steven Liu's avatar
Steven Liu committed
167
Under the hood, the [`AutoModelForSequenceClassification`] and [`AutoTokenizer`] classes work together to power the [`pipeline`] you used above. An [AutoClass](./model_doc/auto) is a shortcut that automatically retrieves the architecture of a pretrained model from it's name or path. You only need to select the appropriate `AutoClass` for your task and it's associated preprocessing class. 
Steven Liu's avatar
Steven Liu committed
168

Steven Liu's avatar
Steven Liu committed
169
Let's return to the example from the previous section and see how you can use the `AutoClass` to replicate the results of the [`pipeline`].
Sylvain Gugger's avatar
Sylvain Gugger committed
170

Steven Liu's avatar
Steven Liu committed
171
### AutoTokenizer
Sylvain Gugger's avatar
Sylvain Gugger committed
172

Steven Liu's avatar
Steven Liu committed
173
A tokenizer is responsible for preprocessing text into an array of numbers as inputs to a model. There are multiple rules that govern the tokenization process, including how to split a word and at what level words should be split (learn more about tokenization in the [tokenizer summary](./tokenizer_summary)). The most important thing to remember is you need to instantiate a tokenizer with the same model name to ensure you're using the same tokenization rules a model was pretrained with.
Sylvain Gugger's avatar
Sylvain Gugger committed
174

Steven Liu's avatar
Steven Liu committed
175
Load a tokenizer with [`AutoTokenizer`]:
Sylvain Gugger's avatar
Sylvain Gugger committed
176
177

```py
Steven Liu's avatar
Steven Liu committed
178
179
180
181
>>> from transformers import AutoTokenizer

>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
Sylvain Gugger's avatar
Sylvain Gugger committed
182
183
```

Steven Liu's avatar
Steven Liu committed
184
Pass your text to the tokenizer:
Sylvain Gugger's avatar
Sylvain Gugger committed
185
186

```py
Steven Liu's avatar
Steven Liu committed
187
188
>>> encoding = tokenizer("We are very happy to show you the 馃 Transformers library.")
>>> print(encoding)
189
190
191
{'input_ids': [101, 11312, 10320, 12495, 19308, 10114, 11391, 10855, 10103, 100, 58263, 13299, 119, 102],
 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
Sylvain Gugger's avatar
Sylvain Gugger committed
192
193
```

Steven Liu's avatar
Steven Liu committed
194
The tokenizer returns a dictionary containing:
Steven Liu's avatar
Steven Liu committed
195
196
197
198

* [input_ids](./glossary#input-ids): numerical representions of your tokens.
* [atttention_mask](.glossary#attention-mask): indicates which tokens should be attended to.

Steven Liu's avatar
Steven Liu committed
199
A tokenizer can also accept a list of inputs, and pad and truncate the text to return a batch with uniform length:
Sylvain Gugger's avatar
Sylvain Gugger committed
200

Sylvain Gugger's avatar
Sylvain Gugger committed
201
202
<frameworkcontent>
<pt>
Sylvain Gugger's avatar
Sylvain Gugger committed
203
204
205
206
207
208
```py
>>> pt_batch = tokenizer(
...     ["We are very happy to show you the 馃 Transformers library.", "We hope you don't hate it."],
...     padding=True,
...     truncation=True,
...     max_length=512,
Sylvain Gugger's avatar
Sylvain Gugger committed
209
...     return_tensors="pt",
Sylvain Gugger's avatar
Sylvain Gugger committed
210
... )
Sylvain Gugger's avatar
Sylvain Gugger committed
211
212
213
214
```
</pt>
<tf>
```py
Sylvain Gugger's avatar
Sylvain Gugger committed
215
216
217
218
219
>>> tf_batch = tokenizer(
...     ["We are very happy to show you the 馃 Transformers library.", "We hope you don't hate it."],
...     padding=True,
...     truncation=True,
...     max_length=512,
Sylvain Gugger's avatar
Sylvain Gugger committed
220
...     return_tensors="tf",
Sylvain Gugger's avatar
Sylvain Gugger committed
221
222
... )
```
Sylvain Gugger's avatar
Sylvain Gugger committed
223
224
</tf>
</frameworkcontent>
Sylvain Gugger's avatar
Sylvain Gugger committed
225

Steven Liu's avatar
Steven Liu committed
226
227
228
229
230
<Tip>

Check out the [preprocess](./preprocessing) tutorial for more details about tokenization, and how to use an [`AutoFeatureExtractor`] and [`AutoProcessor`] to preprocess image, audio, and multimodal inputs.

</Tip>
Sylvain Gugger's avatar
Sylvain Gugger committed
231

Steven Liu's avatar
Steven Liu committed
232
### AutoModel
Sylvain Gugger's avatar
Sylvain Gugger committed
233

Sylvain Gugger's avatar
Sylvain Gugger committed
234
235
<frameworkcontent>
<pt>
Steven Liu's avatar
Steven Liu committed
236
馃 Transformers provides a simple and unified way to load pretrained instances. This means you can load an [`AutoModel`] like you would load an [`AutoTokenizer`]. The only difference is selecting the correct [`AutoModel`] for the task. For text (or sequence) classification, you should load [`AutoModelForSequenceClassification`]:
Sylvain Gugger's avatar
Sylvain Gugger committed
237
238

```py
Steven Liu's avatar
Steven Liu committed
239
>>> from transformers import AutoModelForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
240

Steven Liu's avatar
Steven Liu committed
241
242
243
>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
>>> pt_model = AutoModelForSequenceClassification.from_pretrained(model_name)
```
Sylvain Gugger's avatar
Sylvain Gugger committed
244
245
246

<Tip>

Steven Liu's avatar
Steven Liu committed
247
See the [task summary](./task_summary) for tasks supported by an [`AutoModel`] class.
Sylvain Gugger's avatar
Sylvain Gugger committed
248
249
250

</Tip>

Steven Liu's avatar
Steven Liu committed
251
Now pass your preprocessed batch of inputs directly to the model. You just have to unpack the dictionary by adding `**`:
Sylvain Gugger's avatar
Sylvain Gugger committed
252
253

```py
Steven Liu's avatar
Steven Liu committed
254
>>> pt_outputs = pt_model(**pt_batch)
Sylvain Gugger's avatar
Sylvain Gugger committed
255
256
```

Steven Liu's avatar
Steven Liu committed
257
The model outputs the final activations in the `logits` attribute. Apply the softmax function to the `logits` to retrieve the probabilities:
Sylvain Gugger's avatar
Sylvain Gugger committed
258
259

```py
Steven Liu's avatar
Steven Liu committed
260
261
262
>>> from torch import nn

>>> pt_predictions = nn.functional.softmax(pt_outputs.logits, dim=-1)
Sylvain Gugger's avatar
Sylvain Gugger committed
263
>>> print(pt_predictions)
264
265
tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
        [0.2084, 0.1826, 0.1969, 0.1755, 0.2365]], grad_fn=<SoftmaxBackward0>)
Sylvain Gugger's avatar
Sylvain Gugger committed
266
267
268
```
</pt>
<tf>
Steven Liu's avatar
Steven Liu committed
269
馃 Transformers provides a simple and unified way to load pretrained instances. This means you can load an [`TFAutoModel`] like you would load an [`AutoTokenizer`]. The only difference is selecting the correct [`TFAutoModel`] for the task. For text (or sequence) classification, you should load [`TFAutoModelForSequenceClassification`]:
Sylvain Gugger's avatar
Sylvain Gugger committed
270
271
272
273
274
275
276
277
278
279

```py
>>> from transformers import TFAutoModelForSequenceClassification

>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained(model_name)
```

<Tip>

Steven Liu's avatar
Steven Liu committed
280
See the [task summary](./task_summary) for tasks supported by an [`AutoModel`] class.
281

Sylvain Gugger's avatar
Sylvain Gugger committed
282
283
</Tip>

Steven Liu's avatar
Steven Liu committed
284
Now pass your preprocessed batch of inputs directly to the model by passing the dictionary keys directly to the tensors:
Sylvain Gugger's avatar
Sylvain Gugger committed
285
286
287
288
289
290
291
292

```py
>>> tf_outputs = tf_model(tf_batch)
```

The model outputs the final activations in the `logits` attribute. Apply the softmax function to the `logits` to retrieve the probabilities:

```py
Steven Liu's avatar
Steven Liu committed
293
294
295
>>> import tensorflow as tf

>>> tf_predictions = tf.nn.softmax(tf_outputs.logits, axis=-1)
296
>>> tf_predictions  # doctest: +IGNORE_RESULT
Sylvain Gugger's avatar
Sylvain Gugger committed
297
```
Sylvain Gugger's avatar
Sylvain Gugger committed
298
299
</tf>
</frameworkcontent>
Sylvain Gugger's avatar
Sylvain Gugger committed
300

Steven Liu's avatar
Steven Liu committed
301
<Tip>
Sylvain Gugger's avatar
Sylvain Gugger committed
302

Steven Liu's avatar
Steven Liu committed
303
304
All 馃 Transformers models (PyTorch or TensorFlow) output the tensors *before* the final activation
function (like softmax) because the final activation function is often fused with the loss. Model outputs are special dataclasses so their attributes are autocompleted in an IDE. The model outputs behave like a tuple or a dictionary (you can index with an integer, a slice or a string) in which case, attributes that are None are ignored.
Sylvain Gugger's avatar
Sylvain Gugger committed
305
306
307

</Tip>

Steven Liu's avatar
Steven Liu committed
308
309
### Save a model

Sylvain Gugger's avatar
Sylvain Gugger committed
310
311
<frameworkcontent>
<pt>
Steven Liu's avatar
Steven Liu committed
312
Once your model is fine-tuned, you can save it with its tokenizer using [`PreTrainedModel.save_pretrained`]:
Sylvain Gugger's avatar
Sylvain Gugger committed
313
314

```py
Sylvain Gugger's avatar
Sylvain Gugger committed
315
>>> pt_save_directory = "./pt_save_pretrained"
316
>>> tokenizer.save_pretrained(pt_save_directory)  # doctest: +IGNORE_RESULT
Sylvain Gugger's avatar
Sylvain Gugger committed
317
>>> pt_model.save_pretrained(pt_save_directory)
Sylvain Gugger's avatar
Sylvain Gugger committed
318
319
320
321
322
323
324
325
326
327
328
329
```

When you are ready to use the model again, reload it with [`PreTrainedModel.from_pretrained`]:

```py
>>> pt_model = AutoModelForSequenceClassification.from_pretrained("./pt_save_pretrained")
```
</pt>
<tf>
Once your model is fine-tuned, you can save it with its tokenizer using [`TFPreTrainedModel.save_pretrained`]:

```py
Sylvain Gugger's avatar
Sylvain Gugger committed
330
>>> tf_save_directory = "./tf_save_pretrained"
331
>>> tokenizer.save_pretrained(tf_save_directory)  # doctest: +IGNORE_RESULT
Sylvain Gugger's avatar
Sylvain Gugger committed
332
333
334
>>> tf_model.save_pretrained(tf_save_directory)
```

Sylvain Gugger's avatar
Sylvain Gugger committed
335
When you are ready to use the model again, reload it with [`TFPreTrainedModel.from_pretrained`]:
Sylvain Gugger's avatar
Sylvain Gugger committed
336

Steven Liu's avatar
Steven Liu committed
337
338
```py
>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained("./tf_save_pretrained")
Sylvain Gugger's avatar
Sylvain Gugger committed
339
```
Sylvain Gugger's avatar
Sylvain Gugger committed
340
341
</tf>
</frameworkcontent>
Sylvain Gugger's avatar
Sylvain Gugger committed
342

Steven Liu's avatar
Steven Liu committed
343
One particularly cool 馃 Transformers feature is the ability to save a model and reload it as either a PyTorch or TensorFlow model. The `from_pt` or `from_tf` parameter can convert the model from one framework to the other:
Sylvain Gugger's avatar
Sylvain Gugger committed
344

Sylvain Gugger's avatar
Sylvain Gugger committed
345
346
<frameworkcontent>
<pt>
Sylvain Gugger's avatar
Sylvain Gugger committed
347
348
```py
>>> from transformers import AutoModel
Sylvain Gugger's avatar
Sylvain Gugger committed
349

Sylvain Gugger's avatar
Sylvain Gugger committed
350
>>> tokenizer = AutoTokenizer.from_pretrained(tf_save_directory)
Steven Liu's avatar
Steven Liu committed
351
>>> pt_model = AutoModelForSequenceClassification.from_pretrained(tf_save_directory, from_tf=True)
Sylvain Gugger's avatar
Sylvain Gugger committed
352
353
354
355
```
</pt>
<tf>
```py
356
>>> from transformers import TFAutoModel
Sylvain Gugger's avatar
Sylvain Gugger committed
357

358
>>> tokenizer = AutoTokenizer.from_pretrained(pt_save_directory)
Steven Liu's avatar
Steven Liu committed
359
>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained(pt_save_directory, from_pt=True)
360
```
Sylvain Gugger's avatar
Sylvain Gugger committed
361
362
</tf>
</frameworkcontent>
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398

## Custom model builds

You can modify the model's configuration class to change how a model is built. The configuration specifies a model's attributes, such as the number of hidden layers or attention heads. You start from scratch when you initialize a model from a custom configuration class. The model attributes are randomly initialized, and you'll need to train the model before you can use it to get meaningful results.

Start by importing [`AutoConfig`], and then load the pretrained model you want to modify. Within [`AutoConfig.from_pretrained`], you can specify the attribute you want to change, such as the number of attention heads:

```py
>>> from transformers import AutoConfig

>>> my_config = AutoConfig.from_pretrained("distilbert-base-uncased", n_heads=12)
```

<frameworkcontent>
<pt>
Create a model from your custom configuration with [`AutoModel.from_config`]:

```py
>>> from transformers import AutoModel

>>> my_model = AutoModel.from_config(my_config)
```
</pt>
<tf>
Create a model from your custom configuration with [`TFAutoModel.from_config`]:

```py
>>> from transformers import TFAutoModel

>>> my_model = TFAutoModel.from_config(my_config)
```
</tf>
</frameworkcontent>

Take a look at the [Create a custom architecture](./create_a_model) guide for more information about building custom configurations.

Steven Liu's avatar
Steven Liu committed
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
## Trainer - a PyTorch optimized training loop

All models are a standard [`torch.nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) so you can use them in any typical training loop. While you can write your own training loop, 馃 Transformers provides a [`Trainer`] class for PyTorch, which contains the basic training loop and adds additional functionality for features like distributed training, mixed precision, and more.

Depending on your task, you'll typically pass the following parameters to [`Trainer`]:

1. A [`PreTrainedModel`] or a [`torch.nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module):

   ```py
   >>> from transformers import AutoModelForSequenceClassification

   >>> model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
   ```

2. [`TrainingArguments`] contains the model hyperparameters you can change like learning rate, batch size, and the number of epochs to train for. The default values are used if you don't specify any training arguments:

   ```py
   >>> from transformers import TrainingArguments

   >>> training_args = TrainingArguments(
   ...     output_dir="path/to/save/folder/",
   ...     learning_rate=2e-5,
   ...     per_device_train_batch_size=8,
   ...     per_device_eval_batch_size=8,
   ...     num_train_epochs=2,
   ... )
   ```

3. A preprocessing class like a tokenizer, feature extractor, or processor:

   ```py
   >>> from transformers import AutoTokenizer

   >>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
   ```

4. Your preprocessed train and test datasets:

   ```py
438
439
   >>> train_dataset = dataset["train"]  # doctest: +SKIP
   >>> eval_dataset = dataset["eval"]  # doctest: +SKIP
Steven Liu's avatar
Steven Liu committed
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
   ```

5. A [`DataCollator`] to create a batch of examples from your dataset:

   ```py
   >>> from transformers import DefaultDataCollator

   >>> data_collator = DefaultDataCollator()
   ```

Now gather all these classes in [`Trainer`]:

```py
>>> from transformers import Trainer

>>> trainer = Trainer(
...     model=model,
...     args=training_args,
...     train_dataset=dataset["train"],
...     eval_dataset=dataset["test"],
...     tokenizer=tokenizer,
...     data_collator=data_collator,
462
... )  # doctest: +SKIP
Steven Liu's avatar
Steven Liu committed
463
464
465
466
467
```

When you're ready, call [`~Trainer.train`] to start training:

```py
468
>>> trainer.train()  # doctest: +SKIP
Steven Liu's avatar
Steven Liu committed
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
```

<Tip>

For tasks - like translation or summarization - that use a sequence-to-sequence model, use the [`Seq2SeqTrainer`] and [`Seq2SeqTrainingArguments`] classes instead.

</Tip>

You can customize the training loop behavior by subclassing the methods inside [`Trainer`]. This allows you to customize features such as the loss function, optimizer, and scheduler. Take a look at the [`Trainer`] reference for which methods can be subclassed. 

The other way to customize the training loop is by using [Callbacks](./main_classes/callbacks). You can use callbacks to integrate with other libraries and inspect the training loop to report on progress or stop the training early. Callbacks do not modify anything in the training loop itself. To customize something like the loss function, you need to subclass the [`Trainer`] instead.

## Train with TensorFlow

All models are a standard [`tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model) so they can be trained in TensorFlow with the [Keras](https://keras.io/) API. 馃 Transformers provides the [`~TFPreTrainedModel.prepare_tf_dataset`] method to easily load your dataset as a `tf.data.Dataset` so you can start training right away with Keras' [`compile`](https://keras.io/api/models/model_training_apis/#compile-method) and [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) methods.

1. You'll start with a [`TFPreTrainedModel`] or a [`tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model):

   ```py
   >>> from transformers import TFAutoModelForSequenceClassification

   >>> model = TFAutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
   ```

2. A preprocessing class like a tokenizer, feature extractor, or processor:

   ```py
   >>> from transformers import AutoTokenizer

   >>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
   ```

501
3. Create a function to tokenize the dataset:
Steven Liu's avatar
Steven Liu committed
502
503
504

   ```py
   >>> def tokenize_dataset(dataset):
505
506
   ...     return tokenizer(dataset["text"])  # doctest: +SKIP
   ```
Steven Liu's avatar
Steven Liu committed
507

508
4. Apply the tokenizer over the entire dataset with [`~datasets.Dataset.map`] and then pass the dataset and tokenizer to [`~TFPreTrainedModel.prepare_tf_dataset`]. You can also change the batch size and shuffle the dataset here if you'd like:
Steven Liu's avatar
Steven Liu committed
509

510
511
512
513
514
   ```py
   >>> dataset = dataset.map(tokenize_dataset)  # doctest: +SKIP
   >>> tf_dataset = model.prepare_tf_dataset(
   ...     dataset, batch_size=16, shuffle=True, tokenizer=tokenizer
   ... )  # doctest: +SKIP
Steven Liu's avatar
Steven Liu committed
515
516
   ```

517
5. When you're ready, you can call `compile` and `fit` to start training:
Steven Liu's avatar
Steven Liu committed
518
519
520
521
522

   ```py
   >>> from tensorflow.keras.optimizers import Adam

   >>> model.compile(optimizer=Adam(3e-5))
523
   >>> model.fit(dataset)  # doctest: +SKIP
Steven Liu's avatar
Steven Liu committed
524
525
   ```

526
527
528
## What's next?

Now that you've completed the 馃 Transformers quick tour, check out our guides and learn how to do more specific things like writing a custom model, fine-tuning a model for a task, and how to train a model with a script. If you're interested in learning more about 馃 Transformers core concepts, grab a cup of coffee and take a look at our Conceptual Guides!