quicktour.rst 23.6 KB
Newer Older
Sylvain Gugger's avatar
Sylvain Gugger committed
1
2
3
4
5
6
7
8
9
10
11
12
.. 
    Copyright 2020 The HuggingFace Team. All rights reserved.

    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
    the License. You may obtain a copy of the License at

        http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
    an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
    specific language governing permissions and limitations under the License.

13
Quick tour
Sylvain Gugger's avatar
Sylvain Gugger committed
14
=======================================================================================================================
15

Sylvain Gugger's avatar
Sylvain Gugger committed
16
17
Let's have a quick look at the 馃 Transformers library features. The library downloads pretrained models for Natural
Language Understanding (NLU) tasks, such as analyzing the sentiment of a text, and Natural Language Generation (NLG),
18
19
20
21
22
23
24
25
26
27
28
such as completing a prompt with new text or translating in another language.

First we will see how to easily leverage the pipeline API to quickly use those pretrained models at inference. Then, we
will dig a little bit more and see how the library gives you access to those models and helps you preprocess your data.

.. note::

    All code examples presented in the documentation have a switch on the top left for Pytorch versus TensorFlow. If
    not, the code is expected to work for both backends without any change needed.

Getting started on a task with a pipeline
Sylvain Gugger's avatar
Sylvain Gugger committed
29
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
30

31
32
33
34
35
36
37
38
39
The easiest way to use a pretrained model on a given task is to use :func:`~transformers.pipeline`.

.. raw:: html

   <iframe width="560" height="315" src="https://www.youtube.com/embed/tiZFewofSLM" title="YouTube video player"
   frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
   picture-in-picture" allowfullscreen></iframe>

馃 Transformers provides the following tasks out of the box:
40
41
42
43
44
45
46
47
48
49
50

- Sentiment analysis: is a text positive or negative?
- Text generation (in English): provide a prompt and the model will generate what follows.
- Name entity recognition (NER): in an input sentence, label each word with the entity it represents (person, place,
  etc.)
- Question answering: provide the model with some context and a question, extract the answer from the context.
- Filling masked text: given a text with masked words (e.g., replaced by ``[MASK]``), fill the blanks.
- Summarization: generate a summary of a long text.
- Translation: translate a text in another language.
- Feature extraction: return a tensor representation of the text.

Sylvain Gugger's avatar
Sylvain Gugger committed
51
52
Let's see how this work for sentiment analysis (the other tasks are all covered in the :doc:`task summary
</task_summary>`):
53

54
55
Install the following dependencies (if not already installed):

Lysandre Debut's avatar
Lysandre Debut committed
56
.. code-block:: bash
57

Lysandre Debut's avatar
Lysandre Debut committed
58
59
60
61
    ## PYTORCH CODE
    pip install torch
    ## TENSORFLOW CODE
    pip install tensorflow
62

63
64
65
66
67
68
69
70
71
72
73
74
75
76
.. code-block::

    >>> from transformers import pipeline
    >>> classifier = pipeline('sentiment-analysis')

When typing this command for the first time, a pretrained model and its tokenizer are downloaded and cached. We will
look at both later on, but as an introduction the tokenizer's job is to preprocess the text for the model, which is
then responsible for making predictions. The pipeline groups all of that together, and post-process the predictions to
make them readable. For instance:


.. code-block::

    >>> classifier('We are very happy to show you the 馃 Transformers library.')
Lysandre Debut's avatar
Lysandre Debut committed
77
    [{'label': 'POSITIVE', 'score': 0.9998}]
78

79
80
That's encouraging! You can use it on a list of sentences, which will be preprocessed then fed to the model, returning
a list of dictionaries like this one:
81
82
83
84
85
86
87
88
89
90

.. code-block::

    >>> results = classifier(["We are very happy to show you the 馃 Transformers library.",
    ...            "We hope you don't hate it."])
    >>> for result in results:
    ...     print(f"label: {result['label']}, with score: {round(result['score'], 4)}")
    label: POSITIVE, with score: 0.9998
    label: NEGATIVE, with score: 0.5309

91
92
To use with a large dataset, look at :doc:`iterating over a pipeline <./main_classes/pipelines>`

93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
You can see the second sentence has been classified as negative (it needs to be positive or negative) but its score is
fairly neutral.

By default, the model downloaded for this pipeline is called "distilbert-base-uncased-finetuned-sst-2-english". We can
look at its `model page <https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english>`__ to get more
information about it. It uses the :doc:`DistilBERT architecture </model_doc/distilbert>` and has been fine-tuned on a
dataset called SST-2 for the sentiment analysis task.

Let's say we want to use another model; for instance, one that has been trained on French data. We can search through
the `model hub <https://huggingface.co/models>`__ that gathers models pretrained on a lot of data by research labs, but
also community models (usually fine-tuned versions of those big models on a specific dataset). Applying the tags
"French" and "text-classification" gives back a suggestion "nlptown/bert-base-multilingual-uncased-sentiment". Let's
see how we can use it.

You can directly pass the name of the model to use to :func:`~transformers.pipeline`:

.. code-block::

    >>> classifier = pipeline('sentiment-analysis', model="nlptown/bert-base-multilingual-uncased-sentiment")

This classifier can now deal with texts in English, French, but also Dutch, German, Italian and Spanish! You can also
replace that name by a local folder where you have saved a pretrained model (see below). You can also pass a model
object and its associated tokenizer.

We will need two classes for this. The first is :class:`~transformers.AutoTokenizer`, which we will use to download the
tokenizer associated to the model we picked and instantiate it. The second is
:class:`~transformers.AutoModelForSequenceClassification` (or
:class:`~transformers.TFAutoModelForSequenceClassification` if you are using TensorFlow), which we will use to download
the model itself. Note that if we were using the library on an other task, the class of the model would change. The
:doc:`task summary </task_summary>` tutorial summarizes which class is used for which task.

.. code-block::

    >>> ## PYTORCH CODE
    >>> from transformers import AutoTokenizer, AutoModelForSequenceClassification
    >>> ## TENSORFLOW CODE
    >>> from transformers import AutoTokenizer, TFAutoModelForSequenceClassification

Now, to download the models and tokenizer we found previously, we just have to use the
:func:`~transformers.AutoModelForSequenceClassification.from_pretrained` method (feel free to replace ``model_name`` by
any other model from the model hub):

.. code-block::

    >>> ## PYTORCH CODE
    >>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
    >>> model = AutoModelForSequenceClassification.from_pretrained(model_name)
    >>> tokenizer = AutoTokenizer.from_pretrained(model_name)
Stas Bekman's avatar
Stas Bekman committed
141
    >>> classifier = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)
142
143
144
    >>> ## TENSORFLOW CODE
    >>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
    >>> # This model only exists in PyTorch, so we use the `from_pt` flag to import that model in TensorFlow.
Stas Bekman's avatar
Stas Bekman committed
145
    >>> model = TFAutoModelForSequenceClassification.from_pretrained(model_name, from_pt=True)
146
147
148
149
150
151
152
153
154
155
    >>> tokenizer = AutoTokenizer.from_pretrained(model_name)
    >>> classifier = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)

If you don't find a model that has been pretrained on some data similar to yours, you will need to fine-tune a
pretrained model on your data. We provide :doc:`example scripts </examples>` to do so. Once you're done, don't forget
to share your fine-tuned model on the hub with the community, using :doc:`this tutorial </model_sharing>`.

.. _pretrained-model:

Under the hood: pretrained models
Sylvain Gugger's avatar
Sylvain Gugger committed
156
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
157

158
159
160
161
162
163
164
165
166
Let's now see what happens beneath the hood when using those pipelines.

.. raw:: html

   <iframe width="560" height="315" src="https://www.youtube.com/embed/AhChOFRegn4" title="YouTube video player"
   frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
   picture-in-picture" allowfullscreen></iframe>

As we saw, the model and tokenizer are created using the :obj:`from_pretrained` method:
167

Sylvain Gugger's avatar
Sylvain Gugger committed
168
.. code-block::
169
170
171
172
173
174
175
176
177
178
179
180
181

    >>> ## PYTORCH CODE
    >>> from transformers import AutoTokenizer, AutoModelForSequenceClassification
    >>> model_name = "distilbert-base-uncased-finetuned-sst-2-english"
    >>> pt_model = AutoModelForSequenceClassification.from_pretrained(model_name)
    >>> tokenizer = AutoTokenizer.from_pretrained(model_name)
    >>> ## TENSORFLOW CODE
    >>> from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
    >>> model_name = "distilbert-base-uncased-finetuned-sst-2-english"
    >>> tf_model = TFAutoModelForSequenceClassification.from_pretrained(model_name)
    >>> tokenizer = AutoTokenizer.from_pretrained(model_name)

Using the tokenizer
Sylvain Gugger's avatar
Sylvain Gugger committed
182
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
183
184
185

We mentioned the tokenizer is responsible for the preprocessing of your texts. First, it will split a given text in
words (or part of words, punctuation symbols, etc.) usually called `tokens`. There are multiple rules that can govern
186
that process (you can learn more about them in the :doc:`tokenizer summary <tokenizer_summary>`), which is why we need
Sylvain Gugger's avatar
Sylvain Gugger committed
187
188
to instantiate the tokenizer using the name of the model, to make sure we use the same rules as when the model was
pretrained.
189
190
191
192
193
194
195
196
197
198
199

The second step is to convert those `tokens` into numbers, to be able to build a tensor out of them and feed them to
the model. To do this, the tokenizer has a `vocab`, which is the part we download when we instantiate it with the
:obj:`from_pretrained` method, since we need to use the same `vocab` as when the model was pretrained.

To apply these steps on a given text, we can just feed it to our tokenizer:

.. code-block::

    >>> inputs = tokenizer("We are very happy to show you the 馃 Transformers library.")

Sylvain Gugger's avatar
Sylvain Gugger committed
200
This returns a dictionary string to list of ints. It contains the `ids of the tokens <glossary#input-ids>`__, as
Sylvain Gugger's avatar
Sylvain Gugger committed
201
mentioned before, but also additional arguments that will be useful to the model. Here for instance, we also have an
Sylvain Gugger's avatar
Sylvain Gugger committed
202
`attention mask <glossary#attention-mask>`__ that the model will use to have a better understanding of the sequence:
203
204
205
206
207


.. code-block::

    >>> print(inputs)
208
209
    {'input_ids': [101, 2057, 2024, 2200, 3407, 2000, 2265, 2017, 1996, 100, 19081, 3075, 1012, 102],
     'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
210
211
212
213
214
215
216
217
218
219
220
221

You can pass a list of sentences directly to your tokenizer. If your goal is to send them through your model as a
batch, you probably want to pad them all to the same length, truncate them to the maximum length the model can accept
and get tensors back. You can specify all of that to the tokenizer:

.. code-block::

    >>> ## PYTORCH CODE
    >>> pt_batch = tokenizer(
    ...     ["We are very happy to show you the 馃 Transformers library.", "We hope you don't hate it."],
    ...     padding=True,
    ...     truncation=True,
222
    ...     max_length=512,
223
224
225
226
227
228
229
    ...     return_tensors="pt"
    ... )
    >>> ## TENSORFLOW CODE
    >>> tf_batch = tokenizer(
    ...     ["We are very happy to show you the 馃 Transformers library.", "We hope you don't hate it."],
    ...     padding=True,
    ...     truncation=True,
230
    ...     max_length=512,
231
232
233
    ...     return_tensors="tf"
    ... )

Sylvain Gugger's avatar
Sylvain Gugger committed
234
235
The padding is automatically applied on the side expected by the model (in this case, on the right), with the padding
token the model was pretrained with. The attention mask is also adapted to take the padding into account:
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252

.. code-block::

    >>> ## PYTORCH CODE
    >>> for key, value in pt_batch.items():
    ...     print(f"{key}: {value.numpy().tolist()}")
    input_ids: [[101, 2057, 2024, 2200, 3407, 2000, 2265, 2017, 1996, 100, 19081, 3075, 1012, 102], [101, 2057, 3246, 2017, 2123, 1005, 1056, 5223, 2009, 1012, 102, 0, 0, 0]]
    attention_mask: [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0]]
    >>> ## TENSORFLOW CODE
    >>> for key, value in tf_batch.items():
    ...     print(f"{key}: {value.numpy().tolist()}")
    input_ids: [[101, 2057, 2024, 2200, 3407, 2000, 2265, 2017, 1996, 100, 19081, 3075, 1012, 102], [101, 2057, 3246, 2017, 2123, 1005, 1056, 5223, 2009, 1012, 102, 0, 0, 0]]
    attention_mask: [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0]]

You can learn more about tokenizers :doc:`here <preprocessing>`.

Using the model
Sylvain Gugger's avatar
Sylvain Gugger committed
253
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
254

Stas Bekman's avatar
Stas Bekman committed
255
Once your input has been preprocessed by the tokenizer, you can send it directly to the model. As we mentioned, it will
Sylvain Gugger's avatar
Sylvain Gugger committed
256
257
contain all the relevant information the model needs. If you're using a TensorFlow model, you can pass the dictionary
keys directly to tensors, for a PyTorch model, you need to unpack the dictionary by adding :obj:`**`.
258
259
260
261
262
263
264
265

.. code-block::

    >>> ## PYTORCH CODE
    >>> pt_outputs = pt_model(**pt_batch)
    >>> ## TENSORFLOW CODE
    >>> tf_outputs = tf_model(tf_batch)

266
267
In 馃 Transformers, all outputs are objects that contain the model's final activations along with other metadata. These
objects are described in greater detail :doc:`here <main_classes/output>`. For now, let's inspect the output ourselves:
268
269
270
271
272

.. code-block::

    >>> ## PYTORCH CODE
    >>> print(pt_outputs)
273
    SequenceClassifierOutput(loss=None, logits=tensor([[-4.0833,  4.3364],
274
            [ 0.0818, -0.0418]], grad_fn=<AddmmBackward>), hidden_states=None, attentions=None)
275
276
    >>> ## TENSORFLOW CODE
    >>> print(tf_outputs)
277
    TFSequenceClassifierOutput(loss=None, logits=<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
278
279
    array([[-4.0833 ,  4.3364  ],
           [ 0.0818, -0.0418]], dtype=float32)>, hidden_states=None, attentions=None)
280

281
Notice how the output object has a ``logits`` attribute. You can use this to access the model's final activations.
Guy Rosin's avatar
Guy Rosin committed
282
283

.. note::
284

Sylvain Gugger's avatar
Sylvain Gugger committed
285
286
    All 馃 Transformers models (PyTorch or TensorFlow) return the activations of the model *before* the final activation
    function (like SoftMax) since this final activation function is often fused with the loss.
287
288
289
290
291
292

Let's apply the SoftMax activation to get predictions.

.. code-block::

    >>> ## PYTORCH CODE
293
294
    >>> from torch import nn
    >>> pt_predictions = nn.functional.softmax(pt_outputs.logits, dim=-1)
295
296
    >>> ## TENSORFLOW CODE
    >>> import tensorflow as tf
297
    >>> tf_predictions = tf.nn.softmax(tf_outputs.logits, axis=-1)
298
299
300
301
302
303
304
305

We can see we get the numbers from before:

.. code-block::

    >>> ## TENSORFLOW CODE
    >>> print(tf_predictions)
    tf.Tensor(
306
307
    [[2.2043e-04 9.9978e-01]
     [5.3086e-01 4.6914e-01]], shape=(2, 2), dtype=float32)
308
309
310
311
312
    >>> ## PYTORCH CODE
    >>> print(pt_predictions)
    tensor([[2.2043e-04, 9.9978e-01],
            [5.3086e-01, 4.6914e-01]], grad_fn=<SoftmaxBackward>)

313
314
If you provide the model with labels in addition to inputs, the model output object will also contain a ``loss``
attribute:
315
316
317
318
319
320

.. code-block::

    >>> ## PYTORCH CODE
    >>> import torch
    >>> pt_outputs = pt_model(**pt_batch, labels = torch.tensor([1, 0]))
321
322
    >>> print(pt_outputs)
    SequenceClassifierOutput(loss=tensor(0.3167, grad_fn=<NllLossBackward>), logits=tensor([[-4.0833,  4.3364],
323
            [ 0.0818, -0.0418]], grad_fn=<AddmmBackward>), hidden_states=None, attentions=None)
324
325
326
    >>> ## TENSORFLOW CODE
    >>> import tensorflow as tf
    >>> tf_outputs = tf_model(tf_batch, labels = tf.constant([1, 0]))
327
    >>> print(tf_outputs)
328
329
330
    TFSequenceClassifierOutput(loss=<tf.Tensor: shape=(2,), dtype=float32, numpy=array([2.2051e-04, 6.3326e-01], dtype=float32)>, logits=<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
    array([[-4.0833 ,  4.3364  ],
           [ 0.0818, -0.0418]], dtype=float32)>, hidden_states=None, attentions=None)
331

Sylvain Gugger's avatar
Sylvain Gugger committed
332
333
334
335
336
Models are standard `torch.nn.Module <https://pytorch.org/docs/stable/nn.html#torch.nn.Module>`__ or `tf.keras.Model
<https://www.tensorflow.org/api_docs/python/tf/keras/Model>`__ so you can use them in your usual training loop. 馃
Transformers also provides a :class:`~transformers.Trainer` (or :class:`~transformers.TFTrainer` if you are using
TensorFlow) class to help with your training (taking care of things such as distributed training, mixed precision,
etc.). See the :doc:`training tutorial <training>` for more details.
337

338
339
340
341
342
343
.. note::

    Pytorch model outputs are special dataclasses so that you can get autocompletion for their attributes in an IDE.
    They also behave like a tuple or a dictionary (e.g., you can index with an integer, a slice or a string) in which
    case the attributes not set (that have :obj:`None` values) are ignored.

Stas Bekman's avatar
Stas Bekman committed
344
Once your model is fine-tuned, you can save it with its tokenizer in the following way:
345

Sylvain Gugger's avatar
Sylvain Gugger committed
346
.. code-block::
347

Lysandre Debut's avatar
Lysandre Debut committed
348
    >>> ## PYTORCH CODE
349
350
351
    >>> pt_save_directory = './pt_save_pretrained'
    >>> tokenizer.save_pretrained(pt_save_directory)
    >>> pt_model.save_pretrained(pt_save_directory)
Lysandre Debut's avatar
Lysandre Debut committed
352
    >>> ## TENSORFLOW CODE
353
354
355
    >>> tf_save_directory = './tf_save_pretrained'
    >>> tokenizer.save_pretrained(tf_save_directory)
    >>> tf_model.save_pretrained(tf_save_directory)
356
357
358

You can then load this model back using the :func:`~transformers.AutoModel.from_pretrained` method by passing the
directory name instead of the model name. One cool feature of 馃 Transformers is that you can easily switch between
Lysandre Debut's avatar
Lysandre Debut committed
359
PyTorch and TensorFlow: any model saved as before can be loaded back either in PyTorch or TensorFlow.
360
361


Lysandre Debut's avatar
Lysandre Debut committed
362
363
364
If you would like to load your saved model in the other framework, first make sure it is installed:

.. code-block:: bash
365

Lysandre Debut's avatar
Lysandre Debut committed
366
367
368
369
370
371
    ## PYTORCH CODE
    pip install tensorflow
    ## TENSORFLOW CODE
    pip install torch

Then, use the corresponding Auto class to load it like this:
372

Sylvain Gugger's avatar
Sylvain Gugger committed
373
.. code-block::
374

Lysandre Debut's avatar
Lysandre Debut committed
375
376
377
378
379
    ## PYTORCH CODE
    >>> from transformers import TFAutoModel
    >>> tokenizer = AutoTokenizer.from_pretrained(pt_save_directory)
    >>> tf_model = TFAutoModel.from_pretrained(pt_save_directory, from_pt=True)
    ## TENSORFLOW CODE
380
381
382
    >>> from transformers import AutoModel
    >>> tokenizer = AutoTokenizer.from_pretrained(tf_save_directory)
    >>> pt_model = AutoModel.from_pretrained(tf_save_directory, from_tf=True)
383

Lysandre Debut's avatar
Lysandre Debut committed
384

385
386
387
Lastly, you can also ask the model to return all hidden states and all attention weights if you need them:


Sylvain Gugger's avatar
Sylvain Gugger committed
388
.. code-block::
389
390
391

    >>> ## PYTORCH CODE
    >>> pt_outputs = pt_model(**pt_batch, output_hidden_states=True, output_attentions=True)
392
393
    >>> all_hidden_states  = pt_outputs.hidden_states 
    >>> all_attentions = pt_outputs.attentions
394
395
    >>> ## TENSORFLOW CODE
    >>> tf_outputs = tf_model(tf_batch, output_hidden_states=True, output_attentions=True)
396
397
    >>> all_hidden_states =  tf_outputs.hidden_states
    >>> all_attentions = tf_outputs.attentions
398
399

Accessing the code
Sylvain Gugger's avatar
Sylvain Gugger committed
400
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
401
402
403
404
405

The :obj:`AutoModel` and :obj:`AutoTokenizer` classes are just shortcuts that will automatically work with any
pretrained model. Behind the scenes, the library has one model class per combination of architecture plus class, so the
code is easy to access and tweak if you need to.

Sylvain Gugger's avatar
Sylvain Gugger committed
406
407
408
409
410
411
412
In our previous example, the model was called "distilbert-base-uncased-finetuned-sst-2-english", which means it's using
the :doc:`DistilBERT </model_doc/distilbert>` architecture. As
:class:`~transformers.AutoModelForSequenceClassification` (or
:class:`~transformers.TFAutoModelForSequenceClassification` if you are using TensorFlow) was used, the model
automatically created is then a :class:`~transformers.DistilBertForSequenceClassification`. You can look at its
documentation for all details relevant to that specific model, or browse the source code. This is how you would
directly instantiate model and tokenizer without the auto magic:
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427

.. code-block::

    >>> ## PYTORCH CODE
    >>> from transformers import DistilBertTokenizer, DistilBertForSequenceClassification
    >>> model_name = "distilbert-base-uncased-finetuned-sst-2-english"
    >>> model = DistilBertForSequenceClassification.from_pretrained(model_name)
    >>> tokenizer = DistilBertTokenizer.from_pretrained(model_name)
    >>> ## TENSORFLOW CODE
    >>> from transformers import DistilBertTokenizer, TFDistilBertForSequenceClassification
    >>> model_name = "distilbert-base-uncased-finetuned-sst-2-english"
    >>> model = TFDistilBertForSequenceClassification.from_pretrained(model_name)
    >>> tokenizer = DistilBertTokenizer.from_pretrained(model_name)

Customizing the model
Sylvain Gugger's avatar
Sylvain Gugger committed
428
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
429

430
431
432
433
434
If you want to change how the model itself is built, you can define a custom configuration class. Each architecture
comes with its own relevant configuration. For example, :class:`~transformers.DistilBertConfig` allows you to specify
parameters such as the hidden dimension, dropout rate, etc for DistilBERT. If you do core modifications, like changing
the hidden size, you won't be able to use a pretrained model anymore and will need to train from scratch. You would
then instantiate the model directly from this configuration.
435

436
437
438
439
Below, we load a predefined vocabulary for a tokenizer with the
:func:`~transformers.DistilBertTokenizer.from_pretrained` method. However, unlike the tokenizer, we wish to initialize
the model from scratch. Therefore, we instantiate the model from a configuration instead of using the
:func:`~transformers.DistilBertForSequenceClassification.from_pretrained` method.
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455

.. code-block::

    >>> ## PYTORCH CODE
    >>> from transformers import DistilBertConfig, DistilBertTokenizer, DistilBertForSequenceClassification
    >>> config = DistilBertConfig(n_heads=8, dim=512, hidden_dim=4*512)
    >>> tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
    >>> model = DistilBertForSequenceClassification(config)
    >>> ## TENSORFLOW CODE
    >>> from transformers import DistilBertConfig, DistilBertTokenizer, TFDistilBertForSequenceClassification
    >>> config = DistilBertConfig(n_heads=8, dim=512, hidden_dim=4*512)
    >>> tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
    >>> model = TFDistilBertForSequenceClassification(config)

For something that only changes the head of the model (for instance, the number of labels), you can still use a
pretrained model for the body. For instance, let's define a classifier for 10 different labels using a pretrained body.
456
457
458
Instead of creating a new configuration with all the default values just to change the number of labels, we can instead
pass any argument a configuration would take to the :func:`from_pretrained` method and it will update the default
configuration appropriately:
459
460
461
462
463
464
465
466
467
468
469
470
471

.. code-block::

    >>> ## PYTORCH CODE
    >>> from transformers import DistilBertConfig, DistilBertTokenizer, DistilBertForSequenceClassification
    >>> model_name = "distilbert-base-uncased"
    >>> model = DistilBertForSequenceClassification.from_pretrained(model_name, num_labels=10)
    >>> tokenizer = DistilBertTokenizer.from_pretrained(model_name)
    >>> ## TENSORFLOW CODE
    >>> from transformers import DistilBertConfig, DistilBertTokenizer, TFDistilBertForSequenceClassification
    >>> model_name = "distilbert-base-uncased"
    >>> model = TFDistilBertForSequenceClassification.from_pretrained(model_name, num_labels=10)
    >>> tokenizer = DistilBertTokenizer.from_pretrained(model_name)