model.rst 4.6 KB
Newer Older
1
..
Sylvain Gugger's avatar
Sylvain Gugger committed
2
3
4
5
6
7
8
9
10
11
12
    Copyright 2020 The HuggingFace Team. All rights reserved.

    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
    the License. You may obtain a copy of the License at

        http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
    an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
    specific language governing permissions and limitations under the License.

thomwolf's avatar
thomwolf committed
13
Models
Sylvain Gugger's avatar
Sylvain Gugger committed
14
-----------------------------------------------------------------------------------------------------------------------
thomwolf's avatar
thomwolf committed
15

16
17
18
19
The base classes :class:`~transformers.PreTrainedModel`, :class:`~transformers.TFPreTrainedModel`, and
:class:`~transformers.FlaxPreTrainedModel` implement the common methods for loading/saving a model either from a local
file or directory, or from a pretrained model configuration provided by the library (downloaded from HuggingFace's AWS
S3 repository).
20

Sylvain Gugger's avatar
Sylvain Gugger committed
21
22
:class:`~transformers.PreTrainedModel` and :class:`~transformers.TFPreTrainedModel` also implement a few methods which
are common among all the models to:
23
24
25
26

- resize the input token embeddings when new tokens are added to the vocabulary
- prune the attention heads of the model.

Sylvain Gugger's avatar
Sylvain Gugger committed
27
The other methods that are common to each model are defined in :class:`~transformers.modeling_utils.ModuleUtilsMixin`
Sylvain Gugger's avatar
Sylvain Gugger committed
28
(for the PyTorch models) and :class:`~transformers.modeling_tf_utils.TFModuleUtilsMixin` (for the TensorFlow models) or
Patrick von Platen's avatar
Patrick von Platen committed
29
30
31
for text generation, :class:`~transformers.generation_utils.GenerationMixin` (for the PyTorch models),
:class:`~transformers.generation_tf_utils.TFGenerationMixin` (for the TensorFlow models) and
:class:`~transformers.generation_flax_utils.FlaxGenerationMixin` (for the Flax/JAX models).
Sylvain Gugger's avatar
Sylvain Gugger committed
32
33


Sylvain Gugger's avatar
Sylvain Gugger committed
34
35
PreTrainedModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
thomwolf's avatar
thomwolf committed
36

37
.. autoclass:: transformers.PreTrainedModel
thomwolf's avatar
thomwolf committed
38
    :members:
LysandreJik's avatar
LysandreJik committed
39

Patrick von Platen's avatar
Patrick von Platen committed
40

41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
.. _from_pretrained-torch-dtype:

Model Instantiation dtype
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Under Pytorch a model normally gets instantiated with ``torch.float32`` format. This can be an issue if one tries to
load a model whose weights are in fp16, since it'd require twice as much memory. To overcome this limitation, you can
either explicitly pass the desired ``dtype`` using ``torch_dtype`` argument:

.. code-block:: python

    model = T5ForConditionalGeneration.from_pretrained("t5", torch_dtype=torch.float16)

or, if you want the model to always load in the most optimal memory pattern, you can use the special value ``"auto"``,
and then ``dtype`` will be automatically derived from the model's weights:

.. code-block:: python

    model = T5ForConditionalGeneration.from_pretrained("t5", torch_dtype="auto")

Models instantiated from scratch can also be told which ``dtype`` to use with:

.. code-block:: python

    config = T5Config.from_pretrained("t5")
    model = AutoModel.from_config(config)

Due to Pytorch design, this functionality is only available for floating dtypes.



Sylvain Gugger's avatar
Sylvain Gugger committed
72
73
ModuleUtilsMixin
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
74
75
76
77

.. autoclass:: transformers.modeling_utils.ModuleUtilsMixin
    :members:

Patrick von Platen's avatar
Patrick von Platen committed
78

Sylvain Gugger's avatar
Sylvain Gugger committed
79
80
TFPreTrainedModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LysandreJik's avatar
LysandreJik committed
81

82
.. autoclass:: transformers.TFPreTrainedModel
LysandreJik's avatar
LysandreJik committed
83
    :members:
Sylvain Gugger's avatar
Sylvain Gugger committed
84
85


Sylvain Gugger's avatar
Sylvain Gugger committed
86
87
TFModelUtilsMixin
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
88
89
90

.. autoclass:: transformers.modeling_tf_utils.TFModelUtilsMixin
    :members:
Sylvain Gugger's avatar
Sylvain Gugger committed
91
92


93
94
95
96
97
98
99
FlaxPreTrainedModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.FlaxPreTrainedModel
    :members:


100
Generation
Sylvain Gugger's avatar
Sylvain Gugger committed
101
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
102

Sylvain Gugger's avatar
Sylvain Gugger committed
103
104
105
106
.. autoclass:: transformers.generation_utils.GenerationMixin
    :members:

.. autoclass:: transformers.generation_tf_utils.TFGenerationMixin
Sylvain Gugger's avatar
Sylvain Gugger committed
107
    :members:
Sylvain Gugger's avatar
Sylvain Gugger committed
108

Patrick von Platen's avatar
Patrick von Platen committed
109
110
111
.. autoclass:: transformers.generation_flax_utils.FlaxGenerationMixin
    :members:

Sylvain Gugger's avatar
Sylvain Gugger committed
112
113
114
115
116
117

Pushing to the Hub
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.file_utils.PushToHubMixin
    :members: