"vscode:/vscode.git/clone" did not exist on "b4199c2dad51639f5c467c7e2986e5270b275d84"
model_sharing.rst 12 KB
Newer Older
Sylvain Gugger's avatar
Sylvain Gugger committed
1
2
3
4
5
6
7
8
9
10
11
12
.. 
    Copyright 2020 The HuggingFace Team. All rights reserved.

    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
    the License. You may obtain a copy of the License at

        http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
    an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
    specific language governing permissions and limitations under the License.

Sylvain Gugger's avatar
Sylvain Gugger committed
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Model sharing and uploading
=======================================================================================================================

In this page, we will show you how to share a model you have trained or fine-tuned on new data with the community on
the `model hub <https://huggingface.co/models>`__.

.. note::

    You will need to create an account on `huggingface.co <https://huggingface.co/join>`__ for this.

    Optionally, you can join an existing organization or create a new one.

Prepare your model for uploading
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We have seen in the :doc:`training tutorial <training>`: how to fine-tune a model on a given task. You have probably
done something similar on your task, either using the model directly in your own training loop or using the
Sylvain Gugger's avatar
Sylvain Gugger committed
30
31
:class:`~.transformers.Trainer`/:class:`~.transformers.TFTrainer` class. Let's see how you can share the result on the
`model hub <https://huggingface.co/models>`__.
Sylvain Gugger's avatar
Sylvain Gugger committed
32

Lysandre Debut's avatar
Lysandre Debut committed
33
Model versioning
Sylvain Gugger's avatar
Sylvain Gugger committed
34
35
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Lysandre Debut's avatar
Lysandre Debut committed
36
37
38
39
40
41
42
43
44
45
46
Since version v3.5.0, the model hub has built-in model versioning based on git and git-lfs. It is based on the paradigm
that one model *is* one repo.

This allows:

- built-in versioning
- access control
- scalability

This is built around *revisions*, which is a way to pin a specific version of a model, using a commit hash, tag or
branch.
Sylvain Gugger's avatar
Sylvain Gugger committed
47

Lysandre Debut's avatar
Lysandre Debut committed
48
For instance:
Sylvain Gugger's avatar
Sylvain Gugger committed
49
50
51

.. code-block::

52
    >>> model = AutoModel.from_pretrained(
Lysandre Debut's avatar
Lysandre Debut committed
53
54
55
56
57
58
59
60
    >>>   "julien-c/EsperBERTo-small",
    >>>   revision="v2.0.1" # tag name, or branch name, or commit hash
    >>> )

Basic steps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In order to upload a model, you'll need to first create a git repo. This repo will live on the model hub, allowing
61
62
users to clone it and you (and your organization members) to push to it.

63
You can create a model repo **directly from `the /new page on the website <https://huggingface.co/new>`__.**
64
65

Alternatively, you can use the ``transformers-cli``. The next steps describe that process:
Sylvain Gugger's avatar
Sylvain Gugger committed
66

67
Go to a terminal and run the following command. It should be in the virtual environment where you installed 馃
Lysandre Debut's avatar
Lysandre Debut committed
68
Transformers, since that command :obj:`transformers-cli` comes from the library.
Sylvain Gugger's avatar
Sylvain Gugger committed
69

Sylvain Gugger's avatar
Sylvain Gugger committed
70
.. code-block:: bash
Sylvain Gugger's avatar
Sylvain Gugger committed
71

Lysandre Debut's avatar
Lysandre Debut committed
72
73
    transformers-cli login

Sylvain Gugger's avatar
Sylvain Gugger committed
74

Lysandre Debut's avatar
Lysandre Debut committed
75
Once you are logged in with your model hub credentials, you can start building your repositories. To create a repo:
Sylvain Gugger's avatar
Sylvain Gugger committed
76

Sylvain Gugger's avatar
Sylvain Gugger committed
77
.. code-block:: bash
Sylvain Gugger's avatar
Sylvain Gugger committed
78

Lysandre Debut's avatar
Lysandre Debut committed
79
    transformers-cli repo create your-model-name
Sylvain Gugger's avatar
Sylvain Gugger committed
80

81
This creates a repo on the model hub, which can be cloned.
Sylvain Gugger's avatar
Sylvain Gugger committed
82

Sylvain Gugger's avatar
Sylvain Gugger committed
83
.. code-block:: bash
Sylvain Gugger's avatar
Sylvain Gugger committed
84

85
86
87
88
    # Make sure you have git-lfs installed
    # (https://git-lfs.github.com/)
    git lfs install

89
90
    git clone https://huggingface.co/username/your-model-name

91
92
93
94
95
96
When you have your local clone of your repo and lfs installed, you can then add/remove from that clone as you would
with any other git repo.

.. code-block:: bash

    # Commit as usual
Lysandre Debut's avatar
Lysandre Debut committed
97
98
99
100
    cd your-model-name
    echo "hello" >> README.md
    git add . && git commit -m "Update from $USER"

101
102
We are intentionally not wrapping git too much, so that you can go on with the workflow you're used to and the tools
you already know.
Lysandre Debut's avatar
Lysandre Debut committed
103

104
105
106
The only learning curve you might have compared to regular git is the one for git-lfs. The documentation at
`git-lfs.github.com <https://git-lfs.github.com/>`__ is decent, but we'll work on a tutorial with some tips and tricks
in the coming weeks!
Sylvain Gugger's avatar
Sylvain Gugger committed
107
108
109
110
111
112
113
114
115

Make your model work on all frameworks
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. 
    TODO Sylvain: make this automatic during the upload

You probably have your favorite framework, but so will other users! That's why it's best to upload your model with both
PyTorch `and` TensorFlow checkpoints to make it easier to use (if you skip this step, users will still be able to load
Sylvain Gugger's avatar
Sylvain Gugger committed
116
your model in another framework, but it will be slower, as it will have to be converted on the fly). Don't worry, it's
117
super easy to do (and in a future version, it might all be automatic). You will need to install both PyTorch and
Sylvain Gugger's avatar
Sylvain Gugger committed
118
119
120
TensorFlow for this step, but you don't need to worry about the GPU, so it should be very easy. Check the `TensorFlow
installation page <https://www.tensorflow.org/install/pip#tensorflow-2.0-rc-is-available>`__ and/or the `PyTorch
installation page <https://pytorch.org/get-started/locally/#start-locally>`__ to see how.
Sylvain Gugger's avatar
Sylvain Gugger committed
121
122

First check that your model class exists in the other framework, that is try to import the same model by either adding
Sylvain Gugger's avatar
Sylvain Gugger committed
123
or removing TF. For instance, if you trained a :class:`~transformers.DistilBertForSequenceClassification`, try to type
Sylvain Gugger's avatar
Sylvain Gugger committed
124
125
126

.. code-block::

Lysandre Debut's avatar
Lysandre Debut committed
127
    >>> from transformers import TFDistilBertForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
128

Sylvain Gugger's avatar
Sylvain Gugger committed
129
and if you trained a :class:`~transformers.TFDistilBertForSequenceClassification`, try to type
Sylvain Gugger's avatar
Sylvain Gugger committed
130
131
132

.. code-block::

Lysandre Debut's avatar
Lysandre Debut committed
133
    >>> from transformers import DistilBertForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
134
135
136
137
138
139
140
141
142

This will give back an error if your model does not exist in the other framework (something that should be pretty rare
since we're aiming for full parity between the two frameworks). In this case, skip this and go to the next step.

Now, if you trained your model in PyTorch and have to create a TensorFlow version, adapt the following code to your
model class:

.. code-block::

Lysandre Debut's avatar
Lysandre Debut committed
143
144
    >>> tf_model = TFDistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_pt=True)
    >>> tf_model.save_pretrained("path/to/awesome-name-you-picked")
Sylvain Gugger's avatar
Sylvain Gugger committed
145
146
147
148
149
150

and if you trained your model in TensorFlow and have to create a PyTorch version, adapt the following code to your
model class:

.. code-block::

Lysandre Debut's avatar
Lysandre Debut committed
151
152
    >>> pt_model = DistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_tf=True)
    >>> pt_model.save_pretrained("path/to/awesome-name-you-picked")
Sylvain Gugger's avatar
Sylvain Gugger committed
153
154
155

That's all there is to it!

Lysandre Debut's avatar
Lysandre Debut committed
156
Check the directory before pushing to the model hub.
Sylvain Gugger's avatar
Sylvain Gugger committed
157
158
159
160
161
162
163
164
165
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Make sure there are no garbage files in the directory you'll upload. It should only have:

- a `config.json` file, which saves the :doc:`configuration <main_classes/configuration>` of your model ;
- a `pytorch_model.bin` file, which is the PyTorch checkpoint (unless you can't have it for some reason) ;
- a `tf_model.h5` file, which is the TensorFlow checkpoint (unless you can't have it for some reason) ;
- a `special_tokens_map.json`, which is part of your :doc:`tokenizer <main_classes/tokenizer>` save;
- a `tokenizer_config.json`, which is part of your :doc:`tokenizer <main_classes/tokenizer>` save;
Sylvain Gugger's avatar
Sylvain Gugger committed
166
167
- files named `vocab.json`, `vocab.txt`, `merges.txt`, or similar, which contain the vocabulary of your tokenizer, part
  of your :doc:`tokenizer <main_classes/tokenizer>` save;
Sylvain Gugger's avatar
Sylvain Gugger committed
168
169
170
171
172
- maybe a `added_tokens.json`, which is part of your :doc:`tokenizer <main_classes/tokenizer>` save.

Other files can safely be deleted.


Lysandre Debut's avatar
Lysandre Debut committed
173
174
Uploading your files
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
175

Lysandre Debut's avatar
Lysandre Debut committed
176
177
Once the repo is cloned, you can add the model, configuration and tokenizer files. For instance, saving the model and
tokenizer files:
Sylvain Gugger's avatar
Sylvain Gugger committed
178
179
180

.. code-block::

Lysandre Debut's avatar
Lysandre Debut committed
181
182
    >>> model.save_pretrained("path/to/repo/clone/your-model-name")
    >>> tokenizer.save_pretrained("path/to/repo/clone/your-model-name")
Sylvain Gugger's avatar
Sylvain Gugger committed
183

Lysandre Debut's avatar
Lysandre Debut committed
184
Or, if you're using the Trainer API
Sylvain Gugger's avatar
Sylvain Gugger committed
185
186
187

.. code-block::

Lysandre Debut's avatar
Lysandre Debut committed
188
    >>> trainer.save_model("path/to/awesome-name-you-picked")
Sylvain Gugger's avatar
Sylvain Gugger committed
189
    >>> tokenizer.save_pretrained("path/to/repo/clone/your-model-name")
Sylvain Gugger's avatar
Sylvain Gugger committed
190

Lysandre Debut's avatar
Lysandre Debut committed
191
192
You can then add these files to the staging environment and verify that they have been correctly staged with the ``git
status`` command:
Sylvain Gugger's avatar
Sylvain Gugger committed
193

Sylvain Gugger's avatar
Sylvain Gugger committed
194
.. code-block:: bash
Sylvain Gugger's avatar
Sylvain Gugger committed
195

Lysandre Debut's avatar
Lysandre Debut committed
196
197
    git add --all
    git status
Sylvain Gugger's avatar
Sylvain Gugger committed
198

199
Finally, the files should be committed:
Sylvain Gugger's avatar
Sylvain Gugger committed
200

Sylvain Gugger's avatar
Sylvain Gugger committed
201
.. code-block:: bash
Sylvain Gugger's avatar
Sylvain Gugger committed
202

Lysandre Debut's avatar
Lysandre Debut committed
203
    git commit -m "First version of the your-model-name model and tokenizer."
Sylvain Gugger's avatar
Sylvain Gugger committed
204

Lysandre Debut's avatar
Lysandre Debut committed
205
And pushed to the remote:
Sylvain Gugger's avatar
Sylvain Gugger committed
206

Sylvain Gugger's avatar
Sylvain Gugger committed
207
.. code-block:: bash
Sylvain Gugger's avatar
Sylvain Gugger committed
208

Lysandre Debut's avatar
Lysandre Debut committed
209
    git push
Sylvain Gugger's avatar
Sylvain Gugger committed
210

Lysandre Debut's avatar
Lysandre Debut committed
211
This will upload the folder containing the weights, tokenizer and configuration we have just prepared.
Sylvain Gugger's avatar
Sylvain Gugger committed
212
213
214
215
216


Add a model card
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

217
218
219
220
221
To make sure everyone knows what your model can do, what its limitations, potential bias or ethical considerations are,
please add a README.md model card to your model repo. You can just create it, or there's also a convenient button
titled "Add a README.md" on your model page. A model card template can be found `here
<https://github.com/huggingface/model_card>`__ (meta-suggestions are welcome). model card template (meta-suggestions
are welcome).
Sylvain Gugger's avatar
Sylvain Gugger committed
222

223
.. note::
Sylvain Gugger's avatar
Sylvain Gugger committed
224

225
226
    Model cards used to live in the 馃 Transformers repo under `model_cards/`, but for consistency and scalability we
    migrated every model card from the repo to its corresponding huggingface.co model repo.
Sylvain Gugger's avatar
Sylvain Gugger committed
227

228
229
If your model is fine-tuned from another model coming from the model hub (all 馃 Transformers pretrained models do),
don't forget to link to its model card so that people can fully trace how your model was built.
Sylvain Gugger's avatar
Sylvain Gugger committed
230
231
232
233
234
235
236
237
238
239
240


Using your model
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Your model now has a page on huggingface.co/models 馃敟

Anyone can load it from code:

.. code-block::

Lysandre Debut's avatar
Lysandre Debut committed
241
242
    >>> tokenizer = AutoTokenizer.from_pretrained("namespace/awesome-name-you-picked")
    >>> model = AutoModel.from_pretrained("namespace/awesome-name-you-picked")
Sylvain Gugger's avatar
Sylvain Gugger committed
243
244


Lysandre Debut's avatar
Lysandre Debut committed
245
You may specify a revision by using the ``revision`` flag in the ``from_pretrained`` method:
Sylvain Gugger's avatar
Sylvain Gugger committed
246
247
248

.. code-block::

Lysandre Debut's avatar
Lysandre Debut committed
249
250
251
252
    >>> tokenizer = AutoTokenizer.from_pretrained(
    >>>   "julien-c/EsperBERTo-small",
    >>>   revision="v2.0.1" # tag name, or branch name, or commit hash
    >>> )
Sylvain Gugger's avatar
Sylvain Gugger committed
253
254
255
256
257
258
259
260
261
262
263
264
265

Workflow in a Colab notebook
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If you're in a Colab notebook (or similar) with no direct access to a terminal, here is the workflow you can use to
upload your model. You can execute each one of them in a cell by adding a ! at the beginning.

First you need to install `git-lfs` in the environment used by the notebook:

.. code-block:: bash

    sudo apt-get install git-lfs

266
267
Then you can use either create a repo directly from `huggingface.co <https://huggingface.co/>`__ , or use the
:obj:`transformers-cli` to create it:
Sylvain Gugger's avatar
Sylvain Gugger committed
268
269
270
271
272
273
274
275
276
277
278


.. code-block:: bash

    transformers-cli login
    transformers-cli repo create your-model-name

Once it's created, you can clone it and configure it (replace username by your username on huggingface.co):

.. code-block:: bash

279
280
    git lfs install

281
282
283
284
285
    git clone https://username:password@huggingface.co/username/your-model-name
    # Alternatively if you have a token,
    # you can use it instead of your password
    git clone https://username:token@huggingface.co/username/your-model-name

Sylvain Gugger's avatar
Sylvain Gugger committed
286
287
    cd your-model-name
    git config --global user.email "email@example.com"
288
289
    # Tip: using the same email than for your huggingface.co account will link your commits to your profile
    git config --global user.name "Your name"
Sylvain Gugger's avatar
Sylvain Gugger committed
290

291
292
Once you've saved your model inside, and your clone is setup with the right remote URL, you can add it and push it with
usual git commands.
Sylvain Gugger's avatar
Sylvain Gugger committed
293
294
295
296
297

.. code-block:: bash

    git add .
    git commit -m "Initial commit"
298
    git push