model_sharing.rst 16.2 KB
Newer Older
Sylvain Gugger's avatar
Sylvain Gugger committed
1
2
3
4
5
6
7
8
9
10
11
12
.. 
    Copyright 2020 The HuggingFace Team. All rights reserved.

    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
    the License. You may obtain a copy of the License at

        http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
    an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
    specific language governing permissions and limitations under the License.

Sylvain Gugger's avatar
Sylvain Gugger committed
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Model sharing and uploading
=======================================================================================================================

In this page, we will show you how to share a model you have trained or fine-tuned on new data with the community on
the `model hub <https://huggingface.co/models>`__.

.. note::

    You will need to create an account on `huggingface.co <https://huggingface.co/join>`__ for this.

    Optionally, you can join an existing organization or create a new one.


We have seen in the :doc:`training tutorial <training>`: how to fine-tune a model on a given task. You have probably
done something similar on your task, either using the model directly in your own training loop or using the
Sylvain Gugger's avatar
Sylvain Gugger committed
28
29
:class:`~.transformers.Trainer`/:class:`~.transformers.TFTrainer` class. Let's see how you can share the result on the
`model hub <https://huggingface.co/models>`__.
Sylvain Gugger's avatar
Sylvain Gugger committed
30

Lysandre Debut's avatar
Lysandre Debut committed
31
Model versioning
Sylvain Gugger's avatar
Sylvain Gugger committed
32
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
33

Lysandre Debut's avatar
Lysandre Debut committed
34
35
36
37
38
39
40
41
42
43
44
Since version v3.5.0, the model hub has built-in model versioning based on git and git-lfs. It is based on the paradigm
that one model *is* one repo.

This allows:

- built-in versioning
- access control
- scalability

This is built around *revisions*, which is a way to pin a specific version of a model, using a commit hash, tag or
branch.
Sylvain Gugger's avatar
Sylvain Gugger committed
45

Lysandre Debut's avatar
Lysandre Debut committed
46
For instance:
Sylvain Gugger's avatar
Sylvain Gugger committed
47
48
49

.. code-block::

50
    >>> model = AutoModel.from_pretrained(
Lysandre Debut's avatar
Lysandre Debut committed
51
52
53
54
    >>>   "julien-c/EsperBERTo-small",
    >>>   revision="v2.0.1" # tag name, or branch name, or commit hash
    >>> )

Sylvain Gugger's avatar
Sylvain Gugger committed
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154

Push your model from Python
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Preparation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The first step is to make sure your credentials to the hub are stored somewhere. This can be done in two ways. If you
have access to a terminal, you cam just run the following command in the virtual environment where you installed 馃
Transformers:

.. code-block:: bash

    transformers-cli login

It will store your access token in the Hugging Face cache folder (by default :obj:`~/.cache/`).

If you don't have an easy access to a terminal (for instance in a Colab session), you can find a token linked to your
acount by going on `huggingface.co <https://huggingface.co/>`, click on your avatar on the top left corner, then on
`Edit profile` on the left, just beneath your profile picture. In the submenu `API Tokens`, you will find your API
token that you can just copy.

Directly push your model to the hub
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Once you have an API token (either stored in the cache or copied and pasted in your notebook), you can directly push a
finetuned model you saved in :obj:`save_drectory` by calling:

.. code-block:: python

    finetuned_model.push_to_hub("my-awesome-model")

If you have your API token not stored in the cache, you will need to pass it with :obj:`use_auth_token=your_token`.
This is also be the case for all the examples below, so we won't mention it again.

This will create a repository in your namespace name :obj:`my-awesome-model`, so anyone can now run:

.. code-block:: python

    from transformers import AutoModel

    model = AutoModel.from_pretrained("your_username/my-awesome-model")

Even better, you can combine this push to the hub with the call to :obj:`save_pretrained`:

.. code-block:: python

    finetuned_model.save_pretrained(save_directory, push_to_hub=True, repo_name="my-awesome-model")

If you are a premium user and want your model to be private, just add :obj:`private=True` to this call.

If you are a member of an organization and want to push it inside the namespace of the organization instead of yours,
just add :obj:`organization=my_amazing_org`.

Add new files to your model repo
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Once you have pushed your model to the hub, you might want to add the tokenizer, or a version of your model for another
framework (TensorFlow, PyTorch, Flax). This is super easy to do! Let's begin with the tokenizer. You can add it to the
repo you created before like this

.. code-block:: python

    tokenizer.push_to_hub("my-awesome-model")

If you know its URL (it should be :obj:`https://huggingface.co/username/repo_name`), you can also do:

.. code-block:: python

    tokenizer.push_to_hub(repo_url=my_repo_url)

And that's all there is to it! It's also a very easy way to fix a mistake if one of the files online had a bug.

To add a model for another backend, it's also super easy. Let's say you have fine-tuned a TensorFlow model and want to
add the pytorch model files to your model repo, so that anyone in the community can use it. The following allows you to
directly create a PyTorch version of your TensorFlow model:

.. code-block:: python

    from transfomers import AutoModel

    model = AutoModel.from_pretrained(save_directory, from_tf=True)

You can also replace :obj:`save_directory` by the identifier of your model (:obj:`username/repo_name`) if you don't
have a local save of it anymore. Then, just do the same as before:

.. code-block:: python

    model.push_to_hub("my-awesome-model")

or

.. code-block:: python

    model.push_to_hub(repo_url=my_repo_url)


Use your terminal and git
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Lysandre Debut's avatar
Lysandre Debut committed
155
156
157
158
Basic steps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In order to upload a model, you'll need to first create a git repo. This repo will live on the model hub, allowing
159
160
users to clone it and you (and your organization members) to push to it.

161
You can create a model repo directly from `the /new page on the website <https://huggingface.co/new>`__.
162
163

Alternatively, you can use the ``transformers-cli``. The next steps describe that process:
Sylvain Gugger's avatar
Sylvain Gugger committed
164

165
Go to a terminal and run the following command. It should be in the virtual environment where you installed 馃
Lysandre Debut's avatar
Lysandre Debut committed
166
Transformers, since that command :obj:`transformers-cli` comes from the library.
Sylvain Gugger's avatar
Sylvain Gugger committed
167

Sylvain Gugger's avatar
Sylvain Gugger committed
168
.. code-block:: bash
Sylvain Gugger's avatar
Sylvain Gugger committed
169

Lysandre Debut's avatar
Lysandre Debut committed
170
171
    transformers-cli login

Sylvain Gugger's avatar
Sylvain Gugger committed
172

Lysandre Debut's avatar
Lysandre Debut committed
173
Once you are logged in with your model hub credentials, you can start building your repositories. To create a repo:
Sylvain Gugger's avatar
Sylvain Gugger committed
174

Sylvain Gugger's avatar
Sylvain Gugger committed
175
.. code-block:: bash
Sylvain Gugger's avatar
Sylvain Gugger committed
176

Lysandre Debut's avatar
Lysandre Debut committed
177
    transformers-cli repo create your-model-name
Sylvain Gugger's avatar
Sylvain Gugger committed
178

Patrick von Platen's avatar
Patrick von Platen committed
179
180
181
182
183
184
185
If you want to create a repo under a specific organization, you should add a `--organization` flag:

.. code-block:: bash

    transformers-cli repo create your-model-name --organization your-org-name

This creates a repo on the model hub, which can be cloned.
Sylvain Gugger's avatar
Sylvain Gugger committed
186

Sylvain Gugger's avatar
Sylvain Gugger committed
187
.. code-block:: bash
Sylvain Gugger's avatar
Sylvain Gugger committed
188

189
190
191
192
    # Make sure you have git-lfs installed
    # (https://git-lfs.github.com/)
    git lfs install

193
194
    git clone https://huggingface.co/username/your-model-name

195
196
197
198
199
200
When you have your local clone of your repo and lfs installed, you can then add/remove from that clone as you would
with any other git repo.

.. code-block:: bash

    # Commit as usual
Lysandre Debut's avatar
Lysandre Debut committed
201
202
203
204
    cd your-model-name
    echo "hello" >> README.md
    git add . && git commit -m "Update from $USER"

Patrick von Platen's avatar
Patrick von Platen committed
205
206
We are intentionally not wrapping git too much, so that you can go on with the workflow you're used to and the tools
you already know.
Lysandre Debut's avatar
Lysandre Debut committed
207

208
209
210
The only learning curve you might have compared to regular git is the one for git-lfs. The documentation at
`git-lfs.github.com <https://git-lfs.github.com/>`__ is decent, but we'll work on a tutorial with some tips and tricks
in the coming weeks!
Sylvain Gugger's avatar
Sylvain Gugger committed
211

Patrick von Platen's avatar
Patrick von Platen committed
212
213
214
Additionally, if you want to change multiple repos at once, the `change_config.py script
<https://github.com/huggingface/efficient_scripts/blob/main/change_config.py>`__ can probably save you some time.

Sylvain Gugger's avatar
Sylvain Gugger committed
215
216
217
218
219
220
221
222
Make your model work on all frameworks
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. 
    TODO Sylvain: make this automatic during the upload

You probably have your favorite framework, but so will other users! That's why it's best to upload your model with both
PyTorch `and` TensorFlow checkpoints to make it easier to use (if you skip this step, users will still be able to load
Sylvain Gugger's avatar
Sylvain Gugger committed
223
your model in another framework, but it will be slower, as it will have to be converted on the fly). Don't worry, it's
224
super easy to do (and in a future version, it might all be automatic). You will need to install both PyTorch and
Sylvain Gugger's avatar
Sylvain Gugger committed
225
226
227
TensorFlow for this step, but you don't need to worry about the GPU, so it should be very easy. Check the `TensorFlow
installation page <https://www.tensorflow.org/install/pip#tensorflow-2.0-rc-is-available>`__ and/or the `PyTorch
installation page <https://pytorch.org/get-started/locally/#start-locally>`__ to see how.
Sylvain Gugger's avatar
Sylvain Gugger committed
228
229

First check that your model class exists in the other framework, that is try to import the same model by either adding
Sylvain Gugger's avatar
Sylvain Gugger committed
230
or removing TF. For instance, if you trained a :class:`~transformers.DistilBertForSequenceClassification`, try to type
Sylvain Gugger's avatar
Sylvain Gugger committed
231
232
233

.. code-block::

Lysandre Debut's avatar
Lysandre Debut committed
234
    >>> from transformers import TFDistilBertForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
235

Sylvain Gugger's avatar
Sylvain Gugger committed
236
and if you trained a :class:`~transformers.TFDistilBertForSequenceClassification`, try to type
Sylvain Gugger's avatar
Sylvain Gugger committed
237
238
239

.. code-block::

Lysandre Debut's avatar
Lysandre Debut committed
240
    >>> from transformers import DistilBertForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
241
242
243
244
245
246
247
248
249

This will give back an error if your model does not exist in the other framework (something that should be pretty rare
since we're aiming for full parity between the two frameworks). In this case, skip this and go to the next step.

Now, if you trained your model in PyTorch and have to create a TensorFlow version, adapt the following code to your
model class:

.. code-block::

Lysandre Debut's avatar
Lysandre Debut committed
250
251
    >>> tf_model = TFDistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_pt=True)
    >>> tf_model.save_pretrained("path/to/awesome-name-you-picked")
Sylvain Gugger's avatar
Sylvain Gugger committed
252
253
254
255
256
257

and if you trained your model in TensorFlow and have to create a PyTorch version, adapt the following code to your
model class:

.. code-block::

Lysandre Debut's avatar
Lysandre Debut committed
258
259
    >>> pt_model = DistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_tf=True)
    >>> pt_model.save_pretrained("path/to/awesome-name-you-picked")
Sylvain Gugger's avatar
Sylvain Gugger committed
260
261
262

That's all there is to it!

Lysandre Debut's avatar
Lysandre Debut committed
263
Check the directory before pushing to the model hub.
Sylvain Gugger's avatar
Sylvain Gugger committed
264
265
266
267
268
269
270
271
272
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Make sure there are no garbage files in the directory you'll upload. It should only have:

- a `config.json` file, which saves the :doc:`configuration <main_classes/configuration>` of your model ;
- a `pytorch_model.bin` file, which is the PyTorch checkpoint (unless you can't have it for some reason) ;
- a `tf_model.h5` file, which is the TensorFlow checkpoint (unless you can't have it for some reason) ;
- a `special_tokens_map.json`, which is part of your :doc:`tokenizer <main_classes/tokenizer>` save;
- a `tokenizer_config.json`, which is part of your :doc:`tokenizer <main_classes/tokenizer>` save;
Sylvain Gugger's avatar
Sylvain Gugger committed
273
274
- files named `vocab.json`, `vocab.txt`, `merges.txt`, or similar, which contain the vocabulary of your tokenizer, part
  of your :doc:`tokenizer <main_classes/tokenizer>` save;
Sylvain Gugger's avatar
Sylvain Gugger committed
275
276
277
278
279
- maybe a `added_tokens.json`, which is part of your :doc:`tokenizer <main_classes/tokenizer>` save.

Other files can safely be deleted.


Lysandre Debut's avatar
Lysandre Debut committed
280
281
Uploading your files
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
282

Lysandre Debut's avatar
Lysandre Debut committed
283
284
Once the repo is cloned, you can add the model, configuration and tokenizer files. For instance, saving the model and
tokenizer files:
Sylvain Gugger's avatar
Sylvain Gugger committed
285
286
287

.. code-block::

Lysandre Debut's avatar
Lysandre Debut committed
288
289
    >>> model.save_pretrained("path/to/repo/clone/your-model-name")
    >>> tokenizer.save_pretrained("path/to/repo/clone/your-model-name")
Sylvain Gugger's avatar
Sylvain Gugger committed
290

Lysandre Debut's avatar
Lysandre Debut committed
291
Or, if you're using the Trainer API
Sylvain Gugger's avatar
Sylvain Gugger committed
292
293
294

.. code-block::

Lysandre Debut's avatar
Lysandre Debut committed
295
    >>> trainer.save_model("path/to/awesome-name-you-picked")
Sylvain Gugger's avatar
Sylvain Gugger committed
296
    >>> tokenizer.save_pretrained("path/to/repo/clone/your-model-name")
Sylvain Gugger's avatar
Sylvain Gugger committed
297

Lysandre Debut's avatar
Lysandre Debut committed
298
299
You can then add these files to the staging environment and verify that they have been correctly staged with the ``git
status`` command:
Sylvain Gugger's avatar
Sylvain Gugger committed
300

Sylvain Gugger's avatar
Sylvain Gugger committed
301
.. code-block:: bash
Sylvain Gugger's avatar
Sylvain Gugger committed
302

Lysandre Debut's avatar
Lysandre Debut committed
303
304
    git add --all
    git status
Sylvain Gugger's avatar
Sylvain Gugger committed
305

306
Finally, the files should be committed:
Sylvain Gugger's avatar
Sylvain Gugger committed
307

Sylvain Gugger's avatar
Sylvain Gugger committed
308
.. code-block:: bash
Sylvain Gugger's avatar
Sylvain Gugger committed
309

Lysandre Debut's avatar
Lysandre Debut committed
310
    git commit -m "First version of the your-model-name model and tokenizer."
Sylvain Gugger's avatar
Sylvain Gugger committed
311

Lysandre Debut's avatar
Lysandre Debut committed
312
And pushed to the remote:
Sylvain Gugger's avatar
Sylvain Gugger committed
313

Sylvain Gugger's avatar
Sylvain Gugger committed
314
.. code-block:: bash
Sylvain Gugger's avatar
Sylvain Gugger committed
315

Lysandre Debut's avatar
Lysandre Debut committed
316
    git push
Sylvain Gugger's avatar
Sylvain Gugger committed
317

Lysandre Debut's avatar
Lysandre Debut committed
318
This will upload the folder containing the weights, tokenizer and configuration we have just prepared.
Sylvain Gugger's avatar
Sylvain Gugger committed
319
320
321
322
323


Add a model card
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

324
325
326
327
328
To make sure everyone knows what your model can do, what its limitations, potential bias or ethical considerations are,
please add a README.md model card to your model repo. You can just create it, or there's also a convenient button
titled "Add a README.md" on your model page. A model card template can be found `here
<https://github.com/huggingface/model_card>`__ (meta-suggestions are welcome). model card template (meta-suggestions
are welcome).
Sylvain Gugger's avatar
Sylvain Gugger committed
329

330
.. note::
Sylvain Gugger's avatar
Sylvain Gugger committed
331

332
333
    Model cards used to live in the 馃 Transformers repo under `model_cards/`, but for consistency and scalability we
    migrated every model card from the repo to its corresponding huggingface.co model repo.
Sylvain Gugger's avatar
Sylvain Gugger committed
334

335
336
If your model is fine-tuned from another model coming from the model hub (all 馃 Transformers pretrained models do),
don't forget to link to its model card so that people can fully trace how your model was built.
Sylvain Gugger's avatar
Sylvain Gugger committed
337
338
339
340
341
342
343
344
345
346
347


Using your model
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Your model now has a page on huggingface.co/models 馃敟

Anyone can load it from code:

.. code-block::

Lysandre Debut's avatar
Lysandre Debut committed
348
349
    >>> tokenizer = AutoTokenizer.from_pretrained("namespace/awesome-name-you-picked")
    >>> model = AutoModel.from_pretrained("namespace/awesome-name-you-picked")
Sylvain Gugger's avatar
Sylvain Gugger committed
350
351


Lysandre Debut's avatar
Lysandre Debut committed
352
You may specify a revision by using the ``revision`` flag in the ``from_pretrained`` method:
Sylvain Gugger's avatar
Sylvain Gugger committed
353
354
355

.. code-block::

Lysandre Debut's avatar
Lysandre Debut committed
356
357
358
359
    >>> tokenizer = AutoTokenizer.from_pretrained(
    >>>   "julien-c/EsperBERTo-small",
    >>>   revision="v2.0.1" # tag name, or branch name, or commit hash
    >>> )
Sylvain Gugger's avatar
Sylvain Gugger committed
360
361
362
363
364
365
366
367
368
369
370
371
372

Workflow in a Colab notebook
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If you're in a Colab notebook (or similar) with no direct access to a terminal, here is the workflow you can use to
upload your model. You can execute each one of them in a cell by adding a ! at the beginning.

First you need to install `git-lfs` in the environment used by the notebook:

.. code-block:: bash

    sudo apt-get install git-lfs

373
374
Then you can use either create a repo directly from `huggingface.co <https://huggingface.co/>`__ , or use the
:obj:`transformers-cli` to create it:
Sylvain Gugger's avatar
Sylvain Gugger committed
375
376
377
378
379
380
381
382
383
384
385


.. code-block:: bash

    transformers-cli login
    transformers-cli repo create your-model-name

Once it's created, you can clone it and configure it (replace username by your username on huggingface.co):

.. code-block:: bash

386
387
    git lfs install

388
389
390
391
392
    git clone https://username:password@huggingface.co/username/your-model-name
    # Alternatively if you have a token,
    # you can use it instead of your password
    git clone https://username:token@huggingface.co/username/your-model-name

Sylvain Gugger's avatar
Sylvain Gugger committed
393
394
    cd your-model-name
    git config --global user.email "email@example.com"
395
396
    # Tip: using the same email than for your huggingface.co account will link your commits to your profile
    git config --global user.name "Your name"
Sylvain Gugger's avatar
Sylvain Gugger committed
397

398
399
Once you've saved your model inside, and your clone is setup with the right remote URL, you can add it and push it with
usual git commands.
Sylvain Gugger's avatar
Sylvain Gugger committed
400
401
402
403
404

.. code-block:: bash

    git add .
    git commit -m "Initial commit"
405
    git push