model_sharing.rst 10.7 KB
Newer Older
Sylvain Gugger's avatar
Sylvain Gugger committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Model sharing and uploading
=======================================================================================================================

In this page, we will show you how to share a model you have trained or fine-tuned on new data with the community on
the `model hub <https://huggingface.co/models>`__.

.. note::

    You will need to create an account on `huggingface.co <https://huggingface.co/join>`__ for this.

    Optionally, you can join an existing organization or create a new one.

Prepare your model for uploading
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We have seen in the :doc:`training tutorial <training>`: how to fine-tune a model on a given task. You have probably
done something similar on your task, either using the model directly in your own training loop or using the
Sylvain Gugger's avatar
Sylvain Gugger committed
18
19
:class:`~.transformers.Trainer`/:class:`~.transformers.TFTrainer` class. Let's see how you can share the result on the
`model hub <https://huggingface.co/models>`__.
Sylvain Gugger's avatar
Sylvain Gugger committed
20

Lysandre Debut's avatar
Lysandre Debut committed
21
Model versioning
Sylvain Gugger's avatar
Sylvain Gugger committed
22
23
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Lysandre Debut's avatar
Lysandre Debut committed
24
25
26
27
28
29
30
31
32
33
34
Since version v3.5.0, the model hub has built-in model versioning based on git and git-lfs. It is based on the paradigm
that one model *is* one repo.

This allows:

- built-in versioning
- access control
- scalability

This is built around *revisions*, which is a way to pin a specific version of a model, using a commit hash, tag or
branch.
Sylvain Gugger's avatar
Sylvain Gugger committed
35

Lysandre Debut's avatar
Lysandre Debut committed
36
For instance:
Sylvain Gugger's avatar
Sylvain Gugger committed
37
38
39

.. code-block::

Lysandre Debut's avatar
Lysandre Debut committed
40
41
42
43
44
45
46
47
48
49
50
    >>> tokenizer = AutoTokenizer.from_pretrained(
    >>>   "julien-c/EsperBERTo-small",
    >>>   revision="v2.0.1" # tag name, or branch name, or commit hash
    >>> )

Basic steps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In order to upload a model, you'll need to first create a git repo. This repo will live on the model hub, allowing
users to clone it and you (and your organization members) to push to it. First, you should ensure you are logged in the
``transformers-cli``:
Sylvain Gugger's avatar
Sylvain Gugger committed
51

Lysandre Debut's avatar
Lysandre Debut committed
52
53
Go in a terminal and run the following command. It should be in the virtual environment where you installed 馃
Transformers, since that command :obj:`transformers-cli` comes from the library.
Sylvain Gugger's avatar
Sylvain Gugger committed
54

Sylvain Gugger's avatar
Sylvain Gugger committed
55
.. code-block:: bash
Sylvain Gugger's avatar
Sylvain Gugger committed
56

Lysandre Debut's avatar
Lysandre Debut committed
57
58
    transformers-cli login

Sylvain Gugger's avatar
Sylvain Gugger committed
59

Lysandre Debut's avatar
Lysandre Debut committed
60
Once you are logged in with your model hub credentials, you can start building your repositories. To create a repo:
Sylvain Gugger's avatar
Sylvain Gugger committed
61

Sylvain Gugger's avatar
Sylvain Gugger committed
62
.. code-block:: bash
Sylvain Gugger's avatar
Sylvain Gugger committed
63

Lysandre Debut's avatar
Lysandre Debut committed
64
    transformers-cli repo create your-model-name
Sylvain Gugger's avatar
Sylvain Gugger committed
65

Lysandre Debut's avatar
Lysandre Debut committed
66
67
This creates a repo on the model hub, which can be cloned. You can then add/remove from that repo as you would with any
other git repo.
Sylvain Gugger's avatar
Sylvain Gugger committed
68

Sylvain Gugger's avatar
Sylvain Gugger committed
69
.. code-block:: bash
Sylvain Gugger's avatar
Sylvain Gugger committed
70

Lysandre Debut's avatar
Lysandre Debut committed
71
72
73
74
75
76
77
78
79
    git clone https://huggingface.co/username/your-model-name

    # Then commit as usual
    cd your-model-name
    echo "hello" >> README.md
    git add . && git commit -m "Update from $USER"

We are intentionally not wrapping git too much, so as to stay intuitive and easy-to-use.

Sylvain Gugger's avatar
Sylvain Gugger committed
80
81
82
83
84
85
86
87
88

Make your model work on all frameworks
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. 
    TODO Sylvain: make this automatic during the upload

You probably have your favorite framework, but so will other users! That's why it's best to upload your model with both
PyTorch `and` TensorFlow checkpoints to make it easier to use (if you skip this step, users will still be able to load
Sylvain Gugger's avatar
Sylvain Gugger committed
89
90
91
92
93
your model in another framework, but it will be slower, as it will have to be converted on the fly). Don't worry, it's
super easy to do (and in a future version, it will all be automatic). You will need to install both PyTorch and
TensorFlow for this step, but you don't need to worry about the GPU, so it should be very easy. Check the `TensorFlow
installation page <https://www.tensorflow.org/install/pip#tensorflow-2.0-rc-is-available>`__ and/or the `PyTorch
installation page <https://pytorch.org/get-started/locally/#start-locally>`__ to see how.
Sylvain Gugger's avatar
Sylvain Gugger committed
94
95

First check that your model class exists in the other framework, that is try to import the same model by either adding
Sylvain Gugger's avatar
Sylvain Gugger committed
96
or removing TF. For instance, if you trained a :class:`~transformers.DistilBertForSequenceClassification`, try to type
Sylvain Gugger's avatar
Sylvain Gugger committed
97
98
99

.. code-block::

Lysandre Debut's avatar
Lysandre Debut committed
100
    >>> from transformers import TFDistilBertForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
101

Sylvain Gugger's avatar
Sylvain Gugger committed
102
and if you trained a :class:`~transformers.TFDistilBertForSequenceClassification`, try to type
Sylvain Gugger's avatar
Sylvain Gugger committed
103
104
105

.. code-block::

Lysandre Debut's avatar
Lysandre Debut committed
106
    >>> from transformers import DistilBertForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
107
108
109
110
111
112
113
114
115

This will give back an error if your model does not exist in the other framework (something that should be pretty rare
since we're aiming for full parity between the two frameworks). In this case, skip this and go to the next step.

Now, if you trained your model in PyTorch and have to create a TensorFlow version, adapt the following code to your
model class:

.. code-block::

Lysandre Debut's avatar
Lysandre Debut committed
116
117
    >>> tf_model = TFDistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_pt=True)
    >>> tf_model.save_pretrained("path/to/awesome-name-you-picked")
Sylvain Gugger's avatar
Sylvain Gugger committed
118
119
120
121
122
123

and if you trained your model in TensorFlow and have to create a PyTorch version, adapt the following code to your
model class:

.. code-block::

Lysandre Debut's avatar
Lysandre Debut committed
124
125
    >>> pt_model = DistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_tf=True)
    >>> pt_model.save_pretrained("path/to/awesome-name-you-picked")
Sylvain Gugger's avatar
Sylvain Gugger committed
126
127
128

That's all there is to it!

Lysandre Debut's avatar
Lysandre Debut committed
129
Check the directory before pushing to the model hub.
Sylvain Gugger's avatar
Sylvain Gugger committed
130
131
132
133
134
135
136
137
138
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Make sure there are no garbage files in the directory you'll upload. It should only have:

- a `config.json` file, which saves the :doc:`configuration <main_classes/configuration>` of your model ;
- a `pytorch_model.bin` file, which is the PyTorch checkpoint (unless you can't have it for some reason) ;
- a `tf_model.h5` file, which is the TensorFlow checkpoint (unless you can't have it for some reason) ;
- a `special_tokens_map.json`, which is part of your :doc:`tokenizer <main_classes/tokenizer>` save;
- a `tokenizer_config.json`, which is part of your :doc:`tokenizer <main_classes/tokenizer>` save;
Sylvain Gugger's avatar
Sylvain Gugger committed
139
140
- files named `vocab.json`, `vocab.txt`, `merges.txt`, or similar, which contain the vocabulary of your tokenizer, part
  of your :doc:`tokenizer <main_classes/tokenizer>` save;
Sylvain Gugger's avatar
Sylvain Gugger committed
141
142
143
144
145
- maybe a `added_tokens.json`, which is part of your :doc:`tokenizer <main_classes/tokenizer>` save.

Other files can safely be deleted.


Lysandre Debut's avatar
Lysandre Debut committed
146
147
Uploading your files
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
148

Lysandre Debut's avatar
Lysandre Debut committed
149
150
Once the repo is cloned, you can add the model, configuration and tokenizer files. For instance, saving the model and
tokenizer files:
Sylvain Gugger's avatar
Sylvain Gugger committed
151
152
153

.. code-block::

Lysandre Debut's avatar
Lysandre Debut committed
154
155
    >>> model.save_pretrained("path/to/repo/clone/your-model-name")
    >>> tokenizer.save_pretrained("path/to/repo/clone/your-model-name")
Sylvain Gugger's avatar
Sylvain Gugger committed
156

Lysandre Debut's avatar
Lysandre Debut committed
157
Or, if you're using the Trainer API
Sylvain Gugger's avatar
Sylvain Gugger committed
158
159
160

.. code-block::

Lysandre Debut's avatar
Lysandre Debut committed
161
    >>> trainer.save_model("path/to/awesome-name-you-picked")
Sylvain Gugger's avatar
Sylvain Gugger committed
162
    >>> tokenizer.save_pretrained("path/to/repo/clone/your-model-name")
Sylvain Gugger's avatar
Sylvain Gugger committed
163

Lysandre Debut's avatar
Lysandre Debut committed
164
165
You can then add these files to the staging environment and verify that they have been correctly staged with the ``git
status`` command:
Sylvain Gugger's avatar
Sylvain Gugger committed
166

Sylvain Gugger's avatar
Sylvain Gugger committed
167
.. code-block:: bash
Sylvain Gugger's avatar
Sylvain Gugger committed
168

Lysandre Debut's avatar
Lysandre Debut committed
169
170
    git add --all
    git status
Sylvain Gugger's avatar
Sylvain Gugger committed
171

Lysandre Debut's avatar
Lysandre Debut committed
172
Finally, the files should be comitted:
Sylvain Gugger's avatar
Sylvain Gugger committed
173

Sylvain Gugger's avatar
Sylvain Gugger committed
174
.. code-block:: bash
Sylvain Gugger's avatar
Sylvain Gugger committed
175

Lysandre Debut's avatar
Lysandre Debut committed
176
    git commit -m "First version of the your-model-name model and tokenizer."
Sylvain Gugger's avatar
Sylvain Gugger committed
177

Lysandre Debut's avatar
Lysandre Debut committed
178
And pushed to the remote:
Sylvain Gugger's avatar
Sylvain Gugger committed
179

Sylvain Gugger's avatar
Sylvain Gugger committed
180
.. code-block:: bash
Sylvain Gugger's avatar
Sylvain Gugger committed
181

Lysandre Debut's avatar
Lysandre Debut committed
182
    git push
Sylvain Gugger's avatar
Sylvain Gugger committed
183

Lysandre Debut's avatar
Lysandre Debut committed
184
This will upload the folder containing the weights, tokenizer and configuration we have just prepared.
Sylvain Gugger's avatar
Sylvain Gugger committed
185
186
187
188
189
190
191
192


Add a model card
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

To make sure everyone knows what your model can do, what its limitations and potential bias or ethetical
considerations, please add a README.md model card to the 馃 Transformers repo under `model_cards/`. It should then be
placed in a subfolder with your username or organization, then another subfolder named like your model
Sylvain Gugger's avatar
Sylvain Gugger committed
193
194
195
(`awesome-name-you-picked`). Or just click on the "Create a model card on GitHub" button on the model page, it will get
you directly to the right location. If you need one, `here <https://github.com/huggingface/model_card>`__ is a model
card template (meta-suggestions are welcome).
Sylvain Gugger's avatar
Sylvain Gugger committed
196
197
198
199

If your model is fine-tuned from another model coming from the model hub (all 馃 Transformers pretrained models do),
don't forget to link to its model card so that people can fully trace how your model was built.

Sylvain Gugger's avatar
Sylvain Gugger committed
200
201
If you have never made a pull request to the 馃 Transformers repo, look at the :doc:`contributing guide <contributing>`
to see the steps to follow.
Sylvain Gugger's avatar
Sylvain Gugger committed
202

Sylvain Gugger's avatar
Sylvain Gugger committed
203
.. note::
Sylvain Gugger's avatar
Sylvain Gugger committed
204
205
206
207
208
209
210
211
212
213
214
215
216

    You can also send your model card in the folder you uploaded with the CLI by placing it in a `README.md` file
    inside `path/to/awesome-name-you-picked/`.

Using your model
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Your model now has a page on huggingface.co/models 馃敟

Anyone can load it from code:

.. code-block::

Lysandre Debut's avatar
Lysandre Debut committed
217
218
    >>> tokenizer = AutoTokenizer.from_pretrained("namespace/awesome-name-you-picked")
    >>> model = AutoModel.from_pretrained("namespace/awesome-name-you-picked")
Sylvain Gugger's avatar
Sylvain Gugger committed
219
220


Lysandre Debut's avatar
Lysandre Debut committed
221
You may specify a revision by using the ``revision`` flag in the ``from_pretrained`` method:
Sylvain Gugger's avatar
Sylvain Gugger committed
222
223
224

.. code-block::

Lysandre Debut's avatar
Lysandre Debut committed
225
226
227
228
    >>> tokenizer = AutoTokenizer.from_pretrained(
    >>>   "julien-c/EsperBERTo-small",
    >>>   revision="v2.0.1" # tag name, or branch name, or commit hash
    >>> )
Sylvain Gugger's avatar
Sylvain Gugger committed
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266

Workflow in a Colab notebook
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If you're in a Colab notebook (or similar) with no direct access to a terminal, here is the workflow you can use to
upload your model. You can execute each one of them in a cell by adding a ! at the beginning.

First you need to install `git-lfs` in the environment used by the notebook:

.. code-block:: bash

    sudo apt-get install git-lfs

Then you can use the :obj:`transformers-cli` to create your new repo:


.. code-block:: bash

    transformers-cli login
    transformers-cli repo create your-model-name

Once it's created, you can clone it and configure it (replace username by your username on huggingface.co):

.. code-block:: bash

    git clone https://huggingface.co/username/your-model-name
    cd your-model-name
    git lfs install
    git config --global user.email "email@example.com"

Once you've saved your model inside, you can add it and push it with usual git commands. Note that you have to replace
`username:password` with your username and password to huggingface.co.

.. code-block:: bash

    git add .
    git commit -m "Initial commit"
    git push https://username:password@huggingface.co/username/your-model-name