model_sharing.rst 9.26 KB
Newer Older
Sylvain Gugger's avatar
Sylvain Gugger committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Model sharing and uploading
=======================================================================================================================

In this page, we will show you how to share a model you have trained or fine-tuned on new data with the community on
the `model hub <https://huggingface.co/models>`__.

.. note::

    You will need to create an account on `huggingface.co <https://huggingface.co/join>`__ for this.

    Optionally, you can join an existing organization or create a new one.

Prepare your model for uploading
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We have seen in the :doc:`training tutorial <training>`: how to fine-tune a model on a given task. You have probably
done something similar on your task, either using the model directly in your own training loop or using the
Sylvain Gugger's avatar
Sylvain Gugger committed
18
19
:class:`~.transformers.Trainer`/:class:`~.transformers.TFTrainer` class. Let's see how you can share the result on the
`model hub <https://huggingface.co/models>`__.
Sylvain Gugger's avatar
Sylvain Gugger committed
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62

Basic steps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. 
    When #5258 is merged, we can remove the need to create the directory.

First, pick a directory with the name you want your model to have on the model hub (its full name will then be
`username/awesome-name-you-picked` or `organization/awesome-name-you-picked`) and create it with either

.. code-block::

    mkdir path/to/awesome-name-you-picked

or in python

.. code-block::

    import os
    os.makedirs("path/to/awesome-name-you-picked")

then you can save your model and tokenizer with:

.. code-block::

    model.save_pretrained("path/to/awesome-name-you-picked")
    tokenizer.save_pretrained("path/to/awesome-name-you-picked")

Or, if you're using the Trainer API

.. code-block::

    trainer.save_model("path/to/awesome-name-you-picked")
    tokenizer.save_pretrained("path/to/awesome-name-you-picked")

Make your model work on all frameworks
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. 
    TODO Sylvain: make this automatic during the upload

You probably have your favorite framework, but so will other users! That's why it's best to upload your model with both
PyTorch `and` TensorFlow checkpoints to make it easier to use (if you skip this step, users will still be able to load
Sylvain Gugger's avatar
Sylvain Gugger committed
63
64
65
66
67
your model in another framework, but it will be slower, as it will have to be converted on the fly). Don't worry, it's
super easy to do (and in a future version, it will all be automatic). You will need to install both PyTorch and
TensorFlow for this step, but you don't need to worry about the GPU, so it should be very easy. Check the `TensorFlow
installation page <https://www.tensorflow.org/install/pip#tensorflow-2.0-rc-is-available>`__ and/or the `PyTorch
installation page <https://pytorch.org/get-started/locally/#start-locally>`__ to see how.
Sylvain Gugger's avatar
Sylvain Gugger committed
68
69

First check that your model class exists in the other framework, that is try to import the same model by either adding
Sylvain Gugger's avatar
Sylvain Gugger committed
70
or removing TF. For instance, if you trained a :class:`~transformers.DistilBertForSequenceClassification`, try to type
Sylvain Gugger's avatar
Sylvain Gugger committed
71
72
73
74
75

.. code-block::

    from transformers import TFDistilBertForSequenceClassification

Sylvain Gugger's avatar
Sylvain Gugger committed
76
and if you trained a :class:`~transformers.TFDistilBertForSequenceClassification`, try to type
Sylvain Gugger's avatar
Sylvain Gugger committed
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112

.. code-block::

    from transformers import DistilBertForSequenceClassification

This will give back an error if your model does not exist in the other framework (something that should be pretty rare
since we're aiming for full parity between the two frameworks). In this case, skip this and go to the next step.

Now, if you trained your model in PyTorch and have to create a TensorFlow version, adapt the following code to your
model class:

.. code-block::

    tf_model = TFDistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_pt=True)
    tf_model.save_pretrained("path/to/awesome-name-you-picked")

and if you trained your model in TensorFlow and have to create a PyTorch version, adapt the following code to your
model class:

.. code-block::

    pt_model = DistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_tf=True)
    pt_model.save_pretrained("path/to/awesome-name-you-picked")

That's all there is to it!

Check the directory before uploading
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Make sure there are no garbage files in the directory you'll upload. It should only have:

- a `config.json` file, which saves the :doc:`configuration <main_classes/configuration>` of your model ;
- a `pytorch_model.bin` file, which is the PyTorch checkpoint (unless you can't have it for some reason) ;
- a `tf_model.h5` file, which is the TensorFlow checkpoint (unless you can't have it for some reason) ;
- a `special_tokens_map.json`, which is part of your :doc:`tokenizer <main_classes/tokenizer>` save;
- a `tokenizer_config.json`, which is part of your :doc:`tokenizer <main_classes/tokenizer>` save;
Sylvain Gugger's avatar
Sylvain Gugger committed
113
114
- files named `vocab.json`, `vocab.txt`, `merges.txt`, or similar, which contain the vocabulary of your tokenizer, part
  of your :doc:`tokenizer <main_classes/tokenizer>` save;
Sylvain Gugger's avatar
Sylvain Gugger committed
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
- maybe a `added_tokens.json`, which is part of your :doc:`tokenizer <main_classes/tokenizer>` save.

Other files can safely be deleted.

Upload your model with the CLI
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Now go in a terminal and run the following command. It should be in the virtual enviromnent where you installed 馃
Transformers, since that command :obj:`transformers-cli` comes from the library.

.. code-block::

    transformers-cli login

Then log in using the same credentials as on huggingface.co. To upload your model, just type

.. code-block::

    transformers-cli upload path/to/awesome-name-you-picked/

This will upload the folder containing the weights, tokenizer and configuration we prepared in the previous section.

Sylvain Gugger's avatar
Sylvain Gugger committed
137
138
By default you will be prompted to confirm that you want these files to be uploaded. If you are uploading multiple
models and need to script that process, you can add `-y` to bypass the prompt. For example:
Sylvain Gugger's avatar
Sylvain Gugger committed
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181

.. code-block::

    transformers-cli upload -y path/to/awesome-name-you-picked/


If you want to upload a single file (a new version of your model, or the other framework checkpoint you want to add),
just type:

.. code-block::

    transformers-cli upload path/to/awesome-name-you-picked/that-file 

or

.. code-block::

   transformers-cli upload path/to/awesome-name-you-picked/that-file --filename awesome-name-you-picked/new_name

if you want to change its filename.

This uploads the model to your personal account. If you want your model to be namespaced by your organization name
rather than your username, add the following flag to any command:

.. code-block::

    --organization organization_name

so for instance:

.. code-block::

    transformers-cli upload path/to/awesome-name-you-picked/ --organization organization_name

Your model will then be accessible through its identifier, which is, as we saw above,
`username/awesome-name-you-picked` or `organization/awesome-name-you-picked`.

Add a model card
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

To make sure everyone knows what your model can do, what its limitations and potential bias or ethetical
considerations, please add a README.md model card to the 馃 Transformers repo under `model_cards/`. It should then be
placed in a subfolder with your username or organization, then another subfolder named like your model
Sylvain Gugger's avatar
Sylvain Gugger committed
182
183
184
(`awesome-name-you-picked`). Or just click on the "Create a model card on GitHub" button on the model page, it will get
you directly to the right location. If you need one, `here <https://github.com/huggingface/model_card>`__ is a model
card template (meta-suggestions are welcome).
Sylvain Gugger's avatar
Sylvain Gugger committed
185
186
187
188

If your model is fine-tuned from another model coming from the model hub (all 馃 Transformers pretrained models do),
don't forget to link to its model card so that people can fully trace how your model was built.

Sylvain Gugger's avatar
Sylvain Gugger committed
189
190
If you have never made a pull request to the 馃 Transformers repo, look at the :doc:`contributing guide <contributing>`
to see the steps to follow.
Sylvain Gugger's avatar
Sylvain Gugger committed
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222

.. Note::

    You can also send your model card in the folder you uploaded with the CLI by placing it in a `README.md` file
    inside `path/to/awesome-name-you-picked/`.

Using your model
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Your model now has a page on huggingface.co/models 馃敟

Anyone can load it from code:

.. code-block::

    tokenizer = AutoTokenizer.from_pretrained("namespace/awesome-name-you-picked")
    model = AutoModel.from_pretrained("namespace/awesome-name-you-picked")

Additional commands
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

You can list all the files you uploaded on the hub like this:

.. code-block::

    transformers-cli s3 ls

You can also delete unneeded files with

.. code-block::

    transformers-cli s3 rm awesome-name-you-picked/filename