CONTRIBUTING.md 13.4 KB
Newer Older
Sylvain Gugger's avatar
Sylvain Gugger committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<!---
Copyright 2020 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

17
18
19
# How to contribute to transformers?

Everyone is welcome to contribute, and we value everybody's contribution. Code
20
21
22
is thus not the only way to help the community. Answering questions, helping
others, reaching out and improving the documentations are immensely valuable to
the community.
23
24

It also helps us if you spread the word: reference the library from blog posts
25
on the awesome projects it made possible, shout out on Twitter every time it has
26
27
helped you, or simply star the repo to say "thank you".

Sylvain Gugger's avatar
Sylvain Gugger committed
28
29
30
Whichever way you choose to contribute, please be mindful to respect our
[code of conduct](https://github.com/huggingface/transformers/blob/master/CODE_OF_CONDUCT.md).

31
32
33
34
35
## You can contribute in so many ways!

There are 4 ways you can contribute to transformers:
* Fixing outstanding issues with the existing code;
* Implementing new models;
36
* Contributing to the examples or to the documentation;
37
38
39
40
41
42
43
44
45
46
47
48
49
* Submitting issues related to bugs or desired new features.

*All are equally valuable to the community.*

## Submitting a new issue or feature request

Do your best to follow these guidelines when submitting an issue or a feature
request. It will make it easier for us to come back to you quickly and with good
feedback.

### Did you find a bug?

The transformers are robust and reliable thanks to the users who notify us of
50
the problems they encounter. So thank you for reporting an issue.
51

52
53
First, we would really appreciate it if you could **make sure the bug was not
already reported** (use the search bar on Github under Issues).
54
55
56
57
58
59

Did not find it? :( So we can act quickly on it, please follow these steps:

* Include your **OS type and version**, the versions of **Python**, **PyTorch** and
  **Tensorflow** when applicable;
* A short, self-contained, code snippet that allows us to reproduce the bug in
60
  less than 30s;
61
62
* Provide the *full* traceback if an exception is raised.

63
To get the OS and software versions automatically, you can run the following command:
64

65
```bash
66
transformers-cli env
67
68
```

69
70
71
72
73
74
75
or from the root of the repository the following command:

```bash
python src/transformers/commands/transformers_cli.py env
```


76
77
### Do you want to implement a new model?

78
Awesome! Please provide the following information:
79

80
81
82
* Short description of the model and link to the paper;
* Link to the implementation if it is open-source;
* Link to the model weights if they are available.
83

84
85
If you are willing to contribute the model yourself, let us know so we can best
guide you.
86

Stas Bekman's avatar
Stas Bekman committed
87
We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them
Stas Bekman's avatar
Stas Bekman committed
88
in the [`templates`](https://github.com/huggingface/transformers/tree/master/templates) folder.
89

90
91
92
93
94
95
96
97
98
99
100
### Do you want a new feature (that is not a model)?

A world-class feature request addresses the following points:

1. Motivation first:
  * Is it related to a problem/frustration with the library? If so, please explain
    why. Providing a code snippet that demonstrates the problem is best.
  * Is it related to something you would need for a project? We'd love to hear
    about it!
  * Is it something you worked on and think could benefit the community?
    Awesome! Tell us what problem it solved for you.
101
102
103
2. Write a *full paragraph* describing the feature;
3. Provide a **code snippet** that demonstrates its future use;
4. In case this is related to a paper, please attach a link;
104
105
5. Attach any additional information (drawings, screenshots, etc.) you think may help.

106
If your issue is well written we're already 80% of the way there by the time you
107
108
post it.

Stas Bekman's avatar
Stas Bekman committed
109
110
We have added **templates** to guide you in the process of adding a new example script for training or testing the
models in the library. You can find them in the [`templates`](https://github.com/huggingface/transformers/tree/master/templates)
111
folder.
112

Rémi Louf's avatar
Rémi Louf committed
113
114
## Start contributing! (Pull Requests)

115
Before writing code, we strongly advise you to search through the existing PRs or
116
117
issues to make sure that nobody is already working on the same thing. If you are
unsure, it is always a good idea to open an issue to get some feedback.
Rémi Louf's avatar
Rémi Louf committed
118
119
120
121
122
123
124
125
126

You will need basic `git` proficiency to be able to contribute to
`transformers`. `git` is not the easiest tool to use but it has the greatest
manual. Type `git --help` in a shell and enjoy. If you prefer books, [Pro
Git](https://git-scm.com/book/en/v2) is a very good reference.

Follow these steps to start contributing:

1. Fork the [repository](https://github.com/huggingface/transformers) by
127
   clicking on the 'Fork' button on the repository's page. This creates a copy of the code
128
129
   under your GitHub user account.

Rémi Louf's avatar
Rémi Louf committed
130
2. Clone your fork to your local disk, and add the base repository as a remote:
131

Rémi Louf's avatar
Rémi Louf committed
132
133
134
   ```bash
   $ git clone git@github.com:<your Github handle>/transformers.git
   $ cd transformers
135
   $ git remote add upstream https://github.com/huggingface/transformers.git
Rémi Louf's avatar
Rémi Louf committed
136
137
138
139
140
141
142
   ```

3. Create a new branch to hold your development changes:

   ```bash
   $ git checkout -b a-descriptive-name-for-my-changes
   ```
143

Guy Rosin's avatar
Guy Rosin committed
144
   **Do not** work on the `master` branch.
145

146
4. Set up a development environment by running the following command in a virtual environment:
Rémi Louf's avatar
Rémi Louf committed
147
148

   ```bash
149
   $ pip install -e ".[dev]"
Rémi Louf's avatar
Rémi Louf committed
150
151
   ```

152
153
154
155
   (If transformers was already installed in the virtual environment, remove
   it with `pip uninstall transformers` before reinstalling it in editable
   mode with the `-e` flag.)

Sylvain Gugger's avatar
Sylvain Gugger committed
156
157
158
159
160
161
162
163
164
165
166
167
   To run the full test suite, you might need the additional dependency on `datasets` which requires a separate source
   install:

   ```bash
   $ git clone https://github.com/huggingface/datasets
   $ cd datasets
   $ pip install -e .
   ```

   If you have already cloned that repo, you might need to `git pull` to get the most recent changes in the `datasets`
   library.

168
169
170
171
172
173
174
175
176
5. Develop the features on your branch.

   As you work on the features, you should make sure that the test suite
   passes:

   ```bash
   $ make test
   ```

Stas Bekman's avatar
Stas Bekman committed
177
178
179
180
181
182
183
184
   Note, that this command uses `-n auto` pytest flag, therefore, it will start as many parallel `pytest` processes as the number of your computer's CPU-cores, and if you have lots of those and a few GPUs and not a great amount of RAM, it's likely to overload your computer. Therefore, to run the test suite, you may want to consider using this command instead:

   ```bash
   $ python -m pytest -n 3 --dist=loadfile -s -v ./tests/
   ```

   Adjust the value of `-n` to fit the load your hardware can support.

185
186
187
188
189
190
191
   `transformers` relies on `black` and `isort` to format its source code
   consistently. After you make changes, format them with:

   ```bash
   $ make style
   ```

192
   `transformers` also uses `flake8` and a few custom scripts to check for coding mistakes. Quality
193
194
195
196
197
   control runs in CI, however you can also run the same checks with:

   ```bash
   $ make quality
   ```
198
199
200
201
202
203
   You can do the automatic style corrections and code verifications that can't be automated in one go:

   ```bash
   $ make fixup
   ```

204
205
   This target is also optimized to only work with files modified by the PR you're working on.

206
207
208
209
210
211
212
213
214
215
   If you're modifying documents under `docs/source`, make sure to validate that
   they can still be built. This check also runs in CI. To run a local check
   make sure you have installed the documentation builder requirements, by
   running `pip install .[tf,torch,docs]` once from the root of this repository
   and then run:

   ```bash
   $ make docs
   ```

216
217
   Once you're happy with your changes, add changed files using `git add` and
   make a commit with `git commit` to record your changes locally:
218

Rémi Louf's avatar
Rémi Louf committed
219
220
221
222
   ```bash
   $ git add modified_file.py
   $ git commit
   ```
223

Rémi Louf's avatar
Rémi Louf committed
224
   Please write [good commit
225
226
227
228
   messages](https://chris.beams.io/posts/git-commit/).

   It is a good idea to sync your copy of the code with the original
   repository regularly. This way you can quickly account for changes:
229

Rémi Louf's avatar
Rémi Louf committed
230
231
232
233
   ```bash
   $ git fetch upstream
   $ git rebase upstream/master
   ```
234

Rémi Louf's avatar
Rémi Louf committed
235
   Push the changes to your account using:
236

Rémi Louf's avatar
Rémi Louf committed
237
238
239
   ```bash
   $ git push -u origin a-descriptive-name-for-my-changes
   ```
240

Rémi Louf's avatar
Rémi Louf committed
241
6. Once you are satisfied (**and the checklist below is happy too**), go to the
242
   webpage of your fork on GitHub. Click on 'Pull request' to send your changes
Rémi Louf's avatar
Rémi Louf committed
243
   to the project maintainers for review.
244

245
246
247
248
7. It's ok if maintainers ask you for changes. It happens to core contributors
   too! So everyone can see the changes in the Pull request, work in your local
   branch and push the changes to your fork. They will automatically appear in
   the pull request.
Rémi Louf's avatar
Rémi Louf committed
249
250
251
252
253


### Checklist

1. The title of your pull request should be a summary of its contribution;
254
2. If your pull request addresses an issue, please mention the issue number in
255
256
   the pull request description to make sure they are linked (and people
   consulting the issue know you are working on it);
Rémi Louf's avatar
Rémi Louf committed
257
258
259
3. To indicate a work in progress please prefix the title with `[WIP]`. These
   are useful to avoid duplicated work, and to differentiate it from PRs ready
   to be merged;
260
4. Make sure existing tests pass;
Stas Bekman's avatar
Stas Bekman committed
261
262
5. Add high-coverage tests. No quality testing = no merge.
   - If you are adding a new model, make sure that you use
263
     `ModelTester.all_model_classes = (MyModel, MyModelWithLMHead,...)`, which triggers the common tests.
Stas Bekman's avatar
Stas Bekman committed
264
265
266
   - If you are adding new `@slow` tests, make sure they pass using
     `RUN_SLOW=1 python -m pytest tests/test_my_new_model.py`.
   - If you are adding a new tokenizer, write tests, and make sure
267
     `RUN_SLOW=1 python -m pytest tests/test_tokenization_{your_model_name}.py` passes.
268
   CircleCI does not run the slow tests, but github actions does every night!
Stas Bekman's avatar
Stas Bekman committed
269
6. All public methods must have informative docstrings that work nicely with sphinx. See `modeling_ctrl.py` for an
270
   example.
Rémi Louf's avatar
Rémi Louf committed
271

272
273
### Tests

Stas Bekman's avatar
Stas Bekman committed
274
275
An extensive test suite is included to test the library behavior and several examples. Library tests can be found in
the [tests folder](https://github.com/huggingface/transformers/tree/master/tests) and examples tests in the
276
[examples folder](https://github.com/huggingface/transformers/tree/master/examples).
277
278
279
280
281
282
283
284
285
286
287
288
289
290

We like `pytest` and `pytest-xdist` because it's faster. From the root of the
repository, here's how to run tests with `pytest` for the library:

```bash
$ python -m pytest -n auto --dist=loadfile -s -v ./tests/
```

and for the examples:

```bash
$ pip install -r examples/requirements.txt  # only needed the first time
$ python -m pytest -n auto --dist=loadfile -s -v ./examples/
```
Stas Bekman's avatar
Stas Bekman committed
291
In fact, that's how `make test` and `make test-examples` are implemented (sans the `pip install` line)!
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319

You can specify a smaller set of tests in order to test only the feature
you're working on.

By default, slow tests are skipped. Set the `RUN_SLOW` environment variable to
`yes` to run them. This will download many gigabytes of models — make sure you
have enough disk space and a good Internet connection, or a lot of patience!

```bash
$ RUN_SLOW=yes python -m pytest -n auto --dist=loadfile -s -v ./tests/
$ RUN_SLOW=yes python -m pytest -n auto --dist=loadfile -s -v ./examples/
```

Likewise, set the `RUN_CUSTOM_TOKENIZERS` environment variable to `yes` to run
tests for custom tokenizers, which don't run by default either.

🤗 Transformers uses `pytest` as a test runner only. It doesn't use any
`pytest`-specific features in the test suite itself.

This means `unittest` is fully supported. Here's how to run tests with
`unittest`:

```bash
$ python -m unittest discover -s tests -t . -v
$ python -m unittest discover -s examples -t examples -v
```


Rémi Louf's avatar
Rémi Louf committed
320
321
### Style guide

322
323
324
For documentation strings, `transformers` follows the [google style](https://google.github.io/styleguide/pyguide.html).
Check our [documentation writing guide](https://github.com/huggingface/transformers/tree/master/docs#writing-documentation---specification)
for more information.
Rémi Louf's avatar
Rémi Louf committed
325
326

#### This guide was heavily inspired by the awesome [scikit-learn guide to contributing](https://github.com/scikit-learn/scikit-learn/blob/master/CONTRIBUTING.md)
327
328
329
330
331
332
333
334
335


### Develop on Windows

One way one can run the make command on Window is to pass by MSYS2:

1. [Download MSYS2](https://www.msys2.org/), we assume to have it installed in C:\msys64
2. Open the command line C:\msys64\msys2.exe (it should be available from the start menu)
3. Run in the shell: `pacman -Syu` and install make with `pacman -S make`
336
337
338
339
340
341
342
343
344
345
346
347
348

### Syncing forked master with upstream (HuggingFace) master

To avoid pinging the upstream repository which adds reference notes to each upstream PR and sends unnessary notifications to the developers involved in these PRs, 
when syncing the master branch of a forked repository, please, follow these steps:
1. When possible, avoid syncing with the upstream using a branch and PR on the forked repository. Instead merge directly into the forked master.
2. If a PR is absolutely necessary, use the following steps after checking out your branch:
```
$ git checkout -b your-branch-for-syncing
$ git pull --squash --no-commit upstream master
$ git commit -m '<your message without GitHub references>'
$ git push --set-upstream origin your-branch-for-syncing
```