CONTRIBUTING.md 16.2 KB
Newer Older
Sylvain Gugger's avatar
Sylvain Gugger committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<!---
Copyright 2020 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

17
18
19
# How to contribute to transformers?

Everyone is welcome to contribute, and we value everybody's contribution. Code
20
21
22
is thus not the only way to help the community. Answering questions, helping
others, reaching out and improving the documentations are immensely valuable to
the community.
23
24

It also helps us if you spread the word: reference the library from blog posts
25
on the awesome projects it made possible, shout out on Twitter every time it has
26
27
helped you, or simply star the repo to say "thank you".

Sylvain Gugger's avatar
Sylvain Gugger committed
28
Whichever way you choose to contribute, please be mindful to respect our
29
[code of conduct](https://github.com/huggingface/transformers/blob/main/CODE_OF_CONDUCT.md).
Sylvain Gugger's avatar
Sylvain Gugger committed
30

31
32
33
34
35
## You can contribute in so many ways!

There are 4 ways you can contribute to transformers:
* Fixing outstanding issues with the existing code;
* Implementing new models;
36
* Contributing to the examples or to the documentation;
37
38
* Submitting issues related to bugs or desired new features.

Shreem Asati's avatar
Shreem Asati committed
39
In particular, there is a special [Good First
40
Issue](https://github.com/huggingface/transformers/contribute) listing. It will give you a list of
41
42
43
44
45
open Issues that are open to anybody to work on. Just comment in the issue that you'd like to work
on it. In that same listing you will also find some Issues with `Good Second Issue` label. These are
typically slightly more complicated than the Issues with just `Good First Issue` label. But if you
feel you know what you're doing, go for it.

46
47
48
49
50
51
52
53
54
55
*All are equally valuable to the community.*

## Submitting a new issue or feature request

Do your best to follow these guidelines when submitting an issue or a feature
request. It will make it easier for us to come back to you quickly and with good
feedback.

### Did you find a bug?

56
The 🤗 Transformers library is robust and reliable thanks to the users who notify us of
57
the problems they encounter. So thank you for reporting an issue.
58

59
60
First, we would really appreciate it if you could **make sure the bug was not
already reported** (use the search bar on Github under Issues).
61
62
63
64
65
66

Did not find it? :( So we can act quickly on it, please follow these steps:

* Include your **OS type and version**, the versions of **Python**, **PyTorch** and
  **Tensorflow** when applicable;
* A short, self-contained, code snippet that allows us to reproduce the bug in
67
  less than 30s;
68
69
* Provide the *full* traceback if an exception is raised.

70
To get the OS and software versions automatically, you can run the following command:
71

72
```bash
73
transformers-cli env
74
75
```

76
77
78
79
80
81
82
or from the root of the repository the following command:

```bash
python src/transformers/commands/transformers_cli.py env
```


83
84
### Do you want to implement a new model?

85
Awesome! Please provide the following information:
86

87
88
89
* Short description of the model and link to the paper;
* Link to the implementation if it is open-source;
* Link to the model weights if they are available.
90

91
92
If you are willing to contribute the model yourself, let us know so we can best
guide you.
93

Stas Bekman's avatar
Stas Bekman committed
94
We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them
95
in the [`templates`](https://github.com/huggingface/transformers/tree/main/templates) folder.
96

97
98
99
100
101
102
103
104
105
106
107
### Do you want a new feature (that is not a model)?

A world-class feature request addresses the following points:

1. Motivation first:
  * Is it related to a problem/frustration with the library? If so, please explain
    why. Providing a code snippet that demonstrates the problem is best.
  * Is it related to something you would need for a project? We'd love to hear
    about it!
  * Is it something you worked on and think could benefit the community?
    Awesome! Tell us what problem it solved for you.
108
109
110
2. Write a *full paragraph* describing the feature;
3. Provide a **code snippet** that demonstrates its future use;
4. In case this is related to a paper, please attach a link;
111
112
5. Attach any additional information (drawings, screenshots, etc.) you think may help.

113
If your issue is well written we're already 80% of the way there by the time you
114
115
post it.

Stas Bekman's avatar
Stas Bekman committed
116
We have added **templates** to guide you in the process of adding a new example script for training or testing the
117
models in the library. You can find them in the [`templates`](https://github.com/huggingface/transformers/tree/main/templates)
118
folder.
119

Rémi Louf's avatar
Rémi Louf committed
120
121
## Start contributing! (Pull Requests)

122
Before writing code, we strongly advise you to search through the existing PRs or
123
124
issues to make sure that nobody is already working on the same thing. If you are
unsure, it is always a good idea to open an issue to get some feedback.
Rémi Louf's avatar
Rémi Louf committed
125
126

You will need basic `git` proficiency to be able to contribute to
127
🤗 Transformers. `git` is not the easiest tool to use but it has the greatest
Rémi Louf's avatar
Rémi Louf committed
128
129
130
manual. Type `git --help` in a shell and enjoy. If you prefer books, [Pro
Git](https://git-scm.com/book/en/v2) is a very good reference.

131
Follow these steps to start contributing ([supported Python versions](https://github.com/huggingface/transformers/blob/main/setup.py#L426)):
Rémi Louf's avatar
Rémi Louf committed
132
133

1. Fork the [repository](https://github.com/huggingface/transformers) by
134
   clicking on the 'Fork' button on the repository's page. This creates a copy of the code
135
136
   under your GitHub user account.

Rémi Louf's avatar
Rémi Louf committed
137
2. Clone your fork to your local disk, and add the base repository as a remote:
138

Rémi Louf's avatar
Rémi Louf committed
139
140
141
   ```bash
   $ git clone git@github.com:<your Github handle>/transformers.git
   $ cd transformers
142
   $ git remote add upstream https://github.com/huggingface/transformers.git
Rémi Louf's avatar
Rémi Louf committed
143
144
145
146
147
148
149
   ```

3. Create a new branch to hold your development changes:

   ```bash
   $ git checkout -b a-descriptive-name-for-my-changes
   ```
150

151
   **Do not** work on the `main` branch.
152

153
4. Set up a development environment by running the following command in a virtual environment:
Rémi Louf's avatar
Rémi Louf committed
154
155

   ```bash
156
   $ pip install -e ".[dev]"
Rémi Louf's avatar
Rémi Louf committed
157
158
   ```

159
160
161
162
   (If transformers was already installed in the virtual environment, remove
   it with `pip uninstall transformers` before reinstalling it in editable
   mode with the `-e` flag.)

Sylvain Gugger's avatar
Sylvain Gugger committed
163
164
165
166
167
168
169
170
171
172
173
   To run the full test suite, you might need the additional dependency on `datasets` which requires a separate source
   install:

   ```bash
   $ git clone https://github.com/huggingface/datasets
   $ cd datasets
   $ pip install -e .
   ```

   If you have already cloned that repo, you might need to `git pull` to get the most recent changes in the `datasets`
   library.
174
175
176
177
178
179
180
181
   
   Depending on your OS, you might need to install some external libraries, as well, if the `pip` installation fails.
   
   For macOS, you will likely need [MeCab](https://taku910.github.io/mecab/), which can be installed from Homebrew:
   
   ```bash
   brew install mecab
   ```
Sylvain Gugger's avatar
Sylvain Gugger committed
182

183
184
185
5. Develop the features on your branch.

   As you work on the features, you should make sure that the test suite
186
187
188
189
190
191
192
193
194
   passes. You should run the tests impacted by your changes like this:

   ```bash
   $ pytest tests/<TEST_TO_RUN>.py
   ```

   You can also run the full suite with the following command, but it takes
   a beefy machine to produce a result in a decent amount of time now that
   Transformers has grown a lot. Here is the command for it:
195
196
197
198
199

   ```bash
   $ make test
   ```

200
201
202
203
204
205
   For more information about tests, check out the
   [dedicated documentation](https://huggingface.co/docs/transformers/testing)

   🤗 Transformers relies on `black` and `isort` to format its source code
   consistently. After you make changes, apply automatic style corrections and code verifications
   that can't be automated in one go with:
Stas Bekman's avatar
Stas Bekman committed
206
207

   ```bash
208
   $ make fixup
Stas Bekman's avatar
Stas Bekman committed
209
210
   ```

211
   This target is also optimized to only work with files modified by the PR you're working on.
Stas Bekman's avatar
Stas Bekman committed
212

213
214
   If you prefer to run the checks one after the other, the following command apply the
   style corrections:
215
216
217
218
219

   ```bash
   $ make style
   ```

220
   🤗 Transformers also uses `flake8` and a few custom scripts to check for coding mistakes. Quality
221
222
223
224
225
   control runs in CI, however you can also run the same checks with:

   ```bash
   $ make quality
   ```
226
227
228

   Finally we have a lot of scripts that check we didn't forget to update
   some files when adding a new model, that you can run with
229
230

   ```bash
Kamal Raj's avatar
Kamal Raj committed
231
   $ make repo-consistency
232
233
   ```

234
235
   To learn more about those checks and how to fix any issue with them, check out the
   [documentation](https://huggingface.co/docs/transformers/pr_checks)
236

237
238
   If you're modifying documents under `docs/source`, make sure to validate that
   they can still be built. This check also runs in CI. To run a local check
239
240
241
242
243
244
245
246
247
248
   make sure you have installed the documentation builder requirements. First you will need to clone the
   repository containing our tools to build the documentation:
   
   ```bash
   $ pip install git+https://github.com/huggingface/doc-builder
   ```

   Then, make sure you have all the dependencies to be able to build the doc with:
   
   ```bash
Kamal Raj's avatar
Kamal Raj committed
249
   $ pip install ".[docs]"
250
251
252
   ```

   Finally run the following command from the root of the repository:
253
254

   ```bash
255
   $ doc-builder build transformers docs/source/ --build_dir ~/tmp/test-build
256
257
   ```

258
259
   This will build the documentation in the `~/tmp/test-build` folder where you can inspect the generated
   Markdown files with your favorite editor. You won't be able to see the final rendering on the website
Kamal Raj's avatar
Kamal Raj committed
260
   before your PR is merged, we are actively working on adding a tool for this.
261

262
263
   Once you're happy with your changes, add changed files using `git add` and
   make a commit with `git commit` to record your changes locally:
264

Rémi Louf's avatar
Rémi Louf committed
265
266
267
268
   ```bash
   $ git add modified_file.py
   $ git commit
   ```
269

Rémi Louf's avatar
Rémi Louf committed
270
   Please write [good commit
271
272
273
274
   messages](https://chris.beams.io/posts/git-commit/).

   It is a good idea to sync your copy of the code with the original
   repository regularly. This way you can quickly account for changes:
275

Rémi Louf's avatar
Rémi Louf committed
276
277
   ```bash
   $ git fetch upstream
278
   $ git rebase upstream/main
Rémi Louf's avatar
Rémi Louf committed
279
   ```
280

Rémi Louf's avatar
Rémi Louf committed
281
   Push the changes to your account using:
282

Rémi Louf's avatar
Rémi Louf committed
283
284
285
   ```bash
   $ git push -u origin a-descriptive-name-for-my-changes
   ```
286

Rémi Louf's avatar
Rémi Louf committed
287
6. Once you are satisfied (**and the checklist below is happy too**), go to the
288
   webpage of your fork on GitHub. Click on 'Pull request' to send your changes
Rémi Louf's avatar
Rémi Louf committed
289
   to the project maintainers for review.
290

291
292
293
294
7. It's ok if maintainers ask you for changes. It happens to core contributors
   too! So everyone can see the changes in the Pull request, work in your local
   branch and push the changes to your fork. They will automatically appear in
   the pull request.
Rémi Louf's avatar
Rémi Louf committed
295
296
297
298
299


### Checklist

1. The title of your pull request should be a summary of its contribution;
300
2. If your pull request addresses an issue, please mention the issue number in
301
302
   the pull request description to make sure they are linked (and people
   consulting the issue know you are working on it);
Rémi Louf's avatar
Rémi Louf committed
303
304
305
3. To indicate a work in progress please prefix the title with `[WIP]`. These
   are useful to avoid duplicated work, and to differentiate it from PRs ready
   to be merged;
306
4. Make sure existing tests pass;
Stas Bekman's avatar
Stas Bekman committed
307
308
5. Add high-coverage tests. No quality testing = no merge.
   - If you are adding a new model, make sure that you use
309
     `ModelTester.all_model_classes = (MyModel, MyModelWithLMHead,...)`, which triggers the common tests.
Stas Bekman's avatar
Stas Bekman committed
310
311
312
   - If you are adding new `@slow` tests, make sure they pass using
     `RUN_SLOW=1 python -m pytest tests/test_my_new_model.py`.
   - If you are adding a new tokenizer, write tests, and make sure
313
     `RUN_SLOW=1 python -m pytest tests/test_tokenization_{your_model_name}.py` passes.
314
   CircleCI does not run the slow tests, but github actions does every night!
Sylvain Gugger's avatar
Sylvain Gugger committed
315
6. All public methods must have informative docstrings that work nicely with sphinx. See `modeling_bert.py` for an
316
   example.
317
318
7. Due to the rapidly growing repository, it is important to make sure that no files that would significantly weigh down the repository are added. This includes images, videos and other non-text files. We prefer to leverage a hf.co hosted `dataset` like
   the ones hosted on [`hf-internal-testing`](https://huggingface.co/hf-internal-testing) in which to place these files and reference 
319
320
321
   them by URL. We recommend putting them in the following dataset: [huggingface/documentation-images](https://huggingface.co/datasets/huggingface/documentation-images).
   If an external contribution, feel free to add the images to your PR and ask a Hugging Face member to migrate your images
   to this dataset.
Rémi Louf's avatar
Rémi Louf committed
322

Sylvain Gugger's avatar
Sylvain Gugger committed
323
See more about the checks run on a pull request in our [PR guide](pr_checks)
Sylvain Gugger's avatar
Sylvain Gugger committed
324

325
326
### Tests

Stas Bekman's avatar
Stas Bekman committed
327
An extensive test suite is included to test the library behavior and several examples. Library tests can be found in
328
329
the [tests folder](https://github.com/huggingface/transformers/tree/main/tests) and examples tests in the
[examples folder](https://github.com/huggingface/transformers/tree/main/examples).
330
331
332
333
334
335
336
337
338
339
340

We like `pytest` and `pytest-xdist` because it's faster. From the root of the
repository, here's how to run tests with `pytest` for the library:

```bash
$ python -m pytest -n auto --dist=loadfile -s -v ./tests/
```

and for the examples:

```bash
Sylvain Gugger's avatar
Sylvain Gugger committed
341
$ pip install -r examples/xxx/requirements.txt  # only needed the first time
342
343
$ python -m pytest -n auto --dist=loadfile -s -v ./examples/
```
Stas Bekman's avatar
Stas Bekman committed
344
In fact, that's how `make test` and `make test-examples` are implemented (sans the `pip install` line)!
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372

You can specify a smaller set of tests in order to test only the feature
you're working on.

By default, slow tests are skipped. Set the `RUN_SLOW` environment variable to
`yes` to run them. This will download many gigabytes of models — make sure you
have enough disk space and a good Internet connection, or a lot of patience!

```bash
$ RUN_SLOW=yes python -m pytest -n auto --dist=loadfile -s -v ./tests/
$ RUN_SLOW=yes python -m pytest -n auto --dist=loadfile -s -v ./examples/
```

Likewise, set the `RUN_CUSTOM_TOKENIZERS` environment variable to `yes` to run
tests for custom tokenizers, which don't run by default either.

🤗 Transformers uses `pytest` as a test runner only. It doesn't use any
`pytest`-specific features in the test suite itself.

This means `unittest` is fully supported. Here's how to run tests with
`unittest`:

```bash
$ python -m unittest discover -s tests -t . -v
$ python -m unittest discover -s examples -t examples -v
```


Rémi Louf's avatar
Rémi Louf committed
373
374
### Style guide

375
For documentation strings, 🤗 Transformers follows the [google style](https://google.github.io/styleguide/pyguide.html).
376
Check our [documentation writing guide](https://github.com/huggingface/transformers/tree/main/docs#writing-documentation---specification)
377
for more information.
Rémi Louf's avatar
Rémi Louf committed
378

379
**This guide was heavily inspired by the awesome [scikit-learn guide to contributing](https://github.com/scikit-learn/scikit-learn/blob/main/CONTRIBUTING.md).**
380
381
382

### Develop on Windows

383
384
385
386
On windows, you need to configure git to transform Windows `CRLF` line endings to Linux `LF` line endings:

`git config core.autocrlf input`

387
388
389
390
391
One way one can run the make command on Window is to pass by MSYS2:

1. [Download MSYS2](https://www.msys2.org/), we assume to have it installed in C:\msys64
2. Open the command line C:\msys64\msys2.exe (it should be available from the start menu)
3. Run in the shell: `pacman -Syu` and install make with `pacman -S make`
392
393
394
4. Add `C:\msys64\usr\bin` to your PATH environment variable.

You can now use `make` from any terminal (Powershell, cmd.exe, etc) 🎉
395

396
### Syncing forked main with upstream (HuggingFace) main
397

Kamal Raj's avatar
Kamal Raj committed
398
To avoid pinging the upstream repository which adds reference notes to each upstream PR and sends unnecessary notifications to the developers involved in these PRs,
399
400
when syncing the main branch of a forked repository, please, follow these steps:
1. When possible, avoid syncing with the upstream using a branch and PR on the forked repository. Instead merge directly into the forked main.
401
402
403
2. If a PR is absolutely necessary, use the following steps after checking out your branch:
```
$ git checkout -b your-branch-for-syncing
404
$ git pull --squash --no-commit upstream main
405
406
407
$ git commit -m '<your message without GitHub references>'
$ git push --set-upstream origin your-branch-for-syncing
```