CONTRIBUTING.md 17.8 KB
Newer Older
Sylvain Gugger's avatar
Sylvain Gugger committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<!---
Copyright 2020 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

17
# Contribute to 🤗 Transformers
18
19

Everyone is welcome to contribute, and we value everybody's contribution. Code
20
21
contributions are not the only way to help the community. Answering questions, helping
others, and improving the documentation are also immensely valuable.
22

23
24
25
It also helps us if you spread the word! Reference the library in blog posts
about the awesome projects it made possible, shout out on Twitter every time it has
helped you, or simply ⭐️ the repository to say thank you.
26

27
However you choose to contribute, please be mindful and respect our
28
[code of conduct](https://github.com/huggingface/transformers/blob/main/CODE_OF_CONDUCT.md).
Sylvain Gugger's avatar
Sylvain Gugger committed
29

30
31
32
33
34
**This guide was heavily inspired by the awesome [scikit-learn guide to contributing](https://github.com/scikit-learn/scikit-learn/blob/main/CONTRIBUTING.md).**

## Ways to contribute

There are several ways you can contribute to 🤗 Transformers:
35

36
37
38
39
* Fix outstanding issues with the existing code.
* Submit issues related to bugs or desired new features.
* Implement new models.
* Contribute to the examples or to the documentation.
40

41
If you don't know where to start, there is a special [Good First
42
Issue](https://github.com/huggingface/transformers/contribute) listing. It will give you a list of
43
44
45
46
47
48
49
50
open issues that are beginner-friendly and help you start contributing to open-source. Just comment in the issue that you'd like to work
on it. 

For something slightly more challenging, you can also take a look at the [Good Second Issue](https://github.com/huggingface/transformers/labels/Good%20Second%20Issue) list. In general though, if you feel like you know what you're doing, go for it and we'll help you get there! 🚀

> All contributions are equally valuable to the community. 🥰

## Fixing outstanding issues
51

52
If you notice an issue with the existing code and have a fix in mind, feel free to [start contributing](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md/#create-a-pull-request) and open a Pull Request!
53

54
## Submitting a bug-related issue or feature request
55

56
Do your best to follow these guidelines when submitting a bug-related issue or a feature
57
58
59
60
61
request. It will make it easier for us to come back to you quickly and with good
feedback.

### Did you find a bug?

62
The 🤗 Transformers library is robust and reliable thanks to users who report the problems they encounter.
63

64
65
Before you report an issue, we would really appreciate it if you could **make sure the bug was not
already reported** (use the search bar on GitHub under Issues). Your issue should also be related to bugs in the library itself, and not your code. If you're unsure whether the bug is in your code or the library, please ask on the [forum](https://discuss.huggingface.co/) first. This helps us respond quicker to fixing issues related to the library versus general questions.
66

67
Once you've confirmed the bug hasn't already been reported, please include the following information in your issue so we can quickly resolve it:
68

69
70
* Your **OS type and version** and **Python**, **PyTorch** and
  **TensorFlow** versions when applicable.
71
* A short, self-contained, code snippet that allows us to reproduce the bug in
72
73
74
  less than 30s.
* The *full* traceback if an exception is raised.
* Attach any other additional information, like screenshots, you think may help.
75

76
To get the OS and software versions automatically, run the following command:
77

78
```bash
79
transformers-cli env
80
81
```

82
You can also run the same command from the root of the repository:
83
84
85
86
87

```bash
python src/transformers/commands/transformers_cli.py env
```

88
### Do you want a new feature?
89

90
If there is a new feature you'd like to see in 🤗 Transformers, please open an issue and describe:
91

92
1. What is the *motivation* behind this feature? Is it related to a problem or frustration with the library? Is it a feature related to something you need for a project? Is it something you worked on and think it could benefit the community?
93

94
95
96
97
98
99
100
   Whatever it is, we'd love to hear about it!

2. Describe your requested feature in as much detail as possible. The more you can tell us about it, the better we'll be able to help you.
3. Provide a *code snippet* that demonstrates the features usage.
4. If the feature is related to a paper, please include a link.

If your issue is well written we're already 80% of the way there by the time you create it.
101

102
We have added [templates](https://github.com/huggingface/transformers/tree/main/templates) to help you get started with your issue.
103

104
## Do you want to implement a new model?
105

106
107
108
109
110
New models are constantly released and if you want to implement a new model, please provide the following information

* A short description of the model and link to the paper.
* Link to the implementation if it is open-sourced.
* Link to the model weights if they are available.
111

112
If you are willing to contribute the model yourself, let us know so we can help you add it to 🤗 Transformers!
113

114
We have added a [detailed guide and templates](https://github.com/huggingface/transformers/tree/main/templates) to help you get started with adding a new model, and we also have a more technical guide for [how to add a model to 🤗 Transformers](https://huggingface.co/docs/transformers/add_new_model).
115

116
## Do you want to add documentation?
117

118
We're always looking for improvements to the documentation that make it more clear and accurate. Please let us know how the documentation can be improved such as typos and any content that is missing, unclear or inaccurate. We'll be happy to make the changes or help you make a contribution if you're interested!
119

120
For more details about how to generate, build, and write the documentation, take a look at the documentation [README](https://github.com/huggingface/transformers/tree/main/docs).
Rémi Louf's avatar
Rémi Louf committed
121

122
123
124
125
## Create a Pull Request

Before writing any code, we strongly advise you to search through the existing PRs or
issues to make sure nobody is already working on the same thing. If you are
126
unsure, it is always a good idea to open an issue to get some feedback.
Rémi Louf's avatar
Rémi Louf committed
127

128
129
130
You will need basic `git` proficiency to contribute to
🤗 Transformers. While `git` is not the easiest tool to use, it has the greatest
manual. Type `git --help` in a shell and enjoy! If you prefer books, [Pro
Rémi Louf's avatar
Rémi Louf committed
131
132
Git](https://git-scm.com/book/en/v2) is a very good reference.

133
You'll need **[Python 3.7]((https://github.com/huggingface/transformers/blob/main/setup.py#L426))** or above to contribute to 🤗 Transformers. Follow the steps below to start contributing:
Rémi Louf's avatar
Rémi Louf committed
134
135

1. Fork the [repository](https://github.com/huggingface/transformers) by
136
   clicking on the **[Fork](https://github.com/huggingface/transformers/fork)** button on the repository's page. This creates a copy of the code
137
138
   under your GitHub user account.

Rémi Louf's avatar
Rémi Louf committed
139
2. Clone your fork to your local disk, and add the base repository as a remote:
140

Rémi Louf's avatar
Rémi Louf committed
141
142
143
   ```bash
   $ git clone git@github.com:<your Github handle>/transformers.git
   $ cd transformers
144
   $ git remote add upstream https://github.com/huggingface/transformers.git
Rémi Louf's avatar
Rémi Louf committed
145
146
147
148
149
150
151
   ```

3. Create a new branch to hold your development changes:

   ```bash
   $ git checkout -b a-descriptive-name-for-my-changes
   ```
152

153
   🚨 **Do not** work on the `main` branch!
154

155
4. Set up a development environment by running the following command in a virtual environment:
Rémi Louf's avatar
Rémi Louf committed
156
157

   ```bash
158
   $ pip install -e ".[dev]"
Rémi Louf's avatar
Rémi Louf committed
159
160
   ```

161
   If 🤗 Transformers was already installed in the virtual environment, remove
162
   it with `pip uninstall transformers` before reinstalling it in editable
163
   mode with the `-e` flag.
164
   
165
   Depending on your OS, you may need to install some external libraries as well if the `pip` installation fails.
166
   
167
   For macOS, you will likely need [MeCab](https://taku910.github.io/mecab/) which can be installed from Homebrew:
168
169
170
171
   
   ```bash
   brew install mecab
   ```
Sylvain Gugger's avatar
Sylvain Gugger committed
172

173
174
5. Develop the features on your branch.

175
176
   As you work on your code, you should make sure the test suite
   passes. Run the tests impacted by your changes like this:
177
178
179
180
181
182

   ```bash
   $ pytest tests/<TEST_TO_RUN>.py
   ```

   For more information about tests, check out the
183
   [Testing](https://huggingface.co/docs/transformers/testing) guide.
184
185
186
187

   🤗 Transformers relies on `black` and `isort` to format its source code
   consistently. After you make changes, apply automatic style corrections and code verifications
   that can't be automated in one go with:
Stas Bekman's avatar
Stas Bekman committed
188
189

   ```bash
190
   $ make fixup
Stas Bekman's avatar
Stas Bekman committed
191
192
   ```

193
   This target is also optimized to only work with files modified by the PR you're working on.
Stas Bekman's avatar
Stas Bekman committed
194

195
   If you prefer to run the checks one after the other, the following command applies the
196
   style corrections:
197
198
199
200
201

   ```bash
   $ make style
   ```

202
   🤗 Transformers also uses `flake8` and a few custom scripts to check for coding mistakes. Quality
203
   controls are run by the CI, but you can run the same checks with:
204
205
206
207

   ```bash
   $ make quality
   ```
208

209
210
   Finally, we have a lot of scripts to make sure we didn't forget to update
   some files when adding a new model. You can run these scripts with:
211
212

   ```bash
Kamal Raj's avatar
Kamal Raj committed
213
   $ make repo-consistency
214
215
   ```

216
217
   To learn more about those checks and how to fix any issues with them, check out the
   [Checks on a Pull Request](https://huggingface.co/docs/transformers/pr_checks) guide.
218

219
220
   If you're modifying documents under `docs/source` directory, make sure the documentation can still be built. This check will also run in the CI when you open a pull request. To run a local check
   make sure you install the documentation builder:
221
222
   
   ```bash
Kamal Raj's avatar
Kamal Raj committed
223
   $ pip install ".[docs]"
224
225
   ```

226
   Run the following command from the root of the repository:
227
228

   ```bash
229
   $ doc-builder build transformers docs/source/en --build_dir ~/tmp/test-build
230
231
   ```

232
   This will build the documentation in the `~/tmp/test-build` folder where you can inspect the generated
233
   Markdown files with your favorite editor. You can also preview the docs on GitHub when you open a pull request.
234

235
236
   Once you're happy with your changes, add changed files with `git add` and
   record your changes locally with `git commit`:
237

Rémi Louf's avatar
Rémi Louf committed
238
239
240
241
   ```bash
   $ git add modified_file.py
   $ git commit
   ```
242

243
244
   Please remember to write [good commit
   messages](https://chris.beams.io/posts/git-commit/) to clearly communicate the changes you made!
245

246
247
   To keep your copy of the code up to date with the original
   repository, rebase your branch on `upstream/branch` *before* you open a pull request or if requested by a maintainer:
248

Rémi Louf's avatar
Rémi Louf committed
249
250
   ```bash
   $ git fetch upstream
251
   $ git rebase upstream/main
Rémi Louf's avatar
Rémi Louf committed
252
   ```
253

254
   Push your changes to your branch:
255

Rémi Louf's avatar
Rémi Louf committed
256
257
258
   ```bash
   $ git push -u origin a-descriptive-name-for-my-changes
   ```
259

260
261
   If you've already opened a pull request, you'll need to force push with the `--force` flag. Otherwise, if the pull request hasn't been opened yet, you can just push your changes normally.

262
6. Now you can go to your fork of the repository on GitHub and click on **Pull request** to open a pull request. Make sure you tick off all the boxes in our [checklist](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md/#pull-request-checklist) below. When you're ready, you can send your changes to the project maintainers for review.
263

264
265
7. It's ok if maintainers request changes, it happens to our core contributors
   too! So everyone can see the changes in the pull request, work in your local
266
267
   branch and push the changes to your fork. They will automatically appear in
   the pull request.
Rémi Louf's avatar
Rémi Louf committed
268

269
270
271
272
273
274
275
276
277
278
279
280
### Pull request checklist

☐ The pull request title should summarize your contribution.<br>
☐ If your pull request addresses an issue, please mention the issue number in the pull
request description to make sure they are linked (and people viewing the issue know you
are working on it).<br>
☐ To indicate a work in progress please prefix the title with `[WIP]`. These are
useful to avoid duplicated work, and to differentiate it from PRs ready to be merged.
☐ Make sure existing tests pass.<br>
☐ If adding a new feature, also add tests for it.<br>
   - If you are adding a new model, make sure you use
     `ModelTester.all_model_classes = (MyModel, MyModelWithLMHead,...)` to trigger the common tests.
Stas Bekman's avatar
Stas Bekman committed
281
   - If you are adding new `@slow` tests, make sure they pass using
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
     `RUN_SLOW=1 python -m pytest tests/models/my_new_model/test_my_new_model.py`.
   - If you are adding a new tokenizer, write tests and make sure
     `RUN_SLOW=1 python -m pytest tests/models/{your_model_name}/test_tokenization_{your_model_name}.py` passes.
   CircleCI does not run the slow tests, but GitHub Actions does every night!<br>

☐ All public methods must have informative docstrings (see
[`modeling_bert.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/modeling_bert.py)
for an example).<br>
☐ Due to the rapidly growing repository, don't add any images, videos and other
non-text files that'll significantly weigh down the repository. Instead, use a Hub
repository such as [`hf-internal-testing`](https://huggingface.co/hf-internal-testing)
to host these files and reference them by URL. We recommend placing documentation
related images in the following repository:
[huggingface/documentation-images](https://huggingface.co/datasets/huggingface/documentation-images).
You can open a PR on this dataset repostitory and ask a Hugging Face member to merge it.

298
For more information about the checks run on a pull request, take a look at our [Checks on a Pull Request](https://huggingface.co/docs/transformers/pr_checks) guide.
Sylvain Gugger's avatar
Sylvain Gugger committed
299

300
301
### Tests

Stas Bekman's avatar
Stas Bekman committed
302
An extensive test suite is included to test the library behavior and several examples. Library tests can be found in
303
304
the [tests](https://github.com/huggingface/transformers/tree/main/tests) folder and examples tests in the
[examples](https://github.com/huggingface/transformers/tree/main/examples) folder.
305
306

We like `pytest` and `pytest-xdist` because it's faster. From the root of the
307
repository, specify a *path to a subfolder or a test file* to run the test.
308
309

```bash
310
$ python -m pytest -n auto --dist=loadfile -s -v ./tests/models/my_new_model
311
312
```

313
Similarly, for the `examples` directory, specify a *path to a subfolder or test file* to run the test. For example, the following command tests the text classification subfolder in the PyTorch `examples` directory:
314
315

```bash
Sylvain Gugger's avatar
Sylvain Gugger committed
316
$ pip install -r examples/xxx/requirements.txt  # only needed the first time
317
$ python -m pytest -n auto --dist=loadfile -s -v ./examples/pytorch/text-classification
318
319
```

320
321
322
In fact, this is actually how our `make test` and `make test-examples` commands are implemented (not including the `pip install`)!

You can also specify a smaller set of tests in order to test only the feature
323
324
you're working on.

325
326
327
328
329
330
331
332
333
By default, slow tests are skipped but you can set the `RUN_SLOW` environment variable to
`yes` to run them. This will download many gigabytes of models so make sure you
have enough disk space, a good internet connection or a lot of patience!

<Tip warning={true}>

Remember to specify a *path to a subfolder or a test file* to run the test. Otherwise, you'll run all the tests in the `tests` or `examples` folder, which will take a very long time!

</Tip>
334
335

```bash
336
337
$ RUN_SLOW=yes python -m pytest -n auto --dist=loadfile -s -v ./tests/models/my_new_model
$ RUN_SLOW=yes python -m pytest -n auto --dist=loadfile -s -v ./examples/pytorch/text-classification
338
339
```

340
Like the slow tests, custom tokenizer tests are skipped but you can set the `RUN_CUSTOM_TOKENIZERS` environment variable to `yes` to run them.
341
342
343
344
345
346
347
348
349
350
351
352

🤗 Transformers uses `pytest` as a test runner only. It doesn't use any
`pytest`-specific features in the test suite itself.

This means `unittest` is fully supported. Here's how to run tests with
`unittest`:

```bash
$ python -m unittest discover -s tests -t . -v
$ python -m unittest discover -s examples -t examples -v
```

Rémi Louf's avatar
Rémi Louf committed
353
354
### Style guide

355
For documentation strings, 🤗 Transformers follows the [Google Python Style Guide](https://google.github.io/styleguide/pyguide.html).
356
Check our [documentation writing guide](https://github.com/huggingface/transformers/tree/main/docs#writing-documentation---specification)
357
for more information.
Rémi Louf's avatar
Rémi Louf committed
358

359
360
### Develop on Windows

361
On Windows (unless you're working in [Windows Subsytem for Linux](https://learn.microsoft.com/en-us/windows/wsl/) or WSL), you need to configure git to transform Windows `CRLF` line endings to Linux `LF` line endings:
362

363
364
365
```bash
git config core.autocrlf input
```
366

367
One way to run the `make` command on Windows is with MSYS2:
368

369
370
371
1. [Download MSYS2](https://www.msys2.org/), and we assume it's installed in `C:\msys64`.
2. Open the command line `C:\msys64\msys2.exe` (it should be available from the **Start** menu).
3. Run in the shell: `pacman -Syu` and install `make` with `pacman -S make`.
372
373
4. Add `C:\msys64\usr\bin` to your PATH environment variable.

374
375
376
You can now use `make` from any terminal (Powershell, cmd.exe, etc.)! 🎉

### Sync a forked repository with upstream main (the Hugging Face repository)
377

378
When updating the main branch of a forked repository, please follow these steps to avoid pinging the upstream repository which adds reference notes to each upstream PR, and sends unnecessary notifications to the developers involved in these PRs.
379

380
1. When possible, avoid syncing with the upstream using a branch and PR on the forked repository. Instead, merge directly into the forked main.
381
2. If a PR is absolutely necessary, use the following steps after checking out your branch:
382
383

```bash
384
$ git checkout -b your-branch-for-syncing
385
$ git pull --squash --no-commit upstream main
386
387
388
$ git commit -m '<your message without GitHub references>'
$ git push --set-upstream origin your-branch-for-syncing
```