CONTRIBUTING.md 9.91 KB
Newer Older
Nicolas Hug's avatar
Nicolas Hug committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Contributing to Torchaudio
We want to make contributing to this project as easy and transparent as possible.

## TL;DR

Please let us know if you encounter a bug by filing an [issue](https://github.com/pytorch/audio/issues).

We appreciate all contributions. If you are planning to contribute back
bug-fixes, please do so without any further discussion.

If you plan to contribute new features, utility functions or extensions to the
core, please first open an issue and discuss the feature with us. Sending a PR
without discussion might end up resulting in a rejected PR, because we might be
taking the core in a different direction than you might be aware of.

16
17
18
19
Facebook has a [bounty program](https://www.facebook.com/whitehat/) for the
safe disclosure of security bugs. In those cases, please go through the
process outlined on that page and do not file a public issue.

Nicolas Hug's avatar
Nicolas Hug committed
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
Fixing bugs and implementing new features are not the only way you can
contribute. It also helps the project when you report problems you're facing,
and when you give a :+1: on issues that others reported and that are relevant
to you.

You can also help by improving the documentation. This is no less important
than improving the library itself! If you find a typo in the documentation,
do not hesitate to submit a pull request.

If you're not sure what you want to work on, you can pick an issue from the
[list of open issues labelled as "help
wanted"](https://github.com/pytorch/audio/issues?q=is%3Aopen+is%3Aissue+label%3A%22help+wanted%22).
Comment on the issue that you want to work on it and send a PR with your fix
(see below).

35
36
37
38
39
40
## Contributor License Agreement ("CLA")
In order to accept your pull request, we need you to submit a CLA. You only need
to do this once to work on any of Facebook's open source projects.

Complete your CLA here: <https://code.facebook.com/cla>

Nicolas Hug's avatar
Nicolas Hug committed
41
42
43
44
45
## Development installation

We recommend using a `conda` environment to contribute efficiently to
torchaudio.

46
### Install PyTorch Nightly
Nicolas Hug's avatar
Nicolas Hug committed
47
48
49
50
51

```bash
conda install pytorch -c pytorch-nightly
```

moto's avatar
moto committed
52
### Install build dependencies
Nicolas Hug's avatar
Nicolas Hug committed
53
54

```bash
55
# Install build-time dependencies
moto's avatar
moto committed
56
57
58
59
60
pip install cmake ninja
# [optional for sox]
conda install pkg-config
# [optional for ffmpeg]
conda install ffmpeg
Nicolas Hug's avatar
Nicolas Hug committed
61
62
```

moto's avatar
moto committed
63
64
### Install Torchaudio

Nicolas Hug's avatar
Nicolas Hug committed
65
```bash
66
# Build torchaudio
Nicolas Hug's avatar
Nicolas Hug committed
67
68
git clone https://github.com/pytorch/audio.git
cd audio
69
python setup.py develop
Nicolas Hug's avatar
Nicolas Hug committed
70
# or, for OSX
moto's avatar
moto committed
71
# CC=clang CXX=clang++ python setup.py develop
Nicolas Hug's avatar
Nicolas Hug committed
72
73
```

74
75
76
Some environmnet variables that change the build behavior
- `BUILD_SOX`: Deteremines whether build and bind libsox in non-Windows environments. (no effect in Windows as libsox integration is not available) Default value is 1 (build and bind). Use 0 for disabling it.
- `USE_CUDA`: Determines whether build the custom CUDA kernel. Default to the availability of CUDA-compatible GPUs.
moto's avatar
moto committed
77
78
79
80
81
82
83
- `BUILD_KALDI`: Determines whether build Kaldi extension. This is required for `kaldi_pitch` function. Default value is 1 on Linux/macOS and 0 on Windows.
- `BUILD_RNNT`: Determines whether build RNN-T loss function. Default value is 1.
- `BUILD_CTC_DECODER`: Determines whether build decoder features based on KenLM and FlashLight CTC decoder. Default value is 1.

Please check the [./tools/setup_helpers/extension.py](./tools/setup_helpers/extension.py) for the up-to-date detail.

### Running Test
84

85
If you built sox, set the `PATH` variable so that the tests properly use the newly built `sox` binary:
Nicolas Hug's avatar
Nicolas Hug committed
86
87
88
89
90
91
92
93

```bash
export PATH="<path_to_torchaudio>/third_party/install/bin:${PATH}"
```

The following dependencies are also needed for testing:

```bash
Jcaw's avatar
Jcaw committed
94
pip install typing pytest scipy numpy parameterized
Nicolas Hug's avatar
Nicolas Hug committed
95
96
```

yangarbiter's avatar
yangarbiter committed
97
98
99
100
101
102
103
104
105
Optional packages to install if you want to run related tests:

- `librosa`
- `requests`
- `soundfile`
- `kaldi_io`
- `transformers`
- `fairseq` (it has to be newer than `0.10.2`, so you will need to install from
  source. Commit `e6eddd80` is known to work.)
106
107
- `unidecode` (dependency for testing text preprocessing functions for examples/pipeline_tacotron2)
- `inflect` (dependency for testing text preprocessing functions for examples/pipeline_tacotron2)
moto's avatar
moto committed
108
- `Pillow` (dependency for testing ffmpeg image processing)
yangarbiter's avatar
yangarbiter committed
109

Nicolas Hug's avatar
Nicolas Hug committed
110
111
112
113
## Development Process

If you plan to modify the code or documentation, please follow the steps below:

114
1. Fork the repository and create your branch from `main`: `$ git checkout main && git checkout -b my_cool_feature`
115
2. If you have modified the code (new feature or bug-fix), [please add tests](test/torchaudio_unittest/).
Nicolas Hug's avatar
Nicolas Hug committed
116
117
3. If you have changed APIs, [update the documentation](#Documentation).

118
119
For more details about pull requests,
please read [GitHub's guides](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request).
Nicolas Hug's avatar
Nicolas Hug committed
120
121
122

If you would like to contribute a new model, please see [here](#New-model).

123
If you would like to contribute a new dataset, please see [here](#New-dataset).
Nicolas Hug's avatar
Nicolas Hug committed
124
125
126

## Testing

127
Please refer to our [testing guidelines](test/torchaudio_unittest/) for more
Nicolas Hug's avatar
Nicolas Hug committed
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
details.

## Documentation

Torchaudio uses [Google style](http://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html)
for formatting docstrings. Length of line inside docstrings block must be limited to 120 characters.

To build the docs, first install the requirements:

```bash
cd docs
pip install -r requirements.txt
```

Then:

```bash
cd docs
make html
```

149
The built docs should now be available in `docs/build/html`.
150
151
152
153
154
155
156
If docstrings are mal-formed, warnings will be shown.
In CI doc build job, `SPHINXOPTS=-W` option is enabled and warnings are treated as error.
Please fix all the warnings when submitting a PR.
(You can use `SPHINXOPTS=-W` in local env, but by default,
tutorials are not built and it will be treated as error.
To use the option, please set  `BUILD_GALLERY` as well.
e.g. `BUILD_GALLERY=1 make 'SPHINXOPTS=-W' html`.)
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179

By default, the documentation only builds API reference.
If you are working to add a new example/tutorial with sphinx-gallery then
install the additional packages and set `BUILD_GALLERY` environment variable.

```bash
pip install -r requirements-tutorials.txt
BUILD_GALLERY=1 make html
```

This will build all the tutorials with ending `_tutorial.py`.
This can be time consuming. You can further filter which tutorial to build by using
`GALLERY_PATTERN` environment variable.

```
BUILD_GALLERY=1 GALLERY_PATTERN=forced_alignment_tutorial.py make html
```

Omitting `BUILD_GALLERY` while providing `GALLERY_PATTERN` assumes `BUILD_GALLERY=1`.

```
GALLERY_PATTERN=forced_alignment_tutorial.py make html
```
Nicolas Hug's avatar
Nicolas Hug committed
180

moto's avatar
moto committed
181
182
183
184
185
186
187
188
189
190
191
192
193
## Adding a new tutorial

We use Sphinx-Gallery to generate tutorials. Please refer to the [documentation](https://sphinx-gallery.github.io/stable/syntax.html) for how to format the tutorial.

You can draft in Google Colab and export it as IPython notebook and use [this script](https://gist.github.com/chsasank/7218ca16f8d022e02a9c0deb94a310fe) to convert it to Python file, but this process is known to incur some rendering issue. So please make sure to the resulting tutorial renders correctly.

Some tips;

- Use the suffix `_tutorial.py` to be recognized by the doc build process.
- When displaying audio with `IPython.display.Audio`, put one audio object per cell and put it at the end so that the resulting audio is embedded. (https://github.com/pytorch/audio/pull/1985)
- Similarly, when adding plots, add one plot per one code cell (use `subplots` to plot multiple), so that the resulting image is properly picked up.
- Avoid using `=` for section header, use `-` or `~`. Otherwise the resulting doc will have an issue like https://github.com/pytorch/audio/pull/1989.

194
195
196
## Conventions

As a good software development practice, we try to stick to existing variable
197
names and shape (for tensors), and maintain consistent docstring standards.
198
199
The following are some of the conventions that we follow.

200
201
202
- Tensor
  - We use an ellipsis "..." as a placeholder for the rest of the dimensions of a
    tensor, e.g. optional batching and channel dimensions. If batching, the
Ravi Makhija's avatar
Ravi Makhija committed
203
    "batch" dimension should come in the first dimension.
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
  - Tensors are assumed to have "channel" dimension coming before the "time"
    dimension. The bins in frequency domain (freq and mel) are assumed to come
    before the "time" dimension but after the "channel" dimension. These
    ordering makes the tensors consistent with PyTorch's dimensions.
  - For size names, the prefix `n_` is used (e.g. "a tensor of size (`n_freq`,
    `n_mels`)") whereas dimension names do not have this prefix (e.g. "a tensor of
    dimension (channel, time)")
- Docstring
  - Tensor dimensions are enclosed with single backticks.
    ``waveform (Tensor): Tensor of audio of dimension `(..., time)` ``
  - Parameter type for variable of type `T` with a default value: `(T, optional)`
  - Parameter type for variable of type `Optional[T]`: `(T or None)`
  - Return type for a tuple or list of known elements:
    `(element1, element2)` or `[element1, element2]`
  - Return type for a tuple or list with an arbitrary number of elements
    of type T: `Tuple[T]` or `List[T]`
220
221
222
223

Here are some of the examples of commonly used variables with thier names,
meanings, and shapes (or units):

224
225
226
227
* `waveform`: a tensor of audio samples with dimensions `(..., channel, time)`
* `sample_rate`: the rate of audio dimensions `(samples per second)`
* `specgram`: a tensor of spectrogram with dimensions `(..., channel, freq, time)`
* `mel_specgram`: a mel spectrogram with dimensions `(..., channel, mel, time)`
228
229
230
231
232
233
234
235
236
* `hop_length`: the number of samples between the starts of consecutive frames
* `n_fft`: the number of Fourier bins
* `n_mels`, `n_mfcc`: the number of mel and MFCC bins
* `n_freq`: the number of bins in a linear spectrogram
* `f_min`: the lowest frequency of the lowest band in a spectrogram
* `f_max`: the highest frequency of the highest band in a spectrogram
* `win_length`: the length of the STFT window
* `window_fn`: for functions that creates windows e.g. `torch.hann_window`

Nicolas Hug's avatar
Nicolas Hug committed
237
238
239
240
## License

By contributing to Torchaudio, you agree that your contributions will be licensed
under the LICENSE file in the root directory of this source tree.