Unverified Commit 108a32d9 authored by yangarbiter's avatar yangarbiter Committed by GitHub
Browse files

Update and move convention section to CONTRIBUTING.md (#1635)

parent e14a2e0c
...@@ -131,6 +131,39 @@ make html ...@@ -131,6 +131,39 @@ make html
The built docs should now be available in `docs/build/html` The built docs should now be available in `docs/build/html`
## Conventions
As a good software development practice, we try to stick to existing variable
names and shape (for tensors).
The following are some of the conventions that we follow.
- We use an ellipsis "..." as a placeholder for the rest of the dimensions of a
tensor, e.g. optional batching and channel dimensions. If batching, the
"batch" dimension should come in the first diemension.
- Tensors are assumed to have "channel" dimension coming before the "time"
dimension. The bins in frequency domain (freq and mel) are assumed to come
before the "time" dimension but after the "channel" dimension. These
ordering makes the tensors consistent with PyTorch's dimensions.
- For size names, the prefix `n_` is used (e.g. "a tensor of size (`n_freq`,
`n_mels`)") whereas dimension names do not have this prefix (e.g. "a tensor of
dimension (channel, time)")
Here are some of the examples of commonly used variables with thier names,
meanings, and shapes (or units):
* `waveform`: a tensor of audio samples with dimensions (..., channel, time)
* `sample_rate`: the rate of audio dimensions (samples per second)
* `specgram`: a tensor of spectrogram with dimensions (..., channel, freq, time)
* `mel_specgram`: a mel spectrogram with dimensions (..., channel, mel, time)
* `hop_length`: the number of samples between the starts of consecutive frames
* `n_fft`: the number of Fourier bins
* `n_mels`, `n_mfcc`: the number of mel and MFCC bins
* `n_freq`: the number of bins in a linear spectrogram
* `f_min`: the lowest frequency of the lowest band in a spectrogram
* `f_max`: the highest frequency of the highest band in a spectrogram
* `win_length`: the length of the STFT window
* `window_fn`: for functions that creates windows e.g. `torch.hann_window`
## License ## License
By contributing to Torchaudio, you agree that your contributions will be licensed By contributing to Torchaudio, you agree that your contributions will be licensed
......
...@@ -138,45 +138,6 @@ API Reference ...@@ -138,45 +138,6 @@ API Reference
API Reference is located here: http://pytorch.org/audio/ API Reference is located here: http://pytorch.org/audio/
Conventions
-----------
With torchaudio being a machine learning library and built on top of PyTorch,
torchaudio is standardized around the following naming conventions. Tensors are
assumed to have "channel" as the first dimension and time as the last
dimension (when applicable). Both of these dimensions make the tensors consistent with PyTorch's dimensions.
For size names, the prefix `n_` is used (e.g. "a tensor of size (`n_freq`, `n_mel`)")
whereas dimension names do not have this prefix (e.g. "a tensor of
dimension (channel, time)")
* `waveform`: a tensor of audio samples with dimensions (channel, time)
* `sample_rate`: the rate of audio dimensions (samples per second)
* `specgram`: a tensor of spectrogram with dimensions (channel, freq, time)
* `mel_specgram`: a mel spectrogram with dimensions (channel, mel, time)
* `hop_length`: the number of samples between the starts of consecutive frames
* `n_fft`: the number of Fourier bins
* `n_mel`, `n_mfcc`: the number of mel and MFCC bins
* `n_freq`: the number of bins in a linear spectrogram
* `min_freq`: the lowest frequency of the lowest band in a spectrogram
* `max_freq`: the highest frequency of the highest band in a spectrogram
* `win_length`: the length of the STFT window
* `window_fn`: for functions that creates windows e.g. `torch.hann_window`
Transforms expect and return the following dimensions.
* `Spectrogram`: (channel, time) -> (channel, freq, time)
* `AmplitudeToDB`: (channel, freq, time) -> (channel, freq, time)
* `MelScale`: (channel, freq, time) -> (channel, mel, time)
* `MelSpectrogram`: (channel, time) -> (channel, mel, time)
* `MFCC`: (channel, time) -> (channel, mfcc, time)
* `MuLawEncode`: (channel, time) -> (channel, time)
* `MuLawDecode`: (channel, time) -> (channel, time)
* `Resample`: (channel, time) -> (channel, time)
* `Fade`: (channel, time) -> (channel, time)
* `Vol`: (channel, time) -> (channel, time)
Here, and in the documentation, we use an ellipsis "..." as a placeholder for the rest of the dimensions of a tensor, e.g. optional batching and channel dimensions.
Contributing Guidelines Contributing Guidelines
----------------------- -----------------------
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment