Update and move convention section to CONTRIBUTING.md (#1635)

108a32d9 · yangarbiter · GitHub · e14a2e0c · 108a32d9 · 108a32d9
Unverified Commit 108a32d9 authored Jul 28, 2021 by yangarbiter Committed by GitHub Jul 28, 2021
Hide whitespace changes
Inline Side-by-side

Showing with 33 additions and 39 deletions

CONTRIBUTING.md CONTRIBUTING.md +33 -0

README.md README.md +0 -39

No files found.
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -131,6 +131,39 @@ make html
 The built docs should now be available in `docs/build/html`
+## Conventions
+As a good software development practice, we try to stick to existing variable
+names and shape (for tensors).
+The following are some of the conventions that we follow.
+- We use an ellipsis "..." as a placeholder for the rest of the dimensions of a
+  tensor, e.g. optional batching and channel dimensions. If batching, the
+  "batch" dimension should come in the first diemension.
+- Tensors are assumed to have "channel" dimension coming before the "time"
+  dimension. The bins in frequency domain (freq and mel) are assumed to come
+  before the "time" dimension but after the "channel" dimension. These
+  ordering makes the tensors consistent with PyTorch's dimensions.
+- For size names, the prefix `n_` is used (e.g. "a tensor of size (`n_freq`,
+  `n_mels`)") whereas dimension names do not have this prefix (e.g. "a tensor of
+  dimension (channel, time)")
+Here are some of the examples of commonly used variables with thier names,
+meanings, and shapes (or units):
+* `waveform`: a tensor of audio samples with dimensions (..., channel, time)
+* `sample_rate`: the rate of audio dimensions (samples per second)
+* `specgram`: a tensor of spectrogram with dimensions (..., channel, freq, time)
+* `mel_specgram`: a mel spectrogram with dimensions (..., channel, mel, time)
+* `hop_length`: the number of samples between the starts of consecutive frames
+* `n_fft`: the number of Fourier bins
+* `n_mels`, `n_mfcc`: the number of mel and MFCC bins
+* `n_freq`: the number of bins in a linear spectrogram
+* `f_min`: the lowest frequency of the lowest band in a spectrogram
+* `f_max`: the highest frequency of the highest band in a spectrogram
+* `win_length`: the length of the STFT window
+* `window_fn`: for functions that creates windows e.g. `torch.hann_window`
 ## License
 By contributing to Torchaudio, you agree that your contributions will be licensed

--- a/README.md
+++ b/README.md
@@ -138,45 +138,6 @@ API Reference
 API Reference is located here: http://pytorch.org/audio/
-Conventions
-----------
-With torchaudio being a machine learning library and built on top of PyTorch,
-torchaudio is standardized around the following naming conventions. Tensors are
-assumed to have "channel" as the first dimension and time as the last
-dimension (when applicable). Both of these dimensions make the tensors consistent with PyTorch's dimensions.
-For size names, the prefix `n_` is used (e.g. "a tensor of size (`n_freq`, `n_mel`)")
-whereas dimension names do not have this prefix (e.g. "a tensor of
-dimension (channel, time)")
-* `waveform`: a tensor of audio samples with dimensions (channel, time)
-* `sample_rate`: the rate of audio dimensions (samples per second)
-* `specgram`: a tensor of spectrogram with dimensions (channel, freq, time)
-* `mel_specgram`: a mel spectrogram with dimensions (channel, mel, time)
-* `hop_length`: the number of samples between the starts of consecutive frames
-* `n_fft`: the number of Fourier bins
-* `n_mel`, `n_mfcc`: the number of mel and MFCC bins
-* `n_freq`: the number of bins in a linear spectrogram
-* `min_freq`: the lowest frequency of the lowest band in a spectrogram
-* `max_freq`: the highest frequency of the highest band in a spectrogram
-* `win_length`: the length of the STFT window
-* `window_fn`: for functions that creates windows e.g. `torch.hann_window`
-Transforms expect and return the following dimensions.
-* `Spectrogram`: (channel, time) -> (channel, freq, time)
-* `AmplitudeToDB`: (channel, freq, time) -> (channel, freq, time)
-* `MelScale`: (channel, freq, time) -> (channel, mel, time)
-* `MelSpectrogram`: (channel, time) -> (channel, mel, time)
-* `MFCC`: (channel, time) -> (channel, mfcc, time)
-* `MuLawEncode`: (channel, time) -> (channel, time)
-* `MuLawDecode`: (channel, time) -> (channel, time)
-* `Resample`: (channel, time) -> (channel, time)
-* `Fade`: (channel, time) -> (channel, time)
-* `Vol`: (channel, time) -> (channel, time)
-Here, and in the documentation, we use an ellipsis "..." as a placeholder for the rest of the dimensions of a tensor, e.g. optional batching and channel dimensions.
 Contributing Guidelines
 -----------------------