Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
Torchaudio
Commits
108a32d9
Unverified
Commit
108a32d9
authored
Jul 28, 2021
by
yangarbiter
Committed by
GitHub
Jul 28, 2021
Browse files
Update and move convention section to CONTRIBUTING.md (#1635)
parent
e14a2e0c
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
33 additions
and
39 deletions
+33
-39
CONTRIBUTING.md
CONTRIBUTING.md
+33
-0
README.md
README.md
+0
-39
No files found.
CONTRIBUTING.md
View file @
108a32d9
...
@@ -131,6 +131,39 @@ make html
...
@@ -131,6 +131,39 @@ make html
The built docs should now be available in
`docs/build/html`
The built docs should now be available in
`docs/build/html`
## Conventions
As a good software development practice, we try to stick to existing variable
names and shape (for tensors).
The following are some of the conventions that we follow.
-
We use an ellipsis "..." as a placeholder for the rest of the dimensions of a
tensor, e.g. optional batching and channel dimensions. If batching, the
"batch" dimension should come in the first diemension.
-
Tensors are assumed to have "channel" dimension coming before the "time"
dimension. The bins in frequency domain (freq and mel) are assumed to come
before the "time" dimension but after the "channel" dimension. These
ordering makes the tensors consistent with PyTorch's dimensions.
-
For size names, the prefix
`n_`
is used (e.g. "a tensor of size (
`n_freq`
,
`n_mels`
)") whereas dimension names do not have this prefix (e.g. "a tensor of
dimension (channel, time)")
Here are some of the examples of commonly used variables with thier names,
meanings, and shapes (or units):
*
`waveform`
: a tensor of audio samples with dimensions (..., channel, time)
*
`sample_rate`
: the rate of audio dimensions (samples per second)
*
`specgram`
: a tensor of spectrogram with dimensions (..., channel, freq, time)
*
`mel_specgram`
: a mel spectrogram with dimensions (..., channel, mel, time)
*
`hop_length`
: the number of samples between the starts of consecutive frames
*
`n_fft`
: the number of Fourier bins
*
`n_mels`
,
`n_mfcc`
: the number of mel and MFCC bins
*
`n_freq`
: the number of bins in a linear spectrogram
*
`f_min`
: the lowest frequency of the lowest band in a spectrogram
*
`f_max`
: the highest frequency of the highest band in a spectrogram
*
`win_length`
: the length of the STFT window
*
`window_fn`
: for functions that creates windows e.g.
`torch.hann_window`
## License
## License
By contributing to Torchaudio, you agree that your contributions will be licensed
By contributing to Torchaudio, you agree that your contributions will be licensed
...
...
README.md
View file @
108a32d9
...
@@ -138,45 +138,6 @@ API Reference
...
@@ -138,45 +138,6 @@ API Reference
API Reference is located here: http://pytorch.org/audio/
API Reference is located here: http://pytorch.org/audio/
Conventions
-----------
With torchaudio being a machine learning library and built on top of PyTorch,
torchaudio is standardized around the following naming conventions. Tensors are
assumed to have "channel" as the first dimension and time as the last
dimension (when applicable). Both of these dimensions make the tensors consistent with PyTorch's dimensions.
For size names, the prefix
`n_`
is used (e.g. "a tensor of size (
`n_freq`
,
`n_mel`
)")
whereas dimension names do not have this prefix (e.g. "a tensor of
dimension (channel, time)")
*
`waveform`
: a tensor of audio samples with dimensions (channel, time)
*
`sample_rate`
: the rate of audio dimensions (samples per second)
*
`specgram`
: a tensor of spectrogram with dimensions (channel, freq, time)
*
`mel_specgram`
: a mel spectrogram with dimensions (channel, mel, time)
*
`hop_length`
: the number of samples between the starts of consecutive frames
*
`n_fft`
: the number of Fourier bins
*
`n_mel`
,
`n_mfcc`
: the number of mel and MFCC bins
*
`n_freq`
: the number of bins in a linear spectrogram
*
`min_freq`
: the lowest frequency of the lowest band in a spectrogram
*
`max_freq`
: the highest frequency of the highest band in a spectrogram
*
`win_length`
: the length of the STFT window
*
`window_fn`
: for functions that creates windows e.g.
`torch.hann_window`
Transforms expect and return the following dimensions.
*
`Spectrogram`
: (channel, time) -> (channel, freq, time)
*
`AmplitudeToDB`
: (channel, freq, time) -> (channel, freq, time)
*
`MelScale`
: (channel, freq, time) -> (channel, mel, time)
*
`MelSpectrogram`
: (channel, time) -> (channel, mel, time)
*
`MFCC`
: (channel, time) -> (channel, mfcc, time)
*
`MuLawEncode`
: (channel, time) -> (channel, time)
*
`MuLawDecode`
: (channel, time) -> (channel, time)
*
`Resample`
: (channel, time) -> (channel, time)
*
`Fade`
: (channel, time) -> (channel, time)
*
`Vol`
: (channel, time) -> (channel, time)
Here, and in the documentation, we use an ellipsis "..." as a placeholder for the rest of the dimensions of a tensor, e.g. optional batching and channel dimensions.
Contributing Guidelines
Contributing Guidelines
-----------------------
-----------------------
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment