README updates (#180)

ae3070cc · jamarshon · cpuhrsch · 33bc3581 · ae3070cc
Commit ae3070cc authored Jul 30, 2019 by jamarshon Committed by cpuhrsch Jul 30, 2019
Hide whitespace changes
Inline Side-by-side

Showing with 53 additions and 25 deletions

README.md README.md +53 -25

No files found.
--- a/README.md
+++ b/README.md
@@ -3,6 +3,15 @@ torchaudio: an audio library for PyTorch

 [![Build Status](https://travis-ci.org/pytorch/audio.svg?branch=master)](https://travis-ci.org/pytorch/audio)

+The aim of torchaudio is to apply [PyTorch](https://github.com/pytorch/pytorch) to
+the audio domain. By supporting PyTorch, torchaudio will follow the same philosophy
+of providing strong GPU acceleration, having a focus on trainable features through
+the autograd system, and having consistent style (tensor names and dimension names).
+Therefore, it will be primarily a machine learning library and not a general signal
+processing library. The benefits of Pytorch will be seen in torchaudio through
+having all the computations be through Pytorch operations which makes it easy
+to use and feel like a natural extension.
+
 - [Support audio I/O (Load files, Save files)](http://pytorch.org/audio/)
  - Load the following formats into a torch Tensor
    - mp3, wav, aac, ogg, flac, avr, cdda, cvs/vms,
@@ -63,28 +72,47 @@ API Reference is located here: http://pytorch.org/audio/
 Conventions
 -----------

-Torchaudio is standardized around the following naming conventions.
-
-* waveform: a tensor of audio samples with dimensions (channel, time)
-* sample_rate: the rate of audio dimensions (samples per second)
-* specgram: a tensor of spectrogram with dimensions (channel, freq, time)
-* mel_specgram: a mel spectrogram with dimensions (channel, mel, time)
-* hop_length: the number of samples between the starts of consecutive frames
-* n_fft: the number of Fourier bins
-* n_mel, n_mfcc: the number of mel and MFCC bins
-* n_freq: the number of bins in a linear spectrogram
-* min_freq: the lowest frequency of the lowest band in a spectrogram
-* max_freq: the highest frequency of the highest band in a spectrogram
-* win_length: the length of the STFT window
-* window_fn: for functions that creates windows e.g. torch.hann_window
-
-Transforms expect the following dimensions. In particular, the input of all transforms and functions assumes channel first.
-
-* Spectrogram: (channel, time) -> (channel, freq, time)
-* AmplitudeToDB: (channel, freq, time) -> (channel, freq, time)
-* MelScale: (channel, time) -> (channel, mel, time)
-* MelSpectrogram: (channel, time) -> (channel, mel, time)
-* MFCC: (channel, time) -> (channel, mfcc, time)
-* MuLawEncode: (channel, time) -> (channel, time)
-* MuLawDecode: (channel, time) -> (channel, time)
-* Resample: (channel, time) -> (channel, time)
+With torchaudio being a machine learning library and built on top of PyTorch,
+torchaudio is standardized around the following naming conventions. In particular,
+tensors are assumed to have channel as the first dimension and time as the last
+dimension (when applicable). This makes it consistent with PyTorch's dimensions.
+For size names, the prefix `n_` is used (e.g. "a tensor of size (`n_freq`, `n_mel`)")
+whereas dimension names do not have this prefix (e.g. "a tensor of
+dimension (channel, time)")
+
+* `waveform`: a tensor of audio samples with dimensions (channel, time)
+* `sample_rate`: the rate of audio dimensions (samples per second)
+* `specgram`: a tensor of spectrogram with dimensions (channel, freq, time)
+* `mel_specgram`: a mel spectrogram with dimensions (channel, mel, time)
+* `hop_length`: the number of samples between the starts of consecutive frames
+* `n_fft`: the number of Fourier bins
+* `n_mel`, `n_mfcc`: the number of mel and MFCC bins
+* `n_freq`: the number of bins in a linear spectrogram
+* `min_freq`: the lowest frequency of the lowest band in a spectrogram
+* `max_freq`: the highest frequency of the highest band in a spectrogram
+* `win_length`: the length of the STFT window
+* `window_fn`: for functions that creates windows e.g. torch.hann_window
+
+Transforms expect the following dimensions.
+
+* `Spectrogram`: (channel, time) -> (channel, freq, time)
+* `AmplitudeToDB`: (channel, freq, time) -> (channel, freq, time)
+* `MelScale`: (channel, time) -> (channel, mel, time)
+* `MelSpectrogram`: (channel, time) -> (channel, mel, time)
+* `MFCC`: (channel, time) -> (channel, mfcc, time)
+* `MuLawEncode`: (channel, time) -> (channel, time)
+* `MuLawDecode`: (channel, time) -> (channel, time)
+* `Resample`: (channel, time) -> (channel, time)
+
+Contributing Guidelines
+-----------------------
+
+Please let us know if you encounter a bug by filing an [issue](https://github.com/pytorch/audio/issues).
+
+We appreciate all contributions. If you are planning to contribute back
+bug-fixes, please do so without any further discussion.
+
+If you plan to contribute new features, utility functions or extensions to the
+core, please first open an issue and discuss the feature with us. Sending a PR
+without discussion might end up resulting in a rejected PR, because we might be
+taking the core in a different direction than you might be aware of.