README.md 2.93 KB
Newer Older
Soumith Chintala's avatar
Soumith Chintala committed
1
torchaudio: an audio library for PyTorch
Vincent QB's avatar
Vincent QB committed
2
========================================
Soumith Chintala's avatar
Soumith Chintala committed
3

4
5
[![Build Status](https://travis-ci.org/pytorch/audio.svg?branch=master)](https://travis-ci.org/pytorch/audio)

6
7
8
9
10
- [Support audio I/O (Load files, Save files)](http://pytorch.org/audio/)
  - Load the following formats into a torch Tensor
    - mp3, wav, aac, ogg, flac, avr, cdda, cvs/vms,
    - aiff, au, amr, mp2, mp4, ac3, avi, wmv,
    - mpeg, ircam and any other format supported by libsox.
11
    - [Kaldi (ark/scp)](http://pytorch.org/audio/kaldi_io.html)
12
13
14
- [Dataloaders for common audio datasets (VCTK, YesNo)](http://pytorch.org/audio/datasets.html)
- Common audio transforms
  - [Scale, PadTrim, DownmixMono, LC2CL, BLC2CBL, MuLawEncoding, MuLawExpanding](http://pytorch.org/audio/transforms.html)
Soumith Chintala's avatar
Soumith Chintala committed
15
16
17

Dependencies
------------
18
* pytorch (nightly version needed for development)
Soumith Chintala's avatar
Soumith Chintala committed
19
* libsox v14.3.2 or above
20
* [optional] vesis84/kaldi-io-for-python commit cb46cb1f44318a5d04d4941cf39084c5b021241e or above
Soumith Chintala's avatar
Soumith Chintala committed
21
22
23
24
25
26
27
28
29
30

Quick install on
OSX (Homebrew):
```bash
brew install sox
```
Linux (Ubuntu):
```bash
sudo apt-get install sox libsox-dev libsox-fmt-all
```
31
32
33
34
Anaconda
```bash
conda install -c conda-forge sox
```
Soumith Chintala's avatar
Soumith Chintala committed
35
36
37
38
39

Installation
------------

```bash
Soumith Chintala's avatar
Soumith Chintala committed
40
# Linux
Soumith Chintala's avatar
Soumith Chintala committed
41
python setup.py install
Soumith Chintala's avatar
Soumith Chintala committed
42

43
# OSX
Soumith Chintala's avatar
Soumith Chintala committed
44
MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py install
Soumith Chintala's avatar
Soumith Chintala committed
45
46
47
48
49
50
51
52
```

Quick Usage
-----------

```python
import torchaudio
sound, sample_rate = torchaudio.load('foo.mp3')
SeanNaren's avatar
SeanNaren committed
53
torchaudio.save('foo_save.mp3', sound, sample_rate) # saves tensor to file
Soumith Chintala's avatar
Soumith Chintala committed
54
55
56
```

API Reference
Vincent QB's avatar
Vincent QB committed
57
-------------
SeanNaren's avatar
SeanNaren committed
58

59
API Reference is located here: http://pytorch.org/audio/
Vincent QB's avatar
Vincent QB committed
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88

Conventions
-----------

Torchaudio is standardized around the following naming conventions.

* waveform: a tensor of audio samples with dimensions (channel, time)
* sample_rate: the rate of audio dimensions (samples per second)
* specgram: a tensor of spectrogram with dimensions (channel, freq, time)
* mel_specgram: a mel spectrogram with dimensions (channel, mel, time)
* hop_length: the number of samples between the starts of consecutive frames
* n_fft: the number of Fourier bins
* n_mel, n_mfcc: the number of mel and MFCC bins
* n_freq: the number of bins in a linear spectrogram
* min_freq: the lowest frequency of the lowest band in a spectrogram
* max_freq: the highest frequency of the highest band in a spectrogram
* win_length: the length of the STFT window
* window_fn: for functions that creates windows e.g. torch.hann_window

Transforms expect the following dimensions. In particular, the input of all transforms and functions assumes channel first.

* Spectrogram: (channel, time) -> (channel, freq, time)
* AmplitudeToDB: (channel, freq, time) -> (channel, freq, time)
* MelScale: (channel, time) -> (channel, mel, time)
* MelSpectrogram: (channel, time) -> (channel, mel, time)
* MFCC: (channel, time) -> (channel, mfcc, time)
* MuLawEncode: (channel, time) -> (channel, time)
* MuLawDecode: (channel, time) -> (channel, time)
* Resample: (channel, time) -> (channel, time)