README.md 3.14 KB
Newer Older
Soumith Chintala's avatar
Soumith Chintala committed
1
torchaudio: an audio library for PyTorch
Vincent QB's avatar
Vincent QB committed
2
========================================
Soumith Chintala's avatar
Soumith Chintala committed
3

4
5
[![Build Status](https://travis-ci.org/pytorch/audio.svg?branch=master)](https://travis-ci.org/pytorch/audio)

6
7
8
9
10
- [Support audio I/O (Load files, Save files)](http://pytorch.org/audio/)
  - Load the following formats into a torch Tensor
    - mp3, wav, aac, ogg, flac, avr, cdda, cvs/vms,
    - aiff, au, amr, mp2, mp4, ac3, avi, wmv,
    - mpeg, ircam and any other format supported by libsox.
11
    - [Kaldi (ark/scp)](http://pytorch.org/audio/kaldi_io.html)
12
13
- [Dataloaders for common audio datasets (VCTK, YesNo)](http://pytorch.org/audio/datasets.html)
- Common audio transforms
14
15
16
    - [Spectrogram, SpectrogramToDB, MelScale, MelSpectrogram, MFCC, MuLawEncoding, MuLawDecoding, Resample](http://pytorch.org/audio/transforms.html)
- Compliance interfaces: Run code using PyTorch that align with other libraries
    - [Kaldi: fbank, spectrogram, resample_waveform](https://pytorch.org/audio/compliance.kaldi.html)
Soumith Chintala's avatar
Soumith Chintala committed
17
18
19

Dependencies
------------
20
* pytorch (nightly version needed for development)
Soumith Chintala's avatar
Soumith Chintala committed
21
* libsox v14.3.2 or above
22
* [optional] vesis84/kaldi-io-for-python commit cb46cb1f44318a5d04d4941cf39084c5b021241e or above
Soumith Chintala's avatar
Soumith Chintala committed
23
24
25
26
27
28
29
30
31
32

Quick install on
OSX (Homebrew):
```bash
brew install sox
```
Linux (Ubuntu):
```bash
sudo apt-get install sox libsox-dev libsox-fmt-all
```
33
34
35
36
Anaconda
```bash
conda install -c conda-forge sox
```
Soumith Chintala's avatar
Soumith Chintala committed
37
38
39
40
41

Installation
------------

```bash
Soumith Chintala's avatar
Soumith Chintala committed
42
# Linux
Soumith Chintala's avatar
Soumith Chintala committed
43
python setup.py install
Soumith Chintala's avatar
Soumith Chintala committed
44

45
# OSX
Soumith Chintala's avatar
Soumith Chintala committed
46
MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py install
Soumith Chintala's avatar
Soumith Chintala committed
47
48
49
50
51
52
53
54
```

Quick Usage
-----------

```python
import torchaudio
sound, sample_rate = torchaudio.load('foo.mp3')
SeanNaren's avatar
SeanNaren committed
55
torchaudio.save('foo_save.mp3', sound, sample_rate) # saves tensor to file
Soumith Chintala's avatar
Soumith Chintala committed
56
57
58
```

API Reference
Vincent QB's avatar
Vincent QB committed
59
-------------
SeanNaren's avatar
SeanNaren committed
60

61
API Reference is located here: http://pytorch.org/audio/
Vincent QB's avatar
Vincent QB committed
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90

Conventions
-----------

Torchaudio is standardized around the following naming conventions.

* waveform: a tensor of audio samples with dimensions (channel, time)
* sample_rate: the rate of audio dimensions (samples per second)
* specgram: a tensor of spectrogram with dimensions (channel, freq, time)
* mel_specgram: a mel spectrogram with dimensions (channel, mel, time)
* hop_length: the number of samples between the starts of consecutive frames
* n_fft: the number of Fourier bins
* n_mel, n_mfcc: the number of mel and MFCC bins
* n_freq: the number of bins in a linear spectrogram
* min_freq: the lowest frequency of the lowest band in a spectrogram
* max_freq: the highest frequency of the highest band in a spectrogram
* win_length: the length of the STFT window
* window_fn: for functions that creates windows e.g. torch.hann_window

Transforms expect the following dimensions. In particular, the input of all transforms and functions assumes channel first.

* Spectrogram: (channel, time) -> (channel, freq, time)
* AmplitudeToDB: (channel, freq, time) -> (channel, freq, time)
* MelScale: (channel, time) -> (channel, mel, time)
* MelSpectrogram: (channel, time) -> (channel, mel, time)
* MFCC: (channel, time) -> (channel, mfcc, time)
* MuLawEncode: (channel, time) -> (channel, time)
* MuLawDecode: (channel, time) -> (channel, time)
* Resample: (channel, time) -> (channel, time)