transforms.rst 3.3 KB
Newer Older
1
2
.. py:module:: torchaudio.transforms

3
torchaudio.transforms
4
=====================
moto's avatar
moto committed
5

6
7
.. currentmodule:: torchaudio.transforms

8
``torchaudio.transforms`` module contains common audio processings and feature extractions. The following diagram shows the relationship between some of the available transforms.
9
10


11
.. image:: https://download.pytorch.org/torchaudio/tutorial-assets/torchaudio_feature_extractions.png
12

13
Transforms are implemented using :class:`torch.nn.Module`. Common ways to build a processing pipeline are to define custom Module class or chain Modules together using :class:`torch.nn.Sequential`, then move it to a target device and data type.
14

15
.. code::
16

17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
   # Define custom feature extraction pipeline.
   #
   # 1. Resample audio
   # 2. Convert to power spectrogram
   # 3. Apply augmentations
   # 4. Convert to mel-scale
   #
   class MyPipeline(torch.nn.Module):
       def __init__(
           self,
           input_freq=16000,
           resample_freq=8000,
           n_fft=1024,
           n_mel=256,
           stretch_factor=0.8,
       ):
           super().__init__()
           self.resample = Resample(orig_freq=input_freq, new_freq=resample_freq)
35

36
           self.spec = Spectrogram(n_fft=n_fft, power=2)
37

38
39
40
41
42
           self.spec_aug = torch.nn.Sequential(
               TimeStretch(stretch_factor, fixed_rate=True),
               FrequencyMasking(freq_mask_param=80),
               TimeMasking(time_mask_param=80),
           )
43

44
45
           self.mel_scale = MelScale(
               n_mels=n_mel, sample_rate=resample_freq, n_stft=n_fft // 2 + 1)
46

47
48
49
       def forward(self, waveform: torch.Tensor) -> torch.Tensor:
           # Resample the input
           resampled = self.resample(waveform)
50

51
52
           # Convert to power spectrogram
           spec = self.spec(resampled)
53

54
55
           # Apply SpecAugment
           spec = self.spec_aug(spec)
56

57
58
           # Convert to mel-scale
           mel = self.mel_scale(spec)
Tomás Osório's avatar
Tomás Osório committed
59

60
           return mel
Tomás Osório's avatar
Tomás Osório committed
61
62


63
.. code::
64

65
66
   # Instantiate a pipeline
   pipeline = MyPipeline()
67

68
69
   # Move the computation graph to CUDA
   pipeline.to(device=torch.device("cuda"), dtype=torch.float32)
70

71
72
   # Perform the transform
   features = pipeline(waveform)
73

74
Please check out tutorials that cover in-depth usage of trasforms.
75

76
.. minigallery:: torchaudio.transforms
Tomás Osório's avatar
Tomás Osório committed
77

78
79
Utility
-------
Tomás Osório's avatar
Tomás Osório committed
80

81
82
83
.. autosummary::
    :toctree: generated
    :nosignatures:
Tomás Osório's avatar
Tomás Osório committed
84

85
86
87
88
89
90
91
    AmplitudeToDB
    MuLawEncoding
    MuLawDecoding
    Resample
    Fade
    Vol
    Loudness
92
93
94
95
96
97
98
    AddNoise
    Convolve
    FFTConvolve
    Speed
    SpeedPerturbation
    Deemphasis
    Preemphasis
Artyom Astafurov's avatar
Artyom Astafurov committed
99

100
101
Feature Extractions
-------------------
wanglong001's avatar
wanglong001 committed
102

103
104
105
.. autosummary::
    :toctree: generated
    :nosignatures:
wanglong001's avatar
wanglong001 committed
106

107
108
    Spectrogram
    InverseSpectrogram
109
110
    MelScale
    InverseMelScale
111
112
113
114
115
116
117
118
119
120
121
    MelSpectrogram
    GriffinLim
    MFCC
    LFCC
    ComputeDeltas
    PitchShift
    SlidingWindowCmn
    SpectralCentroid
    Vad

Augmentations
122
-------------
Artyom Astafurov's avatar
Artyom Astafurov committed
123

124
The following transforms implement popular augmentation techniques known as *SpecAugment* :cite:`specaugment`.
Artyom Astafurov's avatar
Artyom Astafurov committed
125

126
127
128
.. autosummary::
    :toctree: generated
    :nosignatures:
moto's avatar
moto committed
129

130
131
132
    FrequencyMasking
    TimeMasking
    TimeStretch
133

134
135
Loss
----
136

137
138
139
.. autosummary::
    :toctree: generated
    :nosignatures:
140

141
    RNNTLoss
moto's avatar
moto committed
142

143
Multi-channel
144
145
-------------

146
147
148
.. autosummary::
    :toctree: generated
    :nosignatures:
Zhaoheng Ni's avatar
Zhaoheng Ni committed
149

150
151
152
153
    PSD
    MVDR
    RTFMVDR
    SoudenMVDR