transforms.rst 3.16 KB
Newer Older
1
torchaudio.transforms
2
=====================
moto's avatar
moto committed
3

4
5
.. currentmodule:: torchaudio.transforms

6
:mod:`torchaudio.transforms` module contains common audio processings and feature extractions. The following diagram shows the relationship between some of the available transforms.
7
8


9
.. image:: https://download.pytorch.org/torchaudio/tutorial-assets/torchaudio_feature_extractions.png
10

11
Transforms are implemented using :class:`torch.nn.Module`. Common ways to build a processing pipeline are to define custom Module class or chain Modules together using :class:`torch.nn.Sequential`, then move it to a target device and data type.
12

13
.. code::
14

15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
   # Define custom feature extraction pipeline.
   #
   # 1. Resample audio
   # 2. Convert to power spectrogram
   # 3. Apply augmentations
   # 4. Convert to mel-scale
   #
   class MyPipeline(torch.nn.Module):
       def __init__(
           self,
           input_freq=16000,
           resample_freq=8000,
           n_fft=1024,
           n_mel=256,
           stretch_factor=0.8,
       ):
           super().__init__()
           self.resample = Resample(orig_freq=input_freq, new_freq=resample_freq)
33

34
           self.spec = Spectrogram(n_fft=n_fft, power=2)
35

36
37
38
39
40
           self.spec_aug = torch.nn.Sequential(
               TimeStretch(stretch_factor, fixed_rate=True),
               FrequencyMasking(freq_mask_param=80),
               TimeMasking(time_mask_param=80),
           )
41

42
43
           self.mel_scale = MelScale(
               n_mels=n_mel, sample_rate=resample_freq, n_stft=n_fft // 2 + 1)
44

45
46
47
       def forward(self, waveform: torch.Tensor) -> torch.Tensor:
           # Resample the input
           resampled = self.resample(waveform)
48

49
50
           # Convert to power spectrogram
           spec = self.spec(resampled)
51

52
53
           # Apply SpecAugment
           spec = self.spec_aug(spec)
54

55
56
           # Convert to mel-scale
           mel = self.mel_scale(spec)
Tomás Osório's avatar
Tomás Osório committed
57

58
           return mel
Tomás Osório's avatar
Tomás Osório committed
59
60


61
.. code::
62

63
64
   # Instantiate a pipeline
   pipeline = MyPipeline()
65

66
67
   # Move the computation graph to CUDA
   pipeline.to(device=torch.device("cuda"), dtype=torch.float32)
68

69
70
   # Perform the transform
   features = pipeline(waveform)
71

72
Please check out tutorials that cover in-depth usage of trasforms.
73

74
.. minigallery:: torchaudio.transforms
Tomás Osório's avatar
Tomás Osório committed
75

76
77
Utility
-------
Tomás Osório's avatar
Tomás Osório committed
78

79
80
81
.. autosummary::
    :toctree: generated
    :nosignatures:
Tomás Osório's avatar
Tomás Osório committed
82

83
84
85
86
87
88
89
90
91
    AmplitudeToDB
    MelScale
    InverseMelScale
    MuLawEncoding
    MuLawDecoding
    Resample
    Fade
    Vol
    Loudness
Artyom Astafurov's avatar
Artyom Astafurov committed
92

93
94
Feature Extractions
-------------------
wanglong001's avatar
wanglong001 committed
95

96
97
98
.. autosummary::
    :toctree: generated
    :nosignatures:
wanglong001's avatar
wanglong001 committed
99

100
101
102
103
104
105
106
107
108
109
110
111
112
    Spectrogram
    InverseSpectrogram
    MelSpectrogram
    GriffinLim
    MFCC
    LFCC
    ComputeDeltas
    PitchShift
    SlidingWindowCmn
    SpectralCentroid
    Vad

Augmentations
113
-------------
Artyom Astafurov's avatar
Artyom Astafurov committed
114

115
The following transforms implement popular augmentation techniques known as *SpecAugment* :cite:`specaugment`.
Artyom Astafurov's avatar
Artyom Astafurov committed
116

117
118
119
.. autosummary::
    :toctree: generated
    :nosignatures:
moto's avatar
moto committed
120

121
122
123
    FrequencyMasking
    TimeMasking
    TimeStretch
124

125
126
Loss
----
127

128
129
130
.. autosummary::
    :toctree: generated
    :nosignatures:
131

132
    RNNTLoss
moto's avatar
moto committed
133

134
Multi-channel
135
136
-------------

137
138
139
.. autosummary::
    :toctree: generated
    :nosignatures:
Zhaoheng Ni's avatar
Zhaoheng Ni committed
140

141
142
143
144
    PSD
    MVDR
    RTFMVDR
    SoudenMVDR