transforms.rst 3.18 KB
Newer Older
1
2
.. py:module:: torchaudio.transforms

3
torchaudio.transforms
4
=====================
moto's avatar
moto committed
5

6
7
.. currentmodule:: torchaudio.transforms

8
``torchaudio.transforms`` module contains common audio processings and feature extractions. The following diagram shows the relationship between some of the available transforms.
9
10


11
.. image:: https://download.pytorch.org/torchaudio/tutorial-assets/torchaudio_feature_extractions.png
12

13
Transforms are implemented using :class:`torch.nn.Module`. Common ways to build a processing pipeline are to define custom Module class or chain Modules together using :class:`torch.nn.Sequential`, then move it to a target device and data type.
14

15
.. code::
16

17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
   # Define custom feature extraction pipeline.
   #
   # 1. Resample audio
   # 2. Convert to power spectrogram
   # 3. Apply augmentations
   # 4. Convert to mel-scale
   #
   class MyPipeline(torch.nn.Module):
       def __init__(
           self,
           input_freq=16000,
           resample_freq=8000,
           n_fft=1024,
           n_mel=256,
           stretch_factor=0.8,
       ):
           super().__init__()
           self.resample = Resample(orig_freq=input_freq, new_freq=resample_freq)
35

36
           self.spec = Spectrogram(n_fft=n_fft, power=2)
37

38
39
40
41
42
           self.spec_aug = torch.nn.Sequential(
               TimeStretch(stretch_factor, fixed_rate=True),
               FrequencyMasking(freq_mask_param=80),
               TimeMasking(time_mask_param=80),
           )
43

44
45
           self.mel_scale = MelScale(
               n_mels=n_mel, sample_rate=resample_freq, n_stft=n_fft // 2 + 1)
46

47
48
49
       def forward(self, waveform: torch.Tensor) -> torch.Tensor:
           # Resample the input
           resampled = self.resample(waveform)
50

51
52
           # Convert to power spectrogram
           spec = self.spec(resampled)
53

54
55
           # Apply SpecAugment
           spec = self.spec_aug(spec)
56

57
58
           # Convert to mel-scale
           mel = self.mel_scale(spec)
Tomás Osório's avatar
Tomás Osório committed
59

60
           return mel
Tomás Osório's avatar
Tomás Osório committed
61
62


63
.. code::
64

65
66
   # Instantiate a pipeline
   pipeline = MyPipeline()
67

68
69
   # Move the computation graph to CUDA
   pipeline.to(device=torch.device("cuda"), dtype=torch.float32)
70

71
72
   # Perform the transform
   features = pipeline(waveform)
73

74
Please check out tutorials that cover in-depth usage of trasforms.
75

76
.. minigallery:: torchaudio.transforms
Tomás Osório's avatar
Tomás Osório committed
77

78
79
Utility
-------
Tomás Osório's avatar
Tomás Osório committed
80

81
82
83
.. autosummary::
    :toctree: generated
    :nosignatures:
Tomás Osório's avatar
Tomás Osório committed
84

85
86
87
88
89
90
91
92
    AmplitudeToDB
    MelScale
    MuLawEncoding
    MuLawDecoding
    Resample
    Fade
    Vol
    Loudness
Artyom Astafurov's avatar
Artyom Astafurov committed
93

94
95
Feature Extractions
-------------------
wanglong001's avatar
wanglong001 committed
96

97
98
99
.. autosummary::
    :toctree: generated
    :nosignatures:
wanglong001's avatar
wanglong001 committed
100

101
102
103
104
105
106
107
108
109
110
111
112
113
    Spectrogram
    InverseSpectrogram
    MelSpectrogram
    GriffinLim
    MFCC
    LFCC
    ComputeDeltas
    PitchShift
    SlidingWindowCmn
    SpectralCentroid
    Vad

Augmentations
114
-------------
Artyom Astafurov's avatar
Artyom Astafurov committed
115

116
The following transforms implement popular augmentation techniques known as *SpecAugment* :cite:`specaugment`.
Artyom Astafurov's avatar
Artyom Astafurov committed
117

118
119
120
.. autosummary::
    :toctree: generated
    :nosignatures:
moto's avatar
moto committed
121

122
123
124
    FrequencyMasking
    TimeMasking
    TimeStretch
125

126
127
Loss
----
128

129
130
131
.. autosummary::
    :toctree: generated
    :nosignatures:
132

133
    RNNTLoss
moto's avatar
moto committed
134

135
Multi-channel
136
137
-------------

138
139
140
.. autosummary::
    :toctree: generated
    :nosignatures:
Zhaoheng Ni's avatar
Zhaoheng Ni committed
141

142
143
144
145
    PSD
    MVDR
    RTFMVDR
    SoudenMVDR