transforms.rst 3.25 KB
Newer Older
1
2
.. py:module:: torchaudio.transforms

3
torchaudio.transforms
4
=====================
moto's avatar
moto committed
5

6
7
.. currentmodule:: torchaudio.transforms

8
``torchaudio.transforms`` module contains common audio processings and feature extractions. The following diagram shows the relationship between some of the available transforms.
9
10


11
.. image:: https://download.pytorch.org/torchaudio/tutorial-assets/torchaudio_feature_extractions.png
12

13
Transforms are implemented using :class:`torch.nn.Module`. Common ways to build a processing pipeline are to define custom Module class or chain Modules together using :class:`torch.nn.Sequential`, then move it to a target device and data type.
14

15
.. code::
16

17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
   # Define custom feature extraction pipeline.
   #
   # 1. Resample audio
   # 2. Convert to power spectrogram
   # 3. Apply augmentations
   # 4. Convert to mel-scale
   #
   class MyPipeline(torch.nn.Module):
       def __init__(
           self,
           input_freq=16000,
           resample_freq=8000,
           n_fft=1024,
           n_mel=256,
           stretch_factor=0.8,
       ):
           super().__init__()
           self.resample = Resample(orig_freq=input_freq, new_freq=resample_freq)
35

36
           self.spec = Spectrogram(n_fft=n_fft, power=2)
37

38
39
40
41
42
           self.spec_aug = torch.nn.Sequential(
               TimeStretch(stretch_factor, fixed_rate=True),
               FrequencyMasking(freq_mask_param=80),
               TimeMasking(time_mask_param=80),
           )
43

44
45
           self.mel_scale = MelScale(
               n_mels=n_mel, sample_rate=resample_freq, n_stft=n_fft // 2 + 1)
46

47
48
49
       def forward(self, waveform: torch.Tensor) -> torch.Tensor:
           # Resample the input
           resampled = self.resample(waveform)
50

51
52
           # Convert to power spectrogram
           spec = self.spec(resampled)
53

54
55
           # Apply SpecAugment
           spec = self.spec_aug(spec)
56

57
58
           # Convert to mel-scale
           mel = self.mel_scale(spec)
Tomás Osório's avatar
Tomás Osório committed
59

60
           return mel
Tomás Osório's avatar
Tomás Osório committed
61
62


63
.. code::
64

65
66
   # Instantiate a pipeline
   pipeline = MyPipeline()
67

68
69
   # Move the computation graph to CUDA
   pipeline.to(device=torch.device("cuda"), dtype=torch.float32)
70

71
72
   # Perform the transform
   features = pipeline(waveform)
73

74
Please check out tutorials that cover in-depth usage of trasforms.
75

76
.. minigallery:: torchaudio.transforms
Tomás Osório's avatar
Tomás Osório committed
77

78
79
Utility
-------
Tomás Osório's avatar
Tomás Osório committed
80

81
82
83
.. autosummary::
    :toctree: generated
    :nosignatures:
Tomás Osório's avatar
Tomás Osório committed
84

85
86
87
    AmplitudeToDB
    MelScale
    InverseMelScale
88
89
    BarkScale
    InverseBarkScale
90
91
92
93
94
95
    MuLawEncoding
    MuLawDecoding
    Resample
    Fade
    Vol
    Loudness
Artyom Astafurov's avatar
Artyom Astafurov committed
96

97
98
Feature Extractions
-------------------
wanglong001's avatar
wanglong001 committed
99

100
101
102
.. autosummary::
    :toctree: generated
    :nosignatures:
wanglong001's avatar
wanglong001 committed
103

104
105
106
    Spectrogram
    InverseSpectrogram
    MelSpectrogram
107
    BarkSpectrogram
108
109
110
111
112
113
114
115
116
117
    GriffinLim
    MFCC
    LFCC
    ComputeDeltas
    PitchShift
    SlidingWindowCmn
    SpectralCentroid
    Vad

Augmentations
118
-------------
Artyom Astafurov's avatar
Artyom Astafurov committed
119

120
The following transforms implement popular augmentation techniques known as *SpecAugment* :cite:`specaugment`.
Artyom Astafurov's avatar
Artyom Astafurov committed
121

122
123
124
.. autosummary::
    :toctree: generated
    :nosignatures:
moto's avatar
moto committed
125

126
127
128
    FrequencyMasking
    TimeMasking
    TimeStretch
129

130
131
Loss
----
132

133
134
135
.. autosummary::
    :toctree: generated
    :nosignatures:
136

137
    RNNTLoss
moto's avatar
moto committed
138

139
Multi-channel
140
141
-------------

142
143
144
.. autosummary::
    :toctree: generated
    :nosignatures:
Zhaoheng Ni's avatar
Zhaoheng Ni committed
145

146
147
148
149
    PSD
    MVDR
    RTFMVDR
    SoudenMVDR