transforms.rst 3.2 KB
Newer Older
1
2
.. py:module:: torchaudio.transforms

3
torchaudio.transforms
4
=====================
moto's avatar
moto committed
5

6
7
.. currentmodule:: torchaudio.transforms

8
``torchaudio.transforms`` module contains common audio processings and feature extractions. The following diagram shows the relationship between some of the available transforms.
9
10


11
.. image:: https://download.pytorch.org/torchaudio/tutorial-assets/torchaudio_feature_extractions.png
12

13
Transforms are implemented using :class:`torch.nn.Module`. Common ways to build a processing pipeline are to define custom Module class or chain Modules together using :class:`torch.nn.Sequential`, then move it to a target device and data type.
14

15
.. code::
16

17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
   # Define custom feature extraction pipeline.
   #
   # 1. Resample audio
   # 2. Convert to power spectrogram
   # 3. Apply augmentations
   # 4. Convert to mel-scale
   #
   class MyPipeline(torch.nn.Module):
       def __init__(
           self,
           input_freq=16000,
           resample_freq=8000,
           n_fft=1024,
           n_mel=256,
           stretch_factor=0.8,
       ):
           super().__init__()
           self.resample = Resample(orig_freq=input_freq, new_freq=resample_freq)
35

36
           self.spec = Spectrogram(n_fft=n_fft, power=2)
37

38
39
40
41
42
           self.spec_aug = torch.nn.Sequential(
               TimeStretch(stretch_factor, fixed_rate=True),
               FrequencyMasking(freq_mask_param=80),
               TimeMasking(time_mask_param=80),
           )
43

44
45
           self.mel_scale = MelScale(
               n_mels=n_mel, sample_rate=resample_freq, n_stft=n_fft // 2 + 1)
46

47
48
49
       def forward(self, waveform: torch.Tensor) -> torch.Tensor:
           # Resample the input
           resampled = self.resample(waveform)
50

51
52
           # Convert to power spectrogram
           spec = self.spec(resampled)
53

54
55
           # Apply SpecAugment
           spec = self.spec_aug(spec)
56

57
58
           # Convert to mel-scale
           mel = self.mel_scale(spec)
Tomás Osório's avatar
Tomás Osório committed
59

60
           return mel
Tomás Osório's avatar
Tomás Osório committed
61
62


63
.. code::
64

65
66
   # Instantiate a pipeline
   pipeline = MyPipeline()
67

68
69
   # Move the computation graph to CUDA
   pipeline.to(device=torch.device("cuda"), dtype=torch.float32)
70

71
72
   # Perform the transform
   features = pipeline(waveform)
73

74
Please check out tutorials that cover in-depth usage of trasforms.
75

76
.. minigallery:: torchaudio.transforms
Tomás Osório's avatar
Tomás Osório committed
77

78
79
Utility
-------
Tomás Osório's avatar
Tomás Osório committed
80

81
82
83
.. autosummary::
    :toctree: generated
    :nosignatures:
Tomás Osório's avatar
Tomás Osório committed
84

85
86
87
88
89
90
91
    AmplitudeToDB
    MuLawEncoding
    MuLawDecoding
    Resample
    Fade
    Vol
    Loudness
Artyom Astafurov's avatar
Artyom Astafurov committed
92

93
94
Feature Extractions
-------------------
wanglong001's avatar
wanglong001 committed
95

96
97
98
.. autosummary::
    :toctree: generated
    :nosignatures:
wanglong001's avatar
wanglong001 committed
99

100
101
    Spectrogram
    InverseSpectrogram
102
103
    MelScale
    InverseMelScale
104
105
106
107
108
109
110
111
112
113
114
    MelSpectrogram
    GriffinLim
    MFCC
    LFCC
    ComputeDeltas
    PitchShift
    SlidingWindowCmn
    SpectralCentroid
    Vad

Augmentations
115
-------------
Artyom Astafurov's avatar
Artyom Astafurov committed
116

117
The following transforms implement popular augmentation techniques known as *SpecAugment* :cite:`specaugment`.
Artyom Astafurov's avatar
Artyom Astafurov committed
118

119
120
121
.. autosummary::
    :toctree: generated
    :nosignatures:
moto's avatar
moto committed
122

123
124
125
    FrequencyMasking
    TimeMasking
    TimeStretch
126

127
128
Loss
----
129

130
131
132
.. autosummary::
    :toctree: generated
    :nosignatures:
133

134
    RNNTLoss
moto's avatar
moto committed
135

136
Multi-channel
137
138
-------------

139
140
141
.. autosummary::
    :toctree: generated
    :nosignatures:
Zhaoheng Ni's avatar
Zhaoheng Ni committed
142

143
144
145
146
    PSD
    MVDR
    RTFMVDR
    SoudenMVDR