pipelines.rst 8.46 KB
Newer Older
1
2
.. py:module:: torchaudio.pipelines

3
4
5
6
torchaudio.pipelines
====================

.. currentmodule:: torchaudio.pipelines
moto's avatar
moto committed
7
		   
8
The ``torchaudio.pipelines`` module packages pre-trained models with support functions and meta-data into simple APIs tailored to perform specific tasks.
9

10
When using pre-trained models to perform a task, in addition to instantiating the model with pre-trained weights, the client code also needs to build pipelines for feature extractions and post processing in the same way they were done during the training. This requires to carrying over information used during the training, such as the type of transforms and the their parameters (for example, sampling rate the number of FFT bins).
11

12
To make this information tied to a pre-trained model and easily accessible, ``torchaudio.pipelines`` module uses the concept of a `Bundle` class, which defines a set of APIs to instantiate pipelines, and the interface of the pipelines.
13

14
The following figure illustrates this.
15

16
.. image:: https://download.pytorch.org/torchaudio/doc-assets/pipelines-intro.png
17

18
A pre-trained model and associated pipelines are expressed as an instance of ``Bundle``. Different instances of same ``Bundle`` share the interface, but their implementations are not constrained to be of same types. For example, :class:`SourceSeparationBundle` defines the interface for performing source separation, but its instance :data:`CONVTASNET_BASE_LIBRI2MIX` instantiates a model of :class:`~torchaudio.models.ConvTasNet` while :data:`HDEMUCS_HIGH_MUSDB` instantiates a model of :class:`~torchaudio.models.HDemucs`. Still, because they share the same interface, the usage is the same.
19

20
.. note::
21

22
   Under the hood, the implementations of ``Bundle`` use components from other ``torchaudio`` modules, such as :mod:`torchaudio.models` and :mod:`torchaudio.transforms`, or even third party libraries like `SentencPiece <https://github.com/google/sentencepiece>`__ and `DeepPhonemizer <https://github.com/as-ideas/DeepPhonemizer>`__. But this implementation detail is abstracted away from library users.
23

mayp777's avatar
UPDATE  
mayp777 committed
24
25
.. _RNNT:

26
27
RNN-T Streaming/Non-Streaming ASR
---------------------------------
28

29
Interface
mayp777's avatar
UPDATE  
mayp777 committed
30
~~~~~~~~~
31

32
``RNNTBundle`` defines ASR pipelines and consists of three steps: feature extraction, inference, and de-tokenization.
33

34
.. image:: https://download.pytorch.org/torchaudio/doc-assets/pipelines-rnntbundle.png
35

36
37
38
39
.. autosummary::
   :toctree: generated
   :nosignatures:
   :template: autosummary/bundle_class.rst
40

41
42
43
   RNNTBundle
   RNNTBundle.FeatureExtractor
   RNNTBundle.TokenProcessor
44

45
.. rubric:: Tutorials using ``RNNTBundle``
46

47
.. minigallery:: torchaudio.pipelines.RNNTBundle
48

49
Pretrained Models
mayp777's avatar
UPDATE  
mayp777 committed
50
~~~~~~~~~~~~~~~~~
51

52
53
54
55
.. autosummary::
   :toctree: generated
   :nosignatures:
   :template: autosummary/bundle_data.rst
56

57
   EMFORMER_RNNT_BASE_LIBRISPEECH
58
59


mayp777's avatar
UPDATE  
mayp777 committed
60
61
wav2vec 2.0 / HuBERT / WavLM - SSL
----------------------------------
62

63
Interface
mayp777's avatar
UPDATE  
mayp777 committed
64
~~~~~~~~~
65

66
``Wav2Vec2Bundle`` instantiates models that generate acoustic features that can be used for downstream inference and fine-tuning.
67

68
.. image:: https://download.pytorch.org/torchaudio/doc-assets/pipelines-wav2vec2bundle.png
69

70
71
72
73
.. autosummary::
   :toctree: generated
   :nosignatures:
   :template: autosummary/bundle_class.rst
74

75
   Wav2Vec2Bundle
76

77
Pretrained Models
mayp777's avatar
UPDATE  
mayp777 committed
78
~~~~~~~~~~~~~~~~~
79

80
81
82
83
.. autosummary::
   :toctree: generated
   :nosignatures:
   :template: autosummary/bundle_data.rst
84

85
86
87
88
   WAV2VEC2_BASE
   WAV2VEC2_LARGE
   WAV2VEC2_LARGE_LV60K
   WAV2VEC2_XLSR53
mayp777's avatar
UPDATE  
mayp777 committed
89
90
91
   WAV2VEC2_XLSR_300M
   WAV2VEC2_XLSR_1B
   WAV2VEC2_XLSR_2B
92
93
94
   HUBERT_BASE
   HUBERT_LARGE
   HUBERT_XLARGE
mayp777's avatar
UPDATE  
mayp777 committed
95
96
97
   WAVLM_BASE
   WAVLM_BASE_PLUS
   WAVLM_LARGE
98

99
100
wav2vec 2.0 / HuBERT - Fine-tuned ASR
-------------------------------------
101

102
Interface
mayp777's avatar
UPDATE  
mayp777 committed
103
~~~~~~~~~
104

105
``Wav2Vec2ASRBundle`` instantiates models that generate probability distribution over pre-defined labels, that can be used for ASR.
106

107
.. image:: https://download.pytorch.org/torchaudio/doc-assets/pipelines-wav2vec2asrbundle.png
108

109
110
111
112
.. autosummary::
   :toctree: generated
   :nosignatures:
   :template: autosummary/bundle_class.rst
113

114
   Wav2Vec2ASRBundle
115

116
.. rubric:: Tutorials using ``Wav2Vec2ASRBundle``
117

118
.. minigallery:: torchaudio.pipelines.Wav2Vec2ASRBundle
119

120
Pretrained Models
mayp777's avatar
UPDATE  
mayp777 committed
121
~~~~~~~~~~~~~~~~~
122

123
124
125
126
.. autosummary::
   :toctree: generated
   :nosignatures:
   :template: autosummary/bundle_data.rst
127

128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
   WAV2VEC2_ASR_BASE_10M
   WAV2VEC2_ASR_BASE_100H
   WAV2VEC2_ASR_BASE_960H
   WAV2VEC2_ASR_LARGE_10M
   WAV2VEC2_ASR_LARGE_100H
   WAV2VEC2_ASR_LARGE_960H
   WAV2VEC2_ASR_LARGE_LV60K_10M
   WAV2VEC2_ASR_LARGE_LV60K_100H
   WAV2VEC2_ASR_LARGE_LV60K_960H
   VOXPOPULI_ASR_BASE_10K_DE
   VOXPOPULI_ASR_BASE_10K_EN
   VOXPOPULI_ASR_BASE_10K_ES
   VOXPOPULI_ASR_BASE_10K_FR
   VOXPOPULI_ASR_BASE_10K_IT
   HUBERT_ASR_LARGE
   HUBERT_ASR_XLARGE
144

mayp777's avatar
UPDATE  
mayp777 committed
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
wav2vec 2.0 / HuBERT - Forced Alignment
---------------------------------------

Interface
~~~~~~~~~

``Wav2Vec2FABundle`` bundles pre-trained model and its associated dictionary. Additionally, it supports appending ``star`` token dimension.

.. image:: https://download.pytorch.org/torchaudio/doc-assets/pipelines-wav2vec2fabundle.png

.. autosummary::
   :toctree: generated
   :nosignatures:
   :template: autosummary/bundle_class.rst

   Wav2Vec2FABundle
   Wav2Vec2FABundle.Tokenizer
   Wav2Vec2FABundle.Aligner

.. rubric:: Tutorials using ``Wav2Vec2FABundle``

.. minigallery:: torchaudio.pipelines.Wav2Vec2FABundle

Pertrained Models
~~~~~~~~~~~~~~~~~
moto's avatar
moto committed
170

mayp777's avatar
UPDATE  
mayp777 committed
171
172
173
174
175
176
177
178
179
.. autosummary::
   :toctree: generated
   :nosignatures:
   :template: autosummary/bundle_data.rst

   MMS_FA

.. _Tacotron2:
   
moto's avatar
moto committed
180
181
182
Tacotron2 Text-To-Speech
------------------------

183
``Tacotron2TTSBundle`` defines text-to-speech pipelines and consists of three steps: tokenization, spectrogram generation and vocoder. The spectrogram generation is based on :class:`~torchaudio.models.Tacotron2` model.
moto's avatar
moto committed
184

185
.. image:: https://download.pytorch.org/torchaudio/doc-assets/pipelines-tacotron2bundle.png
moto's avatar
moto committed
186

187
``TextProcessor`` can be rule-based tokenization in the case of characters, or it can be a neural-netowrk-based G2P model that generates sequence of phonemes from input text.
moto's avatar
moto committed
188

189
Similarly ``Vocoder`` can be an algorithm without learning parameters, like `Griffin-Lim`, or a neural-network-based model like `Waveglow`.
moto's avatar
moto committed
190

191
Interface
mayp777's avatar
UPDATE  
mayp777 committed
192
~~~~~~~~~
moto's avatar
moto committed
193

194
195
196
197
.. autosummary::
   :toctree: generated
   :nosignatures:
   :template: autosummary/bundle_class.rst
moto's avatar
moto committed
198

199
200
201
   Tacotron2TTSBundle
   Tacotron2TTSBundle.TextProcessor
   Tacotron2TTSBundle.Vocoder
moto's avatar
moto committed
202

203
.. rubric:: Tutorials using ``Tacotron2TTSBundle``
moto's avatar
moto committed
204

205
.. minigallery:: torchaudio.pipelines.Tacotron2TTSBundle
moto's avatar
moto committed
206

207
Pretrained Models
mayp777's avatar
UPDATE  
mayp777 committed
208
~~~~~~~~~~~~~~~~~
moto's avatar
moto committed
209

210
211
212
213
.. autosummary::
   :toctree: generated
   :nosignatures:
   :template: autosummary/bundle_data.rst
moto's avatar
moto committed
214

215
216
217
218
   TACOTRON2_WAVERNN_PHONE_LJSPEECH
   TACOTRON2_WAVERNN_CHAR_LJSPEECH
   TACOTRON2_GRIFFINLIM_PHONE_LJSPEECH
   TACOTRON2_GRIFFINLIM_CHAR_LJSPEECH
219

220
221
222
Source Separation
-----------------

223
Interface
mayp777's avatar
UPDATE  
mayp777 committed
224
~~~~~~~~~
225

226
``SourceSeparationBundle`` instantiates source separation models which take single channel audio and generates multi-channel audio.
227

228
.. image:: https://download.pytorch.org/torchaudio/doc-assets/pipelines-sourceseparationbundle.png
229

230
231
232
233
.. autosummary::
   :toctree: generated
   :nosignatures:
   :template: autosummary/bundle_class.rst
234

235
   SourceSeparationBundle
236

237
.. rubric:: Tutorials using ``SourceSeparationBundle``
238

239
.. minigallery:: torchaudio.pipelines.SourceSeparationBundle
240

241
Pretrained Models
mayp777's avatar
UPDATE  
mayp777 committed
242
~~~~~~~~~~~~~~~~~
243

244
245
246
247
.. autosummary::
   :toctree: generated
   :nosignatures:
   :template: autosummary/bundle_data.rst
248

249
250
251
   CONVTASNET_BASE_LIBRI2MIX
   HDEMUCS_HIGH_MUSDB_PLUS
   HDEMUCS_HIGH_MUSDB
mayp777's avatar
UPDATE  
mayp777 committed
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301

Squim Objective
---------------

Interface
~~~~~~~~~

:py:class:`SquimObjectiveBundle` defines speech quality and intelligibility measurement (SQUIM) pipeline that can predict **objecive** metric scores given the input waveform.

.. autosummary::
   :toctree: generated
   :nosignatures:
   :template: autosummary/bundle_class.rst

   SquimObjectiveBundle

Pretrained Models
~~~~~~~~~~~~~~~~~

.. autosummary::
   :toctree: generated
   :nosignatures:
   :template: autosummary/bundle_data.rst

   SQUIM_OBJECTIVE

Squim Subjective
----------------

Interface
~~~~~~~~~

:py:class:`SquimSubjectiveBundle` defines speech quality and intelligibility measurement (SQUIM) pipeline that can predict **subjective** metric scores given the input waveform.

.. autosummary::
   :toctree: generated
   :nosignatures:
   :template: autosummary/bundle_class.rst

   SquimSubjectiveBundle

Pretrained Models
~~~~~~~~~~~~~~~~~

.. autosummary::
   :toctree: generated
   :nosignatures:
   :template: autosummary/bundle_data.rst

   SQUIM_SUBJECTIVE