Move wav2vec2 pretrained models to pipelines module (#1876)

- Move wav2vec2 pretrained weights to `torchaudio.pipelines` namespace to align with #1872. - Split `Wav2Vec2PretrainedModelBundle` into `Wav2Vec2Bundle` (for pre-training model) and `Wav2Vec2ASRBundle` (for models fine-tuned for ASR). - Update base URL

Move wav2vec2 pretrained models to pipelines module (#1876)
- Move wav2vec2 pretrained weights to `torchaudio.pipelines` namespace to align with #1872. - Split `Wav2Vec2PretrainedModelBundle` into `Wav2Vec2Bundle` (for pre-training model) and `Wav2Vec2ASRBundle` (for models fine-tuned for ASR). - Update base URL
fad855cd · moto · GitHub · c22962d1 · fad855cd · fad855cd
Unverified Commit fad855cd authored Oct 15, 2021 by moto Committed by GitHub Oct 15, 2021
8 changed files
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -35,6 +35,7 @@ The :mod:`torchaudio` package consists of I/O, popular datasets and common audio
   transforms
   datasets
   models
+   pipelines
   sox_effects
   compliance.kaldi
   kaldi_io

--- a/docs/source/models.rst
+++ b/docs/source/models.rst
@@ -111,160 +111,6 @@ hubert_xlarge
 .. autofunction:: hubert_xlarge
-Pre-trained Models
------------------
-.. autoclass:: Wav2Vec2PretrainedModelBundle
-   .. automethod:: get_model
-   .. automethod:: get_labels
-WAV2VEC2_BASE
-^^^^^^^^^^^^^
-.. container:: py attribute
-   .. autodata:: WAV2VEC2_BASE
-      :no-value:
-WAV2VEC2_ASR_BASE_10M
-^^^^^^^^^^^^^^^^^^^^^
-.. container:: py attribute
-   .. autodata:: torchaudio.models.WAV2VEC2_ASR_BASE_10M
-      :no-value:
-WAV2VEC2_ASR_BASE_100H
-^^^^^^^^^^^^^^^^^^^^^^
-.. container:: py attribute
-   .. autodata:: WAV2VEC2_ASR_BASE_100H
-      :no-value:
-WAV2VEC2_ASR_BASE_960H
-^^^^^^^^^^^^^^^^^^^^^^
-.. container:: py attribute
-   .. autodata:: WAV2VEC2_ASR_BASE_960H
-      :no-value:
-WAV2VEC2_LARGE
-^^^^^^^^^^^^^^
-.. container:: py attribute
-   .. autodata:: WAV2VEC2_LARGE
-      :no-value:
-WAV2VEC2_ASR_LARGE_10M
-^^^^^^^^^^^^^^^^^^^^^^
-.. container:: py attribute
-   .. autodata:: WAV2VEC2_ASR_LARGE_10M
-      :no-value:
-WAV2VEC2_ASR_LARGE_100H
-^^^^^^^^^^^^^^^^^^^^^^^
-.. container:: py attribute
-   .. autodata:: WAV2VEC2_ASR_LARGE_100H
-      :no-value:
-WAV2VEC2_ASR_LARGE_960H
-^^^^^^^^^^^^^^^^^^^^^^^
-.. container:: py attribute
-   .. autodata:: WAV2VEC2_ASR_LARGE_960H
-      :no-value:
-WAV2VEC2_LARGE_LV60K
-^^^^^^^^^^^^^^^^^^^^
-.. container:: py attribute
-   .. autodata:: WAV2VEC2_LARGE_LV60K
-      :no-value:
-WAV2VEC2_ASR_LARGE_LV60K_10M
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-.. container:: py attribute
-   .. autodata:: WAV2VEC2_ASR_LARGE_LV60K_10M
-      :no-value:
-WAV2VEC2_ASR_LARGE_LV60K_100H
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-.. container:: py attribute
-   .. autodata:: WAV2VEC2_ASR_LARGE_LV60K_100H
-      :no-value:
-WAV2VEC2_ASR_LARGE_LV60K_960H
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-.. container:: py attribute
-   .. autodata:: WAV2VEC2_ASR_LARGE_LV60K_960H
-      :no-value:
-WAV2VEC2_XLSR53
-^^^^^^^^^^^^^^^
-.. container:: py attribute
-   .. autodata:: WAV2VEC2_XLSR53
-      :no-value:
-HUBERT_BASE
-^^^^^^^^^^^
-.. container:: py attribute
-   .. autodata:: HUBERT_BASE
-      :no-value:
-HUBERT_LARGE
-^^^^^^^^^^^^
-.. container:: py attribute
-   .. autodata:: HUBERT_LARGE
-      :no-value:
-HUBERT_XLARGE
-^^^^^^^^^^^^^
-.. container:: py attribute
-   .. autodata:: HUBERT_XLARGE
-      :no-value:
-HUBERT_ASR_LARGE
-^^^^^^^^^^^^^^^^
-.. container:: py attribute
-   .. autodata:: HUBERT_ASR_LARGE
-      :no-value:
-HUBERT_ASR_XLARGE
-^^^^^^^^^^^^^^^^^
-.. container:: py attribute
-   .. autodata:: HUBERT_ASR_XLARGE
-      :no-value:
 Utility Functions
 -----------------

--- a/docs/source/pipelines.rst
+++ b/docs/source/pipelines.rst
+torchaudio.pipelines
+====================
+.. currentmodule:: torchaudio.pipelines
+The pipelines subpackage contains API to access the models with pretrained weights, and information/helper functions associated the pretrained weights.
+wav2vec 2.0 / HuBERT - Representation Learning
+----------------------------------------------
+.. autoclass:: Wav2Vec2Bundle
+   .. automethod:: get_model
+WAV2VEC2_BASE
+-------------
+.. container:: py attribute
+   .. autodata:: WAV2VEC2_BASE
+      :no-value:
+WAV2VEC2_LARGE
+--------------
+.. container:: py attribute
+   .. autodata:: WAV2VEC2_LARGE
+      :no-value:
+WAV2VEC2_LARGE_LV60K
+--------------------
+.. container:: py attribute
+   .. autodata:: WAV2VEC2_LARGE_LV60K
+      :no-value:
+WAV2VEC2_XLSR53
+---------------
+.. container:: py attribute
+   .. autodata:: WAV2VEC2_XLSR53
+      :no-value:
+HUBERT_BASE
+-----------
+.. container:: py attribute
+   .. autodata:: HUBERT_BASE
+      :no-value:
+HUBERT_LARGE
+------------
+.. container:: py attribute
+   .. autodata:: HUBERT_LARGE
+      :no-value:
+HUBERT_XLARGE
+-------------
+.. container:: py attribute
+   .. autodata:: HUBERT_XLARGE
+      :no-value:
+wav2vec 2.0 / HuBERT - ASR fine-tuning
+--------------------------------------
+.. autoclass:: Wav2Vec2ASRBundle
+   .. automethod:: get_model
+   .. automethod:: get_labels
+WAV2VEC2_ASR_BASE_10M
+---------------------
+.. container:: py attribute
+   .. autodata:: WAV2VEC2_ASR_BASE_10M
+      :no-value:
+WAV2VEC2_ASR_BASE_100H
+----------------------
+.. container:: py attribute
+   .. autodata:: WAV2VEC2_ASR_BASE_100H
+      :no-value:
+WAV2VEC2_ASR_BASE_960H
+----------------------
+.. container:: py attribute
+   .. autodata:: WAV2VEC2_ASR_BASE_960H
+      :no-value:
+WAV2VEC2_ASR_LARGE_10M
+----------------------
+.. container:: py attribute
+   .. autodata:: WAV2VEC2_ASR_LARGE_10M
+      :no-value:
+WAV2VEC2_ASR_LARGE_100H
+-----------------------
+.. container:: py attribute
+   .. autodata:: WAV2VEC2_ASR_LARGE_100H
+      :no-value:
+WAV2VEC2_ASR_LARGE_960H
+-----------------------
+.. container:: py attribute
+   .. autodata:: WAV2VEC2_ASR_LARGE_960H
+      :no-value:
+WAV2VEC2_ASR_LARGE_LV60K_10M
+----------------------------
+.. container:: py attribute
+   .. autodata:: WAV2VEC2_ASR_LARGE_LV60K_10M
+      :no-value:
+WAV2VEC2_ASR_LARGE_LV60K_100H
+-----------------------------
+.. container:: py attribute
+   .. autodata:: WAV2VEC2_ASR_LARGE_LV60K_100H
+      :no-value:
+WAV2VEC2_ASR_LARGE_LV60K_960H
+-----------------------------
+.. container:: py attribute
+   .. autodata:: WAV2VEC2_ASR_LARGE_LV60K_960H
+      :no-value:
+HUBERT_ASR_LARGE
+----------------
+.. container:: py attribute
+   .. autodata:: HUBERT_ASR_LARGE
+      :no-value:
+HUBERT_ASR_XLARGE
+-----------------
+.. container:: py attribute
+   .. autodata:: HUBERT_ASR_XLARGE
+      :no-value:
+References
+----------
+.. footbibliography::
--- a/test/integration_tests/wav2vec2_model_test.py
+++ b/test/integration_tests/wav2vec2_model_test.py
 import torchaudio
-from torchaudio.models import (
+from torchaudio.pipelines import (
    WAV2VEC2_BASE,
    WAV2VEC2_LARGE,
    WAV2VEC2_LARGE_LV60K,

--- a/torchaudio/__init__.py
+++ b/torchaudio/__init__.py
@@ -4,6 +4,7 @@ from torchaudio import (
    datasets,
    functional,
    models,
+    pipelines,
    kaldi_io,
    utils,
    sox_effects,
@@ -26,6 +27,7 @@ __all__ = [
    'datasets',
    'functional',
    'models',
+    'pipelines',
    'kaldi_io',
    'utils',
    'sox_effects',

--- a/torchaudio/models/__init__.py
+++ b/torchaudio/models/__init__.py
@@ -13,27 +13,6 @@ from .wav2vec2 import (
    hubert_large,
    hubert_xlarge,
 )
-from .wav2vec2.pretrained import (
-    Wav2Vec2PretrainedModelBundle,
-    WAV2VEC2_BASE,
-    WAV2VEC2_LARGE,
-    WAV2VEC2_LARGE_LV60K,
-    WAV2VEC2_ASR_BASE_10M,
-    WAV2VEC2_ASR_BASE_100H,
-    WAV2VEC2_ASR_BASE_960H,
-    WAV2VEC2_ASR_LARGE_10M,
-    WAV2VEC2_ASR_LARGE_100H,
-    WAV2VEC2_ASR_LARGE_960H,
-    WAV2VEC2_ASR_LARGE_LV60K_10M,
-    WAV2VEC2_ASR_LARGE_LV60K_100H,
-    WAV2VEC2_ASR_LARGE_LV60K_960H,
-    WAV2VEC2_XLSR53,
-    HUBERT_BASE,
-    HUBERT_LARGE,
-    HUBERT_XLARGE,
-    HUBERT_ASR_LARGE,
-    HUBERT_ASR_XLARGE,
-)
 __all__ = [
    'Wav2Letter',
@@ -49,25 +28,6 @@ __all__ = [
    'hubert_base',
    'hubert_large',
    'hubert_xlarge',
-    'Wav2Vec2PretrainedModelBundle',
-    'WAV2VEC2_BASE',
-    'WAV2VEC2_LARGE',
-    'WAV2VEC2_LARGE_LV60K',
-    'WAV2VEC2_ASR_BASE_10M',
-    'WAV2VEC2_ASR_BASE_100H',
-    'WAV2VEC2_ASR_BASE_960H',
-    'WAV2VEC2_ASR_LARGE_10M',
-    'WAV2VEC2_ASR_LARGE_100H',
-    'WAV2VEC2_ASR_LARGE_960H',
-    'WAV2VEC2_ASR_LARGE_LV60K_10M',
-    'WAV2VEC2_ASR_LARGE_LV60K_100H',
-    'WAV2VEC2_ASR_LARGE_LV60K_960H',
-    'WAV2VEC2_XLSR53',
-    'HUBERT_BASE',
-    'HUBERT_LARGE',
-    'HUBERT_XLARGE',
-    'HUBERT_ASR_LARGE',
-    'HUBERT_ASR_XLARGE',
    'Tacotron2',
    'tacotron2',
 ]
--- a/torchaudio/pipelines/__init__.py
+++ b/torchaudio/pipelines/__init__.py
+from ._wav2vec2 import (
+    Wav2Vec2Bundle,
+    Wav2Vec2ASRBundle,
+    WAV2VEC2_BASE,
+    WAV2VEC2_LARGE,
+    WAV2VEC2_LARGE_LV60K,
+    WAV2VEC2_ASR_BASE_10M,
+    WAV2VEC2_ASR_BASE_100H,
+    WAV2VEC2_ASR_BASE_960H,
+    WAV2VEC2_ASR_LARGE_10M,
+    WAV2VEC2_ASR_LARGE_100H,
+    WAV2VEC2_ASR_LARGE_960H,
+    WAV2VEC2_ASR_LARGE_LV60K_10M,
+    WAV2VEC2_ASR_LARGE_LV60K_100H,
+    WAV2VEC2_ASR_LARGE_LV60K_960H,
+    WAV2VEC2_XLSR53,
+    HUBERT_BASE,
+    HUBERT_LARGE,
+    HUBERT_XLARGE,
+    HUBERT_ASR_LARGE,
+    HUBERT_ASR_XLARGE,
+)
+__all__ = [
+    'Wav2Vec2Bundle',
+    'Wav2Vec2ASRBundle',
+    'WAV2VEC2_BASE',
+    'WAV2VEC2_LARGE',
+    'WAV2VEC2_LARGE_LV60K',
+    'WAV2VEC2_ASR_BASE_10M',
+    'WAV2VEC2_ASR_BASE_100H',
+    'WAV2VEC2_ASR_BASE_960H',
+    'WAV2VEC2_ASR_LARGE_10M',
+    'WAV2VEC2_ASR_LARGE_100H',
+    'WAV2VEC2_ASR_LARGE_960H',
+    'WAV2VEC2_ASR_LARGE_LV60K_10M',
+    'WAV2VEC2_ASR_LARGE_LV60K_100H',
+    'WAV2VEC2_ASR_LARGE_LV60K_960H',
+    'WAV2VEC2_XLSR53',
+    'HUBERT_BASE',
+    'HUBERT_LARGE',
+    'HUBERT_XLARGE',
+    'HUBERT_ASR_LARGE',
+    'HUBERT_ASR_XLARGE',
+]
--- a/torchaudio/models/wav2vec2/pretrained.py
+++ b/torchaudio/models/wav2vec2/pretrained.py
 from dataclasses import dataclass
-from typing import Dict, Tuple, Any, Optional
+from typing import Dict, Tuple, Any
 from torch.hub import load_state_dict_from_url
-from .model import wav2vec2_model, Wav2Vec2Model
+from torchaudio.models import wav2vec2_model, Wav2Vec2Model
 __all__ = []
 @dataclass
-class Wav2Vec2PretrainedModelBundle:
+class Wav2Vec2Bundle:
-    """torchaudio.models.Wav2Vec2PretrainedModelBundle()
+    """torchaudio.pipelines.Wav2Vec2Bundle()
    Data class that bundles associated information to use pretrained Wav2Vec2Model.
@@ -24,7 +24,7 @@ class Wav2Vec2PretrainedModelBundle:
    Please see below for the usage and the available values.
-    Example - Pretraining model
+    Example - Feature Extraction
        >>> import torchaudio
        >>>
        >>> # Build the model and load pretrained weight.
@@ -34,32 +34,14 @@ class Wav2Vec2PretrainedModelBundle:
        >>> # Extract acoustic features
        >>> waveform, sample_rate = torchaudio.load('my_speech.mp3')
        >>> features, _ = model.extract_features(waveform)
-    Example - Model fine-tuned for ASR
-        >>> import torchaudio
-        >>>
-        >>> # Build the model and load pretrained weight.
-        >>> model = torchaudio.models.HUBERT_ASR_LARGE.get_model()
-        Downloading:
-        100%|███████████████████████████████| 1.18G/1.18G [00:17<00:00, 73.8MB/s]
-        >>> # Check the corresponding labels of the output.
-        >>> labels = torchaudio.models.HUBERT_ASR_LARGE.get_labels()
-        >>> print(labels)
-        ('<s>', '<pad>', '</s>', '<unk>', '|', 'E', 'T', 'A', 'O', 'N', 'I', 'H', 'S', 'R', 'D', 'L', 'U', 'M', 'W', 'C', 'F', 'G', 'Y', 'P', 'B', 'V', 'K', "'", 'X', 'J', 'Q', 'Z')
-        >>> # Infer the label probability distribution
-        >>> waveform, sample_rate = torchaudio.load('my_speech.mp3')
-        >>> emissions, _ = model(waveform)
-        >>> # Pass emission to decoder
-        >>> # `ctc_decode` is for illustration purpose only
-        >>> transcripts = ctc_decode(emissions, labels)
    """  # noqa: E501
    _path: str
    _params: Dict[str, Any]
-    _labels: Optional[Tuple[str]]
    def get_model(self, *, dl_kwargs=None) -> Wav2Vec2Model:
-        """Construct the model and load the pretrained weight.
+        """get_model(self, *, dl_kwargs=None) -> torchaudio.models.Wav2Vec2Model
+        Construct the model and load the pretrained weight.
        The weight file is downloaded from the internet and cached with
        :func:`torch.hub.load_state_dict_from_url`
@@ -68,13 +50,50 @@ class Wav2Vec2PretrainedModelBundle:
            dl_kwargs (dictionary of keyword arguments): Passed to :func:`torch.hub.load_state_dict_from_url`.
        """
        model = wav2vec2_model(**self._params)
-        url = f'https://download.pytorch.org/models/audio/{self._path}'
+        url = f'https://download.pytorch.org/torchaudio/models/{self._path}'
        dl_kwargs = {} if dl_kwargs is None else dl_kwargs
        state_dict = load_state_dict_from_url(url, **dl_kwargs)
        model.load_state_dict(state_dict)
        model.eval()
        return model
+@dataclass
+class Wav2Vec2ASRBundle(Wav2Vec2Bundle):
+    """torchaudio.pipelines.Wav2Vec2ASRBundle()
+    Data class that bundles associated information to use pretrained Wav2Vec2Model.
+    This class provides interfaces for instantiating the pretrained model along with
+    the information necessary to retrieve pretrained weights and additional data
+    to be used with the model.
+    Torchaudio library instantiates objects of this class, each of which represents
+    a different pretrained model. Client code should access pretrained models via these
+    instances.
+    Please see below for the usage and the available values.
+    Example - ASR
+        >>> import torchaudio
+        >>>
+        >>> # Build the model and load pretrained weight.
+        >>> model = torchaudio.models.HUBERT_ASR_LARGE.get_model()
+        Downloading:
+        100%|███████████████████████████████| 1.18G/1.18G [00:17<00:00, 73.8MB/s]
+        >>> # Check the corresponding labels of the output.
+        >>> labels = torchaudio.models.HUBERT_ASR_LARGE.get_labels()
+        >>> print(labels)
+        ('<s>', '<pad>', '</s>', '<unk>', '|', 'E', 'T', 'A', 'O', 'N', 'I', 'H', 'S', 'R', 'D', 'L', 'U', 'M', 'W', 'C', 'F', 'G', 'Y', 'P', 'B', 'V', 'K', "'", 'X', 'J', 'Q', 'Z')
+        >>> # Infer the label probability distribution
+        >>> waveform, sample_rate = torchaudio.load('my_speech.mp3')
+        >>> emissions, _ = model(waveform)
+        >>> # Pass emission to decoder
+        >>> # `ctc_decode` is for illustration purpose only
+        >>> transcripts = ctc_decode(emissions, labels)
+    """  # noqa: E501
+    _labels: Tuple[str]
    def get_labels(
            self,
            *,
@@ -143,7 +162,7 @@ def _get_labels():
    )
-WAV2VEC2_BASE = Wav2Vec2PretrainedModelBundle(
+WAV2VEC2_BASE = Wav2Vec2Bundle(
    _path='wav2vec2_fairseq_base_ls960.pth',
    _params={
        'extractor_mode': 'group_norm',
@@ -171,7 +190,6 @@ WAV2VEC2_BASE = Wav2Vec2PretrainedModelBundle(
        'encoder_layer_drop': 0.05,
        "aux_num_out": None,
    },
-    _labels=None,
 )
 WAV2VEC2_BASE.__doc__ = """wav2vec 2.0 model with "Base" configuration.
@@ -183,9 +201,11 @@ Originally published by the authors of *wav2vec 2.0* [:footcite:`baevski2020wav2
 redistributed with the same license.
 [`License <https://github.com/pytorch/fairseq/blob/ce6c9eeae163ac04b79539c78e74f292f29eaa18/LICENSE>`__,
 `Source <https://github.com/pytorch/fairseq/blob/ce6c9eeae163ac04b79539c78e74f292f29eaa18/examples/wav2vec#pre-trained-models>`__]
+Please refer to :func:`torchaudio.pipelines.Wav2Vec2Bundle` for the usage.
 """  # noqa: E501
-WAV2VEC2_ASR_BASE_10M = Wav2Vec2PretrainedModelBundle(
+WAV2VEC2_ASR_BASE_10M = Wav2Vec2ASRBundle(
    _path='wav2vec2_fairseq_base_ls960_asr_ll10m.pth',
    _params={
        'extractor_mode': 'group_norm',
@@ -226,9 +246,11 @@ Originally published by the authors of *wav2vec 2.0* [:footcite:`baevski2020wav2
 redistributed with the same license.
 [`License <https://github.com/pytorch/fairseq/blob/ce6c9eeae163ac04b79539c78e74f292f29eaa18/LICENSE>`__,
 `Source <https://github.com/pytorch/fairseq/blob/ce6c9eeae163ac04b79539c78e74f292f29eaa18/examples/wav2vec#pre-trained-models>`__]
+Please refer to :func:`torchaudio.pipelines.Wav2Vec2ASRBundle` for the usage.
 """  # noqa: E501
-WAV2VEC2_ASR_BASE_100H = Wav2Vec2PretrainedModelBundle(
+WAV2VEC2_ASR_BASE_100H = Wav2Vec2ASRBundle(
    'wav2vec2_fairseq_base_ls960_asr_ls100.pth',
    {
        'extractor_mode': 'group_norm',
@@ -269,9 +291,11 @@ Originally published by the authors of *wav2vec 2.0* [:footcite:`baevski2020wav2
 redistributed with the same license.
 [`License <https://github.com/pytorch/fairseq/blob/ce6c9eeae163ac04b79539c78e74f292f29eaa18/LICENSE>`__,
 `Source <https://github.com/pytorch/fairseq/blob/ce6c9eeae163ac04b79539c78e74f292f29eaa18/examples/wav2vec#pre-trained-models>`__]
+Please refer to :func:`torchaudio.pipelines.Wav2Vec2ASRBundle` for the usage.
 """  # noqa: E501
-WAV2VEC2_ASR_BASE_960H = Wav2Vec2PretrainedModelBundle(
+WAV2VEC2_ASR_BASE_960H = Wav2Vec2ASRBundle(
    'wav2vec2_fairseq_base_ls960_asr_ls960.pth',
    {
        "extractor_mode": "group_norm",
@@ -311,9 +335,11 @@ Originally published by the authors of *wav2vec 2.0* [:footcite:`baevski2020wav2
 redistributed with the same license.
 [`License <https://github.com/pytorch/fairseq/blob/ce6c9eeae163ac04b79539c78e74f292f29eaa18/LICENSE>`__,
 `Source <https://github.com/pytorch/fairseq/blob/ce6c9eeae163ac04b79539c78e74f292f29eaa18/examples/wav2vec#pre-trained-models>`__]
+Please refer to :func:`torchaudio.pipelines.Wav2Vec2ASRBundle` for the usage.
 """  # noqa: E501
-WAV2VEC2_LARGE = Wav2Vec2PretrainedModelBundle(
+WAV2VEC2_LARGE = Wav2Vec2Bundle(
    'wav2vec2_fairseq_large_ls960.pth',
    {
        "extractor_mode": "group_norm",
@@ -341,7 +367,6 @@ WAV2VEC2_LARGE = Wav2Vec2PretrainedModelBundle(
        "encoder_layer_drop": 0.2,
        "aux_num_out": None,
    },
-    _labels=None,
 )
 WAV2VEC2_LARGE.__doc__ = """Build "large" wav2vec2 model.
@@ -353,9 +378,11 @@ Originally published by the authors of *wav2vec 2.0* [:footcite:`baevski2020wav2
 redistributed with the same license.
 [`License <https://github.com/pytorch/fairseq/blob/ce6c9eeae163ac04b79539c78e74f292f29eaa18/LICENSE>`__,
 `Source <https://github.com/pytorch/fairseq/blob/ce6c9eeae163ac04b79539c78e74f292f29eaa18/examples/wav2vec#pre-trained-models>`__]
+Please refer to :func:`torchaudio.pipelines.Wav2Vec2Bundle` for the usage.
 """  # noqa: E501
-WAV2VEC2_ASR_LARGE_10M = Wav2Vec2PretrainedModelBundle(
+WAV2VEC2_ASR_LARGE_10M = Wav2Vec2ASRBundle(
    'wav2vec2_fairseq_large_ls960_asr_ll10m.pth',
    {
        "extractor_mode": "group_norm",
@@ -396,9 +423,11 @@ Originally published by the authors of *wav2vec 2.0* [:footcite:`baevski2020wav2
 redistributed with the same license.
 [`License <https://github.com/pytorch/fairseq/blob/ce6c9eeae163ac04b79539c78e74f292f29eaa18/LICENSE>`__,
 `Source <https://github.com/pytorch/fairseq/blob/ce6c9eeae163ac04b79539c78e74f292f29eaa18/examples/wav2vec#pre-trained-models>`__]
+Please refer to :func:`torchaudio.pipelines.Wav2Vec2ASRBundle` for the usage.
 """  # noqa: E501
-WAV2VEC2_ASR_LARGE_100H = Wav2Vec2PretrainedModelBundle(
+WAV2VEC2_ASR_LARGE_100H = Wav2Vec2ASRBundle(
    'wav2vec2_fairseq_large_ls960_asr_ls100.pth',
    {
        "extractor_mode": "group_norm",
@@ -439,9 +468,11 @@ Originally published by the authors of *wav2vec 2.0* [:footcite:`baevski2020wav2
 redistributed with the same license.
 [`License <https://github.com/pytorch/fairseq/blob/ce6c9eeae163ac04b79539c78e74f292f29eaa18/LICENSE>`__,
 `Source <https://github.com/pytorch/fairseq/blob/ce6c9eeae163ac04b79539c78e74f292f29eaa18/examples/wav2vec#pre-trained-models>`__]
+Please refer to :func:`torchaudio.pipelines.Wav2Vec2ASRBundle` for the usage.
 """  # noqa: E501
-WAV2VEC2_ASR_LARGE_960H = Wav2Vec2PretrainedModelBundle(
+WAV2VEC2_ASR_LARGE_960H = Wav2Vec2ASRBundle(
    'wav2vec2_fairseq_large_ls960_asr_ls960.pth',
    {
        "extractor_mode": "group_norm",
@@ -481,9 +512,11 @@ Originally published by the authors of *wav2vec 2.0* [:footcite:`baevski2020wav2
 redistributed with the same license.
 [`License <https://github.com/pytorch/fairseq/blob/ce6c9eeae163ac04b79539c78e74f292f29eaa18/LICENSE>`__,
 `Source <https://github.com/pytorch/fairseq/blob/ce6c9eeae163ac04b79539c78e74f292f29eaa18/examples/wav2vec#pre-trained-models>`__]
+Please refer to :func:`torchaudio.pipelines.Wav2Vec2ASRBundle` for the usage.
 """  # noqa:  E501
-WAV2VEC2_LARGE_LV60K = Wav2Vec2PretrainedModelBundle(
+WAV2VEC2_LARGE_LV60K = Wav2Vec2Bundle(
    'wav2vec2_fairseq_large_lv60k.pth',
    {
        "extractor_mode": "layer_norm",
@@ -511,7 +544,6 @@ WAV2VEC2_LARGE_LV60K = Wav2Vec2PretrainedModelBundle(
        "encoder_layer_drop": 0.0,
        "aux_num_out": None,
    },
-    _labels=None,
 )
 WAV2VEC2_LARGE_LV60K.__doc__ = """Build "large-lv60k" wav2vec2 model.
@@ -523,9 +555,11 @@ Originally published by the authors of *wav2vec 2.0* [:footcite:`baevski2020wav2
 redistributed with the same license.
 [`License <https://github.com/pytorch/fairseq/blob/ce6c9eeae163ac04b79539c78e74f292f29eaa18/LICENSE>`__,
 `Source <https://github.com/pytorch/fairseq/blob/ce6c9eeae163ac04b79539c78e74f292f29eaa18/examples/wav2vec#pre-trained-models>`__]
+Please refer to :func:`torchaudio.pipelines.Wav2Vec2Bundle` for the usage.
 """  # noqa: E501
-WAV2VEC2_ASR_LARGE_LV60K_10M = Wav2Vec2PretrainedModelBundle(
+WAV2VEC2_ASR_LARGE_LV60K_10M = Wav2Vec2ASRBundle(
    'wav2vec2_fairseq_large_lv60k_asr_ll10m.pth',
    {
        "extractor_mode": "layer_norm",
@@ -566,9 +600,11 @@ Originally published by the authors of *wav2vec 2.0* [:footcite:`baevski2020wav2
 redistributed with the same license.
 [`License <https://github.com/pytorch/fairseq/blob/ce6c9eeae163ac04b79539c78e74f292f29eaa18/LICENSE>`__,
 `Source <https://github.com/pytorch/fairseq/blob/ce6c9eeae163ac04b79539c78e74f292f29eaa18/examples/wav2vec#pre-trained-models>`__]
+Please refer to :func:`torchaudio.pipelines.Wav2Vec2ASRBundle` for the usage.
 """  # noqa: E501
-WAV2VEC2_ASR_LARGE_LV60K_100H = Wav2Vec2PretrainedModelBundle(
+WAV2VEC2_ASR_LARGE_LV60K_100H = Wav2Vec2ASRBundle(
    'wav2vec2_fairseq_large_lv60k_asr_ls100.pth',
    {
        "extractor_mode": "layer_norm",
@@ -609,9 +645,11 @@ Originally published by the authors of *wav2vec 2.0* [:footcite:`baevski2020wav2
 redistributed with the same license.
 [`License <https://github.com/pytorch/fairseq/blob/ce6c9eeae163ac04b79539c78e74f292f29eaa18/LICENSE>`__,
 `Source <https://github.com/pytorch/fairseq/blob/ce6c9eeae163ac04b79539c78e74f292f29eaa18/examples/wav2vec#pre-trained-models>`__]
+Please refer to :func:`torchaudio.pipelines.Wav2Vec2ASRBundle` for the usage.
 """  # noqa: E501
-WAV2VEC2_ASR_LARGE_LV60K_960H = Wav2Vec2PretrainedModelBundle(
+WAV2VEC2_ASR_LARGE_LV60K_960H = Wav2Vec2ASRBundle(
    'wav2vec2_fairseq_large_lv60k_asr_ls960.pth',
    {
        "extractor_mode": "layer_norm",
@@ -653,9 +691,11 @@ Originally published by the authors of *wav2vec 2.0* [:footcite:`baevski2020wav2
 redistributed with the same license.
 [`License <https://github.com/pytorch/fairseq/blob/ce6c9eeae163ac04b79539c78e74f292f29eaa18/LICENSE>`__,
 `Source <https://github.com/pytorch/fairseq/blob/ce6c9eeae163ac04b79539c78e74f292f29eaa18/examples/wav2vec#pre-trained-models>`__]
+Please refer to :func:`torchaudio.pipelines.Wav2Vec2ASRBundle` for the usage.
 """  # noqa: E501
-WAV2VEC2_XLSR53 = Wav2Vec2PretrainedModelBundle(
+WAV2VEC2_XLSR53 = Wav2Vec2Bundle(
    'wav2vec2_fairseq_large_xlsr53.pth',
    {
        "extractor_mode": "layer_norm",
@@ -683,7 +723,6 @@ WAV2VEC2_XLSR53 = Wav2Vec2PretrainedModelBundle(
        "encoder_layer_drop": 0.0,
        "aux_num_out": None,
    },
-    _labels=None,
 )
 WAV2VEC2_XLSR53.__doc__ = """wav2vec 2.0 model with "Base" configuration.
@@ -698,9 +737,11 @@ Originally published by the authors of
 [:footcite:`conneau2020unsupervised`] under MIT License and redistributed with the same license.
 [`License <https://github.com/pytorch/fairseq/blob/ce6c9eeae163ac04b79539c78e74f292f29eaa18/LICENSE>`__,
 `Source <https://github.com/pytorch/fairseq/blob/ce6c9eeae163ac04b79539c78e74f292f29eaa18/examples/wav2vec#pre-trained-models>`__]
+Please refer to :func:`torchaudio.pipelines.Wav2Vec2Bundle` for the usage.
 """  # noqa: E501
-HUBERT_BASE = Wav2Vec2PretrainedModelBundle(
+HUBERT_BASE = Wav2Vec2Bundle(
    'hubert_fairseq_base_ls960.pth',
    {
        'extractor_mode': 'group_norm',
@@ -728,7 +769,6 @@ HUBERT_BASE = Wav2Vec2PretrainedModelBundle(
        'encoder_layer_drop': 0.05,
        'aux_num_out': None,
    },
-    _labels=None,
 )
 HUBERT_BASE.__doc__ = """HuBERT model with "Base" configuration.
@@ -740,9 +780,11 @@ Originally published by the authors of *HuBERT* [:footcite:`hsu2021hubert`] unde
 redistributed with the same license.
 [`License <https://github.com/pytorch/fairseq/blob/ce6c9eeae163ac04b79539c78e74f292f29eaa18/LICENSE>`__,
 `Source <https://github.com/pytorch/fairseq/blob/ce6c9eeae163ac04b79539c78e74f292f29eaa18/examples/hubert#pre-trained-and-fine-tuned-asr-models>`__]
+Please refer to :func:`torchaudio.pipelines.Wav2Vec2Bundle` for the usage.
 """  # noqa: E501
-HUBERT_LARGE = Wav2Vec2PretrainedModelBundle(
+HUBERT_LARGE = Wav2Vec2Bundle(
    'hubert_fairseq_large_ll60k.pth',
    {
        'extractor_mode': 'layer_norm',
@@ -770,7 +812,6 @@ HUBERT_LARGE = Wav2Vec2PretrainedModelBundle(
        'encoder_layer_drop': 0.0,
        'aux_num_out': None,
    },
-    _labels=None,
 )
 HUBERT_LARGE.__doc__ = """HuBERT model with "Large" configuration.
@@ -782,9 +823,11 @@ Originally published by the authors of *HuBERT* [:footcite:`hsu2021hubert`] unde
 redistributed with the same license.
 [`License <https://github.com/pytorch/fairseq/blob/ce6c9eeae163ac04b79539c78e74f292f29eaa18/LICENSE>`__,
 `Source <https://github.com/pytorch/fairseq/blob/ce6c9eeae163ac04b79539c78e74f292f29eaa18/examples/hubert#pre-trained-and-fine-tuned-asr-models>`__]
+Please refer to :func:`torchaudio.pipelines.Wav2Vec2Bundle` for the usage.
 """  # noqa: E501
-HUBERT_XLARGE = Wav2Vec2PretrainedModelBundle(
+HUBERT_XLARGE = Wav2Vec2Bundle(
    'hubert_fairseq_xlarge_ll60k.pth',
    {
        'extractor_mode': 'layer_norm',
@@ -812,7 +855,6 @@ HUBERT_XLARGE = Wav2Vec2PretrainedModelBundle(
        'encoder_layer_drop': 0.0,
        'aux_num_out': None,
    },
-    _labels=None,
 )
 HUBERT_XLARGE.__doc__ = """HuBERT model with "Extra Large" configuration.
@@ -824,9 +866,11 @@ Originally published by the authors of *HuBERT* [:footcite:`hsu2021hubert`] unde
 redistributed with the same license.
 [`License <https://github.com/pytorch/fairseq/blob/ce6c9eeae163ac04b79539c78e74f292f29eaa18/LICENSE>`__,
 `Source <https://github.com/pytorch/fairseq/blob/ce6c9eeae163ac04b79539c78e74f292f29eaa18/examples/hubert#pre-trained-and-fine-tuned-asr-models>`__]
+Please refer to :func:`torchaudio.pipelines.Wav2Vec2Bundle` for the usage.
 """  # noqa: E501
-HUBERT_ASR_LARGE = Wav2Vec2PretrainedModelBundle(
+HUBERT_ASR_LARGE = Wav2Vec2ASRBundle(
    'hubert_fairseq_large_ll60k_asr_ls960.pth',
    {
        'extractor_mode': 'layer_norm',
@@ -868,9 +912,11 @@ Originally published by the authors of *HuBERT* [:footcite:`hsu2021hubert`] unde
 redistributed with the same license.
 [`License <https://github.com/pytorch/fairseq/blob/ce6c9eeae163ac04b79539c78e74f292f29eaa18/LICENSE>`__,
 `Source <https://github.com/pytorch/fairseq/blob/ce6c9eeae163ac04b79539c78e74f292f29eaa18/examples/hubert#pre-trained-and-fine-tuned-asr-models>`__]
+Please refer to :func:`torchaudio.pipelines.Wav2Vec2ASRBundle` for the usage.
 """  # noqa: E501
-HUBERT_ASR_XLARGE = Wav2Vec2PretrainedModelBundle(
+HUBERT_ASR_XLARGE = Wav2Vec2ASRBundle(
    'hubert_fairseq_xlarge_ll60k_asr_ls960.pth',
    {
        'extractor_mode': 'layer_norm',
@@ -912,4 +958,6 @@ Originally published by the authors of *HuBERT* [:footcite:`hsu2021hubert`] unde
 redistributed with the same license.
 [`License <https://github.com/pytorch/fairseq/blob/ce6c9eeae163ac04b79539c78e74f292f29eaa18/LICENSE>`__,
 `Source <https://github.com/pytorch/fairseq/blob/ce6c9eeae163ac04b79539c78e74f292f29eaa18/examples/hubert#pre-trained-and-fine-tuned-asr-models>`__]
+Please refer to :func:`torchaudio.pipelines.Wav2Vec2ASRBundle` for the usage.
 """  # noqa: E501