Adopt `:autosummary:` in `torchaudio.datasets` module doc (#2692)

Summary: * Introduce the mini-index at `torchaudio.datasets` page. * Standardize the format of return type docstring. https://output.circle-artifacts.com/output/job/989328b2-0270-4958-b577-19cf749af3fd/artifacts/0/docs/datasets.html <img width="936" alt="Screen Shot 2022-09-21 at 6 56 52 PM" src="https://user-images.githubusercontent.com/855818/191475141-a97f2bea-705f-49bc-8c34-6ec869e76793.png"> https://output.circle-artifacts.com/output/job/989328b2-0270-4958-b577-19cf749af3fd/artifacts/0/docs/generated/torchaudio.datasets.CMUDict.html#torchaudio.datasets.CMUDict <img width="1069" alt="Screen Shot 2022-09-21 at 6 57 32 PM" src="https://user-images.githubusercontent.com/855818/191475293-e3302528-27ea-4212-9c12-fd6d900fdf3e.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2692 Reviewed By: carolineechen Differential Revision: D39687463 Pulled By: mthrok fbshipit-source-id: 4175fc15388817d2fe76206188618dd1576281df

Adopt `:autosummary:` in `torchaudio.datasets` module doc (#2692)
Summary: * Introduce the mini-index at `torchaudio.datasets` page. * Standardize the format of return type docstring. https://output.circle-artifacts.com/output/job/989328b2-0270-4958-b577-19cf749af3fd/artifacts/0/docs/datasets.html <img width="936" alt="Screen Shot 2022-09-21 at 6 56 52 PM" src="https://user-images.githubusercontent.com/855818/191475141-a97f2bea-705f-49bc-8c34-6ec869e76793.png"> https://output.circle-artifacts.com/output/job/989328b2-0270-4958-b577-19cf749af3fd/artifacts/0/docs/generated/torchaudio.datasets.CMUDict.html#torchaudio.datasets.CMUDict <img width="1069" alt="Screen Shot 2022-09-21 at 6 57 32 PM" src="https://user-images.githubusercontent.com/855818/191475293-e3302528-27ea-4212-9c12-fd6d900fdf3e.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2692 Reviewed By: carolineechen Differential Revision: D39687463 Pulled By: mthrok fbshipit-source-id: 4175fc15388817d2fe76206188618dd1576281df
49b23e15 · moto · Facebook GitHub Bot · 709b4439 · 49b23e15 · 49b23e15
Commit 49b23e15 authored Sep 22, 2022 by moto Committed by Facebook GitHub Bot Sep 22, 2022
20 changed files
--- a/docs/source/_templates/autosummary/dataset_class.rst
+++ b/docs/source/_templates/autosummary/dataset_class.rst
+..
+  autogenerated from source/_templates/autosummary/dataset_class.rst
+
+{{ name | underline }}
+
+.. autoclass:: {{ fullname }}
+
+{%- if "get_metadata" in methods %}
+  {%- set meth=["__getitem__", "get_metadata"] %}
+{%- else %}
+  {%- set meth=["__getitem__"] %}
+{%- endif %}
+
+{%- if name == "CMUDict" %}
+  {%- set properties=["symbols"] %}
+{%- elif name == "TEDLIUM" %}
+  {%- set properties=["phoneme_dict"] %}
+{%- else %}
+  {%- set properties=[] %}
+{%- endif %}
+
+{%- if properties %}
+
+Properties
+==========
+
+{% for item in properties %}
+
+{{item | underline("-") }}
+
+.. container:: py attribute
+
+   .. autoproperty:: {{[fullname, item] | join('.')}}
+
+{%- endfor %}
+
+{%- endif %}
+
+{%- if properties %}
+
+Methods
+=======
+
+{%- endif %}
+
+{% for item in meth %}
+
+{{item | underline("-") }}
+
+.. container:: py attribute
+
+   .. automethod:: {{[fullname, item] | join('.')}}
+
+{%- endfor %}
--- a/docs/source/datasets.rst
+++ b/docs/source/datasets.rst
+.. py:module:: torchaudio.datasets
+
 torchaudio.datasets
 ====================

-.. py:module:: torchaudio.datasets
-
 All datasets are subclasses of :class:`torch.utils.data.Dataset`
 and have ``__getitem__`` and ``__len__`` methods implemented.
-Hence, they can all be passed to a :class:`torch.utils.data.DataLoader`
-which can load multiple samples parallelly using ``torch.multiprocessing`` workers.
-For example: ::
-
-    yesno_data = torchaudio.datasets.YESNO('.', download=True)
-    data_loader = torch.utils.data.DataLoader(yesno_data,
-                                              batch_size=1,
-                                              shuffle=True,
-                                              num_workers=args.nThreads)
-
-.. currentmodule:: torchaudio.datasets
-
-
-CMUARCTIC
-~~~~~~~~~
-
-.. autoclass:: CMUARCTIC
-  :members:
-  :special-members: __getitem__
-
-
-CMUDict
-~~~~~~~~~
-
-.. autoclass:: CMUDict
-  :members:
-  :special-members: __getitem__
-
-
-COMMONVOICE
-~~~~~~~~~~~
-
-.. autoclass:: COMMONVOICE
-  :members:
-  :special-members: __getitem__
-
-
-GTZAN
-~~~~~
-
-.. autoclass:: GTZAN
-  :members:
-  :special-members: __getitem__
-
-
-LibriMix
-~~~~~~~~
-
-.. autoclass:: LibriMix
-  :members:
-  :special-members: __getitem__
-
-
-LIBRISPEECH
-~~~~~~~~~~~
-
-.. autoclass:: LIBRISPEECH
-  :members:
-  :special-members: __getitem__
-
-
-LibriLightLimited
-~~~~~~~~~~~~~~~~~
-
-.. autoclass:: LibriLightLimited
-  :members:
-  :special-members: __getitem__
-

-LIBRITTS
-~~~~~~~~
-
-.. autoclass:: LIBRITTS
-  :members:
-  :special-members: __getitem__
-
-
-LJSPEECH
-~~~~~~~~
-
-.. autoclass:: LJSPEECH
-  :members:
-  :special-members: __getitem__
-
-
-SPEECHCOMMANDS
-~~~~~~~~~~~~~~
-
-.. autoclass:: SPEECHCOMMANDS
-  :members:
-  :special-members: __getitem__
-
-
-TEDLIUM
-~~~~~~~~~~~~~~
-
-.. autoclass:: TEDLIUM
-  :members:
-  :special-members: __getitem__
-
-
-VCTK_092
-~~~~~~~~
-
-.. autoclass:: VCTK_092
-  :members:
-  :special-members: __getitem__
-
-
-VoxCeleb1Identification
-~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autoclass:: VoxCeleb1Identification
-  :members:
-  :special-members: __getitem__
-
-
-VoxCeleb1Verification
-~~~~~~~~~~~~~~~~~~~~~
-
-.. autoclass:: VoxCeleb1Verification
-  :members:
-  :special-members: __getitem__
-
-
-DR_VCTK
-~~~~~~~~
-
-.. autoclass:: DR_VCTK
-  :members:
-  :special-members: __getitem__
-
-
-YESNO
-~~~~~
-
-.. autoclass:: YESNO
-  :members:
-  :special-members: __getitem__
-
-QUESST14
-~~~~~~~~
-
-.. autoclass:: QUESST14
-  :members:
-  :special-members: __getitem__
-
-FluentSpeechCommands
-~~~~~~~~~~~~~~~~~~~~
+Hence, they can all be passed to a :class:`torch.utils.data.DataLoader`
+which can load multiple samples parallelly using :mod:`torch.multiprocessing` workers.
+For example:

-.. autoclass:: FluentSpeechCommands
-  :members:
-  :special-members: __getitem__
+.. code::

+   yesno_data = torchaudio.datasets.YESNO('.', download=True)
+   data_loader = torch.utils.data.DataLoader(
+       yesno_data,
+       batch_size=1,
+       shuffle=True,
+       num_workers=args.nThreads)

-MUSDB_HQ
-~~~~~~~~
+.. currentmodule:: torchaudio.datasets

-.. autoclass:: MUSDB_HQ
-  :members:
-  :special-members: __getitem__
+.. autosummary::
+   :toctree: generated
+   :nosignatures:
+   :template: autosummary/dataset_class.rst
+
+   CMUARCTIC
+   CMUDict
+   COMMONVOICE
+   DR_VCTK
+   FluentSpeechCommands
+   GTZAN
+   LibriMix
+   LIBRISPEECH
+   LibriLightLimited
+   LIBRITTS
+   LJSPEECH
+   MUSDB_HQ
+   QUESST14
+   SPEECHCOMMANDS
+   TEDLIUM
+   VCTK_092
+   VoxCeleb1Identification
+   VoxCeleb1Verification
+   YESNO
--- a/examples/tutorials/audio_datasets_tutorial.py
+++ b/examples/tutorials/audio_datasets_tutorial.py
@@ -31,7 +31,6 @@ print(torchaudio.__version__)
 # -------------------------------------------------------------------------------
 # Preparation of data and helper functions.
 # -------------------------------------------------------------------------------
-import multiprocessing
 import os

 import matplotlib.pyplot as plt
@@ -46,7 +45,7 @@ os.makedirs(YESNO_DATASET_PATH, exist_ok=True)
 def plot_specgram(waveform, sample_rate, title="Spectrogram", xlim=None):
    waveform = waveform.numpy()

-    num_channels, num_frames = waveform.shape
+    num_channels, _ = waveform.shape

    figure, axes = plt.subplots(num_channels, 1)
    if num_channels == 1:
@@ -64,7 +63,7 @@ def plot_specgram(waveform, sample_rate, title="Spectrogram", xlim=None):
 def play_audio(waveform, sample_rate):
    waveform = waveform.numpy()

-    num_channels, num_frames = waveform.shape
+    num_channels, _ = waveform.shape
    if num_channels == 1:
        display(Audio(waveform[0], rate=sample_rate))
    elif num_channels == 2:
@@ -75,7 +74,7 @@ def play_audio(waveform, sample_rate):

 ######################################################################
 # Here, we show how to use the
-# :py:func:`torchaudio.datasets.YESNO` dataset.
+# :py:class:`torchaudio.datasets.YESNO` dataset.
 #



--- a/torchaudio/datasets/cmuarctic.py
+++ b/torchaudio/datasets/cmuarctic.py
@@ -49,7 +49,7 @@ def load_cmuarctic_item(line: str, path: str, folder_audio: str, ext_audio: str)


 class CMUARCTIC(Dataset):
-    """Create a Dataset for *CMU ARCTIC* :cite:`Kominek03cmuarctic`.
+    """*CMU ARCTIC* :cite:`Kominek03cmuarctic` dataset.

    Args:
        root (str or Path): Path to the directory where the dataset is found or downloaded.
@@ -139,7 +139,16 @@ class CMUARCTIC(Dataset):
            n (int): The index of the sample to be loaded

        Returns:
-            (Tensor, int, str, str): ``(waveform, sample_rate, transcript, utterance_id)``
+            Tuple of the following items;
+
+            Tensor:
+                Waveform
+            int:
+                Sample rate
+            str:
+                Transcript
+            str:
+                Utterance ID
        """
        line = self._walker[n]
        return load_cmuarctic_item(line, self._path, self._folder_audio, self._ext_audio)

--- a/torchaudio/datasets/cmudict.py
+++ b/torchaudio/datasets/cmudict.py
@@ -104,7 +104,7 @@ def _parse_dictionary(lines: Iterable[str], exclude_punctuations: bool) -> List[


 class CMUDict(Dataset):
-    """Create a Dataset for *CMU Pronouncing Dictionary* :cite:`cmudict` (CMUDict).
+    """*CMU Pronouncing Dictionary* :cite:`cmudict` (CMUDict) dataset.

    Args:
        root (str or Path): Path to the directory where the dataset is found or downloaded.
@@ -169,8 +169,12 @@ class CMUDict(Dataset):
            n (int): The index of the sample to be loaded.

        Returns:
-            (str, List[str]): The corresponding word and phonemes ``(word, [phonemes])``.
+            Tuple of a word and its phonemes

+            str:
+                Word
+            List[str]:
+                Phonemes
        """
        return self._dictionary[n]

@@ -179,5 +183,5 @@ class CMUDict(Dataset):

    @property
    def symbols(self) -> List[str]:
-        """list[str]: A list of phonemes symbols, such as `AA`, `AE`, `AH`."""
+        """list[str]: A list of phonemes symbols, such as ``"AA"``, ``"AE"``, ``"AH"``."""
        return self._symbols.copy()
--- a/torchaudio/datasets/commonvoice.py
+++ b/torchaudio/datasets/commonvoice.py
@@ -28,7 +28,7 @@ def load_commonvoice_item(


 class COMMONVOICE(Dataset):
-    """Create a Dataset for *CommonVoice* :cite:`ardila2020common`.
+    """*CommonVoice* :cite:`ardila2020common` dataset.

    Args:
        root (str or Path): Path to the directory where the dataset is located.
@@ -61,9 +61,23 @@ class COMMONVOICE(Dataset):
            n (int): The index of the sample to be loaded

        Returns:
-            (Tensor, int, Dict[str, str]): ``(waveform, sample_rate, dictionary)``,  where dictionary
-            is built from the TSV file with the following keys: ``client_id``, ``path``, ``sentence``,
-            ``up_votes``, ``down_votes``, ``age``, ``gender`` and ``accent``.
+            Tuple of the following items;
+
+            Tensor:
+                Waveform
+            int:
+                Sample rate
+            Dict[str, str]:
+                Dictionary containing the following items from the corresponding TSV file;
+
+                * ``"client_id"``
+                * ``"path"``
+                * ``"sentence"``
+                * ``"up_votes"``
+                * ``"down_votes"``
+                * ``"age"``
+                * ``"gender"``
+                * ``"accent"``
        """
        line = self._walker[n]
        return load_commonvoice_item(line, self._header, self._path, self._folder_audio, self._ext_audio)

--- a/torchaudio/datasets/dr_vctk.py
+++ b/torchaudio/datasets/dr_vctk.py
@@ -14,7 +14,7 @@ _SUPPORTED_SUBSETS = {"train", "test"}


 class DR_VCTK(Dataset):
-    """Create a dataset for *Device Recorded VCTK (Small subset version)* :cite:`Sarfjoo2018DeviceRV`.
+    """*Device Recorded VCTK (Small subset version)* :cite:`Sarfjoo2018DeviceRV` dataset.

    Args:
        root (str or Path): Root directory where the dataset's top level directory is found.
@@ -95,9 +95,24 @@ class DR_VCTK(Dataset):
            n (int): The index of the sample to be loaded

        Returns:
-            (Tensor, int, Tensor, int, str, str, str, int):
-            ``(waveform_clean, sample_rate_clean, waveform_noisy, sample_rate_noisy, speaker_id,\
-                utterance_id, source, channel_id)``
+            Tuple of the following items;
+
+            Tensor:
+                Clean waveform
+            int:
+                Sample rate of the clean waveform
+            Tensor:
+                Noisy waveform
+            int:
+                Sample rate of the noisy waveform
+            str:
+                Speaker ID
+            str:
+                Utterance ID
+            str:
+                Source
+            int:
+                Channel ID
        """
        filename = self._filename_list[n]
        return self._load_dr_vctk_item(filename)

--- a/torchaudio/datasets/fluentcommands.py
+++ b/torchaudio/datasets/fluentcommands.py
@@ -11,11 +11,12 @@ SAMPLE_RATE = 16000


 class FluentSpeechCommands(Dataset):
-    """Create *Fluent Speech Commands* :cite:`fluent` Dataset
+    """*Fluent Speech Commands* :cite:`fluent` dataset

    Args:
        root (str of Path): Path to the directory where the dataset is found.
-        subset (str, optional): subset of the dataset to use. Options: [`"train"`, `"valid"`, `"test"`].
+        subset (str, optional): subset of the dataset to use.
+            Options: [``"train"``, ``"valid"``, ``"test"``].
            (Default: ``"train"``)
    """

@@ -45,8 +46,24 @@ class FluentSpeechCommands(Dataset):
            n (int): The index of the sample to be loaded

        Returns:
-            (str, int, str, int, str, str, str, str):
-            ``(filepath, sample_rate, file_name, speaker_id, transcription, action, object, location)``
+            Tuple of the following items;
+
+            str:
+                Path to audio
+            int:
+                Sample rate
+            str:
+                File name
+            int:
+                Speaker ID
+            str:
+                Transcription
+            str:
+                Action
+            str:
+                Object
+            str:
+                Location
        """
        sample = self.data[n]

@@ -67,8 +84,24 @@ class FluentSpeechCommands(Dataset):
            n (int): The index of the sample to be loaded

        Returns:
-            (Tensor, int, str, int, str, str, str, str):
-            ``(waveform, sample_rate, file_name, speaker_id, transcription, action, object, location)``
+            Tuple of the following items;
+
+            Tensor:
+                Waveform
+            int:
+                Sample rate
+            str:
+                File name
+            int:
+                Speaker ID
+            str:
+                Transcription
+            str:
+                Action
+            str:
+                Object
+            str:
+                Location
        """
        metadata = self.get_metadata(n)
        waveform = _load_waveform(self._path, metadata[0], metadata[1])

--- a/torchaudio/datasets/gtzan.py
+++ b/torchaudio/datasets/gtzan.py
@@ -996,7 +996,7 @@ def load_gtzan_item(fileid: str, path: str, ext_audio: str) -> Tuple[Tensor, str


 class GTZAN(Dataset):
-    """Create a Dataset for *GTZAN* :cite:`tzanetakis_essl_cook_2001`.
+    """*GTZAN* :cite:`tzanetakis_essl_cook_2001` dataset.

    Note:
        Please see http://marsyas.info/downloads/datasets.html if you are planning to use
@@ -1096,7 +1096,14 @@ class GTZAN(Dataset):
            n (int): The index of the sample to be loaded

        Returns:
-            (Tensor, int, str): ``(waveform, sample_rate, label)``
+            Tuple of the following items;
+
+            Tensor:
+                Waveform
+            int:
+                Sample rate
+            str:
+                Label
        """
        fileid = self._walker[n]
        item = load_gtzan_item(fileid, self._path, self._ext_audio)

--- a/torchaudio/datasets/librilight_limited.py
+++ b/torchaudio/datasets/librilight_limited.py
@@ -34,13 +34,13 @@ def _get_fileids_paths(path, folders, _ext_audio) -> List[Tuple[str, str]]:


 class LibriLightLimited(Dataset):
-    """Create a Dataset for LibriLightLimited, which is the supervised subset of
-        LibriLight dataset.
+    """Subset of Libri-light :cite:`librilight` dataset,
+    which was used in HuBERT :cite:`hsu2021hubert` for supervised fine-tuning.

    Args:
        root (str or Path): Path to the directory where the dataset is found or downloaded.
-        subset (str, optional): The subset to use. Options: [``10min``, ``1h``, ``10h``]
-            (Default: ``10min``).
+        subset (str, optional): The subset to use. Options: [``"10min"``, ``"1h"``, ``"10h"``]
+            (Default: ``"10min"``).
        download (bool, optional):
            Whether to download the dataset if it is not found at root path. (default: ``False``).
    """
@@ -75,8 +75,20 @@ class LibriLightLimited(Dataset):
        Args:
            n (int): The index of the sample to be loaded
        Returns:
-            (Tensor, int, str, int, int, int):
-            ``(waveform, sample_rate, transcript, speaker_id, chapter_id, utterance_id)``
+            Tuple of the following items;
+
+            Tensor:
+                Waveform
+            int:
+                Sample rate
+            str:
+                Transcript
+            int:
+                Speaker ID
+            int:
+                Chapter ID
+            int:
+                Utterance ID
        """
        file_path, fileid = self._fileids_paths[n]
        metadata = _get_librispeech_metadata(fileid, self._path, file_path, self._ext_audio, self._ext_txt)

--- a/torchaudio/datasets/librimix.py
+++ b/torchaudio/datasets/librimix.py
@@ -9,13 +9,13 @@ SampleType = Tuple[int, torch.Tensor, List[torch.Tensor]]


 class LibriMix(Dataset):
-    r"""Create the *LibriMix* :cite:`cosentino2020librimix` dataset.
+    r"""*LibriMix* :cite:`cosentino2020librimix` dataset.

    Args:
        root (str or Path): The path to the directory where the directory ``Libri2Mix`` or
            ``Libri3Mix`` is stored.
-        subset (str, optional): The subset to use. Options: [``train-360``, ``train-100``,
-            ``dev``, and ``test``] (Default: ``train-360``).
+        subset (str, optional): The subset to use. Options: [``"train-360"``, ``"train-100"``,
+            ``"dev"``, and ``"test"``] (Default: ``"train-360"``).
        num_speakers (int, optional): The number of speakers, which determines the directories
            to traverse. The Dataset will traverse ``s1`` to ``sN`` directories to collect
            N source audios. (Default: 2)
@@ -23,8 +23,8 @@ class LibriMix(Dataset):
            which subdirectory the audio are fetched. If any of the audio has a different sample
            rate, raises ``ValueError``. Options: [8000, 16000] (Default: 8000)
        task (str, optional): the task of LibriMix.
-            Options: [``enh_single``, ``enh_both``, ``sep_clean``, ``sep_noisy``]
-            (Default: ``sep_clean``)
+            Options: [``"enh_single"``, ``"enh_both"``, ``"sep_clean"``, ``"sep_noisy"``]
+            (Default: ``"sep_clean"``)

    Note:
        The LibriMix dataset needs to be manually generated. Please check https://github.com/JorisCos/LibriMix
@@ -81,6 +81,13 @@ class LibriMix(Dataset):
        Args:
            key (int): The index of the sample to be loaded
        Returns:
-            (int, Tensor, List[Tensor]): ``(sample_rate, mix_waveform, list_of_source_waveforms)``
+            Tuple of the following items;
+
+            int:
+                Sample rate
+            Tensor:
+                Mixture waveform
+            list of Tensors:
+                List of source waveforms
        """
        return self._load_sample(self.files[key])
--- a/torchaudio/datasets/librispeech.py
+++ b/torchaudio/datasets/librispeech.py
@@ -75,7 +75,7 @@ def _get_librispeech_metadata(


 class LIBRISPEECH(Dataset):
-    """Create a Dataset for *LibriSpeech* :cite:`7178964`.
+    """*LibriSpeech* :cite:`7178964` dataset.

    Args:
        root (str or Path): Path to the directory where the dataset is found or downloaded.
@@ -126,8 +126,20 @@ class LIBRISPEECH(Dataset):
            n (int): The index of the sample to be loaded

        Returns:
-            (str, int, str, int, int, int):
-            ``(filepath, sample_rate, transcript, speaker_id, chapter_id, utterance_id)``
+            Tuple of the following items;
+
+            str:
+                Path to audio
+            int:
+                Sample rate
+            str:
+                Transcript
+            int:
+                Speaker ID
+            int:
+                Chapter ID
+            int:
+                Utterance ID
        """
        fileid = self._walker[n]
        return _get_librispeech_metadata(fileid, self._archive, self._url, self._ext_audio, self._ext_txt)
@@ -139,8 +151,20 @@ class LIBRISPEECH(Dataset):
            n (int): The index of the sample to be loaded

        Returns:
-            (Tensor, int, str, int, int, int):
-            ``(waveform, sample_rate, transcript, speaker_id, chapter_id, utterance_id)``
+            Tuple of the following items;
+
+            Tensor:
+                Waveform
+            int:
+                Sample rate
+            str:
+                Transcript
+            int:
+                Speaker ID
+            int:
+                Chapter ID
+            int:
+                Utterance ID
        """
        metadata = self.get_metadata(n)
        waveform = _load_waveform(self._archive, metadata[0], metadata[1])

--- a/torchaudio/datasets/libritts.py
+++ b/torchaudio/datasets/libritts.py
@@ -63,7 +63,7 @@ def load_libritts_item(


 class LIBRITTS(Dataset):
-    """Create a Dataset for *LibriTTS* :cite:`Zen2019LibriTTSAC`.
+    """*LibriTTS* :cite:`Zen2019LibriTTSAC` dataset.

    Args:
        root (str or Path): Path to the directory where the dataset is found or downloaded.
@@ -138,8 +138,22 @@ class LIBRITTS(Dataset):
            n (int): The index of the sample to be loaded

        Returns:
-            (Tensor, int, str, str, str, int, int, str):
-            ``(waveform, sample_rate, original_text, normalized_text, speaker_id, chapter_id, utterance_id)``
+            Tuple of the following items;
+
+            Tensor:
+                Waveform
+            int:
+                Sample rate
+            str:
+                Original text
+            str:
+                Normalized text
+            int:
+                Speaker ID
+            int:
+                Chapter ID
+            str:
+                Utterance ID
        """
        fileid = self._walker[n]
        return load_libritts_item(

--- a/torchaudio/datasets/ljspeech.py
+++ b/torchaudio/datasets/ljspeech.py
@@ -20,7 +20,7 @@ _RELEASE_CONFIGS = {


 class LJSPEECH(Dataset):
-    """Create a Dataset for *LJSpeech-1.1* :cite:`ljspeech17`.
+    """*LJSpeech-1.1* :cite:`ljspeech17` dataset.

    Args:
        root (str or Path): Path to the directory where the dataset is found or downloaded.
@@ -78,8 +78,16 @@ class LJSPEECH(Dataset):
            n (int): The index of the sample to be loaded

        Returns:
-            (Tensor, int, str, str):
-            ``(waveform, sample_rate, transcript, normalized_transcript)``
+            Tuple of the following items;
+
+            Tensor:
+                Waveform
+            int:
+                Sample rate
+            str:
+                Transcript
+            str:
+                Normalized Transcript
        """
        line = self._flist[n]
        fileid, transcript, normalized_transcript = line

--- a/torchaudio/datasets/musdb_hq.py
+++ b/torchaudio/datasets/musdb_hq.py
@@ -31,7 +31,7 @@ _VALIDATION_SET = [


 class MUSDB_HQ(Dataset):
-    """Create *MUSDB_HQ* :cite:`MUSDB18HQ` Dataset
+    """*MUSDB_HQ* :cite:`MUSDB18HQ` dataset.

    Args:
        root (str or Path): Root directory where the dataset's top level directory is found
@@ -122,7 +122,16 @@ class MUSDB_HQ(Dataset):
        Args:
            n (int): The index of the sample to be loaded
        Returns:
-            (Tensor, int, int, str): ``(waveforms, sample_rate, num_frames, track_name)``
+            Tuple of the following items;
+
+            Tensor:
+                Waveform
+            int:
+                Sample rate
+            int:
+                Num frames
+            str:
+                Track name
        """
        return self._load_sample(n)


--- a/torchaudio/datasets/quesst14.py
+++ b/torchaudio/datasets/quesst14.py
@@ -23,7 +23,7 @@ _LANGUAGES = [


 class QUESST14(Dataset):
-    """Create *QUESST14* :cite:`Mir2015QUESST2014EQ` Dataset
+    """*QUESST14* :cite:`Mir2015QUESST2014EQ` dataset.

    Args:
        root (str or Path): Root directory where the dataset's top level directory is found
@@ -79,8 +79,14 @@ class QUESST14(Dataset):
            n (int): The index of the sample to be loaded

        Returns:
-            (str, int, str):
-            ``(filepath, sample_rate, file_name)``
+            Tuple of the following items;
+
+            str:
+                Path to audio
+            int:
+                Sample rate
+            str:
+                File name
        """
        audio_path = self.data[n]
        relpath = os.path.relpath(audio_path, self._path)
@@ -93,7 +99,14 @@ class QUESST14(Dataset):
            n (int): The index of the sample to be loaded

        Returns:
-            (Tensor, int, str): ``(waveform, sample_rate, file_name)``
+            Tuple of the following items;
+
+            Tensor:
+                Waveform
+            int:
+                Sample rate
+            str:
+                File name
        """
        metadata = self.get_metadata(n)
        waveform = _load_waveform(self._path, metadata[0], metadata[1])

--- a/torchaudio/datasets/speechcommands.py
+++ b/torchaudio/datasets/speechcommands.py
@@ -48,7 +48,7 @@ def _get_speechcommands_metadata(filepath: str, path: str) -> Tuple[str, int, st


 class SPEECHCOMMANDS(Dataset):
-    """Create a Dataset for *Speech Commands* :cite:`speechcommandsv2`.
+    """*Speech Commands* :cite:`speechcommandsv2` dataset.

    Args:
        root (str or Path): Path to the directory where the dataset is found or downloaded.
@@ -139,8 +139,18 @@ class SPEECHCOMMANDS(Dataset):
            n (int): The index of the sample to be loaded

        Returns:
-            (str, int, str, str, int):
-            ``(filepath, sample_rate, label, speaker_id, utterance_number)``
+            Tuple of the following items;
+
+            str:
+                Path to the audio
+            int:
+                Sample rate
+            str:
+                Label
+            str:
+                Speaker ID
+            int:
+                Utterance number
        """
        fileid = self._walker[n]
        return _get_speechcommands_metadata(fileid, self._archive)
@@ -152,8 +162,18 @@ class SPEECHCOMMANDS(Dataset):
            n (int): The index of the sample to be loaded

        Returns:
-            (Tensor, int, str, str, int):
-            ``(waveform, sample_rate, label, speaker_id, utterance_number)``
+            Tuple of the following items;
+
+            Tensor:
+                Waveform
+            int:
+                Sample rate
+            str:
+                Label
+            str:
+                Speaker ID
+            int:
+                Utterance number
        """
        metadata = self.get_metadata(n)
        waveform = _load_waveform(self._archive, metadata[0], metadata[1])

--- a/torchaudio/datasets/tedlium.py
+++ b/torchaudio/datasets/tedlium.py
@@ -41,8 +41,7 @@ _RELEASE_CONFIGS = {


 class TEDLIUM(Dataset):
-    """
-    Create a Dataset for *Tedlium* :cite:`rousseau2012tedlium`. It supports releases 1,2 and 3.
+    """*Tedlium* :cite:`rousseau2012tedlium` dataset (releases 1,2 and 3).

    Args:
        root (str or Path): Path to the directory where the dataset is found or downloaded.
@@ -178,7 +177,20 @@ class TEDLIUM(Dataset):
            n (int): The index of the sample to be loaded

        Returns:
-            tuple: ``(waveform, sample_rate, transcript, talk_id, speaker_id, identifier)``
+            Tuple of the following items;
+
+            Tensor:
+                Waveform
+            int:
+                Sample rate
+            str:
+                Transcript
+            int:
+                Talk ID
+            int:
+                Speaker ID
+            int:
+                Identifier
        """
        fileid, line = self._filelist[n]
        return self._load_tedlium_item(fileid, line, self._path)

--- a/torchaudio/datasets/vctk.py
+++ b/torchaudio/datasets/vctk.py
@@ -17,7 +17,7 @@ SampleType = Tuple[Tensor, int, str, str, str]


 class VCTK_092(Dataset):
-    """Create *VCTK 0.92* :cite:`yamagishi2019vctk` Dataset
+    """*VCTK 0.92* :cite:`yamagishi2019vctk` dataset

    Args:
        root (str): Root directory where the dataset's top level directory is found.
@@ -123,8 +123,18 @@ class VCTK_092(Dataset):
            n (int): The index of the sample to be loaded

        Returns:
-            (Tensor, int, str, str, str):
-            ``(waveform, sample_rate, transcript, speaker_id, utterance_id)``
+            Tuple of the following items;
+
+            Tensor:
+                Waveform
+            int:
+                Sample rate
+            str:
+                Transcript
+            str:
+                Speaker ID
+            std:
+                Utterance ID
        """
        speaker_id, utterance_id = self._sample_ids[n]
        return self._load_sample(speaker_id, utterance_id, self._mic_id)

--- a/torchaudio/datasets/voxceleb1.py
+++ b/torchaudio/datasets/voxceleb1.py
@@ -90,7 +90,7 @@ def _get_file_id(file_path: str, _ext_audio: str):


 class VoxCeleb1(Dataset):
-    """Create *VoxCeleb1* :cite:`nagrani2017voxceleb` Dataset.
+    """*VoxCeleb1* :cite:`nagrani2017voxceleb` dataset.

    Args:
        root (str or Path): Path to the directory where the dataset is found or downloaded.
@@ -122,7 +122,8 @@ class VoxCeleb1(Dataset):


 class VoxCeleb1Identification(VoxCeleb1):
-    """Create *VoxCeleb1* :cite:`nagrani2017voxceleb` Dataset for speaker identification task.
+    """*VoxCeleb1* :cite:`nagrani2017voxceleb` dataset for speaker identification task.
+
    Each data sample contains the waveform, sample rate, speaker id, and the file id.

    Args:
@@ -156,8 +157,16 @@ class VoxCeleb1Identification(VoxCeleb1):
            n (int): The index of the sample

        Returns:
-            (str, int, int, str):
-            ``(filepath, sample_rate, speaker_id, file_id)``
+            Tuple of the following items;
+
+            str:
+                Path to audio
+            int:
+                Sample rate
+            int:
+                Speaker ID
+            str:
+                File ID
        """
        file_path = self._flist[n]
        file_id = _get_file_id(file_path, self._ext_audio)
@@ -172,8 +181,16 @@ class VoxCeleb1Identification(VoxCeleb1):
            n (int): The index of the sample to be loaded

        Returns:
-            (Tensor, int, int, str):
-            ``(waveform, sample_rate, speaker_id, file_id)``
+            Tuple of the following items;
+
+            Tensor:
+                Waveform
+            int:
+                Sample rate
+            int:
+                Speaker ID
+            str:
+                File ID
        """
        metadata = self.get_metadata(n)
        waveform = _load_waveform(self._path, metadata[0], metadata[1])
@@ -184,7 +201,8 @@ class VoxCeleb1Identification(VoxCeleb1):


 class VoxCeleb1Verification(VoxCeleb1):
-    """Create *VoxCeleb1* :cite:`nagrani2017voxceleb` Dataset for speaker verification task.
+    """*VoxCeleb1* :cite:`nagrani2017voxceleb` dataset for speaker verification task.
+
    Each data sample contains a pair of waveforms, sample rate, the label indicating if they are
    from the same speaker, and the file ids.

@@ -215,8 +233,20 @@ class VoxCeleb1Verification(VoxCeleb1):
            n (int): The index of the sample

        Returns:
-            (str, str, int, int, str, str):
-            ``(filepath_spk1, filepath_spk2, sample_rate, label, file_id_spk1, file_id_spk2)``
+            Tuple of the following items;
+
+            str:
+                Path to audio file of speaker 1
+            str:
+                Path to audio file of speaker 2
+            int:
+                Sample rate
+            int:
+                Label
+            str:
+                File ID of speaker 1
+            str:
+                File ID of speaker 2
        """
        label, file_path_spk1, file_path_spk2 = self._flist[n]
        label = int(label)
@@ -231,8 +261,20 @@ class VoxCeleb1Verification(VoxCeleb1):
            n (int): The index of the sample to be loaded.

        Returns:
-            (Tensor, Tensor, int, int, str, str):
-            ``(waveform_spk1, waveform_spk2, sample_rate, label, file_id_spk1, file_id_spk2)``
+            Tuple of the following items;
+
+            Tensor:
+                Waveform of speaker 1
+            Tensor:
+                Waveform of speaker 2
+            int:
+                Sample rate
+            int:
+                Label
+            str:
+                File ID of speaker 1
+            str:
+                File ID of speaker 2
        """
        metadata = self.get_metadata(n)
        waveform_spk1 = _load_waveform(self._path, metadata[0], metadata[2])