Commit 49b23e15 authored by moto's avatar moto Committed by Facebook GitHub Bot
Browse files

Adopt `:autosummary:` in `torchaudio.datasets` module doc (#2692)

Summary:
* Introduce the mini-index at `torchaudio.datasets` page.
* Standardize the format of return type docstring.

https://output.circle-artifacts.com/output/job/989328b2-0270-4958-b577-19cf749af3fd/artifacts/0/docs/datasets.html

<img width="936" alt="Screen Shot 2022-09-21 at 6 56 52 PM" src="https://user-images.githubusercontent.com/855818/191475141-a97f2bea-705f-49bc-8c34-6ec869e76793.png">

https://output.circle-artifacts.com/output/job/989328b2-0270-4958-b577-19cf749af3fd/artifacts/0/docs/generated/torchaudio.datasets.CMUDict.html#torchaudio.datasets.CMUDict

<img width="1069" alt="Screen Shot 2022-09-21 at 6 57 32 PM" src="https://user-images.githubusercontent.com/855818/191475293-e3302528-27ea-4212-9c12-fd6d900fdf3e.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2692

Reviewed By: carolineechen

Differential Revision: D39687463

Pulled By: mthrok

fbshipit-source-id: 4175fc15388817d2fe76206188618dd1576281df
parent 709b4439
..
autogenerated from source/_templates/autosummary/dataset_class.rst
{{ name | underline }}
.. autoclass:: {{ fullname }}
{%- if "get_metadata" in methods %}
{%- set meth=["__getitem__", "get_metadata"] %}
{%- else %}
{%- set meth=["__getitem__"] %}
{%- endif %}
{%- if name == "CMUDict" %}
{%- set properties=["symbols"] %}
{%- elif name == "TEDLIUM" %}
{%- set properties=["phoneme_dict"] %}
{%- else %}
{%- set properties=[] %}
{%- endif %}
{%- if properties %}
Properties
==========
{% for item in properties %}
{{item | underline("-") }}
.. container:: py attribute
.. autoproperty:: {{[fullname, item] | join('.')}}
{%- endfor %}
{%- endif %}
{%- if properties %}
Methods
=======
{%- endif %}
{% for item in meth %}
{{item | underline("-") }}
.. container:: py attribute
.. automethod:: {{[fullname, item] | join('.')}}
{%- endfor %}
.. py:module:: torchaudio.datasets
torchaudio.datasets torchaudio.datasets
==================== ====================
.. py:module:: torchaudio.datasets
All datasets are subclasses of :class:`torch.utils.data.Dataset` All datasets are subclasses of :class:`torch.utils.data.Dataset`
and have ``__getitem__`` and ``__len__`` methods implemented. and have ``__getitem__`` and ``__len__`` methods implemented.
Hence, they can all be passed to a :class:`torch.utils.data.DataLoader` Hence, they can all be passed to a :class:`torch.utils.data.DataLoader`
which can load multiple samples parallelly using ``torch.multiprocessing`` workers. which can load multiple samples parallelly using :mod:`torch.multiprocessing` workers.
For example: :: For example:
.. code::
yesno_data = torchaudio.datasets.YESNO('.', download=True) yesno_data = torchaudio.datasets.YESNO('.', download=True)
data_loader = torch.utils.data.DataLoader(yesno_data, data_loader = torch.utils.data.DataLoader(
yesno_data,
batch_size=1, batch_size=1,
shuffle=True, shuffle=True,
num_workers=args.nThreads) num_workers=args.nThreads)
.. currentmodule:: torchaudio.datasets .. currentmodule:: torchaudio.datasets
.. autosummary::
CMUARCTIC :toctree: generated
~~~~~~~~~ :nosignatures:
:template: autosummary/dataset_class.rst
.. autoclass:: CMUARCTIC
:members: CMUARCTIC
:special-members: __getitem__ CMUDict
COMMONVOICE
DR_VCTK
CMUDict FluentSpeechCommands
~~~~~~~~~ GTZAN
LibriMix
.. autoclass:: CMUDict LIBRISPEECH
:members: LibriLightLimited
:special-members: __getitem__ LIBRITTS
LJSPEECH
MUSDB_HQ
COMMONVOICE QUESST14
~~~~~~~~~~~ SPEECHCOMMANDS
TEDLIUM
.. autoclass:: COMMONVOICE VCTK_092
:members: VoxCeleb1Identification
:special-members: __getitem__ VoxCeleb1Verification
YESNO
GTZAN
~~~~~
.. autoclass:: GTZAN
:members:
:special-members: __getitem__
LibriMix
~~~~~~~~
.. autoclass:: LibriMix
:members:
:special-members: __getitem__
LIBRISPEECH
~~~~~~~~~~~
.. autoclass:: LIBRISPEECH
:members:
:special-members: __getitem__
LibriLightLimited
~~~~~~~~~~~~~~~~~
.. autoclass:: LibriLightLimited
:members:
:special-members: __getitem__
LIBRITTS
~~~~~~~~
.. autoclass:: LIBRITTS
:members:
:special-members: __getitem__
LJSPEECH
~~~~~~~~
.. autoclass:: LJSPEECH
:members:
:special-members: __getitem__
SPEECHCOMMANDS
~~~~~~~~~~~~~~
.. autoclass:: SPEECHCOMMANDS
:members:
:special-members: __getitem__
TEDLIUM
~~~~~~~~~~~~~~
.. autoclass:: TEDLIUM
:members:
:special-members: __getitem__
VCTK_092
~~~~~~~~
.. autoclass:: VCTK_092
:members:
:special-members: __getitem__
VoxCeleb1Identification
~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: VoxCeleb1Identification
:members:
:special-members: __getitem__
VoxCeleb1Verification
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: VoxCeleb1Verification
:members:
:special-members: __getitem__
DR_VCTK
~~~~~~~~
.. autoclass:: DR_VCTK
:members:
:special-members: __getitem__
YESNO
~~~~~
.. autoclass:: YESNO
:members:
:special-members: __getitem__
QUESST14
~~~~~~~~
.. autoclass:: QUESST14
:members:
:special-members: __getitem__
FluentSpeechCommands
~~~~~~~~~~~~~~~~~~~~
.. autoclass:: FluentSpeechCommands
:members:
:special-members: __getitem__
MUSDB_HQ
~~~~~~~~
.. autoclass:: MUSDB_HQ
:members:
:special-members: __getitem__
...@@ -31,7 +31,6 @@ print(torchaudio.__version__) ...@@ -31,7 +31,6 @@ print(torchaudio.__version__)
# ------------------------------------------------------------------------------- # -------------------------------------------------------------------------------
# Preparation of data and helper functions. # Preparation of data and helper functions.
# ------------------------------------------------------------------------------- # -------------------------------------------------------------------------------
import multiprocessing
import os import os
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
...@@ -46,7 +45,7 @@ os.makedirs(YESNO_DATASET_PATH, exist_ok=True) ...@@ -46,7 +45,7 @@ os.makedirs(YESNO_DATASET_PATH, exist_ok=True)
def plot_specgram(waveform, sample_rate, title="Spectrogram", xlim=None): def plot_specgram(waveform, sample_rate, title="Spectrogram", xlim=None):
waveform = waveform.numpy() waveform = waveform.numpy()
num_channels, num_frames = waveform.shape num_channels, _ = waveform.shape
figure, axes = plt.subplots(num_channels, 1) figure, axes = plt.subplots(num_channels, 1)
if num_channels == 1: if num_channels == 1:
...@@ -64,7 +63,7 @@ def plot_specgram(waveform, sample_rate, title="Spectrogram", xlim=None): ...@@ -64,7 +63,7 @@ def plot_specgram(waveform, sample_rate, title="Spectrogram", xlim=None):
def play_audio(waveform, sample_rate): def play_audio(waveform, sample_rate):
waveform = waveform.numpy() waveform = waveform.numpy()
num_channels, num_frames = waveform.shape num_channels, _ = waveform.shape
if num_channels == 1: if num_channels == 1:
display(Audio(waveform[0], rate=sample_rate)) display(Audio(waveform[0], rate=sample_rate))
elif num_channels == 2: elif num_channels == 2:
...@@ -75,7 +74,7 @@ def play_audio(waveform, sample_rate): ...@@ -75,7 +74,7 @@ def play_audio(waveform, sample_rate):
###################################################################### ######################################################################
# Here, we show how to use the # Here, we show how to use the
# :py:func:`torchaudio.datasets.YESNO` dataset. # :py:class:`torchaudio.datasets.YESNO` dataset.
# #
......
...@@ -49,7 +49,7 @@ def load_cmuarctic_item(line: str, path: str, folder_audio: str, ext_audio: str) ...@@ -49,7 +49,7 @@ def load_cmuarctic_item(line: str, path: str, folder_audio: str, ext_audio: str)
class CMUARCTIC(Dataset): class CMUARCTIC(Dataset):
"""Create a Dataset for *CMU ARCTIC* :cite:`Kominek03cmuarctic`. """*CMU ARCTIC* :cite:`Kominek03cmuarctic` dataset.
Args: Args:
root (str or Path): Path to the directory where the dataset is found or downloaded. root (str or Path): Path to the directory where the dataset is found or downloaded.
...@@ -139,7 +139,16 @@ class CMUARCTIC(Dataset): ...@@ -139,7 +139,16 @@ class CMUARCTIC(Dataset):
n (int): The index of the sample to be loaded n (int): The index of the sample to be loaded
Returns: Returns:
(Tensor, int, str, str): ``(waveform, sample_rate, transcript, utterance_id)`` Tuple of the following items;
Tensor:
Waveform
int:
Sample rate
str:
Transcript
str:
Utterance ID
""" """
line = self._walker[n] line = self._walker[n]
return load_cmuarctic_item(line, self._path, self._folder_audio, self._ext_audio) return load_cmuarctic_item(line, self._path, self._folder_audio, self._ext_audio)
......
...@@ -104,7 +104,7 @@ def _parse_dictionary(lines: Iterable[str], exclude_punctuations: bool) -> List[ ...@@ -104,7 +104,7 @@ def _parse_dictionary(lines: Iterable[str], exclude_punctuations: bool) -> List[
class CMUDict(Dataset): class CMUDict(Dataset):
"""Create a Dataset for *CMU Pronouncing Dictionary* :cite:`cmudict` (CMUDict). """*CMU Pronouncing Dictionary* :cite:`cmudict` (CMUDict) dataset.
Args: Args:
root (str or Path): Path to the directory where the dataset is found or downloaded. root (str or Path): Path to the directory where the dataset is found or downloaded.
...@@ -169,8 +169,12 @@ class CMUDict(Dataset): ...@@ -169,8 +169,12 @@ class CMUDict(Dataset):
n (int): The index of the sample to be loaded. n (int): The index of the sample to be loaded.
Returns: Returns:
(str, List[str]): The corresponding word and phonemes ``(word, [phonemes])``. Tuple of a word and its phonemes
str:
Word
List[str]:
Phonemes
""" """
return self._dictionary[n] return self._dictionary[n]
...@@ -179,5 +183,5 @@ class CMUDict(Dataset): ...@@ -179,5 +183,5 @@ class CMUDict(Dataset):
@property @property
def symbols(self) -> List[str]: def symbols(self) -> List[str]:
"""list[str]: A list of phonemes symbols, such as `AA`, `AE`, `AH`.""" """list[str]: A list of phonemes symbols, such as ``"AA"``, ``"AE"``, ``"AH"``."""
return self._symbols.copy() return self._symbols.copy()
...@@ -28,7 +28,7 @@ def load_commonvoice_item( ...@@ -28,7 +28,7 @@ def load_commonvoice_item(
class COMMONVOICE(Dataset): class COMMONVOICE(Dataset):
"""Create a Dataset for *CommonVoice* :cite:`ardila2020common`. """*CommonVoice* :cite:`ardila2020common` dataset.
Args: Args:
root (str or Path): Path to the directory where the dataset is located. root (str or Path): Path to the directory where the dataset is located.
...@@ -61,9 +61,23 @@ class COMMONVOICE(Dataset): ...@@ -61,9 +61,23 @@ class COMMONVOICE(Dataset):
n (int): The index of the sample to be loaded n (int): The index of the sample to be loaded
Returns: Returns:
(Tensor, int, Dict[str, str]): ``(waveform, sample_rate, dictionary)``, where dictionary Tuple of the following items;
is built from the TSV file with the following keys: ``client_id``, ``path``, ``sentence``,
``up_votes``, ``down_votes``, ``age``, ``gender`` and ``accent``. Tensor:
Waveform
int:
Sample rate
Dict[str, str]:
Dictionary containing the following items from the corresponding TSV file;
* ``"client_id"``
* ``"path"``
* ``"sentence"``
* ``"up_votes"``
* ``"down_votes"``
* ``"age"``
* ``"gender"``
* ``"accent"``
""" """
line = self._walker[n] line = self._walker[n]
return load_commonvoice_item(line, self._header, self._path, self._folder_audio, self._ext_audio) return load_commonvoice_item(line, self._header, self._path, self._folder_audio, self._ext_audio)
......
...@@ -14,7 +14,7 @@ _SUPPORTED_SUBSETS = {"train", "test"} ...@@ -14,7 +14,7 @@ _SUPPORTED_SUBSETS = {"train", "test"}
class DR_VCTK(Dataset): class DR_VCTK(Dataset):
"""Create a dataset for *Device Recorded VCTK (Small subset version)* :cite:`Sarfjoo2018DeviceRV`. """*Device Recorded VCTK (Small subset version)* :cite:`Sarfjoo2018DeviceRV` dataset.
Args: Args:
root (str or Path): Root directory where the dataset's top level directory is found. root (str or Path): Root directory where the dataset's top level directory is found.
...@@ -95,9 +95,24 @@ class DR_VCTK(Dataset): ...@@ -95,9 +95,24 @@ class DR_VCTK(Dataset):
n (int): The index of the sample to be loaded n (int): The index of the sample to be loaded
Returns: Returns:
(Tensor, int, Tensor, int, str, str, str, int): Tuple of the following items;
``(waveform_clean, sample_rate_clean, waveform_noisy, sample_rate_noisy, speaker_id,\
utterance_id, source, channel_id)`` Tensor:
Clean waveform
int:
Sample rate of the clean waveform
Tensor:
Noisy waveform
int:
Sample rate of the noisy waveform
str:
Speaker ID
str:
Utterance ID
str:
Source
int:
Channel ID
""" """
filename = self._filename_list[n] filename = self._filename_list[n]
return self._load_dr_vctk_item(filename) return self._load_dr_vctk_item(filename)
......
...@@ -11,11 +11,12 @@ SAMPLE_RATE = 16000 ...@@ -11,11 +11,12 @@ SAMPLE_RATE = 16000
class FluentSpeechCommands(Dataset): class FluentSpeechCommands(Dataset):
"""Create *Fluent Speech Commands* :cite:`fluent` Dataset """*Fluent Speech Commands* :cite:`fluent` dataset
Args: Args:
root (str of Path): Path to the directory where the dataset is found. root (str of Path): Path to the directory where the dataset is found.
subset (str, optional): subset of the dataset to use. Options: [`"train"`, `"valid"`, `"test"`]. subset (str, optional): subset of the dataset to use.
Options: [``"train"``, ``"valid"``, ``"test"``].
(Default: ``"train"``) (Default: ``"train"``)
""" """
...@@ -45,8 +46,24 @@ class FluentSpeechCommands(Dataset): ...@@ -45,8 +46,24 @@ class FluentSpeechCommands(Dataset):
n (int): The index of the sample to be loaded n (int): The index of the sample to be loaded
Returns: Returns:
(str, int, str, int, str, str, str, str): Tuple of the following items;
``(filepath, sample_rate, file_name, speaker_id, transcription, action, object, location)``
str:
Path to audio
int:
Sample rate
str:
File name
int:
Speaker ID
str:
Transcription
str:
Action
str:
Object
str:
Location
""" """
sample = self.data[n] sample = self.data[n]
...@@ -67,8 +84,24 @@ class FluentSpeechCommands(Dataset): ...@@ -67,8 +84,24 @@ class FluentSpeechCommands(Dataset):
n (int): The index of the sample to be loaded n (int): The index of the sample to be loaded
Returns: Returns:
(Tensor, int, str, int, str, str, str, str): Tuple of the following items;
``(waveform, sample_rate, file_name, speaker_id, transcription, action, object, location)``
Tensor:
Waveform
int:
Sample rate
str:
File name
int:
Speaker ID
str:
Transcription
str:
Action
str:
Object
str:
Location
""" """
metadata = self.get_metadata(n) metadata = self.get_metadata(n)
waveform = _load_waveform(self._path, metadata[0], metadata[1]) waveform = _load_waveform(self._path, metadata[0], metadata[1])
......
...@@ -996,7 +996,7 @@ def load_gtzan_item(fileid: str, path: str, ext_audio: str) -> Tuple[Tensor, str ...@@ -996,7 +996,7 @@ def load_gtzan_item(fileid: str, path: str, ext_audio: str) -> Tuple[Tensor, str
class GTZAN(Dataset): class GTZAN(Dataset):
"""Create a Dataset for *GTZAN* :cite:`tzanetakis_essl_cook_2001`. """*GTZAN* :cite:`tzanetakis_essl_cook_2001` dataset.
Note: Note:
Please see http://marsyas.info/downloads/datasets.html if you are planning to use Please see http://marsyas.info/downloads/datasets.html if you are planning to use
...@@ -1096,7 +1096,14 @@ class GTZAN(Dataset): ...@@ -1096,7 +1096,14 @@ class GTZAN(Dataset):
n (int): The index of the sample to be loaded n (int): The index of the sample to be loaded
Returns: Returns:
(Tensor, int, str): ``(waveform, sample_rate, label)`` Tuple of the following items;
Tensor:
Waveform
int:
Sample rate
str:
Label
""" """
fileid = self._walker[n] fileid = self._walker[n]
item = load_gtzan_item(fileid, self._path, self._ext_audio) item = load_gtzan_item(fileid, self._path, self._ext_audio)
......
...@@ -34,13 +34,13 @@ def _get_fileids_paths(path, folders, _ext_audio) -> List[Tuple[str, str]]: ...@@ -34,13 +34,13 @@ def _get_fileids_paths(path, folders, _ext_audio) -> List[Tuple[str, str]]:
class LibriLightLimited(Dataset): class LibriLightLimited(Dataset):
"""Create a Dataset for LibriLightLimited, which is the supervised subset of """Subset of Libri-light :cite:`librilight` dataset,
LibriLight dataset. which was used in HuBERT :cite:`hsu2021hubert` for supervised fine-tuning.
Args: Args:
root (str or Path): Path to the directory where the dataset is found or downloaded. root (str or Path): Path to the directory where the dataset is found or downloaded.
subset (str, optional): The subset to use. Options: [``10min``, ``1h``, ``10h``] subset (str, optional): The subset to use. Options: [``"10min"``, ``"1h"``, ``"10h"``]
(Default: ``10min``). (Default: ``"10min"``).
download (bool, optional): download (bool, optional):
Whether to download the dataset if it is not found at root path. (default: ``False``). Whether to download the dataset if it is not found at root path. (default: ``False``).
""" """
...@@ -75,8 +75,20 @@ class LibriLightLimited(Dataset): ...@@ -75,8 +75,20 @@ class LibriLightLimited(Dataset):
Args: Args:
n (int): The index of the sample to be loaded n (int): The index of the sample to be loaded
Returns: Returns:
(Tensor, int, str, int, int, int): Tuple of the following items;
``(waveform, sample_rate, transcript, speaker_id, chapter_id, utterance_id)``
Tensor:
Waveform
int:
Sample rate
str:
Transcript
int:
Speaker ID
int:
Chapter ID
int:
Utterance ID
""" """
file_path, fileid = self._fileids_paths[n] file_path, fileid = self._fileids_paths[n]
metadata = _get_librispeech_metadata(fileid, self._path, file_path, self._ext_audio, self._ext_txt) metadata = _get_librispeech_metadata(fileid, self._path, file_path, self._ext_audio, self._ext_txt)
......
...@@ -9,13 +9,13 @@ SampleType = Tuple[int, torch.Tensor, List[torch.Tensor]] ...@@ -9,13 +9,13 @@ SampleType = Tuple[int, torch.Tensor, List[torch.Tensor]]
class LibriMix(Dataset): class LibriMix(Dataset):
r"""Create the *LibriMix* :cite:`cosentino2020librimix` dataset. r"""*LibriMix* :cite:`cosentino2020librimix` dataset.
Args: Args:
root (str or Path): The path to the directory where the directory ``Libri2Mix`` or root (str or Path): The path to the directory where the directory ``Libri2Mix`` or
``Libri3Mix`` is stored. ``Libri3Mix`` is stored.
subset (str, optional): The subset to use. Options: [``train-360``, ``train-100``, subset (str, optional): The subset to use. Options: [``"train-360"``, ``"train-100"``,
``dev``, and ``test``] (Default: ``train-360``). ``"dev"``, and ``"test"``] (Default: ``"train-360"``).
num_speakers (int, optional): The number of speakers, which determines the directories num_speakers (int, optional): The number of speakers, which determines the directories
to traverse. The Dataset will traverse ``s1`` to ``sN`` directories to collect to traverse. The Dataset will traverse ``s1`` to ``sN`` directories to collect
N source audios. (Default: 2) N source audios. (Default: 2)
...@@ -23,8 +23,8 @@ class LibriMix(Dataset): ...@@ -23,8 +23,8 @@ class LibriMix(Dataset):
which subdirectory the audio are fetched. If any of the audio has a different sample which subdirectory the audio are fetched. If any of the audio has a different sample
rate, raises ``ValueError``. Options: [8000, 16000] (Default: 8000) rate, raises ``ValueError``. Options: [8000, 16000] (Default: 8000)
task (str, optional): the task of LibriMix. task (str, optional): the task of LibriMix.
Options: [``enh_single``, ``enh_both``, ``sep_clean``, ``sep_noisy``] Options: [``"enh_single"``, ``"enh_both"``, ``"sep_clean"``, ``"sep_noisy"``]
(Default: ``sep_clean``) (Default: ``"sep_clean"``)
Note: Note:
The LibriMix dataset needs to be manually generated. Please check https://github.com/JorisCos/LibriMix The LibriMix dataset needs to be manually generated. Please check https://github.com/JorisCos/LibriMix
...@@ -81,6 +81,13 @@ class LibriMix(Dataset): ...@@ -81,6 +81,13 @@ class LibriMix(Dataset):
Args: Args:
key (int): The index of the sample to be loaded key (int): The index of the sample to be loaded
Returns: Returns:
(int, Tensor, List[Tensor]): ``(sample_rate, mix_waveform, list_of_source_waveforms)`` Tuple of the following items;
int:
Sample rate
Tensor:
Mixture waveform
list of Tensors:
List of source waveforms
""" """
return self._load_sample(self.files[key]) return self._load_sample(self.files[key])
...@@ -75,7 +75,7 @@ def _get_librispeech_metadata( ...@@ -75,7 +75,7 @@ def _get_librispeech_metadata(
class LIBRISPEECH(Dataset): class LIBRISPEECH(Dataset):
"""Create a Dataset for *LibriSpeech* :cite:`7178964`. """*LibriSpeech* :cite:`7178964` dataset.
Args: Args:
root (str or Path): Path to the directory where the dataset is found or downloaded. root (str or Path): Path to the directory where the dataset is found or downloaded.
...@@ -126,8 +126,20 @@ class LIBRISPEECH(Dataset): ...@@ -126,8 +126,20 @@ class LIBRISPEECH(Dataset):
n (int): The index of the sample to be loaded n (int): The index of the sample to be loaded
Returns: Returns:
(str, int, str, int, int, int): Tuple of the following items;
``(filepath, sample_rate, transcript, speaker_id, chapter_id, utterance_id)``
str:
Path to audio
int:
Sample rate
str:
Transcript
int:
Speaker ID
int:
Chapter ID
int:
Utterance ID
""" """
fileid = self._walker[n] fileid = self._walker[n]
return _get_librispeech_metadata(fileid, self._archive, self._url, self._ext_audio, self._ext_txt) return _get_librispeech_metadata(fileid, self._archive, self._url, self._ext_audio, self._ext_txt)
...@@ -139,8 +151,20 @@ class LIBRISPEECH(Dataset): ...@@ -139,8 +151,20 @@ class LIBRISPEECH(Dataset):
n (int): The index of the sample to be loaded n (int): The index of the sample to be loaded
Returns: Returns:
(Tensor, int, str, int, int, int): Tuple of the following items;
``(waveform, sample_rate, transcript, speaker_id, chapter_id, utterance_id)``
Tensor:
Waveform
int:
Sample rate
str:
Transcript
int:
Speaker ID
int:
Chapter ID
int:
Utterance ID
""" """
metadata = self.get_metadata(n) metadata = self.get_metadata(n)
waveform = _load_waveform(self._archive, metadata[0], metadata[1]) waveform = _load_waveform(self._archive, metadata[0], metadata[1])
......
...@@ -63,7 +63,7 @@ def load_libritts_item( ...@@ -63,7 +63,7 @@ def load_libritts_item(
class LIBRITTS(Dataset): class LIBRITTS(Dataset):
"""Create a Dataset for *LibriTTS* :cite:`Zen2019LibriTTSAC`. """*LibriTTS* :cite:`Zen2019LibriTTSAC` dataset.
Args: Args:
root (str or Path): Path to the directory where the dataset is found or downloaded. root (str or Path): Path to the directory where the dataset is found or downloaded.
...@@ -138,8 +138,22 @@ class LIBRITTS(Dataset): ...@@ -138,8 +138,22 @@ class LIBRITTS(Dataset):
n (int): The index of the sample to be loaded n (int): The index of the sample to be loaded
Returns: Returns:
(Tensor, int, str, str, str, int, int, str): Tuple of the following items;
``(waveform, sample_rate, original_text, normalized_text, speaker_id, chapter_id, utterance_id)``
Tensor:
Waveform
int:
Sample rate
str:
Original text
str:
Normalized text
int:
Speaker ID
int:
Chapter ID
str:
Utterance ID
""" """
fileid = self._walker[n] fileid = self._walker[n]
return load_libritts_item( return load_libritts_item(
......
...@@ -20,7 +20,7 @@ _RELEASE_CONFIGS = { ...@@ -20,7 +20,7 @@ _RELEASE_CONFIGS = {
class LJSPEECH(Dataset): class LJSPEECH(Dataset):
"""Create a Dataset for *LJSpeech-1.1* :cite:`ljspeech17`. """*LJSpeech-1.1* :cite:`ljspeech17` dataset.
Args: Args:
root (str or Path): Path to the directory where the dataset is found or downloaded. root (str or Path): Path to the directory where the dataset is found or downloaded.
...@@ -78,8 +78,16 @@ class LJSPEECH(Dataset): ...@@ -78,8 +78,16 @@ class LJSPEECH(Dataset):
n (int): The index of the sample to be loaded n (int): The index of the sample to be loaded
Returns: Returns:
(Tensor, int, str, str): Tuple of the following items;
``(waveform, sample_rate, transcript, normalized_transcript)``
Tensor:
Waveform
int:
Sample rate
str:
Transcript
str:
Normalized Transcript
""" """
line = self._flist[n] line = self._flist[n]
fileid, transcript, normalized_transcript = line fileid, transcript, normalized_transcript = line
......
...@@ -31,7 +31,7 @@ _VALIDATION_SET = [ ...@@ -31,7 +31,7 @@ _VALIDATION_SET = [
class MUSDB_HQ(Dataset): class MUSDB_HQ(Dataset):
"""Create *MUSDB_HQ* :cite:`MUSDB18HQ` Dataset """*MUSDB_HQ* :cite:`MUSDB18HQ` dataset.
Args: Args:
root (str or Path): Root directory where the dataset's top level directory is found root (str or Path): Root directory where the dataset's top level directory is found
...@@ -122,7 +122,16 @@ class MUSDB_HQ(Dataset): ...@@ -122,7 +122,16 @@ class MUSDB_HQ(Dataset):
Args: Args:
n (int): The index of the sample to be loaded n (int): The index of the sample to be loaded
Returns: Returns:
(Tensor, int, int, str): ``(waveforms, sample_rate, num_frames, track_name)`` Tuple of the following items;
Tensor:
Waveform
int:
Sample rate
int:
Num frames
str:
Track name
""" """
return self._load_sample(n) return self._load_sample(n)
......
...@@ -23,7 +23,7 @@ _LANGUAGES = [ ...@@ -23,7 +23,7 @@ _LANGUAGES = [
class QUESST14(Dataset): class QUESST14(Dataset):
"""Create *QUESST14* :cite:`Mir2015QUESST2014EQ` Dataset """*QUESST14* :cite:`Mir2015QUESST2014EQ` dataset.
Args: Args:
root (str or Path): Root directory where the dataset's top level directory is found root (str or Path): Root directory where the dataset's top level directory is found
...@@ -79,8 +79,14 @@ class QUESST14(Dataset): ...@@ -79,8 +79,14 @@ class QUESST14(Dataset):
n (int): The index of the sample to be loaded n (int): The index of the sample to be loaded
Returns: Returns:
(str, int, str): Tuple of the following items;
``(filepath, sample_rate, file_name)``
str:
Path to audio
int:
Sample rate
str:
File name
""" """
audio_path = self.data[n] audio_path = self.data[n]
relpath = os.path.relpath(audio_path, self._path) relpath = os.path.relpath(audio_path, self._path)
...@@ -93,7 +99,14 @@ class QUESST14(Dataset): ...@@ -93,7 +99,14 @@ class QUESST14(Dataset):
n (int): The index of the sample to be loaded n (int): The index of the sample to be loaded
Returns: Returns:
(Tensor, int, str): ``(waveform, sample_rate, file_name)`` Tuple of the following items;
Tensor:
Waveform
int:
Sample rate
str:
File name
""" """
metadata = self.get_metadata(n) metadata = self.get_metadata(n)
waveform = _load_waveform(self._path, metadata[0], metadata[1]) waveform = _load_waveform(self._path, metadata[0], metadata[1])
......
...@@ -48,7 +48,7 @@ def _get_speechcommands_metadata(filepath: str, path: str) -> Tuple[str, int, st ...@@ -48,7 +48,7 @@ def _get_speechcommands_metadata(filepath: str, path: str) -> Tuple[str, int, st
class SPEECHCOMMANDS(Dataset): class SPEECHCOMMANDS(Dataset):
"""Create a Dataset for *Speech Commands* :cite:`speechcommandsv2`. """*Speech Commands* :cite:`speechcommandsv2` dataset.
Args: Args:
root (str or Path): Path to the directory where the dataset is found or downloaded. root (str or Path): Path to the directory where the dataset is found or downloaded.
...@@ -139,8 +139,18 @@ class SPEECHCOMMANDS(Dataset): ...@@ -139,8 +139,18 @@ class SPEECHCOMMANDS(Dataset):
n (int): The index of the sample to be loaded n (int): The index of the sample to be loaded
Returns: Returns:
(str, int, str, str, int): Tuple of the following items;
``(filepath, sample_rate, label, speaker_id, utterance_number)``
str:
Path to the audio
int:
Sample rate
str:
Label
str:
Speaker ID
int:
Utterance number
""" """
fileid = self._walker[n] fileid = self._walker[n]
return _get_speechcommands_metadata(fileid, self._archive) return _get_speechcommands_metadata(fileid, self._archive)
...@@ -152,8 +162,18 @@ class SPEECHCOMMANDS(Dataset): ...@@ -152,8 +162,18 @@ class SPEECHCOMMANDS(Dataset):
n (int): The index of the sample to be loaded n (int): The index of the sample to be loaded
Returns: Returns:
(Tensor, int, str, str, int): Tuple of the following items;
``(waveform, sample_rate, label, speaker_id, utterance_number)``
Tensor:
Waveform
int:
Sample rate
str:
Label
str:
Speaker ID
int:
Utterance number
""" """
metadata = self.get_metadata(n) metadata = self.get_metadata(n)
waveform = _load_waveform(self._archive, metadata[0], metadata[1]) waveform = _load_waveform(self._archive, metadata[0], metadata[1])
......
...@@ -41,8 +41,7 @@ _RELEASE_CONFIGS = { ...@@ -41,8 +41,7 @@ _RELEASE_CONFIGS = {
class TEDLIUM(Dataset): class TEDLIUM(Dataset):
""" """*Tedlium* :cite:`rousseau2012tedlium` dataset (releases 1,2 and 3).
Create a Dataset for *Tedlium* :cite:`rousseau2012tedlium`. It supports releases 1,2 and 3.
Args: Args:
root (str or Path): Path to the directory where the dataset is found or downloaded. root (str or Path): Path to the directory where the dataset is found or downloaded.
...@@ -178,7 +177,20 @@ class TEDLIUM(Dataset): ...@@ -178,7 +177,20 @@ class TEDLIUM(Dataset):
n (int): The index of the sample to be loaded n (int): The index of the sample to be loaded
Returns: Returns:
tuple: ``(waveform, sample_rate, transcript, talk_id, speaker_id, identifier)`` Tuple of the following items;
Tensor:
Waveform
int:
Sample rate
str:
Transcript
int:
Talk ID
int:
Speaker ID
int:
Identifier
""" """
fileid, line = self._filelist[n] fileid, line = self._filelist[n]
return self._load_tedlium_item(fileid, line, self._path) return self._load_tedlium_item(fileid, line, self._path)
......
...@@ -17,7 +17,7 @@ SampleType = Tuple[Tensor, int, str, str, str] ...@@ -17,7 +17,7 @@ SampleType = Tuple[Tensor, int, str, str, str]
class VCTK_092(Dataset): class VCTK_092(Dataset):
"""Create *VCTK 0.92* :cite:`yamagishi2019vctk` Dataset """*VCTK 0.92* :cite:`yamagishi2019vctk` dataset
Args: Args:
root (str): Root directory where the dataset's top level directory is found. root (str): Root directory where the dataset's top level directory is found.
...@@ -123,8 +123,18 @@ class VCTK_092(Dataset): ...@@ -123,8 +123,18 @@ class VCTK_092(Dataset):
n (int): The index of the sample to be loaded n (int): The index of the sample to be loaded
Returns: Returns:
(Tensor, int, str, str, str): Tuple of the following items;
``(waveform, sample_rate, transcript, speaker_id, utterance_id)``
Tensor:
Waveform
int:
Sample rate
str:
Transcript
str:
Speaker ID
std:
Utterance ID
""" """
speaker_id, utterance_id = self._sample_ids[n] speaker_id, utterance_id = self._sample_ids[n]
return self._load_sample(speaker_id, utterance_id, self._mic_id) return self._load_sample(speaker_id, utterance_id, self._mic_id)
......
...@@ -90,7 +90,7 @@ def _get_file_id(file_path: str, _ext_audio: str): ...@@ -90,7 +90,7 @@ def _get_file_id(file_path: str, _ext_audio: str):
class VoxCeleb1(Dataset): class VoxCeleb1(Dataset):
"""Create *VoxCeleb1* :cite:`nagrani2017voxceleb` Dataset. """*VoxCeleb1* :cite:`nagrani2017voxceleb` dataset.
Args: Args:
root (str or Path): Path to the directory where the dataset is found or downloaded. root (str or Path): Path to the directory where the dataset is found or downloaded.
...@@ -122,7 +122,8 @@ class VoxCeleb1(Dataset): ...@@ -122,7 +122,8 @@ class VoxCeleb1(Dataset):
class VoxCeleb1Identification(VoxCeleb1): class VoxCeleb1Identification(VoxCeleb1):
"""Create *VoxCeleb1* :cite:`nagrani2017voxceleb` Dataset for speaker identification task. """*VoxCeleb1* :cite:`nagrani2017voxceleb` dataset for speaker identification task.
Each data sample contains the waveform, sample rate, speaker id, and the file id. Each data sample contains the waveform, sample rate, speaker id, and the file id.
Args: Args:
...@@ -156,8 +157,16 @@ class VoxCeleb1Identification(VoxCeleb1): ...@@ -156,8 +157,16 @@ class VoxCeleb1Identification(VoxCeleb1):
n (int): The index of the sample n (int): The index of the sample
Returns: Returns:
(str, int, int, str): Tuple of the following items;
``(filepath, sample_rate, speaker_id, file_id)``
str:
Path to audio
int:
Sample rate
int:
Speaker ID
str:
File ID
""" """
file_path = self._flist[n] file_path = self._flist[n]
file_id = _get_file_id(file_path, self._ext_audio) file_id = _get_file_id(file_path, self._ext_audio)
...@@ -172,8 +181,16 @@ class VoxCeleb1Identification(VoxCeleb1): ...@@ -172,8 +181,16 @@ class VoxCeleb1Identification(VoxCeleb1):
n (int): The index of the sample to be loaded n (int): The index of the sample to be loaded
Returns: Returns:
(Tensor, int, int, str): Tuple of the following items;
``(waveform, sample_rate, speaker_id, file_id)``
Tensor:
Waveform
int:
Sample rate
int:
Speaker ID
str:
File ID
""" """
metadata = self.get_metadata(n) metadata = self.get_metadata(n)
waveform = _load_waveform(self._path, metadata[0], metadata[1]) waveform = _load_waveform(self._path, metadata[0], metadata[1])
...@@ -184,7 +201,8 @@ class VoxCeleb1Identification(VoxCeleb1): ...@@ -184,7 +201,8 @@ class VoxCeleb1Identification(VoxCeleb1):
class VoxCeleb1Verification(VoxCeleb1): class VoxCeleb1Verification(VoxCeleb1):
"""Create *VoxCeleb1* :cite:`nagrani2017voxceleb` Dataset for speaker verification task. """*VoxCeleb1* :cite:`nagrani2017voxceleb` dataset for speaker verification task.
Each data sample contains a pair of waveforms, sample rate, the label indicating if they are Each data sample contains a pair of waveforms, sample rate, the label indicating if they are
from the same speaker, and the file ids. from the same speaker, and the file ids.
...@@ -215,8 +233,20 @@ class VoxCeleb1Verification(VoxCeleb1): ...@@ -215,8 +233,20 @@ class VoxCeleb1Verification(VoxCeleb1):
n (int): The index of the sample n (int): The index of the sample
Returns: Returns:
(str, str, int, int, str, str): Tuple of the following items;
``(filepath_spk1, filepath_spk2, sample_rate, label, file_id_spk1, file_id_spk2)``
str:
Path to audio file of speaker 1
str:
Path to audio file of speaker 2
int:
Sample rate
int:
Label
str:
File ID of speaker 1
str:
File ID of speaker 2
""" """
label, file_path_spk1, file_path_spk2 = self._flist[n] label, file_path_spk1, file_path_spk2 = self._flist[n]
label = int(label) label = int(label)
...@@ -231,8 +261,20 @@ class VoxCeleb1Verification(VoxCeleb1): ...@@ -231,8 +261,20 @@ class VoxCeleb1Verification(VoxCeleb1):
n (int): The index of the sample to be loaded. n (int): The index of the sample to be loaded.
Returns: Returns:
(Tensor, Tensor, int, int, str, str): Tuple of the following items;
``(waveform_spk1, waveform_spk2, sample_rate, label, file_id_spk1, file_id_spk2)``
Tensor:
Waveform of speaker 1
Tensor:
Waveform of speaker 2
int:
Sample rate
int:
Label
str:
File ID of speaker 1
str:
File ID of speaker 2
""" """
metadata = self.get_metadata(n) metadata = self.get_metadata(n)
waveform_spk1 = _load_waveform(self._path, metadata[0], metadata[2]) waveform_spk1 = _load_waveform(self._path, metadata[0], metadata[2])
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment