Commit 49b23e15 authored by moto's avatar moto Committed by Facebook GitHub Bot
Browse files

Adopt `:autosummary:` in `torchaudio.datasets` module doc (#2692)

Summary:
* Introduce the mini-index at `torchaudio.datasets` page.
* Standardize the format of return type docstring.

https://output.circle-artifacts.com/output/job/989328b2-0270-4958-b577-19cf749af3fd/artifacts/0/docs/datasets.html

<img width="936" alt="Screen Shot 2022-09-21 at 6 56 52 PM" src="https://user-images.githubusercontent.com/855818/191475141-a97f2bea-705f-49bc-8c34-6ec869e76793.png">

https://output.circle-artifacts.com/output/job/989328b2-0270-4958-b577-19cf749af3fd/artifacts/0/docs/generated/torchaudio.datasets.CMUDict.html#torchaudio.datasets.CMUDict

<img width="1069" alt="Screen Shot 2022-09-21 at 6 57 32 PM" src="https://user-images.githubusercontent.com/855818/191475293-e3302528-27ea-4212-9c12-fd6d900fdf3e.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2692

Reviewed By: carolineechen

Differential Revision: D39687463

Pulled By: mthrok

fbshipit-source-id: 4175fc15388817d2fe76206188618dd1576281df
parent 709b4439
..
autogenerated from source/_templates/autosummary/dataset_class.rst
{{ name | underline }}
.. autoclass:: {{ fullname }}
{%- if "get_metadata" in methods %}
{%- set meth=["__getitem__", "get_metadata"] %}
{%- else %}
{%- set meth=["__getitem__"] %}
{%- endif %}
{%- if name == "CMUDict" %}
{%- set properties=["symbols"] %}
{%- elif name == "TEDLIUM" %}
{%- set properties=["phoneme_dict"] %}
{%- else %}
{%- set properties=[] %}
{%- endif %}
{%- if properties %}
Properties
==========
{% for item in properties %}
{{item | underline("-") }}
.. container:: py attribute
.. autoproperty:: {{[fullname, item] | join('.')}}
{%- endfor %}
{%- endif %}
{%- if properties %}
Methods
=======
{%- endif %}
{% for item in meth %}
{{item | underline("-") }}
.. container:: py attribute
.. automethod:: {{[fullname, item] | join('.')}}
{%- endfor %}
.. py:module:: torchaudio.datasets
torchaudio.datasets
====================
.. py:module:: torchaudio.datasets
All datasets are subclasses of :class:`torch.utils.data.Dataset`
and have ``__getitem__`` and ``__len__`` methods implemented.
Hence, they can all be passed to a :class:`torch.utils.data.DataLoader`
which can load multiple samples parallelly using ``torch.multiprocessing`` workers.
For example: ::
yesno_data = torchaudio.datasets.YESNO('.', download=True)
data_loader = torch.utils.data.DataLoader(yesno_data,
batch_size=1,
shuffle=True,
num_workers=args.nThreads)
.. currentmodule:: torchaudio.datasets
CMUARCTIC
~~~~~~~~~
.. autoclass:: CMUARCTIC
:members:
:special-members: __getitem__
CMUDict
~~~~~~~~~
.. autoclass:: CMUDict
:members:
:special-members: __getitem__
COMMONVOICE
~~~~~~~~~~~
.. autoclass:: COMMONVOICE
:members:
:special-members: __getitem__
GTZAN
~~~~~
.. autoclass:: GTZAN
:members:
:special-members: __getitem__
LibriMix
~~~~~~~~
.. autoclass:: LibriMix
:members:
:special-members: __getitem__
LIBRISPEECH
~~~~~~~~~~~
.. autoclass:: LIBRISPEECH
:members:
:special-members: __getitem__
LibriLightLimited
~~~~~~~~~~~~~~~~~
.. autoclass:: LibriLightLimited
:members:
:special-members: __getitem__
LIBRITTS
~~~~~~~~
.. autoclass:: LIBRITTS
:members:
:special-members: __getitem__
LJSPEECH
~~~~~~~~
.. autoclass:: LJSPEECH
:members:
:special-members: __getitem__
SPEECHCOMMANDS
~~~~~~~~~~~~~~
.. autoclass:: SPEECHCOMMANDS
:members:
:special-members: __getitem__
TEDLIUM
~~~~~~~~~~~~~~
.. autoclass:: TEDLIUM
:members:
:special-members: __getitem__
VCTK_092
~~~~~~~~
.. autoclass:: VCTK_092
:members:
:special-members: __getitem__
VoxCeleb1Identification
~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: VoxCeleb1Identification
:members:
:special-members: __getitem__
VoxCeleb1Verification
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: VoxCeleb1Verification
:members:
:special-members: __getitem__
DR_VCTK
~~~~~~~~
.. autoclass:: DR_VCTK
:members:
:special-members: __getitem__
YESNO
~~~~~
.. autoclass:: YESNO
:members:
:special-members: __getitem__
QUESST14
~~~~~~~~
.. autoclass:: QUESST14
:members:
:special-members: __getitem__
FluentSpeechCommands
~~~~~~~~~~~~~~~~~~~~
Hence, they can all be passed to a :class:`torch.utils.data.DataLoader`
which can load multiple samples parallelly using :mod:`torch.multiprocessing` workers.
For example:
.. autoclass:: FluentSpeechCommands
:members:
:special-members: __getitem__
.. code::
yesno_data = torchaudio.datasets.YESNO('.', download=True)
data_loader = torch.utils.data.DataLoader(
yesno_data,
batch_size=1,
shuffle=True,
num_workers=args.nThreads)
MUSDB_HQ
~~~~~~~~
.. currentmodule:: torchaudio.datasets
.. autoclass:: MUSDB_HQ
:members:
:special-members: __getitem__
.. autosummary::
:toctree: generated
:nosignatures:
:template: autosummary/dataset_class.rst
CMUARCTIC
CMUDict
COMMONVOICE
DR_VCTK
FluentSpeechCommands
GTZAN
LibriMix
LIBRISPEECH
LibriLightLimited
LIBRITTS
LJSPEECH
MUSDB_HQ
QUESST14
SPEECHCOMMANDS
TEDLIUM
VCTK_092
VoxCeleb1Identification
VoxCeleb1Verification
YESNO
......@@ -31,7 +31,6 @@ print(torchaudio.__version__)
# -------------------------------------------------------------------------------
# Preparation of data and helper functions.
# -------------------------------------------------------------------------------
import multiprocessing
import os
import matplotlib.pyplot as plt
......@@ -46,7 +45,7 @@ os.makedirs(YESNO_DATASET_PATH, exist_ok=True)
def plot_specgram(waveform, sample_rate, title="Spectrogram", xlim=None):
waveform = waveform.numpy()
num_channels, num_frames = waveform.shape
num_channels, _ = waveform.shape
figure, axes = plt.subplots(num_channels, 1)
if num_channels == 1:
......@@ -64,7 +63,7 @@ def plot_specgram(waveform, sample_rate, title="Spectrogram", xlim=None):
def play_audio(waveform, sample_rate):
waveform = waveform.numpy()
num_channels, num_frames = waveform.shape
num_channels, _ = waveform.shape
if num_channels == 1:
display(Audio(waveform[0], rate=sample_rate))
elif num_channels == 2:
......@@ -75,7 +74,7 @@ def play_audio(waveform, sample_rate):
######################################################################
# Here, we show how to use the
# :py:func:`torchaudio.datasets.YESNO` dataset.
# :py:class:`torchaudio.datasets.YESNO` dataset.
#
......
......@@ -49,7 +49,7 @@ def load_cmuarctic_item(line: str, path: str, folder_audio: str, ext_audio: str)
class CMUARCTIC(Dataset):
"""Create a Dataset for *CMU ARCTIC* :cite:`Kominek03cmuarctic`.
"""*CMU ARCTIC* :cite:`Kominek03cmuarctic` dataset.
Args:
root (str or Path): Path to the directory where the dataset is found or downloaded.
......@@ -139,7 +139,16 @@ class CMUARCTIC(Dataset):
n (int): The index of the sample to be loaded
Returns:
(Tensor, int, str, str): ``(waveform, sample_rate, transcript, utterance_id)``
Tuple of the following items;
Tensor:
Waveform
int:
Sample rate
str:
Transcript
str:
Utterance ID
"""
line = self._walker[n]
return load_cmuarctic_item(line, self._path, self._folder_audio, self._ext_audio)
......
......@@ -104,7 +104,7 @@ def _parse_dictionary(lines: Iterable[str], exclude_punctuations: bool) -> List[
class CMUDict(Dataset):
"""Create a Dataset for *CMU Pronouncing Dictionary* :cite:`cmudict` (CMUDict).
"""*CMU Pronouncing Dictionary* :cite:`cmudict` (CMUDict) dataset.
Args:
root (str or Path): Path to the directory where the dataset is found or downloaded.
......@@ -169,8 +169,12 @@ class CMUDict(Dataset):
n (int): The index of the sample to be loaded.
Returns:
(str, List[str]): The corresponding word and phonemes ``(word, [phonemes])``.
Tuple of a word and its phonemes
str:
Word
List[str]:
Phonemes
"""
return self._dictionary[n]
......@@ -179,5 +183,5 @@ class CMUDict(Dataset):
@property
def symbols(self) -> List[str]:
"""list[str]: A list of phonemes symbols, such as `AA`, `AE`, `AH`."""
"""list[str]: A list of phonemes symbols, such as ``"AA"``, ``"AE"``, ``"AH"``."""
return self._symbols.copy()
......@@ -28,7 +28,7 @@ def load_commonvoice_item(
class COMMONVOICE(Dataset):
"""Create a Dataset for *CommonVoice* :cite:`ardila2020common`.
"""*CommonVoice* :cite:`ardila2020common` dataset.
Args:
root (str or Path): Path to the directory where the dataset is located.
......@@ -61,9 +61,23 @@ class COMMONVOICE(Dataset):
n (int): The index of the sample to be loaded
Returns:
(Tensor, int, Dict[str, str]): ``(waveform, sample_rate, dictionary)``, where dictionary
is built from the TSV file with the following keys: ``client_id``, ``path``, ``sentence``,
``up_votes``, ``down_votes``, ``age``, ``gender`` and ``accent``.
Tuple of the following items;
Tensor:
Waveform
int:
Sample rate
Dict[str, str]:
Dictionary containing the following items from the corresponding TSV file;
* ``"client_id"``
* ``"path"``
* ``"sentence"``
* ``"up_votes"``
* ``"down_votes"``
* ``"age"``
* ``"gender"``
* ``"accent"``
"""
line = self._walker[n]
return load_commonvoice_item(line, self._header, self._path, self._folder_audio, self._ext_audio)
......
......@@ -14,7 +14,7 @@ _SUPPORTED_SUBSETS = {"train", "test"}
class DR_VCTK(Dataset):
"""Create a dataset for *Device Recorded VCTK (Small subset version)* :cite:`Sarfjoo2018DeviceRV`.
"""*Device Recorded VCTK (Small subset version)* :cite:`Sarfjoo2018DeviceRV` dataset.
Args:
root (str or Path): Root directory where the dataset's top level directory is found.
......@@ -95,9 +95,24 @@ class DR_VCTK(Dataset):
n (int): The index of the sample to be loaded
Returns:
(Tensor, int, Tensor, int, str, str, str, int):
``(waveform_clean, sample_rate_clean, waveform_noisy, sample_rate_noisy, speaker_id,\
utterance_id, source, channel_id)``
Tuple of the following items;
Tensor:
Clean waveform
int:
Sample rate of the clean waveform
Tensor:
Noisy waveform
int:
Sample rate of the noisy waveform
str:
Speaker ID
str:
Utterance ID
str:
Source
int:
Channel ID
"""
filename = self._filename_list[n]
return self._load_dr_vctk_item(filename)
......
......@@ -11,11 +11,12 @@ SAMPLE_RATE = 16000
class FluentSpeechCommands(Dataset):
"""Create *Fluent Speech Commands* :cite:`fluent` Dataset
"""*Fluent Speech Commands* :cite:`fluent` dataset
Args:
root (str of Path): Path to the directory where the dataset is found.
subset (str, optional): subset of the dataset to use. Options: [`"train"`, `"valid"`, `"test"`].
subset (str, optional): subset of the dataset to use.
Options: [``"train"``, ``"valid"``, ``"test"``].
(Default: ``"train"``)
"""
......@@ -45,8 +46,24 @@ class FluentSpeechCommands(Dataset):
n (int): The index of the sample to be loaded
Returns:
(str, int, str, int, str, str, str, str):
``(filepath, sample_rate, file_name, speaker_id, transcription, action, object, location)``
Tuple of the following items;
str:
Path to audio
int:
Sample rate
str:
File name
int:
Speaker ID
str:
Transcription
str:
Action
str:
Object
str:
Location
"""
sample = self.data[n]
......@@ -67,8 +84,24 @@ class FluentSpeechCommands(Dataset):
n (int): The index of the sample to be loaded
Returns:
(Tensor, int, str, int, str, str, str, str):
``(waveform, sample_rate, file_name, speaker_id, transcription, action, object, location)``
Tuple of the following items;
Tensor:
Waveform
int:
Sample rate
str:
File name
int:
Speaker ID
str:
Transcription
str:
Action
str:
Object
str:
Location
"""
metadata = self.get_metadata(n)
waveform = _load_waveform(self._path, metadata[0], metadata[1])
......
......@@ -996,7 +996,7 @@ def load_gtzan_item(fileid: str, path: str, ext_audio: str) -> Tuple[Tensor, str
class GTZAN(Dataset):
"""Create a Dataset for *GTZAN* :cite:`tzanetakis_essl_cook_2001`.
"""*GTZAN* :cite:`tzanetakis_essl_cook_2001` dataset.
Note:
Please see http://marsyas.info/downloads/datasets.html if you are planning to use
......@@ -1096,7 +1096,14 @@ class GTZAN(Dataset):
n (int): The index of the sample to be loaded
Returns:
(Tensor, int, str): ``(waveform, sample_rate, label)``
Tuple of the following items;
Tensor:
Waveform
int:
Sample rate
str:
Label
"""
fileid = self._walker[n]
item = load_gtzan_item(fileid, self._path, self._ext_audio)
......
......@@ -34,13 +34,13 @@ def _get_fileids_paths(path, folders, _ext_audio) -> List[Tuple[str, str]]:
class LibriLightLimited(Dataset):
"""Create a Dataset for LibriLightLimited, which is the supervised subset of
LibriLight dataset.
"""Subset of Libri-light :cite:`librilight` dataset,
which was used in HuBERT :cite:`hsu2021hubert` for supervised fine-tuning.
Args:
root (str or Path): Path to the directory where the dataset is found or downloaded.
subset (str, optional): The subset to use. Options: [``10min``, ``1h``, ``10h``]
(Default: ``10min``).
subset (str, optional): The subset to use. Options: [``"10min"``, ``"1h"``, ``"10h"``]
(Default: ``"10min"``).
download (bool, optional):
Whether to download the dataset if it is not found at root path. (default: ``False``).
"""
......@@ -75,8 +75,20 @@ class LibriLightLimited(Dataset):
Args:
n (int): The index of the sample to be loaded
Returns:
(Tensor, int, str, int, int, int):
``(waveform, sample_rate, transcript, speaker_id, chapter_id, utterance_id)``
Tuple of the following items;
Tensor:
Waveform
int:
Sample rate
str:
Transcript
int:
Speaker ID
int:
Chapter ID
int:
Utterance ID
"""
file_path, fileid = self._fileids_paths[n]
metadata = _get_librispeech_metadata(fileid, self._path, file_path, self._ext_audio, self._ext_txt)
......
......@@ -9,13 +9,13 @@ SampleType = Tuple[int, torch.Tensor, List[torch.Tensor]]
class LibriMix(Dataset):
r"""Create the *LibriMix* :cite:`cosentino2020librimix` dataset.
r"""*LibriMix* :cite:`cosentino2020librimix` dataset.
Args:
root (str or Path): The path to the directory where the directory ``Libri2Mix`` or
``Libri3Mix`` is stored.
subset (str, optional): The subset to use. Options: [``train-360``, ``train-100``,
``dev``, and ``test``] (Default: ``train-360``).
subset (str, optional): The subset to use. Options: [``"train-360"``, ``"train-100"``,
``"dev"``, and ``"test"``] (Default: ``"train-360"``).
num_speakers (int, optional): The number of speakers, which determines the directories
to traverse. The Dataset will traverse ``s1`` to ``sN`` directories to collect
N source audios. (Default: 2)
......@@ -23,8 +23,8 @@ class LibriMix(Dataset):
which subdirectory the audio are fetched. If any of the audio has a different sample
rate, raises ``ValueError``. Options: [8000, 16000] (Default: 8000)
task (str, optional): the task of LibriMix.
Options: [``enh_single``, ``enh_both``, ``sep_clean``, ``sep_noisy``]
(Default: ``sep_clean``)
Options: [``"enh_single"``, ``"enh_both"``, ``"sep_clean"``, ``"sep_noisy"``]
(Default: ``"sep_clean"``)
Note:
The LibriMix dataset needs to be manually generated. Please check https://github.com/JorisCos/LibriMix
......@@ -81,6 +81,13 @@ class LibriMix(Dataset):
Args:
key (int): The index of the sample to be loaded
Returns:
(int, Tensor, List[Tensor]): ``(sample_rate, mix_waveform, list_of_source_waveforms)``
Tuple of the following items;
int:
Sample rate
Tensor:
Mixture waveform
list of Tensors:
List of source waveforms
"""
return self._load_sample(self.files[key])
......@@ -75,7 +75,7 @@ def _get_librispeech_metadata(
class LIBRISPEECH(Dataset):
"""Create a Dataset for *LibriSpeech* :cite:`7178964`.
"""*LibriSpeech* :cite:`7178964` dataset.
Args:
root (str or Path): Path to the directory where the dataset is found or downloaded.
......@@ -126,8 +126,20 @@ class LIBRISPEECH(Dataset):
n (int): The index of the sample to be loaded
Returns:
(str, int, str, int, int, int):
``(filepath, sample_rate, transcript, speaker_id, chapter_id, utterance_id)``
Tuple of the following items;
str:
Path to audio
int:
Sample rate
str:
Transcript
int:
Speaker ID
int:
Chapter ID
int:
Utterance ID
"""
fileid = self._walker[n]
return _get_librispeech_metadata(fileid, self._archive, self._url, self._ext_audio, self._ext_txt)
......@@ -139,8 +151,20 @@ class LIBRISPEECH(Dataset):
n (int): The index of the sample to be loaded
Returns:
(Tensor, int, str, int, int, int):
``(waveform, sample_rate, transcript, speaker_id, chapter_id, utterance_id)``
Tuple of the following items;
Tensor:
Waveform
int:
Sample rate
str:
Transcript
int:
Speaker ID
int:
Chapter ID
int:
Utterance ID
"""
metadata = self.get_metadata(n)
waveform = _load_waveform(self._archive, metadata[0], metadata[1])
......
......@@ -63,7 +63,7 @@ def load_libritts_item(
class LIBRITTS(Dataset):
"""Create a Dataset for *LibriTTS* :cite:`Zen2019LibriTTSAC`.
"""*LibriTTS* :cite:`Zen2019LibriTTSAC` dataset.
Args:
root (str or Path): Path to the directory where the dataset is found or downloaded.
......@@ -138,8 +138,22 @@ class LIBRITTS(Dataset):
n (int): The index of the sample to be loaded
Returns:
(Tensor, int, str, str, str, int, int, str):
``(waveform, sample_rate, original_text, normalized_text, speaker_id, chapter_id, utterance_id)``
Tuple of the following items;
Tensor:
Waveform
int:
Sample rate
str:
Original text
str:
Normalized text
int:
Speaker ID
int:
Chapter ID
str:
Utterance ID
"""
fileid = self._walker[n]
return load_libritts_item(
......
......@@ -20,7 +20,7 @@ _RELEASE_CONFIGS = {
class LJSPEECH(Dataset):
"""Create a Dataset for *LJSpeech-1.1* :cite:`ljspeech17`.
"""*LJSpeech-1.1* :cite:`ljspeech17` dataset.
Args:
root (str or Path): Path to the directory where the dataset is found or downloaded.
......@@ -78,8 +78,16 @@ class LJSPEECH(Dataset):
n (int): The index of the sample to be loaded
Returns:
(Tensor, int, str, str):
``(waveform, sample_rate, transcript, normalized_transcript)``
Tuple of the following items;
Tensor:
Waveform
int:
Sample rate
str:
Transcript
str:
Normalized Transcript
"""
line = self._flist[n]
fileid, transcript, normalized_transcript = line
......
......@@ -31,7 +31,7 @@ _VALIDATION_SET = [
class MUSDB_HQ(Dataset):
"""Create *MUSDB_HQ* :cite:`MUSDB18HQ` Dataset
"""*MUSDB_HQ* :cite:`MUSDB18HQ` dataset.
Args:
root (str or Path): Root directory where the dataset's top level directory is found
......@@ -122,7 +122,16 @@ class MUSDB_HQ(Dataset):
Args:
n (int): The index of the sample to be loaded
Returns:
(Tensor, int, int, str): ``(waveforms, sample_rate, num_frames, track_name)``
Tuple of the following items;
Tensor:
Waveform
int:
Sample rate
int:
Num frames
str:
Track name
"""
return self._load_sample(n)
......
......@@ -23,7 +23,7 @@ _LANGUAGES = [
class QUESST14(Dataset):
"""Create *QUESST14* :cite:`Mir2015QUESST2014EQ` Dataset
"""*QUESST14* :cite:`Mir2015QUESST2014EQ` dataset.
Args:
root (str or Path): Root directory where the dataset's top level directory is found
......@@ -79,8 +79,14 @@ class QUESST14(Dataset):
n (int): The index of the sample to be loaded
Returns:
(str, int, str):
``(filepath, sample_rate, file_name)``
Tuple of the following items;
str:
Path to audio
int:
Sample rate
str:
File name
"""
audio_path = self.data[n]
relpath = os.path.relpath(audio_path, self._path)
......@@ -93,7 +99,14 @@ class QUESST14(Dataset):
n (int): The index of the sample to be loaded
Returns:
(Tensor, int, str): ``(waveform, sample_rate, file_name)``
Tuple of the following items;
Tensor:
Waveform
int:
Sample rate
str:
File name
"""
metadata = self.get_metadata(n)
waveform = _load_waveform(self._path, metadata[0], metadata[1])
......
......@@ -48,7 +48,7 @@ def _get_speechcommands_metadata(filepath: str, path: str) -> Tuple[str, int, st
class SPEECHCOMMANDS(Dataset):
"""Create a Dataset for *Speech Commands* :cite:`speechcommandsv2`.
"""*Speech Commands* :cite:`speechcommandsv2` dataset.
Args:
root (str or Path): Path to the directory where the dataset is found or downloaded.
......@@ -139,8 +139,18 @@ class SPEECHCOMMANDS(Dataset):
n (int): The index of the sample to be loaded
Returns:
(str, int, str, str, int):
``(filepath, sample_rate, label, speaker_id, utterance_number)``
Tuple of the following items;
str:
Path to the audio
int:
Sample rate
str:
Label
str:
Speaker ID
int:
Utterance number
"""
fileid = self._walker[n]
return _get_speechcommands_metadata(fileid, self._archive)
......@@ -152,8 +162,18 @@ class SPEECHCOMMANDS(Dataset):
n (int): The index of the sample to be loaded
Returns:
(Tensor, int, str, str, int):
``(waveform, sample_rate, label, speaker_id, utterance_number)``
Tuple of the following items;
Tensor:
Waveform
int:
Sample rate
str:
Label
str:
Speaker ID
int:
Utterance number
"""
metadata = self.get_metadata(n)
waveform = _load_waveform(self._archive, metadata[0], metadata[1])
......
......@@ -41,8 +41,7 @@ _RELEASE_CONFIGS = {
class TEDLIUM(Dataset):
"""
Create a Dataset for *Tedlium* :cite:`rousseau2012tedlium`. It supports releases 1,2 and 3.
"""*Tedlium* :cite:`rousseau2012tedlium` dataset (releases 1,2 and 3).
Args:
root (str or Path): Path to the directory where the dataset is found or downloaded.
......@@ -178,7 +177,20 @@ class TEDLIUM(Dataset):
n (int): The index of the sample to be loaded
Returns:
tuple: ``(waveform, sample_rate, transcript, talk_id, speaker_id, identifier)``
Tuple of the following items;
Tensor:
Waveform
int:
Sample rate
str:
Transcript
int:
Talk ID
int:
Speaker ID
int:
Identifier
"""
fileid, line = self._filelist[n]
return self._load_tedlium_item(fileid, line, self._path)
......
......@@ -17,7 +17,7 @@ SampleType = Tuple[Tensor, int, str, str, str]
class VCTK_092(Dataset):
"""Create *VCTK 0.92* :cite:`yamagishi2019vctk` Dataset
"""*VCTK 0.92* :cite:`yamagishi2019vctk` dataset
Args:
root (str): Root directory where the dataset's top level directory is found.
......@@ -123,8 +123,18 @@ class VCTK_092(Dataset):
n (int): The index of the sample to be loaded
Returns:
(Tensor, int, str, str, str):
``(waveform, sample_rate, transcript, speaker_id, utterance_id)``
Tuple of the following items;
Tensor:
Waveform
int:
Sample rate
str:
Transcript
str:
Speaker ID
std:
Utterance ID
"""
speaker_id, utterance_id = self._sample_ids[n]
return self._load_sample(speaker_id, utterance_id, self._mic_id)
......
......@@ -90,7 +90,7 @@ def _get_file_id(file_path: str, _ext_audio: str):
class VoxCeleb1(Dataset):
"""Create *VoxCeleb1* :cite:`nagrani2017voxceleb` Dataset.
"""*VoxCeleb1* :cite:`nagrani2017voxceleb` dataset.
Args:
root (str or Path): Path to the directory where the dataset is found or downloaded.
......@@ -122,7 +122,8 @@ class VoxCeleb1(Dataset):
class VoxCeleb1Identification(VoxCeleb1):
"""Create *VoxCeleb1* :cite:`nagrani2017voxceleb` Dataset for speaker identification task.
"""*VoxCeleb1* :cite:`nagrani2017voxceleb` dataset for speaker identification task.
Each data sample contains the waveform, sample rate, speaker id, and the file id.
Args:
......@@ -156,8 +157,16 @@ class VoxCeleb1Identification(VoxCeleb1):
n (int): The index of the sample
Returns:
(str, int, int, str):
``(filepath, sample_rate, speaker_id, file_id)``
Tuple of the following items;
str:
Path to audio
int:
Sample rate
int:
Speaker ID
str:
File ID
"""
file_path = self._flist[n]
file_id = _get_file_id(file_path, self._ext_audio)
......@@ -172,8 +181,16 @@ class VoxCeleb1Identification(VoxCeleb1):
n (int): The index of the sample to be loaded
Returns:
(Tensor, int, int, str):
``(waveform, sample_rate, speaker_id, file_id)``
Tuple of the following items;
Tensor:
Waveform
int:
Sample rate
int:
Speaker ID
str:
File ID
"""
metadata = self.get_metadata(n)
waveform = _load_waveform(self._path, metadata[0], metadata[1])
......@@ -184,7 +201,8 @@ class VoxCeleb1Identification(VoxCeleb1):
class VoxCeleb1Verification(VoxCeleb1):
"""Create *VoxCeleb1* :cite:`nagrani2017voxceleb` Dataset for speaker verification task.
"""*VoxCeleb1* :cite:`nagrani2017voxceleb` dataset for speaker verification task.
Each data sample contains a pair of waveforms, sample rate, the label indicating if they are
from the same speaker, and the file ids.
......@@ -215,8 +233,20 @@ class VoxCeleb1Verification(VoxCeleb1):
n (int): The index of the sample
Returns:
(str, str, int, int, str, str):
``(filepath_spk1, filepath_spk2, sample_rate, label, file_id_spk1, file_id_spk2)``
Tuple of the following items;
str:
Path to audio file of speaker 1
str:
Path to audio file of speaker 2
int:
Sample rate
int:
Label
str:
File ID of speaker 1
str:
File ID of speaker 2
"""
label, file_path_spk1, file_path_spk2 = self._flist[n]
label = int(label)
......@@ -231,8 +261,20 @@ class VoxCeleb1Verification(VoxCeleb1):
n (int): The index of the sample to be loaded.
Returns:
(Tensor, Tensor, int, int, str, str):
``(waveform_spk1, waveform_spk2, sample_rate, label, file_id_spk1, file_id_spk2)``
Tuple of the following items;
Tensor:
Waveform of speaker 1
Tensor:
Waveform of speaker 2
int:
Sample rate
int:
Label
str:
File ID of speaker 1
str:
File ID of speaker 2
"""
metadata = self.get_metadata(n)
waveform_spk1 = _load_waveform(self._path, metadata[0], metadata[2])
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment