Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
hehl2
Torchaudio
Commits
cb40dd72
Unverified
Commit
cb40dd72
authored
Oct 18, 2021
by
Caroline Chen
Committed by
GitHub
Oct 18, 2021
Browse files
[DOC] Standardization and minor fixes (#1892)
parent
955cdbdc
Changes
27
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
46 additions
and
40 deletions
+46
-40
torchaudio/backend/soundfile_backend.py
torchaudio/backend/soundfile_backend.py
+7
-7
torchaudio/backend/sox_io_backend.py
torchaudio/backend/sox_io_backend.py
+7
-7
torchaudio/datasets/cmuarctic.py
torchaudio/datasets/cmuarctic.py
+1
-1
torchaudio/datasets/cmudict.py
torchaudio/datasets/cmudict.py
+1
-1
torchaudio/datasets/commonvoice.py
torchaudio/datasets/commonvoice.py
+2
-2
torchaudio/datasets/dr_vctk.py
torchaudio/datasets/dr_vctk.py
+3
-2
torchaudio/datasets/gtzan.py
torchaudio/datasets/gtzan.py
+1
-1
torchaudio/datasets/librimix.py
torchaudio/datasets/librimix.py
+1
-1
torchaudio/datasets/librispeech.py
torchaudio/datasets/librispeech.py
+2
-1
torchaudio/datasets/libritts.py
torchaudio/datasets/libritts.py
+2
-2
torchaudio/datasets/ljspeech.py
torchaudio/datasets/ljspeech.py
+2
-1
torchaudio/datasets/speechcommands.py
torchaudio/datasets/speechcommands.py
+2
-1
torchaudio/datasets/tedlium.py
torchaudio/datasets/tedlium.py
+2
-1
torchaudio/datasets/utils.py
torchaudio/datasets/utils.py
+1
-1
torchaudio/datasets/vctk.py
torchaudio/datasets/vctk.py
+2
-1
torchaudio/datasets/yesno.py
torchaudio/datasets/yesno.py
+1
-1
torchaudio/functional/filtering.py
torchaudio/functional/filtering.py
+3
-3
torchaudio/functional/functional.py
torchaudio/functional/functional.py
+2
-2
torchaudio/models/conv_tasnet.py
torchaudio/models/conv_tasnet.py
+3
-3
torchaudio/models/tacotron2.py
torchaudio/models/tacotron2.py
+1
-1
No files found.
torchaudio/backend/soundfile_backend.py
View file @
cb40dd72
...
...
@@ -146,7 +146,7 @@ def load(
* SPHERE
By default (``normalize=True``, ``channels_first=True``), this function returns Tensor with
``float32`` dtype and the shape of
`
`[channel, time]`
`
.
``float32`` dtype and the shape of `[channel, time]`.
The samples are normalized to fit in the range of ``[-1.0, 1.0]``.
When the input format is WAV with integer type, such as 32-bit signed integer, 16-bit
...
...
@@ -182,16 +182,16 @@ def load(
integer type.
This argument has no effect for formats other than integer WAV type.
channels_first (bool, optional):
When True, the returned Tensor has dimension
`
`[channel, time]`
`
.
Otherwise, the returned Tensor's dimension is
`
`[time, channel]`
`
.
When True, the returned Tensor has dimension `[channel, time]`.
Otherwise, the returned Tensor's dimension is `[time, channel]`.
format (str or None, optional):
Not used. PySoundFile does not accept format hint.
Returns:
Tuple[
torch.Tensor, int
]
: Resulting Tensor and sample rate.
(
torch.Tensor, int
)
: Resulting Tensor and sample rate.
If the input file has integer wav format and normalization is off, then it has
integer type, else ``float32`` type. If ``channels_first=True``, it has
`
`[channel, time]`
`
else
`
`[time, channel]`
`
.
`[channel, time]` else `[time, channel]`.
"""
with
soundfile
.
SoundFile
(
filepath
,
"r"
)
as
file_
:
if
file_
.
format
!=
"WAV"
or
normalize
:
...
...
@@ -335,8 +335,8 @@ def save(
filepath (str or pathlib.Path): Path to audio file.
src (torch.Tensor): Audio data to save. must be 2D tensor.
sample_rate (int): sampling rate
channels_first (bool, optional): If ``True``, the given tensor is interpreted as
`
`[channel, time]`
`
,
otherwise
`
`[time, channel]`
`
.
channels_first (bool, optional): If ``True``, the given tensor is interpreted as `[channel, time]`,
otherwise `[time, channel]`.
compression (float of None, optional): Not used.
It is here only for interface compatibility reson with "sox_io" backend.
format (str or None, optional): Override the audio format.
...
...
torchaudio/backend/sox_io_backend.py
View file @
cb40dd72
...
...
@@ -89,7 +89,7 @@ def load(
and corresponding codec libraries such as ``libmad`` or ``libmp3lame`` etc.
By default (``normalize=True``, ``channels_first=True``), this function returns Tensor with
``float32`` dtype and the shape of
`
`[channel, time]`
`
.
``float32`` dtype and the shape of `[channel, time]`.
The samples are normalized to fit in the range of ``[-1.0, 1.0]``.
When the input format is WAV with integer type, such as 32-bit signed integer, 16-bit
...
...
@@ -131,18 +131,18 @@ def load(
integer type.
This argument has no effect for formats other than integer WAV type.
channels_first (bool, optional):
When True, the returned Tensor has dimension
`
`[channel, time]`
`
.
Otherwise, the returned Tensor's dimension is
`
`[time, channel]`
`
.
When True, the returned Tensor has dimension `[channel, time]`.
Otherwise, the returned Tensor's dimension is `[time, channel]`.
format (str or None, optional):
Override the format detection with the given format.
Providing the argument might help when libsox can not infer the format
from header or extension,
Returns:
Tuple[
torch.Tensor, int
]
: Resulting Tensor and sample rate.
(
torch.Tensor, int
)
: Resulting Tensor and sample rate.
If the input file has integer wav format and normalization is off, then it has
integer type, else ``float32`` type. If ``channels_first=True``, it has
`
`[channel, time]`
`
else
`
`[time, channel]`
`
.
`[channel, time]` else `[time, channel]`.
"""
if
not
torch
.
jit
.
is_scripting
():
if
hasattr
(
filepath
,
'read'
):
...
...
@@ -172,8 +172,8 @@ def save(
as ``str`` for TorchScript compiler compatibility.
src (torch.Tensor): Audio data to save. must be 2D tensor.
sample_rate (int): sampling rate
channels_first (bool, optional): If ``True``, the given tensor is interpreted as
`
`[channel, time]`
`
,
otherwise
`
`[time, channel]`
`
.
channels_first (bool, optional): If ``True``, the given tensor is interpreted as `[channel, time]`,
otherwise `[time, channel]`.
compression (float or None, optional): Used for formats other than WAV.
This corresponds to ``-C`` option of ``sox`` command.
...
...
torchaudio/datasets/cmuarctic.py
View file @
cb40dd72
...
...
@@ -164,7 +164,7 @@ class CMUARCTIC(Dataset):
n (int): The index of the sample to be loaded
Returns:
tuple
: ``(waveform, sample_rate, transcript, utterance_id)``
(Tensor, int, str, str)
: ``(waveform, sample_rate, transcript, utterance_id)``
"""
line
=
self
.
_walker
[
n
]
return
load_cmuarctic_item
(
line
,
self
.
_path
,
self
.
_folder_audio
,
self
.
_ext_audio
)
...
...
torchaudio/datasets/cmudict.py
View file @
cb40dd72
...
...
@@ -167,7 +167,7 @@ class CMUDict(Dataset):
n (int): The index of the sample to be loaded.
Returns:
tuple
: The corresponding word and phonemes ``(word, [phonemes])``.
(str, List[str])
: The corresponding word and phonemes ``(word, [phonemes])``.
"""
return
self
.
_dictionary
[
n
]
...
...
torchaudio/datasets/commonvoice.py
View file @
cb40dd72
...
...
@@ -65,8 +65,8 @@ class COMMONVOICE(Dataset):
n (int): The index of the sample to be loaded
Returns:
tuple
: ``(waveform, sample_rate, dictionary)``, where dictionary
is built
from the TSV file with the following keys: ``client_id``, ``path``, ``sentence``,
(Tensor, int, Dict[str, str])
: ``(waveform, sample_rate, dictionary)``, where dictionary
is built
from the TSV file with the following keys: ``client_id``, ``path``, ``sentence``,
``up_votes``, ``down_votes``, ``age``, ``gender`` and ``accent``.
"""
line
=
self
.
_walker
[
n
]
...
...
torchaudio/datasets/dr_vctk.py
View file @
cb40dd72
...
...
@@ -107,8 +107,9 @@ class DR_VCTK(Dataset):
n (int): The index of the sample to be loaded
Returns:
tuple: ``(waveform_clean, sample_rate_clean, waveform_noisy, sample_rate_noisy, speaker_id, utterance_id,
\
source, channel_id)``
(Tensor, int, Tensor, int, str, str, str, int):
``(waveform_clean, sample_rate_clean, waveform_noisy, sample_rate_noisy, speaker_id,
\
utterance_id, source, channel_id)``
"""
filename
=
self
.
_filename_list
[
n
]
return
self
.
_load_dr_vctk_item
(
filename
)
...
...
torchaudio/datasets/gtzan.py
View file @
cb40dd72
...
...
@@ -1102,7 +1102,7 @@ class GTZAN(Dataset):
n (int): The index of the sample to be loaded
Returns:
tuple
: ``(waveform, sample_rate, label)``
(Tensor, int, str)
: ``(waveform, sample_rate, label)``
"""
fileid
=
self
.
_walker
[
n
]
item
=
load_gtzan_item
(
fileid
,
self
.
_path
,
self
.
_ext_audio
)
...
...
torchaudio/datasets/librimix.py
View file @
cb40dd72
...
...
@@ -84,6 +84,6 @@ class LibriMix(Dataset):
Args:
key (int): The index of the sample to be loaded
Returns:
tuple
: ``(sample_rate, mix_waveform, list_of_source_waveforms)``
(int, Tensor, List[Tensor])
: ``(sample_rate, mix_waveform, list_of_source_waveforms)``
"""
return
self
.
_load_sample
(
self
.
files
[
key
])
torchaudio/datasets/librispeech.py
View file @
cb40dd72
...
...
@@ -133,7 +133,8 @@ class LIBRISPEECH(Dataset):
n (int): The index of the sample to be loaded
Returns:
tuple: ``(waveform, sample_rate, transcript, speaker_id, chapter_id, utterance_id)``
(Tensor, int, str, int, int, int):
``(waveform, sample_rate, transcript, speaker_id, chapter_id, utterance_id)``
"""
fileid
=
self
.
_walker
[
n
]
return
load_librispeech_item
(
fileid
,
self
.
_path
,
self
.
_ext_audio
,
self
.
_ext_txt
)
...
...
torchaudio/datasets/libritts.py
View file @
cb40dd72
...
...
@@ -134,8 +134,8 @@ class LIBRITTS(Dataset):
n (int): The index of the sample to be loaded
Returns:
tuple: ``(waveform, sample_rate, original_text, normalized_text, speaker_id,
chapter_id, utterance_id)``
(Tensor, int, str, str, str, int, int, str):
``(waveform, sample_rate, original_text, normalized_text, speaker_id,
chapter_id, utterance_id)``
"""
fileid
=
self
.
_walker
[
n
]
return
load_libritts_item
(
...
...
torchaudio/datasets/ljspeech.py
View file @
cb40dd72
...
...
@@ -68,7 +68,8 @@ class LJSPEECH(Dataset):
n (int): The index of the sample to be loaded
Returns:
tuple: ``(waveform, sample_rate, transcript, normalized_transcript)``
(Tensor, int, str, str):
``(waveform, sample_rate, transcript, normalized_transcript)``
"""
line
=
self
.
_flist
[
n
]
fileid
,
transcript
,
normalized_transcript
=
line
...
...
torchaudio/datasets/speechcommands.py
View file @
cb40dd72
...
...
@@ -138,7 +138,8 @@ class SPEECHCOMMANDS(Dataset):
n (int): The index of the sample to be loaded
Returns:
tuple: ``(waveform, sample_rate, label, speaker_id, utterance_number)``
(Tensor, int, str, str, int):
``(waveform, sample_rate, label, speaker_id, utterance_number)``
"""
fileid
=
self
.
_walker
[
n
]
return
load_speechcommands_item
(
fileid
,
self
.
_path
)
...
...
torchaudio/datasets/tedlium.py
View file @
cb40dd72
...
...
@@ -127,7 +127,8 @@ class TEDLIUM(Dataset):
path (str): Dataset root path
Returns:
tuple: ``(waveform, sample_rate, transcript, talk_id, speaker_id, identifier)``
(Tensor, int, str, int, int, int):
``(waveform, sample_rate, transcript, talk_id, speaker_id, identifier)``
"""
transcript_path
=
os
.
path
.
join
(
path
,
"stm"
,
fileid
)
with
open
(
transcript_path
+
".stm"
)
as
f
:
...
...
torchaudio/datasets/utils.py
View file @
cb40dd72
...
...
@@ -151,7 +151,7 @@ def extract_archive(from_path: str, to_path: Optional[str] = None, overwrite: bo
overwrite (bool, optional): overwrite existing files (Default: ``False``)
Returns:
l
ist: List of paths to extracted files even if not overwritten.
L
ist
[str]
: List of paths to extracted files even if not overwritten.
Examples:
>>> url = 'http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/validation.tar.gz'
...
...
torchaudio/datasets/vctk.py
View file @
cb40dd72
...
...
@@ -134,7 +134,8 @@ class VCTK_092(Dataset):
n (int): The index of the sample to be loaded
Returns:
tuple: ``(waveform, sample_rate, transcript, speaker_id, utterance_id)``
(Tensor, int, str, str, str):
``(waveform, sample_rate, transcript, speaker_id, utterance_id)``
"""
speaker_id
,
utterance_id
=
self
.
_sample_ids
[
n
]
return
self
.
_load_sample
(
speaker_id
,
utterance_id
,
self
.
_mic_id
)
...
...
torchaudio/datasets/yesno.py
View file @
cb40dd72
...
...
@@ -77,7 +77,7 @@ class YESNO(Dataset):
n (int): The index of the sample to be loaded
Returns:
tuple
: ``(waveform, sample_rate, labels)``
(Tensor, int, List[int])
: ``(waveform, sample_rate, labels)``
"""
fileid
=
self
.
_walker
[
n
]
item
=
self
.
_load_item
(
fileid
,
self
.
_path
)
...
...
torchaudio/functional/filtering.py
View file @
cb40dd72
...
...
@@ -663,7 +663,7 @@ def filtfilt(
Returns:
Tensor: Waveform with dimension of either `(..., num_filters, time)` if ``a_coeffs`` and ``b_coeffs``
are 2D Tensors, or `(..., time)` otherwise.
are 2D Tensors, or `(..., time)` otherwise.
"""
forward_filtered
=
lfilter
(
waveform
,
a_coeffs
,
b_coeffs
,
clamp
=
False
,
batching
=
True
)
backward_filtered
=
lfilter
(
...
...
@@ -987,7 +987,7 @@ def lfilter(
Returns:
Tensor: Waveform with dimension of either `(..., num_filters, time)` if ``a_coeffs`` and ``b_coeffs``
are 2D Tensors, or `(..., time)` otherwise.
are 2D Tensors, or `(..., time)` otherwise.
"""
assert
a_coeffs
.
size
()
==
b_coeffs
.
size
()
assert
a_coeffs
.
ndim
<=
2
...
...
@@ -1474,7 +1474,7 @@ def vad(
in the detector algorithm. (Default: 2000.0)
Returns:
Tensor: Tensor of audio of dimension (..., time).
Tensor: Tensor of audio of dimension
`
(..., time)
`
.
Reference:
- http://sox.sourceforge.net/sox.html
...
...
torchaudio/functional/functional.py
View file @
cb40dd72
...
...
@@ -263,7 +263,7 @@ def griffinlim(
rand_init (bool): Initializes phase randomly if True, to zero otherwise.
Returns:
torch.
Tensor: waveform of `(..., time)`, where time equals the ``length`` parameter if given.
Tensor: waveform of `(..., time)`, where time equals the ``length`` parameter if given.
"""
assert
momentum
<
1
,
'momentum={} > 1 can be unstable'
.
format
(
momentum
)
assert
momentum
>=
0
,
'momentum={} < 0'
.
format
(
momentum
)
...
...
@@ -1369,7 +1369,7 @@ def apply_codec(
For more details see :py:func:`torchaudio.backend.sox_io_backend.save`.
Returns:
torch.
Tensor: Resulting Tensor.
Tensor: Resulting Tensor.
If ``channels_first=True``, it has `(channel, time)` else `(time, channel)`.
"""
bytes
=
io
.
BytesIO
()
...
...
torchaudio/models/conv_tasnet.py
View file @
cb40dd72
...
...
@@ -154,7 +154,7 @@ class MaskGenerator(torch.nn.Module):
input (torch.Tensor): 3D Tensor with shape [batch, features, frames]
Returns:
torch.
Tensor: shape [batch, num_sources, features, frames]
Tensor: shape [batch, num_sources, features, frames]
"""
batch_size
=
input
.
shape
[
0
]
feats
=
self
.
input_norm
(
input
)
...
...
@@ -264,7 +264,7 @@ class ConvTasNet(torch.nn.Module):
input (torch.Tensor): 3D Tensor with shape (batch_size, channels==1, frames)
Returns:
torch.
Tensor: Padded Tensor
Tensor: Padded Tensor
int: Number of paddings performed
"""
batch_size
,
num_channels
,
num_frames
=
input
.
shape
...
...
@@ -291,7 +291,7 @@ class ConvTasNet(torch.nn.Module):
input (torch.Tensor): 3D Tensor with shape [batch, channel==1, frames]
Returns:
torch.
Tensor: 3D Tensor with shape [batch, channel==num_sources, frames]
Tensor: 3D Tensor with shape [batch, channel==num_sources, frames]
"""
if
input
.
ndim
!=
3
or
input
.
shape
[
1
]
!=
1
:
raise
ValueError
(
...
...
torchaudio/models/tacotron2.py
View file @
cb40dd72
...
...
@@ -1031,7 +1031,7 @@ class Tacotron2(nn.Module):
mel_specgram_lengths (Tensor): The length of each mel spectrogram with shape `(n_batch, )`.
Returns:
Tensor, Tensor, Tensor,
and
Tensor:
[
Tensor, Tensor, Tensor, Tensor
]
:
Tensor
Mel spectrogram before Postnet with shape `(n_batch, n_mels, max of mel_specgram_lengths)`.
Tensor
...
...
Prev
1
2
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment