Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
Torchaudio
Commits
768432c3
Unverified
Commit
768432c3
authored
Sep 02, 2021
by
Caroline Chen
Committed by
GitHub
Sep 02, 2021
Browse files
Standardize optional types in docstrings (#1746)
parent
d9bfb708
Changes
21
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
108 additions
and
103 deletions
+108
-103
examples/pipeline_tacotron2/datasets.py
examples/pipeline_tacotron2/datasets.py
+1
-1
examples/pipeline_tacotron2/text/text_preprocessing.py
examples/pipeline_tacotron2/text/text_preprocessing.py
+5
-5
examples/pipeline_wavernn/wavernn_inference_wrapper.py
examples/pipeline_wavernn/wavernn_inference_wrapper.py
+5
-5
examples/source_separation/utils/dataset/wsj0mix.py
examples/source_separation/utils/dataset/wsj0mix.py
+1
-1
examples/source_separation/utils/metrics.py
examples/source_separation/utils/metrics.py
+8
-8
torchaudio/backend/soundfile_backend.py
torchaudio/backend/soundfile_backend.py
+11
-11
torchaudio/backend/sox_io_backend.py
torchaudio/backend/sox_io_backend.py
+10
-10
torchaudio/backend/utils.py
torchaudio/backend/utils.py
+1
-1
torchaudio/datasets/gtzan.py
torchaudio/datasets/gtzan.py
+1
-1
torchaudio/datasets/speechcommands.py
torchaudio/datasets/speechcommands.py
+1
-1
torchaudio/datasets/tedlium.py
torchaudio/datasets/tedlium.py
+5
-3
torchaudio/datasets/utils.py
torchaudio/datasets/utils.py
+6
-4
torchaudio/datasets/vctk.py
torchaudio/datasets/vctk.py
+1
-1
torchaudio/functional/filtering.py
torchaudio/functional/filtering.py
+19
-18
torchaudio/functional/functional.py
torchaudio/functional/functional.py
+12
-12
torchaudio/models/conv_tasnet.py
torchaudio/models/conv_tasnet.py
+10
-10
torchaudio/models/wav2vec2/components.py
torchaudio/models/wav2vec2/components.py
+4
-4
torchaudio/models/wav2vec2/model.py
torchaudio/models/wav2vec2/model.py
+2
-2
torchaudio/models/wav2vec2/utils/import_fairseq.py
torchaudio/models/wav2vec2/utils/import_fairseq.py
+1
-1
torchaudio/sox_effects/sox_effects.py
torchaudio/sox_effects/sox_effects.py
+4
-4
No files found.
examples/pipeline_tacotron2/datasets.py
View file @
768432c3
...
@@ -131,7 +131,7 @@ def text_mel_collate_fn(batch: Tuple[Tensor, Tensor],
...
@@ -131,7 +131,7 @@ def text_mel_collate_fn(batch: Tuple[Tensor, Tensor],
Args:
Args:
batch (tuple of two tensors): the first tensor is the mel spectrogram with shape
batch (tuple of two tensors): the first tensor is the mel spectrogram with shape
(n_batch, n_mels, n_frames), the second tensor is the text with shape (n_batch, ).
(n_batch, n_mels, n_frames), the second tensor is the text with shape (n_batch, ).
n_frames_per_step (int): The number of frames to advance every step.
n_frames_per_step (int
, optional
): The number of frames to advance every step.
Returns:
Returns:
text_padded (Tensor): The input text to Tacotron2 with shape (n_batch, max of ``text_lengths``).
text_padded (Tensor): The input text to Tacotron2 with shape (n_batch, max of ``text_lengths``).
...
...
examples/pipeline_tacotron2/text/text_preprocessing.py
View file @
768432c3
...
@@ -123,12 +123,12 @@ def text_to_sequence(sent: str,
...
@@ -123,12 +123,12 @@ def text_to_sequence(sent: str,
symbol_list (str or List of string, optional): When the input is a string, available options include
symbol_list (str or List of string, optional): When the input is a string, available options include
"english_characters" and "english_phonemes". When the input is a list of string, ``symbol_list`` will
"english_characters" and "english_phonemes". When the input is a list of string, ``symbol_list`` will
directly be used as the symbol to encode. (Default: "english_characters")
directly be used as the symbol to encode. (Default: "english_characters")
phonemizer (str, optional): The phonemizer to use. Only used when ``symbol_list`` is "english_phonemes".
phonemizer (str
or None
, optional): The phonemizer to use. Only used when ``symbol_list`` is "english_phonemes".
Available options include "DeepPhonemizer". (Default: "DeepPhonemizer")
Available options include "DeepPhonemizer". (Default: "DeepPhonemizer")
checkpoint (str, optional): The path to the checkpoint of the phonemizer. Only used when
``symbol_list`` is
checkpoint (str
or None
, optional): The path to the checkpoint of the phonemizer. Only used when
"english_phonemes". (Default: "./en_us_cmudict_forward.pt")
``symbol_list`` is
"english_phonemes". (Default: "./en_us_cmudict_forward.pt")
cmudict_root (str, optional): The path to the directory where the CMUDict dataset is found or
downloaded.
cmudict_root (str
or None
, optional): The path to the directory where the CMUDict dataset is found or
Only used when ``symbol_list`` is "english_phonemes". (Default: "./")
downloaded.
Only used when ``symbol_list`` is "english_phonemes". (Default: "./")
Returns:
Returns:
List of integers corresponding to the symbols in the sentence.
List of integers corresponding to the symbols in the sentence.
...
...
examples/pipeline_wavernn/wavernn_inference_wrapper.py
View file @
768432c3
...
@@ -171,13 +171,13 @@ class WaveRNNInferenceWrapper(torch.nn.Module):
...
@@ -171,13 +171,13 @@ class WaveRNNInferenceWrapper(torch.nn.Module):
Args:
Args:
specgram (Tensor): spectrogram of size (n_mels, n_time)
specgram (Tensor): spectrogram of size (n_mels, n_time)
mulaw (bool): Whether to perform mulaw decoding (Default: ``True``).
mulaw (bool
, optional
): Whether to perform mulaw decoding (Default: ``True``).
batched (bool): Whether to perform batch prediction. Using batch prediction
batched (bool
, optional
): Whether to perform batch prediction. Using batch prediction
will significantly increase the inference speed (Default: ``True``).
will significantly increase the inference speed (Default: ``True``).
timesteps (int): The time steps for each batch. Only used when `batched`
timesteps (int
, optional
): The time steps for each batch. Only used when `batched`
is set to True (Default: ``100``).
is set to True (Default: ``100``).
overlap (int): The overlapping time steps between batches. Only used when
`batched`
overlap (int
, optional
): The overlapping time steps between batches. Only used when
is set to True (Default: ``5``).
`batched`
is set to True (Default: ``5``).
Returns:
Returns:
waveform (Tensor): Reconstructed waveform of size (1, n_time, ).
waveform (Tensor): Reconstructed waveform of size (1, n_time, ).
...
...
examples/source_separation/utils/dataset/wsj0mix.py
View file @
768432c3
...
@@ -19,7 +19,7 @@ class WSJ0Mix(Dataset):
...
@@ -19,7 +19,7 @@ class WSJ0Mix(Dataset):
N source audios.
N source audios.
sample_rate (int): Expected sample rate of audio files. If any of the audio has a
sample_rate (int): Expected sample rate of audio files. If any of the audio has a
different sample rate, raises ``ValueError``.
different sample rate, raises ``ValueError``.
audio_ext (str): The extension of audio files to find. (default: ".wav")
audio_ext (str
, optional
): The extension of audio files to find. (default: ".wav")
"""
"""
def
__init__
(
def
__init__
(
self
,
self
,
...
...
examples/source_separation/utils/metrics.py
View file @
768432c3
...
@@ -21,9 +21,9 @@ def sdr(
...
@@ -21,9 +21,9 @@ def sdr(
Shape: [batch, speakers (can be 1), time frame]
Shape: [batch, speakers (can be 1), time frame]
reference (torch.Tensor): Reference signal.
reference (torch.Tensor): Reference signal.
Shape: [batch, speakers, time frame]
Shape: [batch, speakers, time frame]
mask (
Optional[
torch.Tensor
]
): Binary mask to indicate padded value (0) or valid value (1).
mask (torch.Tensor
or None, optional
): Binary mask to indicate padded value (0) or valid value (1).
Shape: [batch, 1, time frame]
Shape: [batch, 1, time frame]
epsilon (float): constant value used to stabilize division.
epsilon (float
, optional
): constant value used to stabilize division.
Returns:
Returns:
torch.Tensor: scale-invariant source-to-distortion ratio.
torch.Tensor: scale-invariant source-to-distortion ratio.
...
@@ -99,9 +99,9 @@ class PIT(torch.nn.Module):
...
@@ -99,9 +99,9 @@ class PIT(torch.nn.Module):
Shape: [bacth, speakers, time frame]
Shape: [bacth, speakers, time frame]
reference (torch.Tensor): Reference (original) source signals.
reference (torch.Tensor): Reference (original) source signals.
Shape: [batch, speakers, time frame]
Shape: [batch, speakers, time frame]
mask (
Optional[
torch.Tensor
]
): Binary mask to indicate padded value (0) or valid value (1).
mask (torch.Tensor
or None, optional
): Binary mask to indicate padded value (0) or valid value (1).
Shape: [batch, 1, time frame]
Shape: [batch, 1, time frame]
epsilon (float): constant value used to stabilize division.
epsilon (float
, optional
): constant value used to stabilize division.
Returns:
Returns:
torch.Tensor: Maximum criterion over the speaker permutation.
torch.Tensor: Maximum criterion over the speaker permutation.
...
@@ -140,9 +140,9 @@ def sdr_pit(
...
@@ -140,9 +140,9 @@ def sdr_pit(
Shape: [batch, speakers (can be 1), time frame]
Shape: [batch, speakers (can be 1), time frame]
reference (torch.Tensor): Reference signal.
reference (torch.Tensor): Reference signal.
Shape: [batch, speakers, time frame]
Shape: [batch, speakers, time frame]
mask (
Optional[
torch.Tensor
]
): Binary mask to indicate padded value (0) or valid value (1).
mask (torch.Tensor
or None, optional
): Binary mask to indicate padded value (0) or valid value (1).
Shape: [batch, 1, time frame]
Shape: [batch, 1, time frame]
epsilon (float): constant value used to stabilize division.
epsilon (float
, optional
): constant value used to stabilize division.
Returns:
Returns:
torch.Tensor: scale-invariant source-to-distortion ratio.
torch.Tensor: scale-invariant source-to-distortion ratio.
...
@@ -187,9 +187,9 @@ def sdri(
...
@@ -187,9 +187,9 @@ def sdri(
Shape: [batch, speakers, time frame]
Shape: [batch, speakers, time frame]
mix (torch.Tensor): Mixed souce signals, from which the setimated signals were generated.
mix (torch.Tensor): Mixed souce signals, from which the setimated signals were generated.
Shape: [batch, speakers == 1, time frame]
Shape: [batch, speakers == 1, time frame]
mask (
Optional[
torch.Tensor
]
): Binary mask to indicate padded value (0) or valid value (1).
mask (torch.Tensor
or None, optional
): Binary mask to indicate padded value (0) or valid value (1).
Shape: [batch, 1, time frame]
Shape: [batch, 1, time frame]
epsilon (float): constant value used to stabilize division.
epsilon (float
, optional
): constant value used to stabilize division.
Returns:
Returns:
torch.Tensor: Improved SDR. Shape: [batch, ]
torch.Tensor: Improved SDR. Shape: [batch, ]
...
...
torchaudio/backend/soundfile_backend.py
View file @
768432c3
...
@@ -92,7 +92,7 @@ def info(filepath: str, format: Optional[str] = None) -> AudioMetaData:
...
@@ -92,7 +92,7 @@ def info(filepath: str, format: Optional[str] = None) -> AudioMetaData:
Args:
Args:
filepath (path-like object or file-like object):
filepath (path-like object or file-like object):
Source of audio data.
Source of audio data.
format (str, optional):
format (str
or None
, optional):
Not used. PySoundFile does not accept format hint.
Not used. PySoundFile does not accept format hint.
Returns:
Returns:
...
@@ -168,23 +168,23 @@ def load(
...
@@ -168,23 +168,23 @@ def load(
Args:
Args:
filepath (path-like object or file-like object):
filepath (path-like object or file-like object):
Source of audio data.
Source of audio data.
frame_offset (int):
frame_offset (int
, optional
):
Number of frames to skip before start reading data.
Number of frames to skip before start reading data.
num_frames (int):
num_frames (int
, optional
):
Maximum number of frames to read. ``-1`` reads all the remaining samples,
Maximum number of frames to read. ``-1`` reads all the remaining samples,
starting from ``frame_offset``.
starting from ``frame_offset``.
This function may return the less number of frames if there is not enough
This function may return the less number of frames if there is not enough
frames in the given file.
frames in the given file.
normalize (bool):
normalize (bool
, optional
):
When ``True``, this function always return ``float32``, and sample values are
When ``True``, this function always return ``float32``, and sample values are
normalized to ``[-1.0, 1.0]``.
normalized to ``[-1.0, 1.0]``.
If input file is integer WAV, giving ``False`` will change the resulting Tensor type to
If input file is integer WAV, giving ``False`` will change the resulting Tensor type to
integer type.
integer type.
This argument has no effect for formats other than integer WAV type.
This argument has no effect for formats other than integer WAV type.
channels_first (bool):
channels_first (bool
, optional
):
When True, the returned Tensor has dimension ``[channel, time]``.
When True, the returned Tensor has dimension ``[channel, time]``.
Otherwise, the returned Tensor's dimension is ``[time, channel]``.
Otherwise, the returned Tensor's dimension is ``[time, channel]``.
format (str, optional):
format (str
or None
, optional):
Not used. PySoundFile does not accept format hint.
Not used. PySoundFile does not accept format hint.
Returns:
Returns:
...
@@ -335,11 +335,11 @@ def save(
...
@@ -335,11 +335,11 @@ def save(
filepath (str or pathlib.Path): Path to audio file.
filepath (str or pathlib.Path): Path to audio file.
src (torch.Tensor): Audio data to save. must be 2D tensor.
src (torch.Tensor): Audio data to save. must be 2D tensor.
sample_rate (int): sampling rate
sample_rate (int): sampling rate
channels_first (bool): If ``True``, the given tensor is interpreted as ``[channel, time]``,
channels_first (bool
, optional
): If ``True``, the given tensor is interpreted as ``[channel, time]``,
otherwise ``[time, channel]``.
otherwise ``[time, channel]``.
compression (
Optional[float]
): Not used.
compression (
float of None, optional
): Not used.
It is here only for interface compatibility reson with "sox_io" backend.
It is here only for interface compatibility reson with "sox_io" backend.
format (str, optional): Override the audio format.
format (str
or None
, optional): Override the audio format.
When ``filepath`` argument is path-like object, audio format is
When ``filepath`` argument is path-like object, audio format is
inferred from file extension. If the file extension is missing or
inferred from file extension. If the file extension is missing or
different, you can specify the correct format with this argument.
different, you can specify the correct format with this argument.
...
@@ -349,7 +349,7 @@ def save(
...
@@ -349,7 +349,7 @@ def save(
Valid values are ``"wav"``, ``"ogg"``, ``"vorbis"``,
Valid values are ``"wav"``, ``"ogg"``, ``"vorbis"``,
``"flac"`` and ``"sph"``.
``"flac"`` and ``"sph"``.
encoding (str, optional): Changes the encoding for supported formats.
encoding (str
or None
, optional): Changes the encoding for supported formats.
This argument is effective only for supported formats, sush as
This argument is effective only for supported formats, sush as
``"wav"``, ``""flac"`` and ``"sph"``. Valid values are;
``"wav"``, ``""flac"`` and ``"sph"``. Valid values are;
...
@@ -359,7 +359,7 @@ def save(
...
@@ -359,7 +359,7 @@ def save(
- ``"ULAW"`` (mu-law)
- ``"ULAW"`` (mu-law)
- ``"ALAW"`` (a-law)
- ``"ALAW"`` (a-law)
bits_per_sample (int, optional): Changes the bit depth for the
bits_per_sample (int
or None
, optional): Changes the bit depth for the
supported formats.
supported formats.
When ``format`` is one of ``"wav"``, ``"flac"`` or ``"sph"``,
When ``format`` is one of ``"wav"``, ``"flac"`` or ``"sph"``,
you can change the bit depth.
you can change the bit depth.
...
...
torchaudio/backend/sox_io_backend.py
View file @
768432c3
...
@@ -37,7 +37,7 @@ def info(
...
@@ -37,7 +37,7 @@ def info(
* This argument is intentionally annotated as ``str`` only due to
* This argument is intentionally annotated as ``str`` only due to
TorchScript compiler compatibility.
TorchScript compiler compatibility.
format (str, optional):
format (str
or None
, optional):
Override the format detection with the given format.
Override the format detection with the given format.
Providing the argument might help when libsox can not infer the format
Providing the argument might help when libsox can not infer the format
from header or extension,
from header or extension,
...
@@ -119,21 +119,21 @@ def load(
...
@@ -119,21 +119,21 @@ def load(
TorchScript compiler compatibility.
TorchScript compiler compatibility.
frame_offset (int):
frame_offset (int):
Number of frames to skip before start reading data.
Number of frames to skip before start reading data.
num_frames (int):
num_frames (int
, optional
):
Maximum number of frames to read. ``-1`` reads all the remaining samples,
Maximum number of frames to read. ``-1`` reads all the remaining samples,
starting from ``frame_offset``.
starting from ``frame_offset``.
This function may return the less number of frames if there is not enough
This function may return the less number of frames if there is not enough
frames in the given file.
frames in the given file.
normalize (bool):
normalize (bool
, optional
):
When ``True``, this function always return ``float32``, and sample values are
When ``True``, this function always return ``float32``, and sample values are
normalized to ``[-1.0, 1.0]``.
normalized to ``[-1.0, 1.0]``.
If input file is integer WAV, giving ``False`` will change the resulting Tensor type to
If input file is integer WAV, giving ``False`` will change the resulting Tensor type to
integer type.
integer type.
This argument has no effect for formats other than integer WAV type.
This argument has no effect for formats other than integer WAV type.
channels_first (bool):
channels_first (bool
, optional
):
When True, the returned Tensor has dimension ``[channel, time]``.
When True, the returned Tensor has dimension ``[channel, time]``.
Otherwise, the returned Tensor's dimension is ``[time, channel]``.
Otherwise, the returned Tensor's dimension is ``[time, channel]``.
format (str, optional):
format (str
or None
, optional):
Override the format detection with the given format.
Override the format detection with the given format.
Providing the argument might help when libsox can not infer the format
Providing the argument might help when libsox can not infer the format
from header or extension,
from header or extension,
...
@@ -172,9 +172,9 @@ def save(
...
@@ -172,9 +172,9 @@ def save(
as ``str`` for TorchScript compiler compatibility.
as ``str`` for TorchScript compiler compatibility.
src (torch.Tensor): Audio data to save. must be 2D tensor.
src (torch.Tensor): Audio data to save. must be 2D tensor.
sample_rate (int): sampling rate
sample_rate (int): sampling rate
channels_first (bool): If ``True``, the given tensor is interpreted as ``[channel, time]``,
channels_first (bool
, optional
): If ``True``, the given tensor is interpreted as ``[channel, time]``,
otherwise ``[time, channel]``.
otherwise ``[time, channel]``.
compression (
Optional[float]
): Used for formats other than WAV.
compression (
float or None, optional
): Used for formats other than WAV.
This corresponds to ``-C`` option of ``sox`` command.
This corresponds to ``-C`` option of ``sox`` command.
``"mp3"``
``"mp3"``
...
@@ -189,7 +189,7 @@ def save(
...
@@ -189,7 +189,7 @@ def save(
and lowest quality. Default: ``3``.
and lowest quality. Default: ``3``.
See the detail at http://sox.sourceforge.net/soxformat.html.
See the detail at http://sox.sourceforge.net/soxformat.html.
format (str, optional): Override the audio format.
format (str
or None
, optional): Override the audio format.
When ``filepath`` argument is path-like object, audio format is infered from
When ``filepath`` argument is path-like object, audio format is infered from
file extension. If file extension is missing or different, you can specify the
file extension. If file extension is missing or different, you can specify the
correct format with this argument.
correct format with this argument.
...
@@ -199,7 +199,7 @@ def save(
...
@@ -199,7 +199,7 @@ def save(
Valid values are ``"wav"``, ``"mp3"``, ``"ogg"``, ``"vorbis"``, ``"amr-nb"``,
Valid values are ``"wav"``, ``"mp3"``, ``"ogg"``, ``"vorbis"``, ``"amr-nb"``,
``"amb"``, ``"flac"``, ``"sph"``, ``"gsm"``, and ``"htk"``.
``"amb"``, ``"flac"``, ``"sph"``, ``"gsm"``, and ``"htk"``.
encoding (str, optional): Changes the encoding for the supported formats.
encoding (str
or None
, optional): Changes the encoding for the supported formats.
This argument is effective only for supported formats, such as ``"wav"``, ``""amb"``
This argument is effective only for supported formats, such as ``"wav"``, ``""amb"``
and ``"sph"``. Valid values are;
and ``"sph"``. Valid values are;
...
@@ -225,7 +225,7 @@ def save(
...
@@ -225,7 +225,7 @@ def save(
``"sph"`` format;
``"sph"`` format;
- the default value is ``"PCM_S"``
- the default value is ``"PCM_S"``
bits_per_sample (int, optional): Changes the bit depth for the supported formats.
bits_per_sample (int
or None
, optional): Changes the bit depth for the supported formats.
When ``format`` is one of ``"wav"``, ``"flac"``, ``"sph"``, or ``"amb"``, you can change the
When ``format`` is one of ``"wav"``, ``"flac"``, ``"sph"``, or ``"amb"``, you can change the
bit depth. Valid values are ``8``, ``16``, ``32`` and ``64``.
bit depth. Valid values are ``8``, ``16``, ``32`` and ``64``.
...
...
torchaudio/backend/utils.py
View file @
768432c3
...
@@ -35,7 +35,7 @@ def set_audio_backend(backend: Optional[str]):
...
@@ -35,7 +35,7 @@ def set_audio_backend(backend: Optional[str]):
"""Set the backend for I/O operation
"""Set the backend for I/O operation
Args:
Args:
backend (
Optional[str]
): Name of the backend.
backend (
str or None
): Name of the backend.
One of ``"sox_io"`` or ``"soundfile"`` based on availability
One of ``"sox_io"`` or ``"soundfile"`` based on availability
of the system. If ``None`` is provided the current backend is unassigned.
of the system. If ``None`` is provided the current backend is unassigned.
"""
"""
...
...
torchaudio/datasets/gtzan.py
View file @
768432c3
...
@@ -1011,7 +1011,7 @@ class GTZAN(Dataset):
...
@@ -1011,7 +1011,7 @@ class GTZAN(Dataset):
folder_in_archive (str, optional): The top-level directory of the dataset.
folder_in_archive (str, optional): The top-level directory of the dataset.
download (bool, optional):
download (bool, optional):
Whether to download the dataset if it is not found at root path. (default: ``False``).
Whether to download the dataset if it is not found at root path. (default: ``False``).
subset (str, optional): Which subset of the dataset to use.
subset (str
or None
, optional): Which subset of the dataset to use.
One of ``"training"``, ``"validation"``, ``"testing"`` or ``None``.
One of ``"training"``, ``"validation"``, ``"testing"`` or ``None``.
If ``None``, the entire dataset is used. (default: ``None``).
If ``None``, the entire dataset is used. (default: ``None``).
"""
"""
...
...
torchaudio/datasets/speechcommands.py
View file @
768432c3
...
@@ -65,7 +65,7 @@ class SPEECHCOMMANDS(Dataset):
...
@@ -65,7 +65,7 @@ class SPEECHCOMMANDS(Dataset):
The top-level directory of the dataset. (default: ``"SpeechCommands"``)
The top-level directory of the dataset. (default: ``"SpeechCommands"``)
download (bool, optional):
download (bool, optional):
Whether to download the dataset if it is not found at root path. (default: ``False``).
Whether to download the dataset if it is not found at root path. (default: ``False``).
subset (
O
ptional
[str]
):
subset (
str or None, o
ptional):
Select a subset of the dataset [None, "training", "validation", "testing"]. None means
Select a subset of the dataset [None, "training", "validation", "testing"]. None means
the whole dataset. "validation" and "testing" are defined in "validation_list.txt" and
the whole dataset. "validation" and "testing" are defined in "validation_list.txt" and
"testing_list.txt", respectively, and "training" is the rest. Details for the files
"testing_list.txt", respectively, and "training" is the rest. Details for the files
...
...
torchaudio/datasets/tedlium.py
View file @
768432c3
...
@@ -55,6 +55,7 @@ class TEDLIUM(Dataset):
...
@@ -55,6 +55,7 @@ class TEDLIUM(Dataset):
and ``"test"`` for releases 1&2, ``None`` for release3. Defaults to ``"train"`` or ``None``.
and ``"test"`` for releases 1&2, ``None`` for release3. Defaults to ``"train"`` or ``None``.
download (bool, optional):
download (bool, optional):
Whether to download the dataset if it is not found at root path. (default: ``False``).
Whether to download the dataset if it is not found at root path. (default: ``False``).
audio_ext (str, optional): extension for audio file (default: ``"audio_ext"``)
"""
"""
def
__init__
(
def
__init__
(
self
,
self
,
...
@@ -62,7 +63,7 @@ class TEDLIUM(Dataset):
...
@@ -62,7 +63,7 @@ class TEDLIUM(Dataset):
release
:
str
=
"release1"
,
release
:
str
=
"release1"
,
subset
:
str
=
None
,
subset
:
str
=
None
,
download
:
bool
=
False
,
download
:
bool
=
False
,
audio_ext
=
".sph"
audio_ext
:
str
=
".sph"
)
->
None
:
)
->
None
:
self
.
_ext_audio
=
audio_ext
self
.
_ext_audio
=
audio_ext
if
release
in
_RELEASE_CONFIGS
.
keys
():
if
release
in
_RELEASE_CONFIGS
.
keys
():
...
@@ -144,8 +145,9 @@ class TEDLIUM(Dataset):
...
@@ -144,8 +145,9 @@ class TEDLIUM(Dataset):
Args:
Args:
path (str): Path to audio file
path (str): Path to audio file
start_time (int, optional): Time in seconds where the sample sentence stars
start_time (int): Time in seconds where the sample sentence stars
end_time (int, optional): Time in seconds where the sample sentence finishes
end_time (int): Time in seconds where the sample sentence finishes
sample_rate (float, optional): Sampling rate
Returns:
Returns:
[Tensor, int]: Audio tensor representation and sample rate
[Tensor, int]: Audio tensor representation and sample rate
...
...
torchaudio/datasets/utils.py
View file @
768432c3
...
@@ -22,7 +22,7 @@ def stream_url(url: str,
...
@@ -22,7 +22,7 @@ def stream_url(url: str,
Args:
Args:
url (str): Url.
url (str): Url.
start_byte (int, optional): Start streaming at that point (Default: ``None``).
start_byte (int
or None
, optional): Start streaming at that point (Default: ``None``).
block_size (int, optional): Size of chunks to stream (Default: ``32 * 1024``).
block_size (int, optional): Size of chunks to stream (Default: ``32 * 1024``).
progress_bar (bool, optional): Display a progress bar (Default: ``True``).
progress_bar (bool, optional): Display a progress bar (Default: ``True``).
"""
"""
...
@@ -68,8 +68,9 @@ def download_url(url: str,
...
@@ -68,8 +68,9 @@ def download_url(url: str,
Args:
Args:
url (str): Url.
url (str): Url.
download_folder (str): Folder to download file.
download_folder (str): Folder to download file.
filename (str, optional): Name of downloaded file. If None, it is inferred from the url (Default: ``None``).
filename (str or None, optional): Name of downloaded file. If None, it is inferred from the url
hash_value (str, optional): Hash for url (Default: ``None``).
(Default: ``None``).
hash_value (str or None, optional): Hash for url (Default: ``None``).
hash_type (str, optional): Hash type, among "sha256" and "md5" (Default: ``"sha256"``).
hash_type (str, optional): Hash type, among "sha256" and "md5" (Default: ``"sha256"``).
progress_bar (bool, optional): Display a progress bar (Default: ``True``).
progress_bar (bool, optional): Display a progress bar (Default: ``True``).
resume (bool, optional): Enable resuming download (Default: ``False``).
resume (bool, optional): Enable resuming download (Default: ``False``).
...
@@ -149,7 +150,8 @@ def extract_archive(from_path: str, to_path: Optional[str] = None, overwrite: bo
...
@@ -149,7 +150,8 @@ def extract_archive(from_path: str, to_path: Optional[str] = None, overwrite: bo
"""Extract archive.
"""Extract archive.
Args:
Args:
from_path (str): the path of the archive.
from_path (str): the path of the archive.
to_path (str, optional): the root path of the extraced files (directory of from_path) (Default: ``None``)
to_path (str or None, optional): the root path of the extraced files (directory of from_path)
(Default: ``None``)
overwrite (bool, optional): overwrite existing files (Default: ``False``)
overwrite (bool, optional): overwrite existing files (Default: ``False``)
Returns:
Returns:
...
...
torchaudio/datasets/vctk.py
View file @
768432c3
...
@@ -150,7 +150,7 @@ class VCTK_092(Dataset):
...
@@ -150,7 +150,7 @@ class VCTK_092(Dataset):
Args:
Args:
root (str): Root directory where the dataset's top level directory is found.
root (str): Root directory where the dataset's top level directory is found.
mic_id (str): Microphone ID. Either ``"mic1"`` or ``"mic2"``. (default: ``"mic2"``)
mic_id (str
, optional
): Microphone ID. Either ``"mic1"`` or ``"mic2"``. (default: ``"mic2"``)
download (bool, optional):
download (bool, optional):
Whether to download the dataset if it is not found at root path. (default: ``False``).
Whether to download the dataset if it is not found at root path. (default: ``False``).
url (str, optional): The URL to download the dataset from.
url (str, optional): The URL to download the dataset from.
...
...
torchaudio/functional/filtering.py
View file @
768432c3
...
@@ -316,7 +316,7 @@ def contrast(waveform: Tensor, enhancement_amount: float = 75.0) -> Tensor:
...
@@ -316,7 +316,7 @@ def contrast(waveform: Tensor, enhancement_amount: float = 75.0) -> Tensor:
Args:
Args:
waveform (Tensor): audio waveform of dimension of `(..., time)`
waveform (Tensor): audio waveform of dimension of `(..., time)`
enhancement_amount (float): controls the amount of the enhancement
enhancement_amount (float
, optional
): controls the amount of the enhancement
Allowed range of values for enhancement_amount : 0-100
Allowed range of values for enhancement_amount : 0-100
Note that enhancement_amount = 0 still gives a significant contrast enhancement
Note that enhancement_amount = 0 still gives a significant contrast enhancement
...
@@ -350,7 +350,7 @@ def dcshift(
...
@@ -350,7 +350,7 @@ def dcshift(
waveform (Tensor): audio waveform of dimension of `(..., time)`
waveform (Tensor): audio waveform of dimension of `(..., time)`
shift (float): indicates the amount to shift the audio
shift (float): indicates the amount to shift the audio
Allowed range of values for shift : -2.0 to +2.0
Allowed range of values for shift : -2.0 to +2.0
limiter_gain (float): It is used only on peaks to prevent clipping
limiter_gain (float
of None, optional
): It is used only on peaks to prevent clipping
It should have a value much less than 1 (e.g. 0.05 or 0.02)
It should have a value much less than 1 (e.g. 0.05 or 0.02)
Returns:
Returns:
...
@@ -690,20 +690,21 @@ def flanger(
...
@@ -690,20 +690,21 @@ def flanger(
waveform (Tensor): audio waveform of dimension of `(..., channel, time)` .
waveform (Tensor): audio waveform of dimension of `(..., channel, time)` .
Max 4 channels allowed
Max 4 channels allowed
sample_rate (int): sampling rate of the waveform, e.g. 44100 (Hz)
sample_rate (int): sampling rate of the waveform, e.g. 44100 (Hz)
delay (float): desired delay in milliseconds(ms)
delay (float
, optional
): desired delay in milliseconds(ms)
Allowed range of values are 0 to 30
Allowed range of values are 0 to 30
depth (float): desired delay depth in milliseconds(ms)
depth (float
, optional
): desired delay depth in milliseconds(ms)
Allowed range of values are 0 to 10
Allowed range of values are 0 to 10
regen (float): desired regen(feedback gain) in dB
regen (float
, optional
): desired regen(feedback gain) in dB
Allowed range of values are -95 to 95
Allowed range of values are -95 to 95
width (float): desired width(delay gain) in dB
width (float
, optional
): desired width(delay gain) in dB
Allowed range of values are 0 to 100
Allowed range of values are 0 to 100
speed (float): modulation speed in Hz
speed (float
, optional
): modulation speed in Hz
Allowed range of values are 0.1 to 10
Allowed range of values are 0.1 to 10
phase (float): percentage phase-shift for multi-channel
phase (float
, optional
): percentage phase-shift for multi-channel
Allowed range of values are 0 to 100
Allowed range of values are 0 to 100
modulation (str): Use either "sinusoidal" or "triangular" modulation. (Default: ``sinusoidal``)
modulation (str, optional): Use either "sinusoidal" or "triangular" modulation. (Default: ``sinusoidal``)
interpolation (str): Use either "linear" or "quadratic" for delay-line interpolation. (Default: ``linear``)
interpolation (str, optional): Use either "linear" or "quadratic" for delay-line interpolation.
(Default: ``linear``)
Returns:
Returns:
Tensor: Waveform of dimension of `(..., channel, time)`
Tensor: Waveform of dimension of `(..., channel, time)`
...
@@ -1072,9 +1073,9 @@ def overdrive(waveform: Tensor, gain: float = 20, colour: float = 20) -> Tensor:
...
@@ -1072,9 +1073,9 @@ def overdrive(waveform: Tensor, gain: float = 20, colour: float = 20) -> Tensor:
Args:
Args:
waveform (Tensor): audio waveform of dimension of `(..., time)`
waveform (Tensor): audio waveform of dimension of `(..., time)`
gain (float): desired gain at the boost (or attenuation) in dB
gain (float
, optional
): desired gain at the boost (or attenuation) in dB
Allowed range of values are 0 to 100
Allowed range of values are 0 to 100
colour (float): controls the amount of even harmonic content in the over-driven output
colour (float
, optional
): controls the amount of even harmonic content in the over-driven output
Allowed range of values are 0 to 100
Allowed range of values are 0 to 100
Returns:
Returns:
...
@@ -1132,17 +1133,17 @@ def phaser(
...
@@ -1132,17 +1133,17 @@ def phaser(
Args:
Args:
waveform (Tensor): audio waveform of dimension of `(..., time)`
waveform (Tensor): audio waveform of dimension of `(..., time)`
sample_rate (int): sampling rate of the waveform, e.g. 44100 (Hz)
sample_rate (int): sampling rate of the waveform, e.g. 44100 (Hz)
gain_in (float): desired input gain at the boost (or attenuation) in dB
gain_in (float
, optional
): desired input gain at the boost (or attenuation) in dB
Allowed range of values are 0 to 1
Allowed range of values are 0 to 1
gain_out (float): desired output gain at the boost (or attenuation) in dB
gain_out (float
, optional
): desired output gain at the boost (or attenuation) in dB
Allowed range of values are 0 to 1e9
Allowed range of values are 0 to 1e9
delay_ms (float): desired delay in milliseconds
delay_ms (float
, optional
): desired delay in milliseconds
Allowed range of values are 0 to 5.0
Allowed range of values are 0 to 5.0
decay (float): desired decay relative to gain-in
decay (float
, optional
): desired decay relative to gain-in
Allowed range of values are 0 to 0.99
Allowed range of values are 0 to 0.99
mod_speed (float): modulation speed in Hz
mod_speed (float
, optional
): modulation speed in Hz
Allowed range of values are 0.1 to 2
Allowed range of values are 0.1 to 2
sinusoidal (bool): If ``True``, uses sinusoidal modulation (preferable for multiple instruments)
sinusoidal (bool
, optional
): If ``True``, uses sinusoidal modulation (preferable for multiple instruments)
If ``False``, uses triangular modulation (gives single instruments a sharper phasing effect)
If ``False``, uses triangular modulation (gives single instruments a sharper phasing effect)
(Default: ``True``)
(Default: ``True``)
...
...
torchaudio/functional/functional.py
View file @
768432c3
...
@@ -155,7 +155,7 @@ def inverse_spectrogram(
...
@@ -155,7 +155,7 @@ def inverse_spectrogram(
Args:
Args:
spectrogram (Tensor): Complex tensor of audio of dimension (..., freq, time).
spectrogram (Tensor): Complex tensor of audio of dimension (..., freq, time).
length (int
,
o
ptional
): The output length of the waveform.
length (int o
r None
): The output length of the waveform.
pad (int): Two sided padding of signal. It is only effective when ``length`` is provided.
pad (int): Two sided padding of signal. It is only effective when ``length`` is provided.
window (Tensor): Window tensor that is applied/multiplied to each frame/window
window (Tensor): Window tensor that is applied/multiplied to each frame/window
n_fft (int): Size of FFT
n_fft (int): Size of FFT
...
@@ -503,8 +503,8 @@ def create_fb_matrix(
...
@@ -503,8 +503,8 @@ def create_fb_matrix(
f_max (float): Maximum frequency (Hz)
f_max (float): Maximum frequency (Hz)
n_mels (int): Number of mel filterbanks
n_mels (int): Number of mel filterbanks
sample_rate (int): Sample rate of the audio waveform
sample_rate (int): Sample rate of the audio waveform
norm (
O
ptional
[str]
): If 'slaney', divide the triangular mel weights by the width of the mel band
norm (
str or None, o
ptional): If 'slaney', divide the triangular mel weights by the width of the mel band
(area normalization). (Default: ``None``)
(area normalization). (Default: ``None``)
mel_scale (str, optional): Scale to use: ``htk`` or ``slaney``. (Default: ``htk``)
mel_scale (str, optional): Scale to use: ``htk`` or ``slaney``. (Default: ``htk``)
Returns:
Returns:
...
@@ -549,8 +549,8 @@ def melscale_fbanks(
...
@@ -549,8 +549,8 @@ def melscale_fbanks(
f_max (float): Maximum frequency (Hz)
f_max (float): Maximum frequency (Hz)
n_mels (int): Number of mel filterbanks
n_mels (int): Number of mel filterbanks
sample_rate (int): Sample rate of the audio waveform
sample_rate (int): Sample rate of the audio waveform
norm (
O
ptional
[str]
): If 'slaney', divide the triangular mel weights by the width of the mel band
norm (
str or None, o
ptional): If 'slaney', divide the triangular mel weights by the width of the mel band
(area normalization). (Default: ``None``)
(area normalization). (Default: ``None``)
mel_scale (str, optional): Scale to use: ``htk`` or ``slaney``. (Default: ``htk``)
mel_scale (str, optional): Scale to use: ``htk`` or ``slaney``. (Default: ``htk``)
Returns:
Returns:
...
@@ -724,7 +724,7 @@ def complex_norm(
...
@@ -724,7 +724,7 @@ def complex_norm(
Args:
Args:
complex_tensor (Tensor): Tensor shape of `(..., complex=2)`
complex_tensor (Tensor): Tensor shape of `(..., complex=2)`
power (float): Power of the norm. (Default: `1.0`).
power (float
, optional
): Power of the norm. (Default: `1.0`).
Returns:
Returns:
Tensor: Power of the normed input tensor. Shape of `(..., )`
Tensor: Power of the normed input tensor. Shape of `(..., )`
...
@@ -771,7 +771,7 @@ def magphase(
...
@@ -771,7 +771,7 @@ def magphase(
Args:
Args:
complex_tensor (Tensor): Tensor shape of `(..., complex=2)`
complex_tensor (Tensor): Tensor shape of `(..., complex=2)`
power (float): Power of the norm. (Default: `1.0`)
power (float
, optional
): Power of the norm. (Default: `1.0`)
Returns:
Returns:
(Tensor, Tensor): The magnitude and phase of the complex tensor
(Tensor, Tensor): The magnitude and phase of the complex tensor
...
@@ -1343,14 +1343,14 @@ def apply_codec(
...
@@ -1343,14 +1343,14 @@ def apply_codec(
waveform (Tensor): Audio data. Must be 2 dimensional. See also ```channels_first```.
waveform (Tensor): Audio data. Must be 2 dimensional. See also ```channels_first```.
sample_rate (int): Sample rate of the audio waveform.
sample_rate (int): Sample rate of the audio waveform.
format (str): File format.
format (str): File format.
channels_first (bool):
channels_first (bool
, optional
):
When True, both the input and output Tensor have dimension ``[channel, time]``.
When True, both the input and output Tensor have dimension ``[channel, time]``.
Otherwise, they have dimension ``[time, channel]``.
Otherwise, they have dimension ``[time, channel]``.
compression (float): Used for formats other than WAV.
compression (float
or None, optional
): Used for formats other than WAV.
For more details see :py:func:`torchaudio.backend.sox_io_backend.save`.
For more details see :py:func:`torchaudio.backend.sox_io_backend.save`.
encoding (str, optional): Changes the encoding for the supported formats.
encoding (str
or None
, optional): Changes the encoding for the supported formats.
For more details see :py:func:`torchaudio.backend.sox_io_backend.save`.
For more details see :py:func:`torchaudio.backend.sox_io_backend.save`.
bits_per_sample (int, optional): Changes the bit depth for the supported formats.
bits_per_sample (int
or None
, optional): Changes the bit depth for the supported formats.
For more details see :py:func:`torchaudio.backend.sox_io_backend.save`.
For more details see :py:func:`torchaudio.backend.sox_io_backend.save`.
Returns:
Returns:
...
@@ -1614,7 +1614,7 @@ def resample(
...
@@ -1614,7 +1614,7 @@ def resample(
Lower values reduce anti-aliasing, but also reduce some of the highest frequencies. (Default: ``0.99``)
Lower values reduce anti-aliasing, but also reduce some of the highest frequencies. (Default: ``0.99``)
resampling_method (str, optional): The resampling method to use.
resampling_method (str, optional): The resampling method to use.
Options: [``sinc_interpolation``, ``kaiser_window``] (Default: ``'sinc_interpolation'``)
Options: [``sinc_interpolation``, ``kaiser_window``] (Default: ``'sinc_interpolation'``)
beta (float or None): The shape parameter used for kaiser window.
beta (float or None
, optional
): The shape parameter used for kaiser window.
Returns:
Returns:
Tensor: The waveform at the new frequency of dimension (..., time).
Tensor: The waveform at the new frequency of dimension (..., time).
...
...
torchaudio/models/conv_tasnet.py
View file @
768432c3
...
@@ -16,8 +16,8 @@ class ConvBlock(torch.nn.Module):
...
@@ -16,8 +16,8 @@ class ConvBlock(torch.nn.Module):
hidden_channels (int): The number of channels in the internal layers, <H>.
hidden_channels (int): The number of channels in the internal layers, <H>.
kernel_size (int): The convolution kernel size of the middle layer, <P>.
kernel_size (int): The convolution kernel size of the middle layer, <P>.
padding (int): Padding value of the convolution in the middle layer.
padding (int): Padding value of the convolution in the middle layer.
dilation (int): Dilation value of the convolution in the middle layer.
dilation (int
, optional
): Dilation value of the convolution in the middle layer.
no_redisual (bool): Disable residual block/output.
no_redisual (bool
, optional
): Disable residual block/output.
Note:
Note:
This implementation corresponds to the "non-causal" setting in the paper.
This implementation corresponds to the "non-causal" setting in the paper.
...
@@ -169,14 +169,14 @@ class ConvTasNet(torch.nn.Module):
...
@@ -169,14 +169,14 @@ class ConvTasNet(torch.nn.Module):
[:footcite:`Luo_2019`].
[:footcite:`Luo_2019`].
Args:
Args:
num_sources (int): The number of sources to split.
num_sources (int
, optional
): The number of sources to split.
enc_kernel_size (int): The convolution kernel size of the encoder/decoder, <L>.
enc_kernel_size (int
, optional
): The convolution kernel size of the encoder/decoder, <L>.
enc_num_feats (int): The feature dimensions passed to mask generator, <N>.
enc_num_feats (int
, optional
): The feature dimensions passed to mask generator, <N>.
msk_kernel_size (int): The convolution kernel size of the mask generator, <P>.
msk_kernel_size (int
, optional
): The convolution kernel size of the mask generator, <P>.
msk_num_feats (int): The input/output feature dimension of conv block in the mask generator, <B, Sc>.
msk_num_feats (int
, optional
): The input/output feature dimension of conv block in the mask generator, <B, Sc>.
msk_num_hidden_feats (int): The internal feature dimension of conv block of the mask generator, <H>.
msk_num_hidden_feats (int
, optional
): The internal feature dimension of conv block of the mask generator, <H>.
msk_num_layers (int): The number of layers in one conv block of the mask generator, <X>.
msk_num_layers (int
, optional
): The number of layers in one conv block of the mask generator, <X>.
msk_num_stacks (int): The numbr of conv blocks of the mask generator, <R>.
msk_num_stacks (int
, optional
): The numbr of conv blocks of the mask generator, <R>.
Note:
Note:
This implementation corresponds to the "non-causal" setting in the paper.
This implementation corresponds to the "non-causal" setting in the paper.
...
...
torchaudio/models/wav2vec2/components.py
View file @
768432c3
...
@@ -49,7 +49,7 @@ class ConvLayerBlock(Module):
...
@@ -49,7 +49,7 @@ class ConvLayerBlock(Module):
"""
"""
Args:
Args:
x (Tensor): Shape: ``[batch, in_channels, in_frame]``.
x (Tensor): Shape: ``[batch, in_channels, in_frame]``.
length (Tensor, optional): Shape ``[batch, ]``.
length (Tensor
or None
, optional): Shape ``[batch, ]``.
Returns:
Returns:
Tensor: Shape ``[batch, out_channels, out_frames]``.
Tensor: Shape ``[batch, out_channels, out_frames]``.
Optional[Tensor]: Shape ``[batch, ]``.
Optional[Tensor]: Shape ``[batch, ]``.
...
@@ -90,7 +90,7 @@ class FeatureExtractor(Module):
...
@@ -90,7 +90,7 @@ class FeatureExtractor(Module):
x (Tensor):
x (Tensor):
Input Tensor representing a batch of audio,
Input Tensor representing a batch of audio,
shape: ``[batch, time]``.
shape: ``[batch, time]``.
length (Tensor, optional):
length (Tensor
or None
, optional):
Valid length of each input sample. shape: ``[batch, ]``.
Valid length of each input sample. shape: ``[batch, ]``.
Returns:
Returns:
...
@@ -243,7 +243,7 @@ class SelfAttention(Module):
...
@@ -243,7 +243,7 @@ class SelfAttention(Module):
"""
"""
Args:
Args:
x (Tensor): shape: ``[batch_size, sequence_length, embed_dim]``.
x (Tensor): shape: ``[batch_size, sequence_length, embed_dim]``.
attention_mask (Tensor, optional):
attention_mask (Tensor
or None
, optional):
shape: ``[batch_size, 1, sequence_length, sequence_length]``
shape: ``[batch_size, 1, sequence_length, sequence_length]``
Returns:
Returns:
...
@@ -340,7 +340,7 @@ class EncoderLayer(Module):
...
@@ -340,7 +340,7 @@ class EncoderLayer(Module):
"""
"""
Args:
Args:
x (Tensor): shape: ``(batch, sequence_length, embed_dim)``
x (Tensor): shape: ``(batch, sequence_length, embed_dim)``
attention_mask (Tensor, optional):
attention_mask (Tensor
or None
, optional):
shape: ``(batch, 1, sequence_length, sequence_length)``
shape: ``(batch, 1, sequence_length, sequence_length)``
"""
"""
residual
=
x
residual
=
x
...
...
torchaudio/models/wav2vec2/model.py
View file @
768432c3
...
@@ -38,7 +38,7 @@ class Wav2Vec2Model(Module):
...
@@ -38,7 +38,7 @@ class Wav2Vec2Model(Module):
Args:
Args:
waveforms (Tensor): Audio tensor of shape ``(batch, frames)``.
waveforms (Tensor): Audio tensor of shape ``(batch, frames)``.
lengths (Tensor, optional):
lengths (Tensor
or None
, optional):
Indicates the valid length of each audio sample in the batch.
Indicates the valid length of each audio sample in the batch.
Shape: ``(batch, )``.
Shape: ``(batch, )``.
...
@@ -62,7 +62,7 @@ class Wav2Vec2Model(Module):
...
@@ -62,7 +62,7 @@ class Wav2Vec2Model(Module):
Args:
Args:
waveforms (Tensor): Audio tensor of shape ``(batch, frames)``.
waveforms (Tensor): Audio tensor of shape ``(batch, frames)``.
lengths (Tensor, optional):
lengths (Tensor
or None
, optional):
Indicates the valid length of each audio sample in the batch.
Indicates the valid length of each audio sample in the batch.
Shape: ``(batch, )``.
Shape: ``(batch, )``.
...
...
torchaudio/models/wav2vec2/utils/import_fairseq.py
View file @
768432c3
...
@@ -133,7 +133,7 @@ def import_fairseq_model(
...
@@ -133,7 +133,7 @@ def import_fairseq_model(
An instance of fairseq's Wav2Vec2.0 model class.
An instance of fairseq's Wav2Vec2.0 model class.
Either ``fairseq.models.wav2vec.wav2vec2_asr.Wav2VecEncoder`` or
Either ``fairseq.models.wav2vec.wav2vec2_asr.Wav2VecEncoder`` or
``fairseq.models.wav2vec.wav2vec2.Wav2Vec2Model``.
``fairseq.models.wav2vec.wav2vec2.Wav2Vec2Model``.
num_out (int, optional):
num_out (int
or None
, optional):
The number of output labels. Required only when the original model is
The number of output labels. Required only when the original model is
an instance of ``fairseq.models.wav2vec.wav2vec2.Wav2Vec2Model``.
an instance of ``fairseq.models.wav2vec.wav2vec2.Wav2Vec2Model``.
...
...
torchaudio/sox_effects/sox_effects.py
View file @
768432c3
...
@@ -72,7 +72,7 @@ def apply_effects_tensor(
...
@@ -72,7 +72,7 @@ def apply_effects_tensor(
tensor (torch.Tensor): Input 2D CPU Tensor.
tensor (torch.Tensor): Input 2D CPU Tensor.
sample_rate (int): Sample rate
sample_rate (int): Sample rate
effects (List[List[str]]): List of effects.
effects (List[List[str]]): List of effects.
channels_first (bool): Indicates if the input Tensor's dimension is
channels_first (bool
, optional
): Indicates if the input Tensor's dimension is
``[channels, time]`` or ``[time, channels]``
``[channels, time]`` or ``[time, channels]``
Returns:
Returns:
...
@@ -185,15 +185,15 @@ def apply_effects_file(
...
@@ -185,15 +185,15 @@ def apply_effects_file(
Note: This argument is intentionally annotated as ``str`` only for
Note: This argument is intentionally annotated as ``str`` only for
TorchScript compiler compatibility.
TorchScript compiler compatibility.
effects (List[List[str]]): List of effects.
effects (List[List[str]]): List of effects.
normalize (bool):
normalize (bool
, optional
):
When ``True``, this function always return ``float32``, and sample values are
When ``True``, this function always return ``float32``, and sample values are
normalized to ``[-1.0, 1.0]``.
normalized to ``[-1.0, 1.0]``.
If input file is integer WAV, giving ``False`` will change the resulting Tensor type to
If input file is integer WAV, giving ``False`` will change the resulting Tensor type to
integer type. This argument has no effect for formats other
integer type. This argument has no effect for formats other
than integer WAV type.
than integer WAV type.
channels_first (bool): When True, the returned Tensor has dimension ``[channel, time]``.
channels_first (bool
, optional
): When True, the returned Tensor has dimension ``[channel, time]``.
Otherwise, the returned Tensor's dimension is ``[time, channel]``.
Otherwise, the returned Tensor's dimension is ``[time, channel]``.
format (str, optional):
format (str
or None
, optional):
Override the format detection with the given format.
Override the format detection with the given format.
Providing the argument might help when libsox can not infer the format
Providing the argument might help when libsox can not infer the format
from header or extension,
from header or extension,
...
...
Prev
1
2
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment