Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
hehl2
Torchaudio
Commits
768432c3
"src/vscode:/vscode.git/clone" did not exist on "5d5436ba18ebea9b6e47a8fa8917e925f910742a"
Unverified
Commit
768432c3
authored
Sep 02, 2021
by
Caroline Chen
Committed by
GitHub
Sep 02, 2021
Browse files
Standardize optional types in docstrings (#1746)
parent
d9bfb708
Changes
21
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
108 additions
and
103 deletions
+108
-103
examples/pipeline_tacotron2/datasets.py
examples/pipeline_tacotron2/datasets.py
+1
-1
examples/pipeline_tacotron2/text/text_preprocessing.py
examples/pipeline_tacotron2/text/text_preprocessing.py
+5
-5
examples/pipeline_wavernn/wavernn_inference_wrapper.py
examples/pipeline_wavernn/wavernn_inference_wrapper.py
+5
-5
examples/source_separation/utils/dataset/wsj0mix.py
examples/source_separation/utils/dataset/wsj0mix.py
+1
-1
examples/source_separation/utils/metrics.py
examples/source_separation/utils/metrics.py
+8
-8
torchaudio/backend/soundfile_backend.py
torchaudio/backend/soundfile_backend.py
+11
-11
torchaudio/backend/sox_io_backend.py
torchaudio/backend/sox_io_backend.py
+10
-10
torchaudio/backend/utils.py
torchaudio/backend/utils.py
+1
-1
torchaudio/datasets/gtzan.py
torchaudio/datasets/gtzan.py
+1
-1
torchaudio/datasets/speechcommands.py
torchaudio/datasets/speechcommands.py
+1
-1
torchaudio/datasets/tedlium.py
torchaudio/datasets/tedlium.py
+5
-3
torchaudio/datasets/utils.py
torchaudio/datasets/utils.py
+6
-4
torchaudio/datasets/vctk.py
torchaudio/datasets/vctk.py
+1
-1
torchaudio/functional/filtering.py
torchaudio/functional/filtering.py
+19
-18
torchaudio/functional/functional.py
torchaudio/functional/functional.py
+12
-12
torchaudio/models/conv_tasnet.py
torchaudio/models/conv_tasnet.py
+10
-10
torchaudio/models/wav2vec2/components.py
torchaudio/models/wav2vec2/components.py
+4
-4
torchaudio/models/wav2vec2/model.py
torchaudio/models/wav2vec2/model.py
+2
-2
torchaudio/models/wav2vec2/utils/import_fairseq.py
torchaudio/models/wav2vec2/utils/import_fairseq.py
+1
-1
torchaudio/sox_effects/sox_effects.py
torchaudio/sox_effects/sox_effects.py
+4
-4
No files found.
examples/pipeline_tacotron2/datasets.py
View file @
768432c3
...
...
@@ -131,7 +131,7 @@ def text_mel_collate_fn(batch: Tuple[Tensor, Tensor],
Args:
batch (tuple of two tensors): the first tensor is the mel spectrogram with shape
(n_batch, n_mels, n_frames), the second tensor is the text with shape (n_batch, ).
n_frames_per_step (int): The number of frames to advance every step.
n_frames_per_step (int
, optional
): The number of frames to advance every step.
Returns:
text_padded (Tensor): The input text to Tacotron2 with shape (n_batch, max of ``text_lengths``).
...
...
examples/pipeline_tacotron2/text/text_preprocessing.py
View file @
768432c3
...
...
@@ -123,12 +123,12 @@ def text_to_sequence(sent: str,
symbol_list (str or List of string, optional): When the input is a string, available options include
"english_characters" and "english_phonemes". When the input is a list of string, ``symbol_list`` will
directly be used as the symbol to encode. (Default: "english_characters")
phonemizer (str, optional): The phonemizer to use. Only used when ``symbol_list`` is "english_phonemes".
phonemizer (str
or None
, optional): The phonemizer to use. Only used when ``symbol_list`` is "english_phonemes".
Available options include "DeepPhonemizer". (Default: "DeepPhonemizer")
checkpoint (str, optional): The path to the checkpoint of the phonemizer. Only used when
``symbol_list`` is
"english_phonemes". (Default: "./en_us_cmudict_forward.pt")
cmudict_root (str, optional): The path to the directory where the CMUDict dataset is found or
downloaded.
Only used when ``symbol_list`` is "english_phonemes". (Default: "./")
checkpoint (str
or None
, optional): The path to the checkpoint of the phonemizer. Only used when
``symbol_list`` is
"english_phonemes". (Default: "./en_us_cmudict_forward.pt")
cmudict_root (str
or None
, optional): The path to the directory where the CMUDict dataset is found or
downloaded.
Only used when ``symbol_list`` is "english_phonemes". (Default: "./")
Returns:
List of integers corresponding to the symbols in the sentence.
...
...
examples/pipeline_wavernn/wavernn_inference_wrapper.py
View file @
768432c3
...
...
@@ -171,13 +171,13 @@ class WaveRNNInferenceWrapper(torch.nn.Module):
Args:
specgram (Tensor): spectrogram of size (n_mels, n_time)
mulaw (bool): Whether to perform mulaw decoding (Default: ``True``).
batched (bool): Whether to perform batch prediction. Using batch prediction
mulaw (bool
, optional
): Whether to perform mulaw decoding (Default: ``True``).
batched (bool
, optional
): Whether to perform batch prediction. Using batch prediction
will significantly increase the inference speed (Default: ``True``).
timesteps (int): The time steps for each batch. Only used when `batched`
timesteps (int
, optional
): The time steps for each batch. Only used when `batched`
is set to True (Default: ``100``).
overlap (int): The overlapping time steps between batches. Only used when
`batched`
is set to True (Default: ``5``).
overlap (int
, optional
): The overlapping time steps between batches. Only used when
`batched`
is set to True (Default: ``5``).
Returns:
waveform (Tensor): Reconstructed waveform of size (1, n_time, ).
...
...
examples/source_separation/utils/dataset/wsj0mix.py
View file @
768432c3
...
...
@@ -19,7 +19,7 @@ class WSJ0Mix(Dataset):
N source audios.
sample_rate (int): Expected sample rate of audio files. If any of the audio has a
different sample rate, raises ``ValueError``.
audio_ext (str): The extension of audio files to find. (default: ".wav")
audio_ext (str
, optional
): The extension of audio files to find. (default: ".wav")
"""
def
__init__
(
self
,
...
...
examples/source_separation/utils/metrics.py
View file @
768432c3
...
...
@@ -21,9 +21,9 @@ def sdr(
Shape: [batch, speakers (can be 1), time frame]
reference (torch.Tensor): Reference signal.
Shape: [batch, speakers, time frame]
mask (
Optional[
torch.Tensor
]
): Binary mask to indicate padded value (0) or valid value (1).
mask (torch.Tensor
or None, optional
): Binary mask to indicate padded value (0) or valid value (1).
Shape: [batch, 1, time frame]
epsilon (float): constant value used to stabilize division.
epsilon (float
, optional
): constant value used to stabilize division.
Returns:
torch.Tensor: scale-invariant source-to-distortion ratio.
...
...
@@ -99,9 +99,9 @@ class PIT(torch.nn.Module):
Shape: [bacth, speakers, time frame]
reference (torch.Tensor): Reference (original) source signals.
Shape: [batch, speakers, time frame]
mask (
Optional[
torch.Tensor
]
): Binary mask to indicate padded value (0) or valid value (1).
mask (torch.Tensor
or None, optional
): Binary mask to indicate padded value (0) or valid value (1).
Shape: [batch, 1, time frame]
epsilon (float): constant value used to stabilize division.
epsilon (float
, optional
): constant value used to stabilize division.
Returns:
torch.Tensor: Maximum criterion over the speaker permutation.
...
...
@@ -140,9 +140,9 @@ def sdr_pit(
Shape: [batch, speakers (can be 1), time frame]
reference (torch.Tensor): Reference signal.
Shape: [batch, speakers, time frame]
mask (
Optional[
torch.Tensor
]
): Binary mask to indicate padded value (0) or valid value (1).
mask (torch.Tensor
or None, optional
): Binary mask to indicate padded value (0) or valid value (1).
Shape: [batch, 1, time frame]
epsilon (float): constant value used to stabilize division.
epsilon (float
, optional
): constant value used to stabilize division.
Returns:
torch.Tensor: scale-invariant source-to-distortion ratio.
...
...
@@ -187,9 +187,9 @@ def sdri(
Shape: [batch, speakers, time frame]
mix (torch.Tensor): Mixed souce signals, from which the setimated signals were generated.
Shape: [batch, speakers == 1, time frame]
mask (
Optional[
torch.Tensor
]
): Binary mask to indicate padded value (0) or valid value (1).
mask (torch.Tensor
or None, optional
): Binary mask to indicate padded value (0) or valid value (1).
Shape: [batch, 1, time frame]
epsilon (float): constant value used to stabilize division.
epsilon (float
, optional
): constant value used to stabilize division.
Returns:
torch.Tensor: Improved SDR. Shape: [batch, ]
...
...
torchaudio/backend/soundfile_backend.py
View file @
768432c3
...
...
@@ -92,7 +92,7 @@ def info(filepath: str, format: Optional[str] = None) -> AudioMetaData:
Args:
filepath (path-like object or file-like object):
Source of audio data.
format (str, optional):
format (str
or None
, optional):
Not used. PySoundFile does not accept format hint.
Returns:
...
...
@@ -168,23 +168,23 @@ def load(
Args:
filepath (path-like object or file-like object):
Source of audio data.
frame_offset (int):
frame_offset (int
, optional
):
Number of frames to skip before start reading data.
num_frames (int):
num_frames (int
, optional
):
Maximum number of frames to read. ``-1`` reads all the remaining samples,
starting from ``frame_offset``.
This function may return the less number of frames if there is not enough
frames in the given file.
normalize (bool):
normalize (bool
, optional
):
When ``True``, this function always return ``float32``, and sample values are
normalized to ``[-1.0, 1.0]``.
If input file is integer WAV, giving ``False`` will change the resulting Tensor type to
integer type.
This argument has no effect for formats other than integer WAV type.
channels_first (bool):
channels_first (bool
, optional
):
When True, the returned Tensor has dimension ``[channel, time]``.
Otherwise, the returned Tensor's dimension is ``[time, channel]``.
format (str, optional):
format (str
or None
, optional):
Not used. PySoundFile does not accept format hint.
Returns:
...
...
@@ -335,11 +335,11 @@ def save(
filepath (str or pathlib.Path): Path to audio file.
src (torch.Tensor): Audio data to save. must be 2D tensor.
sample_rate (int): sampling rate
channels_first (bool): If ``True``, the given tensor is interpreted as ``[channel, time]``,
channels_first (bool
, optional
): If ``True``, the given tensor is interpreted as ``[channel, time]``,
otherwise ``[time, channel]``.
compression (
Optional[float]
): Not used.
compression (
float of None, optional
): Not used.
It is here only for interface compatibility reson with "sox_io" backend.
format (str, optional): Override the audio format.
format (str
or None
, optional): Override the audio format.
When ``filepath`` argument is path-like object, audio format is
inferred from file extension. If the file extension is missing or
different, you can specify the correct format with this argument.
...
...
@@ -349,7 +349,7 @@ def save(
Valid values are ``"wav"``, ``"ogg"``, ``"vorbis"``,
``"flac"`` and ``"sph"``.
encoding (str, optional): Changes the encoding for supported formats.
encoding (str
or None
, optional): Changes the encoding for supported formats.
This argument is effective only for supported formats, sush as
``"wav"``, ``""flac"`` and ``"sph"``. Valid values are;
...
...
@@ -359,7 +359,7 @@ def save(
- ``"ULAW"`` (mu-law)
- ``"ALAW"`` (a-law)
bits_per_sample (int, optional): Changes the bit depth for the
bits_per_sample (int
or None
, optional): Changes the bit depth for the
supported formats.
When ``format`` is one of ``"wav"``, ``"flac"`` or ``"sph"``,
you can change the bit depth.
...
...
torchaudio/backend/sox_io_backend.py
View file @
768432c3
...
...
@@ -37,7 +37,7 @@ def info(
* This argument is intentionally annotated as ``str`` only due to
TorchScript compiler compatibility.
format (str, optional):
format (str
or None
, optional):
Override the format detection with the given format.
Providing the argument might help when libsox can not infer the format
from header or extension,
...
...
@@ -119,21 +119,21 @@ def load(
TorchScript compiler compatibility.
frame_offset (int):
Number of frames to skip before start reading data.
num_frames (int):
num_frames (int
, optional
):
Maximum number of frames to read. ``-1`` reads all the remaining samples,
starting from ``frame_offset``.
This function may return the less number of frames if there is not enough
frames in the given file.
normalize (bool):
normalize (bool
, optional
):
When ``True``, this function always return ``float32``, and sample values are
normalized to ``[-1.0, 1.0]``.
If input file is integer WAV, giving ``False`` will change the resulting Tensor type to
integer type.
This argument has no effect for formats other than integer WAV type.
channels_first (bool):
channels_first (bool
, optional
):
When True, the returned Tensor has dimension ``[channel, time]``.
Otherwise, the returned Tensor's dimension is ``[time, channel]``.
format (str, optional):
format (str
or None
, optional):
Override the format detection with the given format.
Providing the argument might help when libsox can not infer the format
from header or extension,
...
...
@@ -172,9 +172,9 @@ def save(
as ``str`` for TorchScript compiler compatibility.
src (torch.Tensor): Audio data to save. must be 2D tensor.
sample_rate (int): sampling rate
channels_first (bool): If ``True``, the given tensor is interpreted as ``[channel, time]``,
channels_first (bool
, optional
): If ``True``, the given tensor is interpreted as ``[channel, time]``,
otherwise ``[time, channel]``.
compression (
Optional[float]
): Used for formats other than WAV.
compression (
float or None, optional
): Used for formats other than WAV.
This corresponds to ``-C`` option of ``sox`` command.
``"mp3"``
...
...
@@ -189,7 +189,7 @@ def save(
and lowest quality. Default: ``3``.
See the detail at http://sox.sourceforge.net/soxformat.html.
format (str, optional): Override the audio format.
format (str
or None
, optional): Override the audio format.
When ``filepath`` argument is path-like object, audio format is infered from
file extension. If file extension is missing or different, you can specify the
correct format with this argument.
...
...
@@ -199,7 +199,7 @@ def save(
Valid values are ``"wav"``, ``"mp3"``, ``"ogg"``, ``"vorbis"``, ``"amr-nb"``,
``"amb"``, ``"flac"``, ``"sph"``, ``"gsm"``, and ``"htk"``.
encoding (str, optional): Changes the encoding for the supported formats.
encoding (str
or None
, optional): Changes the encoding for the supported formats.
This argument is effective only for supported formats, such as ``"wav"``, ``""amb"``
and ``"sph"``. Valid values are;
...
...
@@ -225,7 +225,7 @@ def save(
``"sph"`` format;
- the default value is ``"PCM_S"``
bits_per_sample (int, optional): Changes the bit depth for the supported formats.
bits_per_sample (int
or None
, optional): Changes the bit depth for the supported formats.
When ``format`` is one of ``"wav"``, ``"flac"``, ``"sph"``, or ``"amb"``, you can change the
bit depth. Valid values are ``8``, ``16``, ``32`` and ``64``.
...
...
torchaudio/backend/utils.py
View file @
768432c3
...
...
@@ -35,7 +35,7 @@ def set_audio_backend(backend: Optional[str]):
"""Set the backend for I/O operation
Args:
backend (
Optional[str]
): Name of the backend.
backend (
str or None
): Name of the backend.
One of ``"sox_io"`` or ``"soundfile"`` based on availability
of the system. If ``None`` is provided the current backend is unassigned.
"""
...
...
torchaudio/datasets/gtzan.py
View file @
768432c3
...
...
@@ -1011,7 +1011,7 @@ class GTZAN(Dataset):
folder_in_archive (str, optional): The top-level directory of the dataset.
download (bool, optional):
Whether to download the dataset if it is not found at root path. (default: ``False``).
subset (str, optional): Which subset of the dataset to use.
subset (str
or None
, optional): Which subset of the dataset to use.
One of ``"training"``, ``"validation"``, ``"testing"`` or ``None``.
If ``None``, the entire dataset is used. (default: ``None``).
"""
...
...
torchaudio/datasets/speechcommands.py
View file @
768432c3
...
...
@@ -65,7 +65,7 @@ class SPEECHCOMMANDS(Dataset):
The top-level directory of the dataset. (default: ``"SpeechCommands"``)
download (bool, optional):
Whether to download the dataset if it is not found at root path. (default: ``False``).
subset (
O
ptional
[str]
):
subset (
str or None, o
ptional):
Select a subset of the dataset [None, "training", "validation", "testing"]. None means
the whole dataset. "validation" and "testing" are defined in "validation_list.txt" and
"testing_list.txt", respectively, and "training" is the rest. Details for the files
...
...
torchaudio/datasets/tedlium.py
View file @
768432c3
...
...
@@ -55,6 +55,7 @@ class TEDLIUM(Dataset):
and ``"test"`` for releases 1&2, ``None`` for release3. Defaults to ``"train"`` or ``None``.
download (bool, optional):
Whether to download the dataset if it is not found at root path. (default: ``False``).
audio_ext (str, optional): extension for audio file (default: ``"audio_ext"``)
"""
def
__init__
(
self
,
...
...
@@ -62,7 +63,7 @@ class TEDLIUM(Dataset):
release
:
str
=
"release1"
,
subset
:
str
=
None
,
download
:
bool
=
False
,
audio_ext
=
".sph"
audio_ext
:
str
=
".sph"
)
->
None
:
self
.
_ext_audio
=
audio_ext
if
release
in
_RELEASE_CONFIGS
.
keys
():
...
...
@@ -144,8 +145,9 @@ class TEDLIUM(Dataset):
Args:
path (str): Path to audio file
start_time (int, optional): Time in seconds where the sample sentence stars
end_time (int, optional): Time in seconds where the sample sentence finishes
start_time (int): Time in seconds where the sample sentence stars
end_time (int): Time in seconds where the sample sentence finishes
sample_rate (float, optional): Sampling rate
Returns:
[Tensor, int]: Audio tensor representation and sample rate
...
...
torchaudio/datasets/utils.py
View file @
768432c3
...
...
@@ -22,7 +22,7 @@ def stream_url(url: str,
Args:
url (str): Url.
start_byte (int, optional): Start streaming at that point (Default: ``None``).
start_byte (int
or None
, optional): Start streaming at that point (Default: ``None``).
block_size (int, optional): Size of chunks to stream (Default: ``32 * 1024``).
progress_bar (bool, optional): Display a progress bar (Default: ``True``).
"""
...
...
@@ -68,8 +68,9 @@ def download_url(url: str,
Args:
url (str): Url.
download_folder (str): Folder to download file.
filename (str, optional): Name of downloaded file. If None, it is inferred from the url (Default: ``None``).
hash_value (str, optional): Hash for url (Default: ``None``).
filename (str or None, optional): Name of downloaded file. If None, it is inferred from the url
(Default: ``None``).
hash_value (str or None, optional): Hash for url (Default: ``None``).
hash_type (str, optional): Hash type, among "sha256" and "md5" (Default: ``"sha256"``).
progress_bar (bool, optional): Display a progress bar (Default: ``True``).
resume (bool, optional): Enable resuming download (Default: ``False``).
...
...
@@ -149,7 +150,8 @@ def extract_archive(from_path: str, to_path: Optional[str] = None, overwrite: bo
"""Extract archive.
Args:
from_path (str): the path of the archive.
to_path (str, optional): the root path of the extraced files (directory of from_path) (Default: ``None``)
to_path (str or None, optional): the root path of the extraced files (directory of from_path)
(Default: ``None``)
overwrite (bool, optional): overwrite existing files (Default: ``False``)
Returns:
...
...
torchaudio/datasets/vctk.py
View file @
768432c3
...
...
@@ -150,7 +150,7 @@ class VCTK_092(Dataset):
Args:
root (str): Root directory where the dataset's top level directory is found.
mic_id (str): Microphone ID. Either ``"mic1"`` or ``"mic2"``. (default: ``"mic2"``)
mic_id (str
, optional
): Microphone ID. Either ``"mic1"`` or ``"mic2"``. (default: ``"mic2"``)
download (bool, optional):
Whether to download the dataset if it is not found at root path. (default: ``False``).
url (str, optional): The URL to download the dataset from.
...
...
torchaudio/functional/filtering.py
View file @
768432c3
...
...
@@ -316,7 +316,7 @@ def contrast(waveform: Tensor, enhancement_amount: float = 75.0) -> Tensor:
Args:
waveform (Tensor): audio waveform of dimension of `(..., time)`
enhancement_amount (float): controls the amount of the enhancement
enhancement_amount (float
, optional
): controls the amount of the enhancement
Allowed range of values for enhancement_amount : 0-100
Note that enhancement_amount = 0 still gives a significant contrast enhancement
...
...
@@ -350,7 +350,7 @@ def dcshift(
waveform (Tensor): audio waveform of dimension of `(..., time)`
shift (float): indicates the amount to shift the audio
Allowed range of values for shift : -2.0 to +2.0
limiter_gain (float): It is used only on peaks to prevent clipping
limiter_gain (float
of None, optional
): It is used only on peaks to prevent clipping
It should have a value much less than 1 (e.g. 0.05 or 0.02)
Returns:
...
...
@@ -690,20 +690,21 @@ def flanger(
waveform (Tensor): audio waveform of dimension of `(..., channel, time)` .
Max 4 channels allowed
sample_rate (int): sampling rate of the waveform, e.g. 44100 (Hz)
delay (float): desired delay in milliseconds(ms)
delay (float
, optional
): desired delay in milliseconds(ms)
Allowed range of values are 0 to 30
depth (float): desired delay depth in milliseconds(ms)
depth (float
, optional
): desired delay depth in milliseconds(ms)
Allowed range of values are 0 to 10
regen (float): desired regen(feedback gain) in dB
regen (float
, optional
): desired regen(feedback gain) in dB
Allowed range of values are -95 to 95
width (float): desired width(delay gain) in dB
width (float
, optional
): desired width(delay gain) in dB
Allowed range of values are 0 to 100
speed (float): modulation speed in Hz
speed (float
, optional
): modulation speed in Hz
Allowed range of values are 0.1 to 10
phase (float): percentage phase-shift for multi-channel
phase (float
, optional
): percentage phase-shift for multi-channel
Allowed range of values are 0 to 100
modulation (str): Use either "sinusoidal" or "triangular" modulation. (Default: ``sinusoidal``)
interpolation (str): Use either "linear" or "quadratic" for delay-line interpolation. (Default: ``linear``)
modulation (str, optional): Use either "sinusoidal" or "triangular" modulation. (Default: ``sinusoidal``)
interpolation (str, optional): Use either "linear" or "quadratic" for delay-line interpolation.
(Default: ``linear``)
Returns:
Tensor: Waveform of dimension of `(..., channel, time)`
...
...
@@ -1072,9 +1073,9 @@ def overdrive(waveform: Tensor, gain: float = 20, colour: float = 20) -> Tensor:
Args:
waveform (Tensor): audio waveform of dimension of `(..., time)`
gain (float): desired gain at the boost (or attenuation) in dB
gain (float
, optional
): desired gain at the boost (or attenuation) in dB
Allowed range of values are 0 to 100
colour (float): controls the amount of even harmonic content in the over-driven output
colour (float
, optional
): controls the amount of even harmonic content in the over-driven output
Allowed range of values are 0 to 100
Returns:
...
...
@@ -1132,17 +1133,17 @@ def phaser(
Args:
waveform (Tensor): audio waveform of dimension of `(..., time)`
sample_rate (int): sampling rate of the waveform, e.g. 44100 (Hz)
gain_in (float): desired input gain at the boost (or attenuation) in dB
gain_in (float
, optional
): desired input gain at the boost (or attenuation) in dB
Allowed range of values are 0 to 1
gain_out (float): desired output gain at the boost (or attenuation) in dB
gain_out (float
, optional
): desired output gain at the boost (or attenuation) in dB
Allowed range of values are 0 to 1e9
delay_ms (float): desired delay in milliseconds
delay_ms (float
, optional
): desired delay in milliseconds
Allowed range of values are 0 to 5.0
decay (float): desired decay relative to gain-in
decay (float
, optional
): desired decay relative to gain-in
Allowed range of values are 0 to 0.99
mod_speed (float): modulation speed in Hz
mod_speed (float
, optional
): modulation speed in Hz
Allowed range of values are 0.1 to 2
sinusoidal (bool): If ``True``, uses sinusoidal modulation (preferable for multiple instruments)
sinusoidal (bool
, optional
): If ``True``, uses sinusoidal modulation (preferable for multiple instruments)
If ``False``, uses triangular modulation (gives single instruments a sharper phasing effect)
(Default: ``True``)
...
...
torchaudio/functional/functional.py
View file @
768432c3
...
...
@@ -155,7 +155,7 @@ def inverse_spectrogram(
Args:
spectrogram (Tensor): Complex tensor of audio of dimension (..., freq, time).
length (int
,
o
ptional
): The output length of the waveform.
length (int o
r None
): The output length of the waveform.
pad (int): Two sided padding of signal. It is only effective when ``length`` is provided.
window (Tensor): Window tensor that is applied/multiplied to each frame/window
n_fft (int): Size of FFT
...
...
@@ -503,8 +503,8 @@ def create_fb_matrix(
f_max (float): Maximum frequency (Hz)
n_mels (int): Number of mel filterbanks
sample_rate (int): Sample rate of the audio waveform
norm (
O
ptional
[str]
): If 'slaney', divide the triangular mel weights by the width of the mel band
(area normalization). (Default: ``None``)
norm (
str or None, o
ptional): If 'slaney', divide the triangular mel weights by the width of the mel band
(area normalization). (Default: ``None``)
mel_scale (str, optional): Scale to use: ``htk`` or ``slaney``. (Default: ``htk``)
Returns:
...
...
@@ -549,8 +549,8 @@ def melscale_fbanks(
f_max (float): Maximum frequency (Hz)
n_mels (int): Number of mel filterbanks
sample_rate (int): Sample rate of the audio waveform
norm (
O
ptional
[str]
): If 'slaney', divide the triangular mel weights by the width of the mel band
(area normalization). (Default: ``None``)
norm (
str or None, o
ptional): If 'slaney', divide the triangular mel weights by the width of the mel band
(area normalization). (Default: ``None``)
mel_scale (str, optional): Scale to use: ``htk`` or ``slaney``. (Default: ``htk``)
Returns:
...
...
@@ -724,7 +724,7 @@ def complex_norm(
Args:
complex_tensor (Tensor): Tensor shape of `(..., complex=2)`
power (float): Power of the norm. (Default: `1.0`).
power (float
, optional
): Power of the norm. (Default: `1.0`).
Returns:
Tensor: Power of the normed input tensor. Shape of `(..., )`
...
...
@@ -771,7 +771,7 @@ def magphase(
Args:
complex_tensor (Tensor): Tensor shape of `(..., complex=2)`
power (float): Power of the norm. (Default: `1.0`)
power (float
, optional
): Power of the norm. (Default: `1.0`)
Returns:
(Tensor, Tensor): The magnitude and phase of the complex tensor
...
...
@@ -1343,14 +1343,14 @@ def apply_codec(
waveform (Tensor): Audio data. Must be 2 dimensional. See also ```channels_first```.
sample_rate (int): Sample rate of the audio waveform.
format (str): File format.
channels_first (bool):
channels_first (bool
, optional
):
When True, both the input and output Tensor have dimension ``[channel, time]``.
Otherwise, they have dimension ``[time, channel]``.
compression (float): Used for formats other than WAV.
compression (float
or None, optional
): Used for formats other than WAV.
For more details see :py:func:`torchaudio.backend.sox_io_backend.save`.
encoding (str, optional): Changes the encoding for the supported formats.
encoding (str
or None
, optional): Changes the encoding for the supported formats.
For more details see :py:func:`torchaudio.backend.sox_io_backend.save`.
bits_per_sample (int, optional): Changes the bit depth for the supported formats.
bits_per_sample (int
or None
, optional): Changes the bit depth for the supported formats.
For more details see :py:func:`torchaudio.backend.sox_io_backend.save`.
Returns:
...
...
@@ -1614,7 +1614,7 @@ def resample(
Lower values reduce anti-aliasing, but also reduce some of the highest frequencies. (Default: ``0.99``)
resampling_method (str, optional): The resampling method to use.
Options: [``sinc_interpolation``, ``kaiser_window``] (Default: ``'sinc_interpolation'``)
beta (float or None): The shape parameter used for kaiser window.
beta (float or None
, optional
): The shape parameter used for kaiser window.
Returns:
Tensor: The waveform at the new frequency of dimension (..., time).
...
...
torchaudio/models/conv_tasnet.py
View file @
768432c3
...
...
@@ -16,8 +16,8 @@ class ConvBlock(torch.nn.Module):
hidden_channels (int): The number of channels in the internal layers, <H>.
kernel_size (int): The convolution kernel size of the middle layer, <P>.
padding (int): Padding value of the convolution in the middle layer.
dilation (int): Dilation value of the convolution in the middle layer.
no_redisual (bool): Disable residual block/output.
dilation (int
, optional
): Dilation value of the convolution in the middle layer.
no_redisual (bool
, optional
): Disable residual block/output.
Note:
This implementation corresponds to the "non-causal" setting in the paper.
...
...
@@ -169,14 +169,14 @@ class ConvTasNet(torch.nn.Module):
[:footcite:`Luo_2019`].
Args:
num_sources (int): The number of sources to split.
enc_kernel_size (int): The convolution kernel size of the encoder/decoder, <L>.
enc_num_feats (int): The feature dimensions passed to mask generator, <N>.
msk_kernel_size (int): The convolution kernel size of the mask generator, <P>.
msk_num_feats (int): The input/output feature dimension of conv block in the mask generator, <B, Sc>.
msk_num_hidden_feats (int): The internal feature dimension of conv block of the mask generator, <H>.
msk_num_layers (int): The number of layers in one conv block of the mask generator, <X>.
msk_num_stacks (int): The numbr of conv blocks of the mask generator, <R>.
num_sources (int
, optional
): The number of sources to split.
enc_kernel_size (int
, optional
): The convolution kernel size of the encoder/decoder, <L>.
enc_num_feats (int
, optional
): The feature dimensions passed to mask generator, <N>.
msk_kernel_size (int
, optional
): The convolution kernel size of the mask generator, <P>.
msk_num_feats (int
, optional
): The input/output feature dimension of conv block in the mask generator, <B, Sc>.
msk_num_hidden_feats (int
, optional
): The internal feature dimension of conv block of the mask generator, <H>.
msk_num_layers (int
, optional
): The number of layers in one conv block of the mask generator, <X>.
msk_num_stacks (int
, optional
): The numbr of conv blocks of the mask generator, <R>.
Note:
This implementation corresponds to the "non-causal" setting in the paper.
...
...
torchaudio/models/wav2vec2/components.py
View file @
768432c3
...
...
@@ -49,7 +49,7 @@ class ConvLayerBlock(Module):
"""
Args:
x (Tensor): Shape: ``[batch, in_channels, in_frame]``.
length (Tensor, optional): Shape ``[batch, ]``.
length (Tensor
or None
, optional): Shape ``[batch, ]``.
Returns:
Tensor: Shape ``[batch, out_channels, out_frames]``.
Optional[Tensor]: Shape ``[batch, ]``.
...
...
@@ -90,7 +90,7 @@ class FeatureExtractor(Module):
x (Tensor):
Input Tensor representing a batch of audio,
shape: ``[batch, time]``.
length (Tensor, optional):
length (Tensor
or None
, optional):
Valid length of each input sample. shape: ``[batch, ]``.
Returns:
...
...
@@ -243,7 +243,7 @@ class SelfAttention(Module):
"""
Args:
x (Tensor): shape: ``[batch_size, sequence_length, embed_dim]``.
attention_mask (Tensor, optional):
attention_mask (Tensor
or None
, optional):
shape: ``[batch_size, 1, sequence_length, sequence_length]``
Returns:
...
...
@@ -340,7 +340,7 @@ class EncoderLayer(Module):
"""
Args:
x (Tensor): shape: ``(batch, sequence_length, embed_dim)``
attention_mask (Tensor, optional):
attention_mask (Tensor
or None
, optional):
shape: ``(batch, 1, sequence_length, sequence_length)``
"""
residual
=
x
...
...
torchaudio/models/wav2vec2/model.py
View file @
768432c3
...
...
@@ -38,7 +38,7 @@ class Wav2Vec2Model(Module):
Args:
waveforms (Tensor): Audio tensor of shape ``(batch, frames)``.
lengths (Tensor, optional):
lengths (Tensor
or None
, optional):
Indicates the valid length of each audio sample in the batch.
Shape: ``(batch, )``.
...
...
@@ -62,7 +62,7 @@ class Wav2Vec2Model(Module):
Args:
waveforms (Tensor): Audio tensor of shape ``(batch, frames)``.
lengths (Tensor, optional):
lengths (Tensor
or None
, optional):
Indicates the valid length of each audio sample in the batch.
Shape: ``(batch, )``.
...
...
torchaudio/models/wav2vec2/utils/import_fairseq.py
View file @
768432c3
...
...
@@ -133,7 +133,7 @@ def import_fairseq_model(
An instance of fairseq's Wav2Vec2.0 model class.
Either ``fairseq.models.wav2vec.wav2vec2_asr.Wav2VecEncoder`` or
``fairseq.models.wav2vec.wav2vec2.Wav2Vec2Model``.
num_out (int, optional):
num_out (int
or None
, optional):
The number of output labels. Required only when the original model is
an instance of ``fairseq.models.wav2vec.wav2vec2.Wav2Vec2Model``.
...
...
torchaudio/sox_effects/sox_effects.py
View file @
768432c3
...
...
@@ -72,7 +72,7 @@ def apply_effects_tensor(
tensor (torch.Tensor): Input 2D CPU Tensor.
sample_rate (int): Sample rate
effects (List[List[str]]): List of effects.
channels_first (bool): Indicates if the input Tensor's dimension is
channels_first (bool
, optional
): Indicates if the input Tensor's dimension is
``[channels, time]`` or ``[time, channels]``
Returns:
...
...
@@ -185,15 +185,15 @@ def apply_effects_file(
Note: This argument is intentionally annotated as ``str`` only for
TorchScript compiler compatibility.
effects (List[List[str]]): List of effects.
normalize (bool):
normalize (bool
, optional
):
When ``True``, this function always return ``float32``, and sample values are
normalized to ``[-1.0, 1.0]``.
If input file is integer WAV, giving ``False`` will change the resulting Tensor type to
integer type. This argument has no effect for formats other
than integer WAV type.
channels_first (bool): When True, the returned Tensor has dimension ``[channel, time]``.
channels_first (bool
, optional
): When True, the returned Tensor has dimension ``[channel, time]``.
Otherwise, the returned Tensor's dimension is ``[time, channel]``.
format (str, optional):
format (str
or None
, optional):
Override the format detection with the given format.
Providing the argument might help when libsox can not infer the format
from header or extension,
...
...
Prev
1
2
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment