Commit 70987b01 authored by Caroline Chen's avatar Caroline Chen
Browse files

[DOC] Standardization and minor fixes (#1892)

parent 481d1ecf
...@@ -283,7 +283,7 @@ class WaveRNN(nn.Module): ...@@ -283,7 +283,7 @@ class WaveRNN(nn.Module):
specgram: the input spectrogram to the WaveRNN layer (n_batch, 1, n_freq, n_time) specgram: the input spectrogram to the WaveRNN layer (n_batch, 1, n_freq, n_time)
Return: Return:
Tensor shape: (n_batch, 1, (n_time - kernel_size + 1) * hop_length, n_classes) Tensor: shape (n_batch, 1, (n_time - kernel_size + 1) * hop_length, n_classes)
""" """
assert waveform.size(1) == 1, 'Require the input channel of waveform is 1' assert waveform.size(1) == 1, 'Require the input channel of waveform is 1'
...@@ -343,7 +343,7 @@ class WaveRNN(nn.Module): ...@@ -343,7 +343,7 @@ class WaveRNN(nn.Module):
lengths (Tensor or None, optional): lengths (Tensor or None, optional):
Indicates the valid length of each audio in the batch. Indicates the valid length of each audio in the batch.
Shape: `(batch, )`. Shape: `(batch, )`.
When the ``specgram`` contains spectrograms with different duration, When the ``specgram`` contains spectrograms with different durations,
by providing ``lengths`` argument, the model will compute by providing ``lengths`` argument, the model will compute
the corresponding valid output lengths. the corresponding valid output lengths.
If ``None``, it is assumed that all the audio in ``waveforms`` If ``None``, it is assumed that all the audio in ``waveforms``
...@@ -356,7 +356,7 @@ class WaveRNN(nn.Module): ...@@ -356,7 +356,7 @@ class WaveRNN(nn.Module):
1 stands for a single channel. 1 stands for a single channel.
Tensor or None Tensor or None
If ``lengths`` argument was provided, a Tensor of shape `(batch, )` If ``lengths`` argument was provided, a Tensor of shape `(batch, )`
is retuned. is returned.
It indicates the valid length in time axis of the output Tensor. It indicates the valid length in time axis of the output Tensor.
""" """
......
...@@ -25,7 +25,7 @@ class _TextProcessor(ABC): ...@@ -25,7 +25,7 @@ class _TextProcessor(ABC):
text (str or list of str): The input texts. text (str or list of str): The input texts.
Returns: Returns:
Tensor and Tensor: (Tensor, Tensor):
Tensor: Tensor:
The encoded texts. Shape: `(batch, max length)` The encoded texts. Shape: `(batch, max length)`
Tensor: Tensor:
...@@ -56,7 +56,7 @@ class _Vocoder(ABC): ...@@ -56,7 +56,7 @@ class _Vocoder(ABC):
The valid length of each sample in the batch. Shape: `(batch, )`. The valid length of each sample in the batch. Shape: `(batch, )`.
Returns: Returns:
Tensor and optional Tensor: (Tensor, Optional[Tensor]):
Tensor: Tensor:
The generated waveform. Shape: `(batch, max length)` The generated waveform. Shape: `(batch, max length)`
Tensor or None: Tensor or None:
......
...@@ -134,7 +134,7 @@ class Wav2Vec2ASRBundle(Wav2Vec2Bundle): ...@@ -134,7 +134,7 @@ class Wav2Vec2ASRBundle(Wav2Vec2Bundle):
unk (str, optional): Token for unknown class. (default: ``'<unk>'``) unk (str, optional): Token for unknown class. (default: ``'<unk>'``)
Returns: Returns:
Tuple of strings: Tuple[str]:
For models fine-tuned on ASR, returns the tuple of strings representing For models fine-tuned on ASR, returns the tuple of strings representing
the output class labels. the output class labels.
......
...@@ -73,10 +73,10 @@ def apply_effects_tensor( ...@@ -73,10 +73,10 @@ def apply_effects_tensor(
sample_rate (int): Sample rate sample_rate (int): Sample rate
effects (List[List[str]]): List of effects. effects (List[List[str]]): List of effects.
channels_first (bool, optional): Indicates if the input Tensor's dimension is channels_first (bool, optional): Indicates if the input Tensor's dimension is
``[channels, time]`` or ``[time, channels]`` `[channels, time]` or `[time, channels]`
Returns: Returns:
Tuple[torch.Tensor, int]: Resulting Tensor and sample rate. (Tensor, int): Resulting Tensor and sample rate.
The resulting Tensor has the same ``dtype`` as the input Tensor, and The resulting Tensor has the same ``dtype`` as the input Tensor, and
the same channels order. The shape of the Tensor can be different based on the the same channels order. The shape of the Tensor can be different based on the
effects applied. Sample rate can also be different based on the effects applied. effects applied. Sample rate can also be different based on the effects applied.
...@@ -191,20 +191,20 @@ def apply_effects_file( ...@@ -191,20 +191,20 @@ def apply_effects_file(
If input file is integer WAV, giving ``False`` will change the resulting Tensor type to If input file is integer WAV, giving ``False`` will change the resulting Tensor type to
integer type. This argument has no effect for formats other integer type. This argument has no effect for formats other
than integer WAV type. than integer WAV type.
channels_first (bool, optional): When True, the returned Tensor has dimension ``[channel, time]``. channels_first (bool, optional): When True, the returned Tensor has dimension `[channel, time]`.
Otherwise, the returned Tensor's dimension is ``[time, channel]``. Otherwise, the returned Tensor's dimension is `[time, channel]`.
format (str or None, optional): format (str or None, optional):
Override the format detection with the given format. Override the format detection with the given format.
Providing the argument might help when libsox can not infer the format Providing the argument might help when libsox can not infer the format
from header or extension, from header or extension,
Returns: Returns:
Tuple[torch.Tensor, int]: Resulting Tensor and sample rate. (Tensor, int): Resulting Tensor and sample rate.
If ``normalize=True``, the resulting Tensor is always ``float32`` type. If ``normalize=True``, the resulting Tensor is always ``float32`` type.
If ``normalize=False`` and the input audio file is of integer WAV file, then the If ``normalize=False`` and the input audio file is of integer WAV file, then the
resulting Tensor has corresponding integer type. (Note 24 bit integer type is not supported) resulting Tensor has corresponding integer type. (Note 24 bit integer type is not supported)
If ``channels_first=True``, the resulting Tensor has dimension ``[channel, time]``, If ``channels_first=True``, the resulting Tensor has dimension `[channel, time]`,
otherwise ``[time, channel]``. otherwise `[time, channel]`.
Example - Basic usage Example - Basic usage
>>> >>>
......
...@@ -787,7 +787,7 @@ class MuLawEncoding(torch.nn.Module): ...@@ -787,7 +787,7 @@ class MuLawEncoding(torch.nn.Module):
x (Tensor): A signal to be encoded. x (Tensor): A signal to be encoded.
Returns: Returns:
x_mu (Tensor): An encoded signal. Tensor: An encoded signal.
""" """
return F.mu_law_encoding(x, self.quantization_channels) return F.mu_law_encoding(x, self.quantization_channels)
...@@ -1629,7 +1629,7 @@ class PSD(torch.nn.Module): ...@@ -1629,7 +1629,7 @@ class PSD(torch.nn.Module):
of dimension `(..., channel, freq, time)` if multi_mask is ``True`` of dimension `(..., channel, freq, time)` if multi_mask is ``True``
Returns: Returns:
torch.Tensor: PSD matrix of the input STFT matrix. Tensor: PSD matrix of the input STFT matrix.
Tensor of dimension `(..., freq, channel, channel)` Tensor of dimension `(..., freq, channel, channel)`
""" """
# outer product: # outer product:
...@@ -1773,7 +1773,7 @@ class MVDR(torch.nn.Module): ...@@ -1773,7 +1773,7 @@ class MVDR(torch.nn.Module):
eps (float, optional): a value added to the denominator in mask normalization. (Default: 1e-8) eps (float, optional): a value added to the denominator in mask normalization. (Default: 1e-8)
Returns: Returns:
torch.Tensor: the mvdr beamforming weight matrix Tensor: the mvdr beamforming weight matrix
""" """
if self.multi_mask: if self.multi_mask:
# Averaging mask along channel dimension # Averaging mask along channel dimension
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment