Unverified Commit 211270db authored by moto's avatar moto Committed by GitHub
Browse files

Update desciptions of `lengths` parameters (#1890)

parent 89aeb686
...@@ -1080,7 +1080,7 @@ class Tacotron2(nn.Module): ...@@ -1080,7 +1080,7 @@ class Tacotron2(nn.Module):
If ``None``, it is assumed that the all the tokens are valid. Default: ``None`` If ``None``, it is assumed that the all the tokens are valid. Default: ``None``
Returns: Returns:
Tensor, Tensor, and Tensor: (Tensor, Tensor, Tensor):
Tensor Tensor
The predicted mel spectrogram with shape `(n_batch, n_mels, max of mel_specgram_lengths)`. The predicted mel spectrogram with shape `(n_batch, n_mels, max of mel_specgram_lengths)`.
Tensor Tensor
......
...@@ -50,8 +50,14 @@ class Wav2Vec2Model(Module): ...@@ -50,8 +50,14 @@ class Wav2Vec2Model(Module):
Args: Args:
waveforms (Tensor): Audio tensor of shape `(batch, frames)`. waveforms (Tensor): Audio tensor of shape `(batch, frames)`.
lengths (Tensor or None, optional): lengths (Tensor or None, optional):
Indicates the valid length of each audio sample in the batch. Indicates the valid length of each audio in the batch.
Shape: `(batch, )`. Shape: `(batch, )`.
When the ``waveforms`` contains audios with different durations,
by providing ``lengths`` argument, the model will compute
the corresponding valid output lengths and apply proper mask in
transformer attention layer.
If ``None``, it is assumed that the entire audio waveform
length is valid.
num_layers (int or None, optional): num_layers (int or None, optional):
If given, limit the number of intermediate layers to go through. If given, limit the number of intermediate layers to go through.
Providing `1` will stop the computation after going through one Providing `1` will stop the computation after going through one
...@@ -59,13 +65,14 @@ class Wav2Vec2Model(Module): ...@@ -59,13 +65,14 @@ class Wav2Vec2Model(Module):
intermediate layers are returned. intermediate layers are returned.
Returns: Returns:
List of Tensors and an optional Tensor: (List[Tensor], Optional[Tensor]):
List of Tensors List of Tensors
Features from requested layers. Features from requested layers.
Each Tensor is of shape: `(batch, frames, feature dimention)` Each Tensor is of shape: `(batch, time frame, feature dimension)`
Tensor or None Tensor or None
If ``lengths`` argument was provided, a Tensor of shape `(batch, )` If ``lengths`` argument was provided, a Tensor of shape `(batch, )`
is retuned. It indicates the valid length of each feature in the batch. is returned.
It indicates the valid length in time axis of each feature Tensor.
""" """
x, lengths = self.feature_extractor(waveforms, lengths) x, lengths = self.feature_extractor(waveforms, lengths)
x = self.encoder.extract_features(x, lengths, num_layers) x = self.encoder.extract_features(x, lengths, num_layers)
...@@ -81,17 +88,24 @@ class Wav2Vec2Model(Module): ...@@ -81,17 +88,24 @@ class Wav2Vec2Model(Module):
Args: Args:
waveforms (Tensor): Audio tensor of shape `(batch, frames)`. waveforms (Tensor): Audio tensor of shape `(batch, frames)`.
lengths (Tensor or None, optional): lengths (Tensor or None, optional):
Indicates the valid length of each audio sample in the batch. Indicates the valid length of each audio in the batch.
Shape: `(batch, )`. Shape: `(batch, )`.
When the ``waveforms`` contains audios with different duration,
by providing ``lengths`` argument, the model will compute
the corresponding valid output lengths and apply proper mask in
transformer attention layer.
If ``None``, it is assumed that all the audio in ``waveforms``
have valid length. Default: ``None``.
Returns: Returns:
Tensor and an optional Tensor: (Tensor, Optional[Tensor]):
Tensor Tensor
The sequences of probability distribution (in logit) over labels. The sequences of probability distribution (in logit) over labels.
Shape: `(batch, frames, num labels)`. Shape: `(batch, frames, num labels)`.
Tensor or None Tensor or None
If ``lengths`` argument was provided, a Tensor of shape `(batch, )` If ``lengths`` argument was provided, a Tensor of shape `(batch, )`
is retuned. It indicates the valid length of each feature in the batch. is retuned.
It indicates the valid length in time axis of the output Tensor.
""" """
x, lengths = self.feature_extractor(waveforms, lengths) x, lengths = self.feature_extractor(waveforms, lengths)
x = self.encoder(x, lengths) x = self.encoder(x, lengths)
......
...@@ -341,16 +341,23 @@ class WaveRNN(nn.Module): ...@@ -341,16 +341,23 @@ class WaveRNN(nn.Module):
specgram (Tensor): specgram (Tensor):
Batch of spectrograms. Shape: `(n_batch, n_freq, n_time)`. Batch of spectrograms. Shape: `(n_batch, n_freq, n_time)`.
lengths (Tensor or None, optional): lengths (Tensor or None, optional):
Indicates the valid length in of each spectrogram in time axis. Indicates the valid length of each audio in the batch.
Shape: `(n_batch, )`. Shape: `(batch, )`.
When the ``specgram`` contains spectrograms with different duration,
by providing ``lengths`` argument, the model will compute
the corresponding valid output lengths.
If ``None``, it is assumed that all the audio in ``waveforms``
have valid length. Default: ``None``.
Returns: Returns:
Tensor and optional Tensor: (Tensor, Optional[Tensor]):
Tensor Tensor
The inferred waveform of size `(n_batch, 1, n_time)`. The inferred waveform of size `(n_batch, 1, n_time)`.
1 stands for a single channel. 1 stands for a single channel.
Tensor or None Tensor or None
The valid lengths of each waveform in the batch. Size `(n_batch, )`. If ``lengths`` argument was provided, a Tensor of shape `(batch, )`
is retuned.
It indicates the valid length in time axis of the output Tensor.
""" """
device = specgram.device device = specgram.device
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment