"...composable_kernel_rocm.git" did not exist on "22fe522d0c98ee26577d411ff52a66370a4c545a"
Unverified Commit 1efca4e6 authored by SaulLu's avatar SaulLu Committed by GitHub
Browse files

replace `Speech2TextTokenizer` by `Speech2TextFeatureExtractor` in some docstrings (#16835)

* replace `Speech2TextTokenizer` by `Speech2TextFeatureExtractor` in docstring

* quality
parent b5c6a63e
...@@ -139,8 +139,8 @@ SPEECH_ENCODER_DECODER_INPUTS_DOCSTRING = r""" ...@@ -139,8 +139,8 @@ SPEECH_ENCODER_DECODER_INPUTS_DOCSTRING = r"""
Float values of fbank features extracted from the raw speech waveform. Raw speech waveform can be obtained Float values of fbank features extracted from the raw speech waveform. Raw speech waveform can be obtained
by loading a `.flac` or `.wav` audio file into an array of type `List[float]` or a `numpy.ndarray`, *e.g.* by loading a `.flac` or `.wav` audio file into an array of type `List[float]` or a `numpy.ndarray`, *e.g.*
via the soundfile library (`pip install soundfile`). To prepare the array into `input_features`, the via the soundfile library (`pip install soundfile`). To prepare the array into `input_features`, the
[`Speech2TextTokenizer`] should be used for extracting the fbank features, padding and conversion into a [`Speech2TextFeatureExtractor`] should be used for extracting the fbank features, padding and conversion
tensor of type `torch.FloatTensor`. See [`~Speech2TextTokenizer.__call__`] into a tensor of type `torch.FloatTensor`. See [`~Speech2TextFeatureExtractor.__call__`]
return_dict (`bool`, *optional*): return_dict (`bool`, *optional*):
If set to `True`, the model will return a [`~utils.Seq2SeqLMOutput`] instead of a plain tuple. If set to `True`, the model will return a [`~utils.Seq2SeqLMOutput`] instead of a plain tuple.
kwargs: (*optional*) Remaining dictionary of keyword arguments. Keyword arguments come in two flavors: kwargs: (*optional*) Remaining dictionary of keyword arguments. Keyword arguments come in two flavors:
......
...@@ -600,8 +600,8 @@ SPEECH_TO_TEXT_INPUTS_DOCSTRING = r""" ...@@ -600,8 +600,8 @@ SPEECH_TO_TEXT_INPUTS_DOCSTRING = r"""
Float values of fbank features extracted from the raw speech waveform. Raw speech waveform can be obtained Float values of fbank features extracted from the raw speech waveform. Raw speech waveform can be obtained
by loading a `.flac` or `.wav` audio file into an array of type `List[float]` or a `numpy.ndarray`, *e.g.* by loading a `.flac` or `.wav` audio file into an array of type `List[float]` or a `numpy.ndarray`, *e.g.*
via the soundfile library (`pip install soundfile`). To prepare the array into `input_features`, the via the soundfile library (`pip install soundfile`). To prepare the array into `input_features`, the
[`Speech2TextTokenizer`] should be used for extracting the fbank features, padding and conversion into a [`Speech2TextFeatureExtractor`] should be used for extracting the fbank features, padding and conversion
tensor of type `torch.FloatTensor`. See [`~Speech2TextTokenizer.__call__`] into a tensor of type `torch.FloatTensor`. See [`~Speech2TextFeatureExtractor.__call__`]
attention_mask (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*): attention_mask (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
Mask to avoid performing convolution and attention on padding token indices. Mask values selected in `[0, Mask to avoid performing convolution and attention on padding token indices. Mask values selected in `[0,
1]`: 1]`:
...@@ -733,9 +733,9 @@ class Speech2TextEncoder(Speech2TextPreTrainedModel): ...@@ -733,9 +733,9 @@ class Speech2TextEncoder(Speech2TextPreTrainedModel):
Float values of fbank features extracted from the raw speech waveform. Raw speech waveform can be Float values of fbank features extracted from the raw speech waveform. Raw speech waveform can be
obtained by loading a `.flac` or `.wav` audio file into an array of type `List[float]` or a obtained by loading a `.flac` or `.wav` audio file into an array of type `List[float]` or a
`numpy.ndarray`, *e.g.* via the soundfile library (`pip install soundfile`). To prepare the array into `numpy.ndarray`, *e.g.* via the soundfile library (`pip install soundfile`). To prepare the array into
`input_features`, the [`Speech2TextTokenizer`] should be used for extracting the fbank features, `input_features`, the [`Speech2TextFeatureExtractor`] should be used for extracting the fbank features,
padding and conversion into a tensor of type `torch.FloatTensor`. See padding and conversion into a tensor of type `torch.FloatTensor`. See
[`~Speech2TextTokenizer.__call__`] [`~Speech2TextFeatureExtractor.__call__`]
attention_mask (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*): attention_mask (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
Mask to avoid performing convolution and attention on padding token indices. Mask values selected in Mask to avoid performing convolution and attention on padding token indices. Mask values selected in
`[0, 1]`: `[0, 1]`:
......
...@@ -650,8 +650,8 @@ SPEECH_TO_TEXT_INPUTS_DOCSTRING = r""" ...@@ -650,8 +650,8 @@ SPEECH_TO_TEXT_INPUTS_DOCSTRING = r"""
Float values of fbank features extracted from the raw speech waveform. Raw speech waveform can be obtained Float values of fbank features extracted from the raw speech waveform. Raw speech waveform can be obtained
by loading a `.flac` or `.wav` audio file into an array of type `List[float]` or a `numpy.ndarray`, *e.g.* by loading a `.flac` or `.wav` audio file into an array of type `List[float]` or a `numpy.ndarray`, *e.g.*
via the soundfile library (`pip install soundfile`). To prepare the array into `input_features`, the via the soundfile library (`pip install soundfile`). To prepare the array into `input_features`, the
[`Speech2TextTokenizer`] should be used for extracting the fbank features, padding and conversion into a [`Speech2TextFeatureExtractor`] should be used for extracting the fbank features, padding and conversion
tensor of floats. See [`~Speech2TextTokenizer.__call__`] into a tensor of floats. See [`~Speech2TextFeatureExtractor.__call__`]
attention_mask (`tf.Tensor` of shape `({0})`, *optional*): attention_mask (`tf.Tensor` of shape `({0})`, *optional*):
Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
...@@ -798,8 +798,8 @@ class TFSpeech2TextEncoder(tf.keras.layers.Layer): ...@@ -798,8 +798,8 @@ class TFSpeech2TextEncoder(tf.keras.layers.Layer):
Float values of fbank features extracted from the raw speech waveform. Raw speech waveform can be Float values of fbank features extracted from the raw speech waveform. Raw speech waveform can be
obtained by loading a `.flac` or `.wav` audio file into an array of type `List[float]` or a obtained by loading a `.flac` or `.wav` audio file into an array of type `List[float]` or a
`numpy.ndarray`, *e.g.* via the soundfile library (`pip install soundfile`). To prepare the array into `numpy.ndarray`, *e.g.* via the soundfile library (`pip install soundfile`). To prepare the array into
`input_features`, the [`Speech2TextTokenizer`] should be used for extracting the fbank features, `input_features`, the [`Speech2TextFeatureExtractor`] should be used for extracting the fbank features,
padding and conversion into a tensor of floats. See [`~Speech2TextTokenizer.__call__`] padding and conversion into a tensor of floats. See [`~Speech2TextFeatureExtractor.__call__`]
attention_mask (`tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*): attention_mask (`tf.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment