Unverified Commit f65307e4 authored by Yih-Dar's avatar Yih-Dar Committed by GitHub
Browse files

Fix dtype of input_features in docstring (#18258)


Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
parent bd87480d
......@@ -135,7 +135,7 @@ SPEECH_ENCODER_DECODER_INPUTS_DOCSTRING = r"""
into an array of type *List[float]* or a *numpy.ndarray*, *e.g.* via the soundfile library (*pip install
soundfile*). To prepare the array into *input_values*, the [`Wav2Vec2Processor`] should be used for padding
and conversion into a tensor of type *torch.FloatTensor*. See [`Wav2Vec2Processor.__call__`] for details.
input_features (`torch.LongTensor` of shape `(batch_size, sequence_length, feature_size)`, *optional*):
input_features (`torch.FloatTensor` of shape `(batch_size, sequence_length, feature_size)`, *optional*):
Float values of fbank features extracted from the raw speech waveform. Raw speech waveform can be obtained
by loading a `.flac` or `.wav` audio file into an array of type `List[float]` or a `numpy.ndarray`, *e.g.*
via the soundfile library (`pip install soundfile`). To prepare the array into `input_features`, the
......
......@@ -599,7 +599,7 @@ SPEECH_TO_TEXT_START_DOCSTRING = r"""
SPEECH_TO_TEXT_INPUTS_DOCSTRING = r"""
Args:
input_features (`torch.LongTensor` of shape `(batch_size, sequence_length, feature_size)`):
input_features (`torch.FloatTensor` of shape `(batch_size, sequence_length, feature_size)`):
Float values of fbank features extracted from the raw speech waveform. Raw speech waveform can be obtained
by loading a `.flac` or `.wav` audio file into an array of type `List[float]` or a `numpy.ndarray`, *e.g.*
via the soundfile library (`pip install soundfile`). To prepare the array into `input_features`, the
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment