Fix dtype of input_features in docstring (#18258)

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Fix dtype of input_features in docstring (#18258)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
f65307e4 · Yih-Dar · GitHub · bd87480d · f65307e4 · f65307e4
Unverified Commit f65307e4 authored Jul 26, 2022 by Yih-Dar Committed by GitHub Jul 26, 2022
2 changed files
--- a/src/transformers/models/speech_encoder_decoder/modeling_speech_encoder_decoder.py
+++ b/src/transformers/models/speech_encoder_decoder/modeling_speech_encoder_decoder.py
@@ -135,7 +135,7 @@ SPEECH_ENCODER_DECODER_INPUTS_DOCSTRING = r"""
            into an array of type *List[float]* or a *numpy.ndarray*, *e.g.* via the soundfile library (*pip install
            soundfile*). To prepare the array into *input_values*, the [`Wav2Vec2Processor`] should be used for padding
            and conversion into a tensor of type *torch.FloatTensor*. See [`Wav2Vec2Processor.__call__`] for details.
-        input_features (`torch.LongTensor` of shape `(batch_size, sequence_length, feature_size)`, *optional*):
+        input_features (`torch.FloatTensor` of shape `(batch_size, sequence_length, feature_size)`, *optional*):
            Float values of fbank features extracted from the raw speech waveform. Raw speech waveform can be obtained
            by loading a `.flac` or `.wav` audio file into an array of type `List[float]` or a `numpy.ndarray`, *e.g.*
            via the soundfile library (`pip install soundfile`). To prepare the array into `input_features`, the

--- a/src/transformers/models/speech_to_text/modeling_speech_to_text.py
+++ b/src/transformers/models/speech_to_text/modeling_speech_to_text.py
@@ -599,7 +599,7 @@ SPEECH_TO_TEXT_START_DOCSTRING = r"""

 SPEECH_TO_TEXT_INPUTS_DOCSTRING = r"""
    Args:
-        input_features (`torch.LongTensor` of shape `(batch_size, sequence_length, feature_size)`):
+        input_features (`torch.FloatTensor` of shape `(batch_size, sequence_length, feature_size)`):
            Float values of fbank features extracted from the raw speech waveform. Raw speech waveform can be obtained
            by loading a `.flac` or `.wav` audio file into an array of type `List[float]` or a `numpy.ndarray`, *e.g.*
            via the soundfile library (`pip install soundfile`). To prepare the array into `input_features`, the