"docs/vscode:/vscode.git/clone" did not exist on "4ebe798ff24dab122e2c49d6242608907bea0bcd"
Unverified Commit f65307e4 authored by Yih-Dar's avatar Yih-Dar Committed by GitHub
Browse files

Fix dtype of input_features in docstring (#18258)


Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
parent bd87480d
...@@ -135,7 +135,7 @@ SPEECH_ENCODER_DECODER_INPUTS_DOCSTRING = r""" ...@@ -135,7 +135,7 @@ SPEECH_ENCODER_DECODER_INPUTS_DOCSTRING = r"""
into an array of type *List[float]* or a *numpy.ndarray*, *e.g.* via the soundfile library (*pip install into an array of type *List[float]* or a *numpy.ndarray*, *e.g.* via the soundfile library (*pip install
soundfile*). To prepare the array into *input_values*, the [`Wav2Vec2Processor`] should be used for padding soundfile*). To prepare the array into *input_values*, the [`Wav2Vec2Processor`] should be used for padding
and conversion into a tensor of type *torch.FloatTensor*. See [`Wav2Vec2Processor.__call__`] for details. and conversion into a tensor of type *torch.FloatTensor*. See [`Wav2Vec2Processor.__call__`] for details.
input_features (`torch.LongTensor` of shape `(batch_size, sequence_length, feature_size)`, *optional*): input_features (`torch.FloatTensor` of shape `(batch_size, sequence_length, feature_size)`, *optional*):
Float values of fbank features extracted from the raw speech waveform. Raw speech waveform can be obtained Float values of fbank features extracted from the raw speech waveform. Raw speech waveform can be obtained
by loading a `.flac` or `.wav` audio file into an array of type `List[float]` or a `numpy.ndarray`, *e.g.* by loading a `.flac` or `.wav` audio file into an array of type `List[float]` or a `numpy.ndarray`, *e.g.*
via the soundfile library (`pip install soundfile`). To prepare the array into `input_features`, the via the soundfile library (`pip install soundfile`). To prepare the array into `input_features`, the
......
...@@ -599,7 +599,7 @@ SPEECH_TO_TEXT_START_DOCSTRING = r""" ...@@ -599,7 +599,7 @@ SPEECH_TO_TEXT_START_DOCSTRING = r"""
SPEECH_TO_TEXT_INPUTS_DOCSTRING = r""" SPEECH_TO_TEXT_INPUTS_DOCSTRING = r"""
Args: Args:
input_features (`torch.LongTensor` of shape `(batch_size, sequence_length, feature_size)`): input_features (`torch.FloatTensor` of shape `(batch_size, sequence_length, feature_size)`):
Float values of fbank features extracted from the raw speech waveform. Raw speech waveform can be obtained Float values of fbank features extracted from the raw speech waveform. Raw speech waveform can be obtained
by loading a `.flac` or `.wav` audio file into an array of type `List[float]` or a `numpy.ndarray`, *e.g.* by loading a `.flac` or `.wav` audio file into an array of type `List[float]` or a `numpy.ndarray`, *e.g.*
via the soundfile library (`pip install soundfile`). To prepare the array into `input_features`, the via the soundfile library (`pip install soundfile`). To prepare the array into `input_features`, the
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment