add a note to whisper docs clarifying support of long-form decoding (#19497)

504cd71a · Akash Mahajan · GitHub · 5dcb10d8 · 504cd71a
Unverified Commit 504cd71a authored Oct 13, 2022 by Akash Mahajan Committed by GitHub Oct 13, 2022
Show whitespace changes
Inline Side-by-side

Showing with 1 addition and 0 deletions

docs/source/en/model_doc/whisper.mdx docs/source/en/model_doc/whisper.mdx +1 -0

No files found.
--- a/docs/source/en/model_doc/whisper.mdx
+++ b/docs/source/en/model_doc/whisper.mdx
@@ -25,6 +25,7 @@ Tips:
 - The model usually performs well without requiring any finetuning.
 - The architecture follows a classic encoder-decoder architecture, which means that it relies on the [`~generation_utils.GenerationMixin.generate`] function for inference.
+- Inference is currently only implemented for short-form i.e. audio is pre-segmented into <=30s segments. Long-form (including timestamps) will be implemented in a future release.
 - One can use [`WhisperProcessor`] to prepare audio for the model, and decode the predicted ID's back into text.
 This model was contributed by [Arthur Zucker](https://huggingface.co/ArthurZ). The Tensorflow version of this model was contributed by [amyeroberts](https://huggingface.co/amyeroberts).