"git@developer.sourcefind.cn:chenpangpang/transformers.git" did not exist on "1c944ac1e1c1878074a657e7eb09cdd6553d45f7"
Unverified Commit 504cd71a authored by Akash Mahajan's avatar Akash Mahajan Committed by GitHub
Browse files

add a note to whisper docs clarifying support of long-form decoding (#19497)

parent 5dcb10d8
...@@ -25,6 +25,7 @@ Tips: ...@@ -25,6 +25,7 @@ Tips:
- The model usually performs well without requiring any finetuning. - The model usually performs well without requiring any finetuning.
- The architecture follows a classic encoder-decoder architecture, which means that it relies on the [`~generation_utils.GenerationMixin.generate`] function for inference. - The architecture follows a classic encoder-decoder architecture, which means that it relies on the [`~generation_utils.GenerationMixin.generate`] function for inference.
- Inference is currently only implemented for short-form i.e. audio is pre-segmented into <=30s segments. Long-form (including timestamps) will be implemented in a future release.
- One can use [`WhisperProcessor`] to prepare audio for the model, and decode the predicted ID's back into text. - One can use [`WhisperProcessor`] to prepare audio for the model, and decode the predicted ID's back into text.
This model was contributed by [Arthur Zucker](https://huggingface.co/ArthurZ). The Tensorflow version of this model was contributed by [amyeroberts](https://huggingface.co/amyeroberts). This model was contributed by [Arthur Zucker](https://huggingface.co/ArthurZ). The Tensorflow version of this model was contributed by [amyeroberts](https://huggingface.co/amyeroberts).
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment