Unverified Commit 55aeec04 authored by Nicolò Lucchesi's avatar Nicolò Lucchesi Committed by GitHub
Browse files

[Bugfix] Fix Whisper tokenization (#34011)


Signed-off-by: default avatarNickLucche <nlucches@redhat.com>
parent 90607718
......@@ -727,6 +727,14 @@ class WhisperMultiModalProcessor(EncDecMultiModalProcessor[WhisperProcessingInfo
**mm_kwargs,
sampling_rate=feature_extractor.sampling_rate,
)
# The HF WhisperProcessor passes **kwargs to both the tokenizer
# and the feature extractor. Text-tokenizer kwargs like
# `truncation` and `max_length` must be removed when audio data
# is present, otherwise the feature extractor interprets
# `max_length` as raw audio samples and truncates the audio.
tok_kwargs = {
k: v for k, v in tok_kwargs.items() if k not in ("truncation", "max_length")
}
processed_outputs = super()._call_hf_processor(
prompt=prompt,
mm_data=mm_data,
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment