[Bugfix] Fix Whisper tokenization (#34011)

Signed-off-by: NickLucche <nlucches@redhat.com>

[Bugfix] Fix Whisper tokenization (#34011)
Signed-off-by: NickLucche <nlucches@redhat.com>
55aeec04 · Nicolò Lucchesi · GitHub · 90607718 · 55aeec04
Unverified Commit 55aeec04 authored Feb 07, 2026 by Nicolò Lucchesi Committed by GitHub Feb 07, 2026
Hide whitespace changes
Inline Side-by-side

Showing with 8 additions and 0 deletions

vllm/model_executor/models/whisper.py vllm/model_executor/models/whisper.py +8 -0

No files found.
--- a/vllm/model_executor/models/whisper.py
+++ b/vllm/model_executor/models/whisper.py
@@ -727,6 +727,14 @@ class WhisperMultiModalProcessor(EncDecMultiModalProcessor[WhisperProcessingInfo
                **mm_kwargs,
                sampling_rate=feature_extractor.sampling_rate,
            )
+        # The HF WhisperProcessor passes **kwargs to both the tokenizer
+        # and the feature extractor. Text-tokenizer kwargs like
+        # `truncation` and `max_length` must be removed when audio data
+        # is present, otherwise the feature extractor interprets
+        # `max_length` as raw audio samples and truncates the audio.
+        tok_kwargs = {
+            k: v for k, v in tok_kwargs.items() if k not in ("truncation", "max_length")
+        }
        processed_outputs = super()._call_hf_processor(
            prompt=prompt,
            mm_data=mm_data,