Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
55aeec04
Unverified
Commit
55aeec04
authored
Feb 07, 2026
by
Nicolò Lucchesi
Committed by
GitHub
Feb 07, 2026
Browse files
[Bugfix] Fix Whisper tokenization (#34011)
Signed-off-by:
NickLucche
<
nlucches@redhat.com
>
parent
90607718
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
8 additions
and
0 deletions
+8
-0
vllm/model_executor/models/whisper.py
vllm/model_executor/models/whisper.py
+8
-0
No files found.
vllm/model_executor/models/whisper.py
View file @
55aeec04
...
@@ -727,6 +727,14 @@ class WhisperMultiModalProcessor(EncDecMultiModalProcessor[WhisperProcessingInfo
...
@@ -727,6 +727,14 @@ class WhisperMultiModalProcessor(EncDecMultiModalProcessor[WhisperProcessingInfo
**
mm_kwargs
,
**
mm_kwargs
,
sampling_rate
=
feature_extractor
.
sampling_rate
,
sampling_rate
=
feature_extractor
.
sampling_rate
,
)
)
# The HF WhisperProcessor passes **kwargs to both the tokenizer
# and the feature extractor. Text-tokenizer kwargs like
# `truncation` and `max_length` must be removed when audio data
# is present, otherwise the feature extractor interprets
# `max_length` as raw audio samples and truncates the audio.
tok_kwargs
=
{
k
:
v
for
k
,
v
in
tok_kwargs
.
items
()
if
k
not
in
(
"truncation"
,
"max_length"
)
}
processed_outputs
=
super
().
_call_hf_processor
(
processed_outputs
=
super
().
_call_hf_processor
(
prompt
=
prompt
,
prompt
=
prompt
,
mm_data
=
mm_data
,
mm_data
=
mm_data
,
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment