[Bugfix] fix Qwen2.5-Omni processor output mapping (#23058)

Signed-off-by: double7 <33449816+DoubleVII@users.noreply.github.com> Co-authored-by: 杨森 <yangsen.double7@bytedance.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

[Bugfix] fix Qwen2.5-Omni processor output mapping (#23058)
Signed-off-by: double7 <33449816+DoubleVII@users.noreply.github.com> Co-authored-by: 杨森 <yangsen.double7@bytedance.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
9f1c6422 · double7 · GitHub · 7be3a59d · 9f1c6422
Unverified Commit 9f1c6422 authored Aug 18, 2025 by double7 Committed by GitHub Aug 17, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 5 additions and 0 deletions

vllm/model_executor/models/qwen2_5_omni_thinker.py vllm/model_executor/models/qwen2_5_omni_thinker.py +5 -0

No files found.
--- a/vllm/model_executor/models/qwen2_5_omni_thinker.py
+++ b/vllm/model_executor/models/qwen2_5_omni_thinker.py
@@ -88,6 +88,11 @@ def _qwen2_5_omni_thinker_field_config(hf_inputs: Mapping[str, torch.Tensor]):
    video_grid_thw = hf_inputs.get("video_grid_thw", torch.empty((0, 3)))
    video_grid_sizes = video_grid_thw.prod(-1)
+    # vllm use `second_per_grid_ts` to compute multimodal rotary embedding
+    video_second_per_grid = hf_inputs.get("video_second_per_grid", None)
+    if video_second_per_grid is not None:
+        hf_inputs["second_per_grid_ts"] = video_second_per_grid
    return dict(
        input_audio_features=MultiModalFieldConfig.flat_from_sizes(
            "audio", audio_feature_lengths, dim=1),