[Bugfix] Fix num video tokens calculation for Qwen2-VL (#13148)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

[Bugfix] Fix num video tokens calculation for Qwen2-VL (#13148)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
985b4a2b · Cyrus Leung · GitHub · f4d97e4f · 985b4a2b
Unverified Commit 985b4a2b authored Feb 12, 2025 by Cyrus Leung Committed by GitHub Feb 12, 2025
Show whitespace changes
Inline Side-by-side

Showing with 5 additions and 1 deletion

vllm/model_executor/models/qwen2_vl.py vllm/model_executor/models/qwen2_vl.py +5 -1

No files found.
--- a/vllm/model_executor/models/qwen2_vl.py
+++ b/vllm/model_executor/models/qwen2_vl.py
@@ -800,7 +800,11 @@ class Qwen2VLProcessingInfo(BaseProcessingInfo):
            preprocessed_size = ImageSize(width=image_width,
                                          height=image_height)

-        grid_t = max(num_frames // temporal_patch_size, 1)
+        # NOTE: Frames are padded to be divisible by `temporal_patch_size`
+        # https://github.com/huggingface/transformers/blob/v4.48.3/src/transformers/models/qwen2_vl/image_processing_qwen2_vl.py#L294
+        padded_num_frames = num_frames + num_frames % temporal_patch_size
+
+        grid_t = max(padded_num_frames // temporal_patch_size, 1)
        grid_h = preprocessed_size.height // patch_size
        grid_w = preprocessed_size.width // patch_size