@@ -135,6 +135,33 @@ Instead of passing in a single image, you can pass in a list of images.
...
@@ -135,6 +135,33 @@ Instead of passing in a single image, you can pass in a list of images.
A code example can be found in `examples/offline_inference_vision_language_multi_image.py <https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_vision_language_multi_image.py>`_.
A code example can be found in `examples/offline_inference_vision_language_multi_image.py <https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_vision_language_multi_image.py>`_.
Multi-image input can be extended to perform video captioning. We show this with `Qwen2-VL <https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct>`_ as it supports videos:
.. code-block:: python
# Specify the maximum number of frames per video to be 4. This can be changed.