@@ -51,7 +51,7 @@ As usual, follow :ref:`these steps <adding_a_new_model>` to implement the model
...
@@ -51,7 +51,7 @@ As usual, follow :ref:`these steps <adding_a_new_model>` to implement the model
2. Register input mappers
2. Register input mappers
-------------------------
-------------------------
For each modality type to support, decorate the model class with :meth:`MULTIMODAL_REGISTRY.register_input_mapper <vllm.multimodal.MultiModalRegistry.register_input_mapper>`.
For each modality type that the model accepts as input, decorate the model class with :meth:`MULTIMODAL_REGISTRY.register_input_mapper <vllm.multimodal.MultiModalRegistry.register_input_mapper>`.
This decorator accepts a function that maps multi-modal inputs to the keyword arguments you have previously defined in :meth:`~torch.nn.Module.forward`.
This decorator accepts a function that maps multi-modal inputs to the keyword arguments you have previously defined in :meth:`~torch.nn.Module.forward`.
.. code-block:: diff
.. code-block:: diff
...
@@ -59,8 +59,7 @@ This decorator accepts a function that maps multi-modal inputs to the keyword ar
...
@@ -59,8 +59,7 @@ This decorator accepts a function that maps multi-modal inputs to the keyword ar
from vllm.model_executor.models.interfaces import SupportsVision
from vllm.model_executor.models.interfaces import SupportsVision
class YourModelForImage2Seq(nn.Module, SupportsVision):
Here are some examples:
- Image inputs (static feature size): `LLaVA-1.5 Model <https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/llava.py>`__
- Image inputs (dynamic feature size): `LLaVA-NeXT Model <https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/llava_next.py>`__
.. seealso::
:ref:`input_processing_pipeline`
4. (Optional) Register dummy data
---------------------------------
---------------------------------
During startup, dummy data is passed to the vLLM model to allocate memory. This only consists of text input by default, which may not be applicable to multi-modal models.
During startup, dummy data is passed to the vLLM model to allocate memory. This only consists of text input by default, which may not be applicable to multi-modal models.
...
@@ -81,11 +106,14 @@ In such cases, you can define your own dummy data by registering a factory metho
...
@@ -81,11 +106,14 @@ In such cases, you can define your own dummy data by registering a factory metho
from vllm.model_executor.models.interfaces import SupportsVision
from vllm.model_executor.models.interfaces import SupportsVision