Unverified Commit 6dcb07f6 authored by Tao Yun's avatar Tao Yun Committed by GitHub
Browse files

support qwen3-vl handle requests with embeddings (#30037)


Signed-off-by: default avatartaoyun <1069423820@qq.com>
Signed-off-by: default avatarCyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: default avatarCyrus Leung <cyrus.tl.leung@gmail.com>
parent 46cbbca0
...@@ -443,6 +443,8 @@ For Qwen2-VL and MiniCPM-V, we accept additional parameters alongside the embedd ...@@ -443,6 +443,8 @@ For Qwen2-VL and MiniCPM-V, we accept additional parameters alongside the embedd
print(generated_text) print(generated_text)
``` ```
For Qwen3-VL, the `image_embeds` should contain both the base image embedding and deepstack features.
#### Audio Embeddings #### Audio Embeddings
You can pass pre-computed audio embeddings similar to image embeddings: You can pass pre-computed audio embeddings similar to image embeddings:
......
...@@ -103,7 +103,7 @@ from .qwen2_5_vl import ( ...@@ -103,7 +103,7 @@ from .qwen2_5_vl import (
Qwen2_5_VLVideoInputs, Qwen2_5_VLVideoInputs,
Qwen2_5_VLVideoPixelInputs, Qwen2_5_VLVideoPixelInputs,
) )
from .qwen2_vl import Qwen2VLProcessingInfo from .qwen2_vl import Qwen2VLMultiModalDataParser, Qwen2VLProcessingInfo
from .qwen3 import Qwen3ForCausalLM, Qwen3Model from .qwen3 import Qwen3ForCausalLM, Qwen3Model
from .utils import ( from .utils import (
AutoWeightsLoader, AutoWeightsLoader,
...@@ -884,7 +884,10 @@ class Qwen3VLDummyInputsBuilder(BaseDummyInputsBuilder[Qwen3VLProcessingInfo]): ...@@ -884,7 +884,10 @@ class Qwen3VLDummyInputsBuilder(BaseDummyInputsBuilder[Qwen3VLProcessingInfo]):
class Qwen3VLMultiModalProcessor(BaseMultiModalProcessor[Qwen3VLProcessingInfo]): class Qwen3VLMultiModalProcessor(BaseMultiModalProcessor[Qwen3VLProcessingInfo]):
def _get_data_parser(self) -> MultiModalDataParser: def _get_data_parser(self) -> MultiModalDataParser:
return MultiModalDataParser(video_needs_metadata=True) return Qwen2VLMultiModalDataParser(
self.info.get_hf_config().vision_config.spatial_merge_size,
video_needs_metadata=True,
)
def _call_hf_processor( def _call_hf_processor(
self, self,
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment