Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
d95d6507
Unverified
Commit
d95d6507
authored
Jan 23, 2026
by
Raushan Turganbay
Committed by
GitHub
Jan 23, 2026
Browse files
[Bugfix] Fix getting vision features in Transformer Multimodal backend (#32933)
Signed-off-by:
raushan
<
raushan@huggingface.co
>
parent
13d8746c
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
9 additions
and
0 deletions
+9
-0
vllm/model_executor/models/transformers/multimodal.py
vllm/model_executor/models/transformers/multimodal.py
+9
-0
No files found.
vllm/model_executor/models/transformers/multimodal.py
View file @
d95d6507
...
@@ -376,6 +376,15 @@ class MultiModalMixin(SupportsMultiModal, SupportsMRoPE):
...
@@ -376,6 +376,15 @@ class MultiModalMixin(SupportsMultiModal, SupportsMRoPE):
pixel_values
,
**
kwargs
pixel_values
,
**
kwargs
)
)
# Transformers `v5`, `self.get_image_features` returns a tuple
# containing the features and optionally attentions/hidden_states
# After v5 is settled, we can enable qwen3-vl with several outputs
# from `self.get_image_features`
if
isinstance
(
vision_embeddings
,
tuple
):
vision_embeddings
=
vision_embeddings
[
0
]
elif
isinstance
(
vision_embeddings
,
dict
):
vision_embeddings
=
vision_embeddings
.
pooler_output
if
isinstance
(
vision_embeddings
,
torch
.
Tensor
):
if
isinstance
(
vision_embeddings
,
torch
.
Tensor
):
if
vision_embeddings
.
ndim
==
2
:
if
vision_embeddings
.
ndim
==
2
:
vision_embeddings
=
vision_embeddings
.
unsqueeze
(
0
)
vision_embeddings
=
vision_embeddings
.
unsqueeze
(
0
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment