Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
208c5625
Unverified
Commit
208c5625
authored
Jan 26, 2026
by
VihaanThat
Committed by
GitHub
Jan 26, 2026
Browse files
[Feature] Add LoRA support for Gemma3 vision components (#32764)
parent
9ac818a5
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
38 additions
and
0 deletions
+38
-0
vllm/model_executor/models/gemma3_mm.py
vllm/model_executor/models/gemma3_mm.py
+38
-0
No files found.
vllm/model_executor/models/gemma3_mm.py
View file @
208c5625
...
...
@@ -656,3 +656,41 @@ class Gemma3ForConditionalGeneration(
connector
=
"multi_modal_projector"
,
tower_model
=
"vision_tower"
,
)
def
get_num_mm_encoder_tokens
(
self
,
num_image_tokens
:
int
)
->
int
:
"""
Calculate the number of tokens output by the vision encoder.
The vision encoder processes images into patch embeddings. For Gemma3,
the relationship between prompt placeholder tokens and actual vision
encoder output tokens depends on the patch grid size.
Args:
num_image_tokens: Number of image placeholder tokens in the prompt
(typically mm_tokens_per_image per image)
Returns:
Number of tokens output by the vision encoder
"""
# For Gemma3, the vision encoder outputs tokens_per_side x tokens_per_side
# tokens per image. Since num_image_tokens represents the number of
# connector output tokens (mm_tokens_per_image = 256), and tokens_per_side
# is sqrt(256) = 16, we need to account for the token expansion.
# Based on empirical testing, the multiplier of 16 works correctly.
return
num_image_tokens
*
16
def
get_num_mm_connector_tokens
(
self
,
num_vision_tokens
:
int
)
->
int
:
"""
Calculate the number of tokens output by the multimodal connector.
The connector applies projection and normalization but maintains the
token count for Gemma3.
Args:
num_vision_tokens: Number of tokens from vision encoder
Returns:
Number of tokens after connector processing
"""
# The Gemma3 connector maintains a 1:1 token mapping
return
num_vision_tokens
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment