@@ -763,7 +763,7 @@ See [this page](#generative-models) for more information on how to use generativ
...
@@ -763,7 +763,7 @@ See [this page](#generative-models) for more information on how to use generativ
*`google/gemma-3-4b-it`, `google/gemma-3-27b-it`, etc.
*`google/gemma-3-4b-it`, `google/gemma-3-27b-it`, etc.
* ✅︎
* ✅︎
* ✅︎
* ✅︎
* ⚠️
*
-*`GLM4VForCausalLM`<sup>^</sup>
-*`GLM4VForCausalLM`<sup>^</sup>
* GLM-4V
* GLM-4V
* T + I
* T + I
...
@@ -948,8 +948,11 @@ V1 currently uses a simplified attention pattern:
...
@@ -948,8 +948,11 @@ V1 currently uses a simplified attention pattern:
- Uses causal attention for all tokens, including image tokens
- Uses causal attention for all tokens, including image tokens
- Generates reasonable outputs but does not match the original model's attention for text + image inputs
- Generates reasonable outputs but does not match the original model's attention for text + image inputs
- Will be updated in the future to support the correct behavior
- Will be updated in the future to support the correct behavior
- Does not support `"do_pan_and_scan": True`
This limitation exists because the model's mixed attention pattern (bidirectional for images, causal otherwise) is not yet supported by vLLM's attention backends.
This limitation exists because the model's mixed attention pattern (bidirectional for images, causal otherwise) is not yet supported by vLLM's attention backends.
For these reasons, `Gemma3ForConditionalGeneration` is supported only on V0 at the moment.