• Pablo Montalvo's avatar
    Add PaliGemma (#30814) · 1360801a
    Pablo Montalvo authored
    
    
    * add new model like
    
    * add state dict slicing + new model config
    
    * update palma config and weights, passes vision activations
    
    * fix
    
    * update
    
    * reorder loading/unpacking
    
    * clean up
    
    * add debug statements
    
    * change device
    
    * fix
    
    * debugging
    
    * fix noncausal mask
    
    * fixup sdpa + causal mask
    
    * fix activation function
    
    * remove debug before changing modeling file
    
    * add variants
    
    * debug attention mask in generate
    
    * revert to non-debug sdpa
    
    * revert gemma modifications
    
    * add custom language modeling
    
    * use Processor
    
    * add language modeling file to init
    
    * try thin wrapper around generate
    
    * Update
    
    * update mask
    
    * breakpoints galore
    
    * remove conflict
    
    * switch to left-padding
    
    * add incomplete model doc
    
    * add paligemma global files
    
    * batch rename paligemma
    
    * make generation match outputs and captioning
    
    * style
    
    * style
    
    * remove copied from + doc
    
    * remove more copied from
    
    * remove copy from projector
    
    * minor fix
    
    * update config and style
    
    * add readme - dummy
    
    * CORRECT image captioning
    
    * moving to args
    
    * add siglip proper + fix merging image + text features
    
    * take update_causal_mask from upstream
    
    * remove breakpoint
    
    * leverage AutoModel
    
    * fix input_ids slicing
    
    * make siglip head conditional
    
    * remove encoder_decoder value
    
    * remove unneeded modeling file
    
    * add commented 4d attention mask
    
    * FIXED generation with 4D mask
    
    * Update src/transformers/models/siglip/modeling_siglip.py
    Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
    
    * fix left padding detection
    
    * shuffle order of verifications
    
    * fix missing labels for training
    
    * fix
    
    * vectorize merging of features, improve slicing
    
    * improve testing before conversion
    
    * handle merging in processor
    
    * image token index depends on checkpoint
    
    * add variants, save processor too
    
    * save processors, base tokenizer off spm file
    
    * expand model embeddings due to additional image token
    
    * pass image processing args
    
    * add convert rgb to siglip processor
    
    * add \n token separately
    
    * fix tokenizer and prompts
    
    * fix docstrings
    
    * change to camel
    
    * fix casing
    
    * debug pos_ids and sdpa
    
    * pass and use cache_position
    
    * add flag for newline tokenization
    
    * Update src/transformers/models/paligemma/processing_paligemma.py
    Co-authored-by: default avatarMerve Noyan <merveenoyan@gmail.com>
    
    * simplify conversion script
    
    * add copied from
    
    * add precision to conversion script
    
    * Update src/transformers/models/paligemma/modeling_paligemma.py
    Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
    
    * clean up
    
    * Shift attention mask from `1:`
    
    After discussion with @molbap
    
    * add docs, fix quality
    
    * quality, tied weights inheritance, and logits/label alignment
    
    * fix more tests
    
    * pass attn_implementation to language model correctly
    
    * add SiglipVisionTransformer to no split modules
    
    * skip paligemma test for sdpa dispatch to flash
    
    * skip incompatible tests
    
    * quality
    
    * [broken archive maps]
    
    * Apply suggestions
    
    - remove archive lists
    - style
    - take shape of inputs_embeds for batch
    Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
    
    * Update src/transformers/utils/dummy_pt_objects.py
    Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
    
    * simplify conversion script
    
    * add suggestions
    
    * add suggestions
    
    * add copied from
    
    * fix
    
    * move labels out
    
    * revert
    
    * fix
    
    * remove placeholder labels if None
    
    * use cache_position
    
    * fix quality + docstrings
    
    * fix quality
    
    * fix paligemma 4d gemma mask incompatibility
    
    * fix config docstring
    
    * fix query and attn_mask dtype
    
    ---------
    Co-authored-by: default avatarArthurZucker <arthur.zucker@gmail.com>
    Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
    Co-authored-by: default avatarMerve Noyan <merveenoyan@gmail.com>
    Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
    1360801a
test_modeling_paligemma.py 16.7 KB