• Stas Bekman's avatar
    new model: IDEFICS via HuggingFaceM4 (#24796) · 6c811a32
    Stas Bekman authored
    
    
    * rename
    
    * restore
    
    * mappings
    
    * unedited tests+docs
    
    * docs
    
    * fixes
    
    * fix auto-sync breakage
    
    * cleanup
    
    * wip
    
    * wip
    
    * add fetch_images
    
    * remove einops dependency
    
    * update
    
    * fix
    
    * fix
    
    * fix
    
    * fix
    
    * fix
    
    * re-add
    
    * add batching
    
    * rework
    
    * fix
    
    * improve
    
    * add Leo as I am extending his work
    
    * cleanup
    
    * fix
    
    * cleanup
    
    * slow-test
    
    * fix
    
    * fix
    
    * fixes
    
    * deal with warning
    
    * rename modified llama classes
    
    * rework fetch_images
    
    * alternative implementation
    
    * cleanup
    
    * strict version
    
    * cleanup
    
    * [`IDEFICS`]聽Fix idefics ci (#25056)
    
    * Fix IDEFICS CI
    
    * fix test file
    
    * fixup
    
    * some changes to make tests pass
    
    * fix
    
    * fixup
    
    * Update src/transformers/models/idefics/configuration_idefics.py
    Co-authored-by: default avatarStas Bekman <stas00@users.noreply.github.com>
    
    ---------
    Co-authored-by: default avatarStas Bekman <stas00@users.noreply.github.com>
    
    * remove compat checks
    
    * style
    
    * explain that Idefics is not for training from scratch
    
    * require pt>=2.0
    
    * fix idefics vision config (#25092)
    
    * fix idefics vision config
    
    * fixup
    
    * clean
    
    * Update src/transformers/models/idefics/configuration_idefics.py
    
    ---------
    Co-authored-by: default avatarStas Bekman <stas00@users.noreply.github.com>
    
    * cleanup
    
    * style
    
    * cleanup
    
    * Apply suggestions from code review
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    
    * upcase
    
    * sequence of images
    
    * handle the case with no images
    
    * Update src/transformers/image_processing_utils.py
    Co-authored-by: default avatarVictor SANH <victorsanh@gmail.com>
    
    * support pure lm take 2
    
    * support tokenizer options
    
    * parameterize num_channels
    
    * fix upcase
    
    * s|IdeficsForCausalLM|IdeficsForVisionText2Text|g
    
    * manual to one line
    
    * addressing review
    
    * unbreak
    
    * remove clip dependency
    
    * fix test
    
    * consistency
    
    * PIL import
    
    * Idefics prefix
    
    * Idefics prefix
    
    * hack to make tests work
    
    * style
    
    * fix
    
    * fix
    
    * revert
    
    * try/finally
    
    * cleanup
    
    * clean up
    
    * move
    
    * [`IDEFICS`] Fix idefics config refactor (#25149)
    
    * refactor config
    
    * nuke init weights
    
    * more refactor
    
    * oops
    
    * remove visual question answering pipeline support
    
    * Update src/transformers/models/idefics/clip.py
    Co-authored-by: default avatarStas Bekman <stas00@users.noreply.github.com>
    
    * Update src/transformers/models/idefics/modeling_idefics.py
    
    * cleanup
    
    * mv clip.py vision.py
    
    * tidyup
    
    ---------
    Co-authored-by: default avatarStas Bekman <stas00@users.noreply.github.com>
    Co-authored-by: default avatarStas Bekman <stas@stason.org>
    
    * fix
    
    * license
    
    * condition on pt
    
    * fix
    
    * style
    
    * fix
    
    * rm torchvision dependency, allow custom transforms
    
    * address review
    
    * rework device arg
    
    * add_eos_token
    
    * s/transforms/transform/
    
    * fix top level imports
    
    * fix return value
    
    * cleanup
    
    * cleanup
    
    * fix
    
    * style
    
    * license
    
    * license
    
    * Update src/transformers/models/idefics/image_processing_idefics.py
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    
    * add a wrapper to freeze vision layears
    
    * tidyup
    
    * use the correct std/mean settings
    
    * parameterize values from config
    
    * add tests/models/idefics/test_image_processing_idefics.py
    
    * add test_processor_idefics.py
    
    * cleanup
    
    * cleanups
    
    * fix
    
    * fix
    
    * move to the right group
    
    * style
    
    * Apply suggestions from code review
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    
    * add perceiver config
    
    * reset
    
    * missing arg docs
    
    * Apply suggestions from code review
    Co-authored-by: default avatarLeo Tronchon <leo.tronchon@gmail.com>
    
    * address review comments
    
    * inject automatic end of utterance tokens (#25218)
    
    * inject automatic end of utterance tokens
    
    * fix
    
    * fix
    
    * fix
    
    * rework to not use the config
    
    * not end_of_utterance_token at the end
    
    * Update src/transformers/models/idefics/processing_idefics.py
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    
    * address review
    
    * Apply suggestions from code review
    Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
    
    * Update src/transformers/image_processing_utils.py
    Co-authored-by: default avatarNicolas Patry <patry.nicolas@protonmail.com>
    
    * [`Idefics`] add image_embeddings option in generate-related methods (#25442)
    
    * add image_embeddings option in generate-related methods
    
    * style
    
    * rename image_embeddings and allow perceiver embeddings precomputation
    
    * compute embeddings within generate
    
    * make is_encoder_decoder= True the default in config
    
    * nested if else fix
    
    * better triple check
    
    * switch if elif order for pixel values / img embeds
    
    * update model_kwargs perceiver only at the end
    
    * use _prepare_model_inputs instead of encoder_decoder logic
    
    * fix comment typo
    
    * fix config default for is_encoder_decoder
    
    * style
    
    * add typehints
    
    * precompute in forward
    
    * doc builder
    
    * style
    
    * pop instead of get image hidden states
    
    * Trigger CI
    
    * Update src/transformers/models/idefics/modeling_idefics.py
    Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
    
    * Update src/transformers/models/idefics/modeling_idefics.py
    Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
    
    * fix * + indentation + style
    
    * simplify a bit the use_resampler logic using comments
    
    * update diocstrings
    
    * Trigger CI
    
    ---------
    Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
    
    * fix rebase changes
    
    * unbreak #25237 - to be fixed in follow up PRs
    
    * is_composition = False
    
    * no longer needed
    
    ---------
    Co-authored-by: default avatarleot13 <leo.tronchon@gmail.com>
    Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    Co-authored-by: default avatarVictor SANH <victorsanh@gmail.com>
    Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
    Co-authored-by: default avatarNicolas Patry <patry.nicolas@protonmail.com>
    Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
    6c811a32
__init__.py 0 Bytes