• NielsRogge's avatar
    Add GIT (GenerativeImage2Text) (#20295) · 9c6f7485
    NielsRogge authored
    
    
    * First draft
    
    * Make model instantiation work
    
    * Fix copied from statement
    
    * More fixes
    
    * Add correct output head
    
    * Improve configuration
    
    * Add conversion script
    
    * Improve conversion script
    
    * Remove token_type_ids
    
    * Fix conversion of projection layers
    
    * Convert all weights
    
    * Use cats image
    
    * Make logits match
    
    * Generate caption on cats image
    
    * Add GITProcessor
    
    * Update conversion script
    
    * Add support for more checkpoints
    
    * Fix conversion script
    
    * Add initial tests
    
    * Remove cross-attention
    
    * More improvements
    
    * Remove is_decoder
    
    * Improve model tests
    
    * Improve tests
    
    * Improve model outputs
    
    * Fix model outputs equivalence
    
    * Fix more tests
    
    * Remove unused code
    
    * Use generate to generate text, no use of cache for now
    
    * Use generate more appropriately
    
    * Fix config tests
    
    * Fix style
    
    * Add support for use_cache
    Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
    
    * Fix style
    
    * Fix GIT vision encoder
    
    * Update README
    
    * Fix integration test
    
    * Set bos and eos token ids
    
    * Improve docs
    
    * Improve code
    
    * Add support for provided attention_mask
    
    * Add copied from statement
    
    * Fix gradient checkpointing test
    
    * Set model_input_names
    
    * Investigate model_input_names
    
    * Remove script
    
    * Fix model inputs
    
    * Fix docstring
    
    * Rename GIT to Git
    
    * Support more models
    
    * Add support for textvqa model
    
    * Add video support
    
    * Extend conversion script for video
    
    * Add support for large variant
    
    * Add support for more models
    
    * Fix config archive map
    
    * Update integration test
    
    * Fix README
    
    * Fix CLIP mean and std
    
    * Update processor
    
    * Fix use_cache for video, thanks @gante
    
    * Remove print statements
    
    * Remove assertion
    
    * Add processor tests
    
    * Fix model_input_names
    
    * Use Auto API for processor
    
    * Fix processor tests
    
    * Fix integration test
    
    * Fix pipeline test
    
    * Make tests faster
    
    * Update conversion script
    
    * Update conversion script
    
    * Convert more checkpoints
    
    * Update conversion script
    
    * Fix typo
    
    * Update docstrings
    
    * Improve code snippets
    
    * Fix doc tests
    
    * Add more code examples茅
    
    * Fix doc tests
    
    * Add integration tests
    
    * Fix unused variable
    
    * revert
    
    * Add GIT to Japanese README
    Co-authored-by: default avatarNiels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
    Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
    Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
    9c6f7485
README_zh-hant.md 72.1 KB