• Daniel Stancl's avatar
    Add `LongT5` model (#16792) · a72f1c9f
    Daniel Stancl authored
    
    
    * Initial commit
    
    * Make some fixes
    
    * Make PT model full forward pass
    
    * Drop TF & Flax implementation, fix copies etc
    
    * Add Flax model and update some corresponding stuff
    
    * Drop some TF things
    
    * Update config and flax local attn
    
    * Add encoder_attention_type to config
    
    * .
    
    * Update docs
    
    * Do some cleansing
    
    * Fix some issues -> make style; add some docs
    
    * Fix position_bias + mask addition + Update tests
    
    * Fix repo consistency
    
    * Fix model consistency by removing flax operation over attn_mask
    
    * [WIP] Add PT TGlobal LongT5
    
    * .
    
    * [WIP] Add flax tglobal model
    
    * [WIP] Update flax model to use the right attention type in the encoder
    
    * Fix flax tglobal model forward pass
    
    * Make the use of global_relative_attention_bias
    
    * Add test suites for TGlobal model
    
    * Fix minor bugs, clean code
    
    * Fix pt-flax equivalence though not convinced with correctness
    
    * Fix LocalAttn implementation to match the original impl. + update READMEs
    
    * Few updates
    
    * Update: [Flax] improve large model init and loading #16148
    
    * Add ckpt conversion script accoring to #16853 + handle torch device placement
    
    * Minor updates to conversion script.
    
    * Typo: AutoModelForSeq2SeqLM -> FlaxAutoModelForSeq2SeqLM
    
    * gpu support + dtype fix
    
    * Apply some suggestions from code review
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
    
    * * Remove (de)parallelize stuff
    * Edit shape comments
    * Update README.md
    * make fix-copies
    
    * Remove caching logic for local & tglobal attention
    
    * Apply another batch of suggestions from code review
    
    * Add missing checkpoints
    * Format converting scripts
    * Drop (de)parallelize links from longT5 mdx
    
    * Fix converting script + revert config file change
    
    * Revert "Remove caching logic for local & tglobal attention"
    
    This reverts commit 2a619828f6ddc3e65bd9bb1725a12b77fa883a46.
    
    * Stash caching logic in Flax model
    
    * Make side relative bias used always
    
    * Drop caching logic in PT model
    
    * Return side bias as it was
    
    * Drop all remaining model parallel logic
    
    * Remove clamp statements
    
    * Move test files to the proper place
    
    * Update docs with new version of hf-doc-builder
    
    * Fix test imports
    
    * Make some minor improvements
    
    * Add missing checkpoints to docs
    * Make TGlobal model compatible with torch.onnx.export
    * Replace some np.ndarray with jnp.ndarray
    
    * Fix TGlobal for ONNX conversion + update docs
    
    * fix _make_global_fixed_block_ids and masked neg  value
    
    * update flax model
    
    * style and quality
    
    * fix imports
    
    * remove load_tf_weights_in_longt5 from init and fix copies
    
    * add slow test for TGlobal model
    
    * typo fix
    
    * Drop obsolete is_parallelizable and one warning
    
    * Update __init__ files to fix repo-consistency
    
    * fix pipeline test
    
    * Fix some device placements
    
    * [wip]: Update tests -- need to generate summaries to update expected_summary
    
    * Fix quality
    
    * Update LongT5 model card
    
    * Update (slow) summarization tests
    
    * make style
    
    * rename checkpoitns
    
    * finish
    
    * fix flax tests
    Co-authored-by: default avatarphungvanduy <pvduy23@gmail.com>
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
    Co-authored-by: default avatarpatil-suraj <surajp815@gmail.com>
    a72f1c9f
dummy_pt_objects.py 119 KB