1. 13 Jun, 2022 8 commits
    • Daniel Stancl's avatar
      Add `LongT5` model (#16792) · a72f1c9f
      Daniel Stancl authored
      
      
      * Initial commit
      
      * Make some fixes
      
      * Make PT model full forward pass
      
      * Drop TF & Flax implementation, fix copies etc
      
      * Add Flax model and update some corresponding stuff
      
      * Drop some TF things
      
      * Update config and flax local attn
      
      * Add encoder_attention_type to config
      
      * .
      
      * Update docs
      
      * Do some cleansing
      
      * Fix some issues -> make style; add some docs
      
      * Fix position_bias + mask addition + Update tests
      
      * Fix repo consistency
      
      * Fix model consistency by removing flax operation over attn_mask
      
      * [WIP] Add PT TGlobal LongT5
      
      * .
      
      * [WIP] Add flax tglobal model
      
      * [WIP] Update flax model to use the right attention type in the encoder
      
      * Fix flax tglobal model forward pass
      
      * Make the use of global_relative_attention_bias
      
      * Add test suites for TGlobal model
      
      * Fix minor bugs, clean code
      
      * Fix pt-flax equivalence though not convinced with correctness
      
      * Fix LocalAttn implementation to match the original impl. + update READMEs
      
      * Few updates
      
      * Update: [Flax] improve large model init and loading #16148
      
      * Add ckpt conversion script accoring to #16853 + handle torch device placement
      
      * Minor updates to conversion script.
      
      * Typo: AutoModelForSeq2SeqLM -> FlaxAutoModelForSeq2SeqLM
      
      * gpu support + dtype fix
      
      * Apply some suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * * Remove (de)parallelize stuff
      * Edit shape comments
      * Update README.md
      * make fix-copies
      
      * Remove caching logic for local & tglobal attention
      
      * Apply another batch of suggestions from code review
      
      * Add missing checkpoints
      * Format converting scripts
      * Drop (de)parallelize links from longT5 mdx
      
      * Fix converting script + revert config file change
      
      * Revert "Remove caching logic for local & tglobal attention"
      
      This reverts commit 2a619828f6ddc3e65bd9bb1725a12b77fa883a46.
      
      * Stash caching logic in Flax model
      
      * Make side relative bias used always
      
      * Drop caching logic in PT model
      
      * Return side bias as it was
      
      * Drop all remaining model parallel logic
      
      * Remove clamp statements
      
      * Move test files to the proper place
      
      * Update docs with new version of hf-doc-builder
      
      * Fix test imports
      
      * Make some minor improvements
      
      * Add missing checkpoints to docs
      * Make TGlobal model compatible with torch.onnx.export
      * Replace some np.ndarray with jnp.ndarray
      
      * Fix TGlobal for ONNX conversion + update docs
      
      * fix _make_global_fixed_block_ids and masked neg  value
      
      * update flax model
      
      * style and quality
      
      * fix imports
      
      * remove load_tf_weights_in_longt5 from init and fix copies
      
      * add slow test for TGlobal model
      
      * typo fix
      
      * Drop obsolete is_parallelizable and one warning
      
      * Update __init__ files to fix repo-consistency
      
      * fix pipeline test
      
      * Fix some device placements
      
      * [wip]: Update tests -- need to generate summaries to update expected_summary
      
      * Fix quality
      
      * Update LongT5 model card
      
      * Update (slow) summarization tests
      
      * make style
      
      * rename checkpoitns
      
      * finish
      
      * fix flax tests
      Co-authored-by: default avatarphungvanduy <pvduy23@gmail.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarpatil-suraj <surajp815@gmail.com>
      a72f1c9f
    • haohanchen-yagao's avatar
      Add FP16 Support for SageMaker Model Parallel (#17386) · 1690094b
      haohanchen-yagao authored
      * Add FP16 supporot for sagemaker model parallel
      
      * minor fix
      
      * fix indentation
      
      * handle mix precision exception for smmp
      
      * minor fix
      
      * remove amp implementation on SMMP
      
      * remove redundant stuff
      
      * reformat trainer
      
      * restyling
      
      * reformat
      1690094b
    • Wang, Yi's avatar
      enable cpu distribution training using mpirun (#17570) · 4aabf9b5
      Wang, Yi authored
      
      
      * enable cpu distribution training using mpirun
      
      *command like
      *    mpirun -n 2 python3 run_qa.py --no_cuda --xpu_backend ccl xxxx
      *MASTER_ADDR and MASTER_PORT should be set as env
      *export MASTER_ADDR=127.0.0.1
      *export MASTER_PORT=29500
      Signed-off-by: default avatarWang, Yi A <yi.a.wang@intel.com>
      
      * fix according to the review comment
      Signed-off-by: default avatarWang, Yi A <yi.a.wang@intel.com>
      
      * use accelerate logic for cpu distribution training to set "RANK","LOCAL_RANK","WORLD_SIZE" environment
      Signed-off-by: default avatarWang, Yi A <yi.a.wang@intel.com>
      4aabf9b5
    • Bram Vanroy's avatar
      Add Ray's scope to training arguments (#17629) · 457d4a32
      Bram Vanroy authored
      
      
      * allow scope from trainer arg
      
      * add ray_scope to training args
      
      * escape double quotes
      
      * make style && quality
      
      * attempt to solve doc style issues
      
      * splitting up URLs for style
      
      * make fixup
      
      * Update src/transformers/training_args.py
      Co-authored-by: default avatarAntoni Baum <antoni.baum@protonmail.com>
      
      * make style
      Co-authored-by: default avatarAntoni Baum <antoni.baum@protonmail.com>
      457d4a32
    • Will Frey's avatar
      Update modeling_gpt_neox.py (#17575) · 54833886
      Will Frey authored
      I'm guessing that the intention was to have the `_no_split_modules` class attribute for `GPTNeoXPreTrainedModel` to be set to `["GPTNeoXLayer"]`, akin to how its set as `["GPTJBlock"]` for `GPTJPreTrainedModel`.
      
      If this is incorrect, please feel free to just close the PR.
      
      Thanks!
      54833886
    • Sylvain Gugger's avatar
      Fix dtype getter (#17668) · a1344dbf
      Sylvain Gugger authored
      * Fix dtype getters
      
      * Proper fix for dtype getter
      
      * Style and commant
      
      * Always use last for consistency
      
      * Quality
      a1344dbf
    • Saint's avatar
    • Sijun He's avatar
      Add Visual Question Answering (VQA) pipeline (#17286) · 66336dc1
      Sijun He authored
      
      
      * wip
      
      * rebase
      
      * all tests pass
      
      * rebase
      
      * ready for PR
      
      * address comments
      
      * fix styles
      
      * add require_torch to pipeline test
      
      * remove remote image to improve CI consistency
      
      * address comments; fix tf/flax tests
      
      * address comments; fix tf/flax tests
      
      * fix tests; add alias
      
      * repo consistency tests
      
      * Update src/transformers/pipelines/visual_question_answering.py
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      
      * address comments
      
      * Update src/transformers/pipelines/visual_question_answering.py
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      
      * merge
      
      * Update src/transformers/models/auto/modeling_auto.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * merge
      Co-authored-by: default avatarSijun He <sijunhe@Sijuns-MacBook-Pro.local>
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      66336dc1
  2. 10 Jun, 2022 8 commits
  3. 09 Jun, 2022 10 commits
  4. 08 Jun, 2022 8 commits
  5. 07 Jun, 2022 5 commits
    • Yih-Dar's avatar
      fix (#17589) · c6cea5a7
      Yih-Dar authored
      
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      c6cea5a7
    • Chan Woo Kim's avatar
      M-CTC-T Model (#16402) · 119e3c0f
      Chan Woo Kim authored
      
      
      * added cbs to notebooks, made copy-paste error fix in generation_utils
      
      * initial push for mctc model
      
      * mctc feature extractor done
      
      * added processor, tokenizer and their tests for MCTC. Have added an MCTC modeling test, adjusting model code accordingly.
      
      * added processor, tokenizer and their tests for MCTC. Have added an MCTC modeling test, adjusting model code accordingly.
      
      * passing attention, now struggling to figure out how attention masks make sense here
      
      * works when excluding attention masks. ask later how one would integrate attention maskshere
      
      * bizarre configuration error (model prefix comes first in config dict json and messes up the order)
      
      * all passing but bizzarre config dict ordering issue when to_dict
      
      * passing all major tests
      
      * feature extraction, processor, tokenizer added & tests passing
      
      * style & consistency & other logistical fixes
      
      * copy paste fix
      
      * model after feature extraction working
      
      * commiting final feature extraction results; need to fix normalization
      
      * feature extraction passing tests; probably should add tests on the specific flashlight-copied functions?
      
      * delete print ; format code a bit
      
      * fixing tests
      
      * passing major tests
      
      * fixing styles
      
      * completed tokenization test with real example; not sure if these values are entirely correct.
      
      * last test fixes from local
      
      * reverting accidentally included custom setup configs
      
      * remove load tf weights; fix config error
      
      * testing couldnt import featureextractor
      
      * fix docs
      
      * fix docs
      
      * resolving comments
      
      * style fixes
      
      * style fixes
      
      * Update to MCTCConv1dSubSampler
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * relposemb fixes
      
      * conv1d name issue; expecting config fail with paraentheses
      
      * fix config issue
      
      * fix config issue
      
      * fix config issue
      
      * change everything to MCTCT
      
      * fixing naming change errors
      
      * archive list
      
      * copyrights and docs
      
      * copyrights and docs
      
      * copyrights and docs
      
      * merge resolution
      
      * move tests, fix to changed optionaldependency structure
      
      * test directories changed
      
      * fixing tests
      
      * how to avoid tf tests?
      
      * how to avoid tf tests?
      
      * tests passing locally
      
      * allow mctctprocessor imported any env
      
      * allow mctctprocessor imported any env
      
      * fixed second round of feedback, need to fix docs
      
      * doc changes not being applied
      
      * all fixed
      
      * style fix
      
      * feedback fixes
      
      * fix copies and feature extraction style fix
      
      * Update tests/models/visual_bert/test_modeling_visual_bert.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * copy paste huggingface:main visual bert
      
      * added eof newline to visual bert; all tests are passing otherwise
      
      * fix slow tests by adding attention mask
      
      * change model id to speechbrain
      
      * make fix-copies
      
      * fix readme unwanted deletes
      
      * fixing readmes, make fix-copies
      
      * consistent M-CTC-T naming
      
      * Update src/transformers/models/mctct/__init__.py
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * all fixed but variable naming
      
      * adjust double quotes
      
      * fixed variable names
      
      * copyright and mr quilter
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * correct slow tests
      
      * make fix-copies
      
      * Update src/transformers/models/mctct/configuration_mctct.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/models/mctct/configuration_mctct.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * m-ctc-t not mctct
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      119e3c0f
    • Michael Benayoun's avatar
      Fx support for Deberta-v[1-2], Hubert and LXMERT (#17539) · 5c8f6010
      Michael Benayoun authored
      * Support for deberta and deberta-v2
      
      * Support for LXMert
      
      * Support for Hubert
      
      * Fix for pt1.11
      
      * Trigger CI
      5c8f6010
    • Sylvain Gugger's avatar
      Add examples telemetry (#17552) · 3cab9027
      Sylvain Gugger authored
      * Add examples telemetry
      
      * Alternative approach
      
      * Add to all other examples
      
      * Add to templates as well
      
      * Put framework separately
      
      * Same for TensorFlow
      3cab9027
    • Sylvain Gugger's avatar
      Fix circular import in onnx.utils (#17577) · b6a65ae5
      Sylvain Gugger authored
      * Fix circular import in onnx.utils
      
      * Add comment for test fetcher
      
      * Here too
      
      * Style
      b6a65ae5
  6. 06 Jun, 2022 1 commit