1. 20 Jun, 2022 3 commits
  2. 18 Jun, 2022 2 commits
    • Yih-Dar's avatar
      Attempt to change Push CI to workflow_run (#17753) · 6589e510
      Yih-Dar authored
      
      
      * Use workflow_run event for push CI
      
      * change to workflow_run
      
      * Add comments
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      6589e510
    • Rafael Zimmer's avatar
      Added translation of index.mdx to Portuguese Issue #16824 (#17565) · 0d92798b
      Rafael Zimmer authored
      
      
      * Added translation of installation.mdx to Portuguese, as well
      as default templates of _toctree.yml and _config.py
      
      * [ build_documentation.yml ] - Updated doc_builder to build
      documentation in Portuguese.
      [ pipeline_tutorial.mdx ] - Created translation for the pipeline_tutorial.mdx.
      
      * [ build_pr_documentation.yml ] - Added pt language to pr_documentation builder.
      
      [ pipeline_tutorial.mdx ] - Grammar changes.
      
      * [ accelerate.mdx ] - Translated to Portuguese the acceleration tutorial.
      
      * [ multilingual.mdx ] - Added portuguese translation for multilingual tutorial.
      
      [ training.mdx ] - Added portuguese translation for training tutorial.
      
      * [ preprocessing.mdx ] - WIP
      
      * Update _toctree.yml
      
      * Adding Pr茅-processamento to _toctree.yml
      
      * Update accelerate.mdx
      
      * Nits and eliminate preprocessing file while it is ready
      
      * [ index.mdx ] - Translated to Portuguese the index apresentation page.
      
      * [ docs/source/pt ] - Updated _toctree.yml to match newest translations.
      
      * Fix build_pr_documentation.yml
      
      * Fix index nits
      
      * nits in _toctree
      Co-authored-by: default avatarOmar U. Espejel <espejelomar@gmail.com>
      0d92798b
  3. 17 Jun, 2022 6 commits
  4. 16 Jun, 2022 5 commits
  5. 15 Jun, 2022 8 commits
  6. 14 Jun, 2022 10 commits
  7. 13 Jun, 2022 6 commits
    • Daniel Stancl's avatar
      Add `LongT5` model (#16792) · a72f1c9f
      Daniel Stancl authored
      
      
      * Initial commit
      
      * Make some fixes
      
      * Make PT model full forward pass
      
      * Drop TF & Flax implementation, fix copies etc
      
      * Add Flax model and update some corresponding stuff
      
      * Drop some TF things
      
      * Update config and flax local attn
      
      * Add encoder_attention_type to config
      
      * .
      
      * Update docs
      
      * Do some cleansing
      
      * Fix some issues -> make style; add some docs
      
      * Fix position_bias + mask addition + Update tests
      
      * Fix repo consistency
      
      * Fix model consistency by removing flax operation over attn_mask
      
      * [WIP] Add PT TGlobal LongT5
      
      * .
      
      * [WIP] Add flax tglobal model
      
      * [WIP] Update flax model to use the right attention type in the encoder
      
      * Fix flax tglobal model forward pass
      
      * Make the use of global_relative_attention_bias
      
      * Add test suites for TGlobal model
      
      * Fix minor bugs, clean code
      
      * Fix pt-flax equivalence though not convinced with correctness
      
      * Fix LocalAttn implementation to match the original impl. + update READMEs
      
      * Few updates
      
      * Update: [Flax] improve large model init and loading #16148
      
      * Add ckpt conversion script accoring to #16853 + handle torch device placement
      
      * Minor updates to conversion script.
      
      * Typo: AutoModelForSeq2SeqLM -> FlaxAutoModelForSeq2SeqLM
      
      * gpu support + dtype fix
      
      * Apply some suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * * Remove (de)parallelize stuff
      * Edit shape comments
      * Update README.md
      * make fix-copies
      
      * Remove caching logic for local & tglobal attention
      
      * Apply another batch of suggestions from code review
      
      * Add missing checkpoints
      * Format converting scripts
      * Drop (de)parallelize links from longT5 mdx
      
      * Fix converting script + revert config file change
      
      * Revert "Remove caching logic for local & tglobal attention"
      
      This reverts commit 2a619828f6ddc3e65bd9bb1725a12b77fa883a46.
      
      * Stash caching logic in Flax model
      
      * Make side relative bias used always
      
      * Drop caching logic in PT model
      
      * Return side bias as it was
      
      * Drop all remaining model parallel logic
      
      * Remove clamp statements
      
      * Move test files to the proper place
      
      * Update docs with new version of hf-doc-builder
      
      * Fix test imports
      
      * Make some minor improvements
      
      * Add missing checkpoints to docs
      * Make TGlobal model compatible with torch.onnx.export
      * Replace some np.ndarray with jnp.ndarray
      
      * Fix TGlobal for ONNX conversion + update docs
      
      * fix _make_global_fixed_block_ids and masked neg  value
      
      * update flax model
      
      * style and quality
      
      * fix imports
      
      * remove load_tf_weights_in_longt5 from init and fix copies
      
      * add slow test for TGlobal model
      
      * typo fix
      
      * Drop obsolete is_parallelizable and one warning
      
      * Update __init__ files to fix repo-consistency
      
      * fix pipeline test
      
      * Fix some device placements
      
      * [wip]: Update tests -- need to generate summaries to update expected_summary
      
      * Fix quality
      
      * Update LongT5 model card
      
      * Update (slow) summarization tests
      
      * make style
      
      * rename checkpoitns
      
      * finish
      
      * fix flax tests
      Co-authored-by: default avatarphungvanduy <pvduy23@gmail.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarpatil-suraj <surajp815@gmail.com>
      a72f1c9f
    • haohanchen-yagao's avatar
      Add FP16 Support for SageMaker Model Parallel (#17386) · 1690094b
      haohanchen-yagao authored
      * Add FP16 supporot for sagemaker model parallel
      
      * minor fix
      
      * fix indentation
      
      * handle mix precision exception for smmp
      
      * minor fix
      
      * remove amp implementation on SMMP
      
      * remove redundant stuff
      
      * reformat trainer
      
      * restyling
      
      * reformat
      1690094b
    • Wang, Yi's avatar
      enable cpu distribution training using mpirun (#17570) · 4aabf9b5
      Wang, Yi authored
      
      
      * enable cpu distribution training using mpirun
      
      *command like
      *    mpirun -n 2 python3 run_qa.py --no_cuda --xpu_backend ccl xxxx
      *MASTER_ADDR and MASTER_PORT should be set as env
      *export MASTER_ADDR=127.0.0.1
      *export MASTER_PORT=29500
      Signed-off-by: default avatarWang, Yi A <yi.a.wang@intel.com>
      
      * fix according to the review comment
      Signed-off-by: default avatarWang, Yi A <yi.a.wang@intel.com>
      
      * use accelerate logic for cpu distribution training to set "RANK","LOCAL_RANK","WORLD_SIZE" environment
      Signed-off-by: default avatarWang, Yi A <yi.a.wang@intel.com>
      4aabf9b5
    • Bram Vanroy's avatar
      Add Ray's scope to training arguments (#17629) · 457d4a32
      Bram Vanroy authored
      
      
      * allow scope from trainer arg
      
      * add ray_scope to training args
      
      * escape double quotes
      
      * make style && quality
      
      * attempt to solve doc style issues
      
      * splitting up URLs for style
      
      * make fixup
      
      * Update src/transformers/training_args.py
      Co-authored-by: default avatarAntoni Baum <antoni.baum@protonmail.com>
      
      * make style
      Co-authored-by: default avatarAntoni Baum <antoni.baum@protonmail.com>
      457d4a32
    • Will Frey's avatar
      Update modeling_gpt_neox.py (#17575) · 54833886
      Will Frey authored
      I'm guessing that the intention was to have the `_no_split_modules` class attribute for `GPTNeoXPreTrainedModel` to be set to `["GPTNeoXLayer"]`, akin to how its set as `["GPTJBlock"]` for `GPTJPreTrainedModel`.
      
      If this is incorrect, please feel free to just close the PR.
      
      Thanks!
      54833886
    • Sylvain Gugger's avatar
      Fix dtype getter (#17668) · a1344dbf
      Sylvain Gugger authored
      * Fix dtype getters
      
      * Proper fix for dtype getter
      
      * Style and commant
      
      * Always use last for consistency
      
      * Quality
      a1344dbf