"tests/models/vscode:/vscode.git/clone" did not exist on "ecfe9be7054e81f8841b8e97e6599e1a2d35ed7e"
  1. 22 Jun, 2022 5 commits
    • Sylvain Gugger's avatar
      Offload fixes (#17810) · df8e6804
      Sylvain Gugger authored
      * Offload fixes
      
      * Add a test
      df8e6804
    • Joao Gante's avatar
      CLI: use hub's `create_commit` (#17755) · 0d0c392c
      Joao Gante authored
      * use create_commit
      
      * better commit message and description
      
      * touch setup.py to trigger cache update
      
      * add hub version gating
      0d0c392c
    • Arthur's avatar
      initial commit (#17818) · 56b83cf0
      Arthur authored
      56b83cf0
    • Eran Hirsch's avatar
      Add logits_processor parameter, used by `generate`, to `Seq2SeqTrainer`... · 13570381
      Eran Hirsch authored
      Add logits_processor parameter, used by `generate`, to `Seq2SeqTrainer` methods `evaluate` and `predict` (#17805)
      
      * Add logits_processor parameter, used by `generate`, to `Seq2SeqTrainer` methods `evaluate` and `predict`
      
      * Add all generate parameters to `Seq2SeqTrainer`, and also to `QuestionAnsweringSeq2SeqTrainer` which overrides it
      
      * Remove `self._num_beams` from trainer classes
      
      * - Run fixup
      - Fix "Constraint" not exposed
      - Fix synced_gpus to actually read from param
      
      * Use kwargs
      
      * Copy kwargs before making changes to it
      
      * Fix style issues unused imports
      13570381
    • Arthur's avatar
      Flax sharded (#17760) · 16c6eb7c
      Arthur authored
      16c6eb7c
  2. 21 Jun, 2022 13 commits
  3. 20 Jun, 2022 5 commits
  4. 17 Jun, 2022 3 commits
    • Swetha Mandava's avatar
      Save huggingface checkpoint as artifact in mlflow callback (#17686) · 522a9ece
      Swetha Mandava authored
      
      
      * Fix eval to compute rouge correctly for rouge_score
      
      * styling
      
      * moving sentence tokenization to utils from run_eval
      
      * saving ckpt in mlflow
      
      * use existing format of args
      
      * fix documentation
      Co-authored-by: default avatarSwetha Mandava <smandava@nvidia.com>
      522a9ece
    • Sourab Mangrulkar's avatar
      Migrate HFDeepSpeedConfig from trfrs to accelerate (#17623) · 21a77242
      Sourab Mangrulkar authored
      
      
      * Migrate HFDeepSpeedConfig from trfrs to accelerate
      
      * add `accelerate` to testing dep
      
      * addressing comments
      
      * addressing comments
      
      Using `_shared_state` and avoiding object creation. This is necessary as `notebook_launcher` in `launcers.py` checks `len(AcceleratorState._shared_state)>0` to throw an error.
      
      * resolving comments
      
      1. Use simple API from accelerate to manage the deepspeed config integration
      2. Update the related documentation
      
      * reverting changes and addressing comments
      
      * docstring correction
      
      * addressing nits
      
      * addressing nits
      
      * addressing nits 3
      
      * bumping up the accelerate version to 0.10.0
      
      * resolving import
      
      * update setup.py to include deepspeed dependencies
      
      * Update dependency_versions_table.py
      
      * fixing imports
      
      * reverting changes to CI dependencies for "run_tests_pipelines_tf*" tests
      
      These changes didn't help with resolving the failures and I believe this needs to be addressed in another PR.
      
      * removing `accelerate` as hard dependency
      
      Resolves issues related to CI Tests
      
      * adding `accelerate` as dependency for building docs
      
      resolves failure in Build PR Documentation test
      
      * adding `accelerate` as dependency in "dev" to resolve doc build issue
      
      * resolving comments
      
      1. adding `accelerate` to extras["all"]
      2. Including check for accelerate too before import HFDeepSpeedConfig from there
      Co-Authored-By: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * resolving comments
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      21a77242
    • greg2451's avatar
      2d7c1bb1
  5. 16 Jun, 2022 4 commits
  6. 15 Jun, 2022 3 commits
  7. 14 Jun, 2022 5 commits
  8. 13 Jun, 2022 2 commits
    • Daniel Stancl's avatar
      Add `LongT5` model (#16792) · a72f1c9f
      Daniel Stancl authored
      
      
      * Initial commit
      
      * Make some fixes
      
      * Make PT model full forward pass
      
      * Drop TF & Flax implementation, fix copies etc
      
      * Add Flax model and update some corresponding stuff
      
      * Drop some TF things
      
      * Update config and flax local attn
      
      * Add encoder_attention_type to config
      
      * .
      
      * Update docs
      
      * Do some cleansing
      
      * Fix some issues -> make style; add some docs
      
      * Fix position_bias + mask addition + Update tests
      
      * Fix repo consistency
      
      * Fix model consistency by removing flax operation over attn_mask
      
      * [WIP] Add PT TGlobal LongT5
      
      * .
      
      * [WIP] Add flax tglobal model
      
      * [WIP] Update flax model to use the right attention type in the encoder
      
      * Fix flax tglobal model forward pass
      
      * Make the use of global_relative_attention_bias
      
      * Add test suites for TGlobal model
      
      * Fix minor bugs, clean code
      
      * Fix pt-flax equivalence though not convinced with correctness
      
      * Fix LocalAttn implementation to match the original impl. + update READMEs
      
      * Few updates
      
      * Update: [Flax] improve large model init and loading #16148
      
      * Add ckpt conversion script accoring to #16853 + handle torch device placement
      
      * Minor updates to conversion script.
      
      * Typo: AutoModelForSeq2SeqLM -> FlaxAutoModelForSeq2SeqLM
      
      * gpu support + dtype fix
      
      * Apply some suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * * Remove (de)parallelize stuff
      * Edit shape comments
      * Update README.md
      * make fix-copies
      
      * Remove caching logic for local & tglobal attention
      
      * Apply another batch of suggestions from code review
      
      * Add missing checkpoints
      * Format converting scripts
      * Drop (de)parallelize links from longT5 mdx
      
      * Fix converting script + revert config file change
      
      * Revert "Remove caching logic for local & tglobal attention"
      
      This reverts commit 2a619828f6ddc3e65bd9bb1725a12b77fa883a46.
      
      * Stash caching logic in Flax model
      
      * Make side relative bias used always
      
      * Drop caching logic in PT model
      
      * Return side bias as it was
      
      * Drop all remaining model parallel logic
      
      * Remove clamp statements
      
      * Move test files to the proper place
      
      * Update docs with new version of hf-doc-builder
      
      * Fix test imports
      
      * Make some minor improvements
      
      * Add missing checkpoints to docs
      * Make TGlobal model compatible with torch.onnx.export
      * Replace some np.ndarray with jnp.ndarray
      
      * Fix TGlobal for ONNX conversion + update docs
      
      * fix _make_global_fixed_block_ids and masked neg  value
      
      * update flax model
      
      * style and quality
      
      * fix imports
      
      * remove load_tf_weights_in_longt5 from init and fix copies
      
      * add slow test for TGlobal model
      
      * typo fix
      
      * Drop obsolete is_parallelizable and one warning
      
      * Update __init__ files to fix repo-consistency
      
      * fix pipeline test
      
      * Fix some device placements
      
      * [wip]: Update tests -- need to generate summaries to update expected_summary
      
      * Fix quality
      
      * Update LongT5 model card
      
      * Update (slow) summarization tests
      
      * make style
      
      * rename checkpoitns
      
      * finish
      
      * fix flax tests
      Co-authored-by: default avatarphungvanduy <pvduy23@gmail.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarpatil-suraj <surajp815@gmail.com>
      a72f1c9f
    • haohanchen-yagao's avatar
      Add FP16 Support for SageMaker Model Parallel (#17386) · 1690094b
      haohanchen-yagao authored
      * Add FP16 supporot for sagemaker model parallel
      
      * minor fix
      
      * fix indentation
      
      * handle mix precision exception for smmp
      
      * minor fix
      
      * remove amp implementation on SMMP
      
      * remove redundant stuff
      
      * reformat trainer
      
      * restyling
      
      * reformat
      1690094b