1. 29 Mar, 2021 7 commits
    • pcuenca's avatar
      Allow use of pre-computed lengths when grouping by length. (#10953) · ae6b6963
      pcuenca authored
      A new argument `length_column_name` has been added to
      `TrainingArguments`, with default value `"length"`. If this column
      exists and `group_by_length` is `True`, the train sampler will use
      it for grouping rather than computing it before training starts.
      
      This is an optimization that allows the user to prepare data for fast
      processing, preventing sequential access to the dataset as described in
      issue #10909.
      ae6b6963
    • Sylvain Gugger's avatar
      Remove duplicate code · 4002f95e
      Sylvain Gugger authored
      4002f95e
    • Daniel Stancl's avatar
      Add `examples/run_ner_no_trainer.py` (#10902) · d7b50ce4
      Daniel Stancl authored
      * Add NER example with accelerate library
      
      * This commit contains the first (yet really unfinished)
      version of a script for showing how to train HuggingFace model
      with their new accelerate library.
      
      * Fix metric calculation
      
      * make style quality
      
      * mv ner_no_trainer to token-classification dir
      
      * Delete --debug flag from running script
      
      * hf_datasets -> raw_datasets
      
      * Make a few slight adjustments
      
      * Add an informative comment + rewrite a help comment
      
      * Change header
      
      * Fix a few things
      
      * Enforce to use fast tokenizers only
      
      * DataCollatorWithPadding -> DataCollatorForTokenClassification
      
      * Change bash script: python3 -> accelerate launch
      
      * make style
      
      * Add a few missing things (see below)
      
      * Add a max-lenghth padding to predictions and labels to
      enable accelerate gather functionality
      
      * Add PyTorch no trainer example to the example README.md
      
      * Remove --do-train from args as being redundant for now
      
      * DataCollatorWithPadding -> DataCollatorForTokenClassification
      
      * Remove some obsolete args.do_train conditions from the script
      
      * Delete --do_train from bash running script
      
      * Delete use_slow_tokenizer from args
      
      * Add unintentionally removed flag --label_all_tokens
      
      * Delete --debug flag from running script
      d7b50ce4
    • Sylvain Gugger's avatar
      Instantiate model only once in pipeline (#10888) · 06a6fea7
      Sylvain Gugger authored
      
      
      * Instantiate model only once in pipeline
      
      * Remove documentation of deprecated method
      
      * Add FutureWarning
      
      * Update src/transformers/pipelines/base.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      06a6fea7
    • Masatoshi Suzuki's avatar
      cc2366bb
    • WybeKoper's avatar
    • Guillaume Filion's avatar
      b3544e4c
  2. 28 Mar, 2021 1 commit
  3. 26 Mar, 2021 3 commits
  4. 25 Mar, 2021 7 commits
    • lexhuismans's avatar
      Fix comment (#10886) · 86c6f8a8
      lexhuismans authored
      86c6f8a8
    • Sylvain Gugger's avatar
      Reorder init imports · 9856c921
      Sylvain Gugger authored
      9856c921
    • Sylvain Gugger's avatar
      Fix typo · e70068a7
      Sylvain Gugger authored
      e70068a7
    • Sylvain Gugger's avatar
      Sort init imports · f183a7a3
      Sylvain Gugger authored
      f183a7a3
    • Amir Tahmasbi's avatar
      Layout lm tf 2 (#10636) · 4684bfc7
      Amir Tahmasbi authored
      
      
      * Added embeddings layer
      
      * Added layoutlm layers, main model, maskedlm and token classification classes
      
      * Added model classes to tf auto models
      
      * Added model to PT to TF conversion script
      
      * Added model to doc README
      
      * Added tests
      
      * Removed unused imports
      
      * Added layoutlm model, test, and doc for sequence classification, and fix imports in __init__.py
      
      * Made tests pass!
      
      * Fixed typos in imports and docs
      
      * Fixed a typo in embeddings layer
      
      * Removed imports
      
      * Fixed formatting issues, imports, tests
      
      * Added layoutlm layers, main model, maskedlm and token classification classes
      
      * Added model classes to tf auto models
      
      * Added model to PT to TF conversion script
      
      * Removed unused imports
      
      * Added layoutlm model, test, and doc for sequence classification, and fix imports in __init__.py
      
      * Made tests pass!
      
      * Fixed typos in imports and docs
      
      * Removed imports
      
      * Fixed small formatting issues
      
      * Removed duplicates import from main __init__.py
      
      * Chnaged deafult arg to true for adding  pooling layer to tf layoutlm
      
      * Fixed formatting issues
      
      * Style
      
      * Added copied from to classes copied from bert
      
      * Fixed doc strings examples to work with layoutlm inputs
      
      * Removed PyTorch reference in doc strings example
      
      * Added integration tests
      
      * Cleaned up initialization file
      
      * Updated model checkpoint identifiers
      
      * Fixed imports
      Co-authored-by: default avatarAmir Tahmasbi <amir@ehsai.ca>
      Co-authored-by: default avatarLysandre <lysandre.debut@reseau.eseo.fr>
      4684bfc7
    • Philipp Schmid's avatar
    • Jethro Kuan's avatar
      run_glue_no_trainer: datasets -> raw_datasets (#10898) · 5f1491d3
      Jethro Kuan authored
      Use the correct variable (raw_datasets) instead of the module (datasets)
      where appropriate.
      5f1491d3
  5. 24 Mar, 2021 6 commits
  6. 23 Mar, 2021 12 commits
  7. 22 Mar, 2021 4 commits