1. 03 Nov, 2021 1 commit
  2. 31 Aug, 2021 1 commit
    • Matt's avatar
      TF/Numpy variants for all DataCollator classes (#13105) · 854260ca
      Matt authored
      
      
      * Adding a TF variant of the DataCollatorForTokenClassification to get feedback
      
      * Added a Numpy variant and a post_init check to fail early if a missing import is found
      
      * Fixed call to Numpy variant
      
      * Added a couple more of the collators
      
      * Update src/transformers/data/data_collator.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Fixes, style pass, finished DataCollatorForSeqToSeq
      
      * Added all the LanguageModeling DataCollators, except SOP and PermutationLanguageModeling
      
      * Adding DataCollatorForPermutationLanguageModeling
      
      * Style pass
      
      * Add missing `__call__` for PLM
      
      * Remove `post_init` checks for frameworks because the imports inside them were making us fail code quality checks
      
      * Remove unused imports
      
      * First attempt at some TF tests
      
      * A second attempt to make any of those tests actually work
      
      * TF tests, round three
      
      * TF tests, round four
      
      * TF tests, round five
      
      * TF tests, all enabled!
      
      * Style pass
      
      * Merging tests into `test_data_collator.py`
      
      * Merging tests into `test_data_collator.py`
      
      * Fixing up test imports
      
      * Fixing up test imports
      
      * Trying shuffling the conditionals around
      
      * Commenting out non-functional old tests
      
      * Completed all tests for all three frameworks
      
      * Style pass
      
      * Fixed test typo
      
      * Style pass
      
      * Move standard `__call__` method to mixin
      
      * Rearranged imports for `test_data_collator`
      
      * Fix data collator typo "torch" -> "pt"
      
      * Fixed the most embarrassingly obvious bug
      
      * Update src/transformers/data/data_collator.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Renaming mixin
      
      * Updating docs
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarDalton Walker <dalton_walker@icloud.com>
      Co-authored-by: default avatarAndrew Romans <andrew.romans@hotmail.com>
      854260ca
  3. 08 Apr, 2021 1 commit
  4. 07 Dec, 2020 1 commit
  5. 04 Nov, 2020 1 commit
  6. 03 Nov, 2020 1 commit
  7. 26 Oct, 2020 1 commit
  8. 22 Sep, 2020 1 commit
  9. 10 Sep, 2020 1 commit
    • Yu Liu's avatar
      Albert pretrain datasets/ datacollator (#6168) · 762cba3b
      Yu Liu authored
      
      
      * add dataset for albert pretrain
      
      * datacollator for albert pretrain
      
      * naming, comprehension, file reading change
      
      * data cleaning is no needed after this modification
      
      * delete prints
      
      * fix a bug
      
      * file structure change
      
      * add tests for albert datacollator
      
      * remove random seed
      
      * add back len and get item function
      
      * sample file for testing and test code added
      
      * format change for black
      
      * more format change
      
      * Style
      
      * var assignment issue resolve
      
      * add back wrongly deleted DataCollatorWithPadding in init file
      
      * Style
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      Co-authored-by: default avatarLysandre <lysandre.debut@reseau.eseo.fr>
      762cba3b
  10. 31 Aug, 2020 1 commit
  11. 20 Aug, 2020 1 commit
    • Sylvain Gugger's avatar
      Add tests to Trainer (#6605) · 573bdb0a
      Sylvain Gugger authored
      * Add tests to Trainer
      
      * Test if removing long breaks everything
      
      * Remove ugly hack
      
      * Fix distributed test
      
      * Use float for number of epochs
      573bdb0a
  12. 20 Jul, 2020 1 commit
    • Pradhy729's avatar
      Trainer support for iterabledataset (#5834) · 290b6e18
      Pradhy729 authored
      * Don't pass sampler for iterable dataset
      
      * Added check for test and eval dataloaders.
      
      * Formatting
      
      * Don't pass sampler for iterable dataset
      
      * Added check for test and eval dataloaders.
      
      * Formatting
      
      * Cleaner if nesting.
      
      * Added test for trainer and iterable dataset
      
      * Formatting for test
      
      * Fixed import when torch is available only.
      
      * Added require torch decorator to helper class
      
      * Moved dataset class inside unittest
      
      * Removed nested if and changed model in test
      
      * Checking torch availability for IterableDataset
      290b6e18
  13. 07 Jul, 2020 1 commit
    • Shashank Gupta's avatar
      Added data collator for permutation (XLNet) language modeling and related calls (#5522) · 3dcb748e
      Shashank Gupta authored
      * Added data collator for XLNet language modeling and related calls
      
      Added DataCollatorForXLNetLanguageModeling in data/data_collator.py
      to generate necessary inputs for language modeling training with
      XLNetLMHeadModel. Also added related arguments, logic and calls in
      examples/language-modeling/run_language_modeling.py.
      
      Resolves: #4739, #2008 (partially)
      
      * Changed name to `DataCollatorForPermutationLanguageModeling`
      
      Changed the name of `DataCollatorForXLNetLanguageModeling` to the more general `DataCollatorForPermutationLanguageModelling`.
      Removed the `--mlm` flag requirement for the new collator and defined a separate `--plm_probability` flag for its use.
      CTRL uses a CLM loss just like GPT and GPT-2, so should work out of the box with this script (provided `past` is taken care of
      similar to `mems` for XLNet).
      Changed calls and imports appropriately.
      
      * Added detailed comments, changed variable names
      
      Added more detailed comments to `DataCollatorForPermutationLanguageModeling` in `data/data_collator.py` to explain working. Also cleaned up variable names and made them more informative.
      
      * Added tests for new data collator
      
      Added tests in `tests/test_trainer.py` for DataCollatorForPermutationLanguageModeling based on those in DataCollatorForLanguageModeling. A specific test has been added to check for odd-length sequences.
      
      * Fixed styling issues
      3dcb748e
  14. 01 Jul, 2020 2 commits
  15. 18 Jun, 2020 1 commit
  16. 17 Jun, 2020 1 commit
  17. 15 Jun, 2020 1 commit
  18. 05 Jun, 2020 1 commit
  19. 21 May, 2020 1 commit
  20. 13 May, 2020 1 commit
  21. 07 May, 2020 1 commit
    • Julien Chaumond's avatar
      BIG Reorganize examples (#4213) · 0ae96ff8
      Julien Chaumond authored
      * Created using Colaboratory
      
      * [examples] reorganize files
      
      * remove run_tpu_glue.py as superseded by TPU support in Trainer
      
      * Bugfix: int, not tuple
      
      * move files around
      0ae96ff8
  22. 22 Apr, 2020 1 commit
    • Julien Chaumond's avatar
      Trainer (#3800) · dd9d483d
      Julien Chaumond authored
      * doc
      
      * [tests] Add sample files for a regression task
      
      * [HUGE] Trainer
      
      * Feedback from @sshleifer
      
      * Feedback from @thomwolf + logging tweak
      
      * [file_utils] when downloading concurrently, get_from_cache will use the cached file for subsequent processes
      
      * [glue] Use default max_seq_length of 128 like before
      
      * [glue] move DataTrainingArguments around
      
      * [ner] Change interface of InputExample, and align run_{tf,pl}
      
      * Re-align the pl scripts a little bit
      
      * ner
      
      * [ner] Add integration test
      
      * Fix language_modeling with API tweak
      
      * [ci] Tweak loss target
      
      * Don't break console output
      
      * amp.initialize: model must be on right device before
      
      * [multiple-choice] update for Trainer
      
      * Re-align to 827d6d6e
      dd9d483d