1. 26 Oct, 2020 1 commit
  2. 22 Sep, 2020 1 commit
  3. 10 Sep, 2020 1 commit
    • Yu Liu's avatar
      Albert pretrain datasets/ datacollator (#6168) · 762cba3b
      Yu Liu authored
      
      
      * add dataset for albert pretrain
      
      * datacollator for albert pretrain
      
      * naming, comprehension, file reading change
      
      * data cleaning is no needed after this modification
      
      * delete prints
      
      * fix a bug
      
      * file structure change
      
      * add tests for albert datacollator
      
      * remove random seed
      
      * add back len and get item function
      
      * sample file for testing and test code added
      
      * format change for black
      
      * more format change
      
      * Style
      
      * var assignment issue resolve
      
      * add back wrongly deleted DataCollatorWithPadding in init file
      
      * Style
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      Co-authored-by: default avatarLysandre <lysandre.debut@reseau.eseo.fr>
      762cba3b
  4. 31 Aug, 2020 1 commit
  5. 20 Aug, 2020 1 commit
    • Sylvain Gugger's avatar
      Add tests to Trainer (#6605) · 573bdb0a
      Sylvain Gugger authored
      * Add tests to Trainer
      
      * Test if removing long breaks everything
      
      * Remove ugly hack
      
      * Fix distributed test
      
      * Use float for number of epochs
      573bdb0a
  6. 20 Jul, 2020 1 commit
    • Pradhy729's avatar
      Trainer support for iterabledataset (#5834) · 290b6e18
      Pradhy729 authored
      * Don't pass sampler for iterable dataset
      
      * Added check for test and eval dataloaders.
      
      * Formatting
      
      * Don't pass sampler for iterable dataset
      
      * Added check for test and eval dataloaders.
      
      * Formatting
      
      * Cleaner if nesting.
      
      * Added test for trainer and iterable dataset
      
      * Formatting for test
      
      * Fixed import when torch is available only.
      
      * Added require torch decorator to helper class
      
      * Moved dataset class inside unittest
      
      * Removed nested if and changed model in test
      
      * Checking torch availability for IterableDataset
      290b6e18
  7. 07 Jul, 2020 1 commit
    • Shashank Gupta's avatar
      Added data collator for permutation (XLNet) language modeling and related calls (#5522) · 3dcb748e
      Shashank Gupta authored
      * Added data collator for XLNet language modeling and related calls
      
      Added DataCollatorForXLNetLanguageModeling in data/data_collator.py
      to generate necessary inputs for language modeling training with
      XLNetLMHeadModel. Also added related arguments, logic and calls in
      examples/language-modeling/run_language_modeling.py.
      
      Resolves: #4739, #2008 (partially)
      
      * Changed name to `DataCollatorForPermutationLanguageModeling`
      
      Changed the name of `DataCollatorForXLNetLanguageModeling` to the more general `DataCollatorForPermutationLanguageModelling`.
      Removed the `--mlm` flag requirement for the new collator and defined a separate `--plm_probability` flag for its use.
      CTRL uses a CLM loss just like GPT and GPT-2, so should work out of the box with this script (provided `past` is taken care of
      similar to `mems` for XLNet).
      Changed calls and imports appropriately.
      
      * Added detailed comments, changed variable names
      
      Added more detailed comments to `DataCollatorForPermutationLanguageModeling` in `data/data_collator.py` to explain working. Also cleaned up variable names and made them more informative.
      
      * Added tests for new data collator
      
      Added tests in `tests/test_trainer.py` for DataCollatorForPermutationLanguageModeling based on those in DataCollatorForLanguageModeling. A specific test has been added to check for odd-length sequences.
      
      * Fixed styling issues
      3dcb748e
  8. 01 Jul, 2020 2 commits
  9. 18 Jun, 2020 1 commit
  10. 17 Jun, 2020 1 commit
  11. 15 Jun, 2020 1 commit
  12. 05 Jun, 2020 1 commit
  13. 21 May, 2020 1 commit
  14. 13 May, 2020 1 commit
  15. 07 May, 2020 1 commit
    • Julien Chaumond's avatar
      BIG Reorganize examples (#4213) · 0ae96ff8
      Julien Chaumond authored
      * Created using Colaboratory
      
      * [examples] reorganize files
      
      * remove run_tpu_glue.py as superseded by TPU support in Trainer
      
      * Bugfix: int, not tuple
      
      * move files around
      0ae96ff8
  16. 22 Apr, 2020 1 commit
    • Julien Chaumond's avatar
      Trainer (#3800) · dd9d483d
      Julien Chaumond authored
      * doc
      
      * [tests] Add sample files for a regression task
      
      * [HUGE] Trainer
      
      * Feedback from @sshleifer
      
      * Feedback from @thomwolf + logging tweak
      
      * [file_utils] when downloading concurrently, get_from_cache will use the cached file for subsequent processes
      
      * [glue] Use default max_seq_length of 128 like before
      
      * [glue] move DataTrainingArguments around
      
      * [ner] Change interface of InputExample, and align run_{tf,pl}
      
      * Re-align the pl scripts a little bit
      
      * ner
      
      * [ner] Add integration test
      
      * Fix language_modeling with API tweak
      
      * [ci] Tweak loss target
      
      * Don't break console output
      
      * amp.initialize: model must be on right device before
      
      * [multiple-choice] update for Trainer
      
      * Re-align to 827d6d6e
      dd9d483d