- 05 Oct, 2020 1 commit
-
-
Sylvain Gugger authored
-
- 01 Oct, 2020 2 commits
-
-
Sylvain Gugger authored
* Fix seq2seq example test * Fix bad copy-paste * Also save the state
-
Sylvain Gugger authored
* Trainer should not modify its TrainingArguments * Trainer should not modify its TrainingArguments * Trainer should not modify its TrainingArguments * Add test of resumed training * Fixes * Non multiGPU test * Clean Trainer state * Add more to the state * Documentation * One last test * Make resume training test more complete * Unwanted changes
-
- 30 Sep, 2020 1 commit
-
-
Sylvain Gugger authored
* Remove config assumption in Trainer * Initialize for eval
-
- 29 Sep, 2020 2 commits
-
-
Teven authored
* GPT2 gradient checkpointing * find_unused_parameters removed if checkpointing * find_unused_parameters removed if checkpointing * Update src/transformers/configuration_gpt2.py Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com> * Added a test for generation with checkpointing * Update src/transformers/configuration_gpt2.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Sylvain Gugger authored
* Add automatic best model loading to Trainer * Some small fixes * Formatting
-
- 28 Sep, 2020 2 commits
-
-
Sylvain Gugger authored
-
Marcin Zab艂ocki authored
-
- 24 Sep, 2020 1 commit
-
-
Teven authored
* remote debugging * remote debugging * moved _store_flos call * moved _store_flos call * moved _store_flos call * removed debugging artefacts
-
- 23 Sep, 2020 1 commit
-
-
Wissam Antoun authored
* Fixed evaluation_strategy on epoch end bug move the evaluation script outside the the iteration loop * black formatting
-
- 22 Sep, 2020 3 commits
-
-
Chady Kamar authored
* Add dataloader_num_workers to TrainingArguments This argument is meant to be used to set the number of workers for the PyTorch DataLoader. * Pass num_workers argument on DataLoader init
-
Sylvain Gugger authored
* Add possibility to evaluate every epoch * Remove multitype arg * Remove needless import * Use a proper enum * Apply suggestions from @LysandreJik Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * One else and formatting Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
Sylvain Gugger authored
-
- 17 Sep, 2020 2 commits
-
-
Sohee Yang authored
* Move 'from transformers' statements to relative imports in some files * Add python prompt symbols in front of the example codes * Reformat the code * Add one missing space Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Sylvain Gugger authored
* Trainer accep multiple labels * Missing import * Fix dosctrings
-
- 15 Sep, 2020 2 commits
-
-
Yih-Dar authored
* fix ZeroDivisionError and epoch counting * Add test for num_train_epochs calculation in trainer.py * Remove @require_non_multigpu for test_num_train_epochs_in_training
-
Sylvain Gugger authored
* Allow multiple outputs * Formatting * Move the unwrapping before metrics * Fix typo * Add test for non-supported config options
-
- 11 Sep, 2020 1 commit
-
-
Sylvain Gugger authored
-
- 10 Sep, 2020 1 commit
-
-
Sylvain Gugger authored
* nlp -> datasets * More nlp -> datasets * Woopsie * More nlp -> datasets * One last
-
- 08 Sep, 2020 3 commits
-
-
Lysandre Debut authored
* Should check if `torch` is available * fixed samples_count error, distributed_concat arguments * style * Import torch at beginning of file Co-authored-by:TevenLeScao <teven.lescao@gmail.com>
-
Teven authored
* neFLOs calculation, logging, and reloading (#1) * testing distributed consecutive batches * fixed AttributeError from DataParallel * removed verbosity * rotate with use_mtime=True * removed print * fixed interaction with gradient accumulation * indent formatting * distributed neflo counting * fixed typo * fixed typo * mean distributed losses * exporting log history * moved a few functions * floating_point_ops clarification for transformers with parameter-reuse * code quality * double import * made flo estimation more task-agnostic * only logging flos if computed * code quality * unused import * Update src/transformers/trainer.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/modeling_utils.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Sylvain review * Update src/transformers/modeling_utils.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * black Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Stuart Mesham authored
* fixed trainer tr_loss memory leak * detached returned training loss from computation graph in the Trainer class' training_step() method * Revert "fixed trainer tr_loss memory leak" This reverts commit 47226e4e
-
- 03 Sep, 2020 1 commit
-
-
krfricke authored
* move wandb/comet logger init to train() to allow parallel logging * Setup wandb/comet loggers on first call to log()
-
- 31 Aug, 2020 4 commits
-
-
Sylvain Gugger authored
* Split the run_hp_search by backend * Unused import
-
krfricke authored
* Introduce HPO checkpointing for PBT * Moved checkpoint saving * Fixed checkpoint subdir pass * Fixed style * Enable/disable checkpointing, check conditions for various tune schedulers incl. PBT * Adjust number of GPUs to number of jobs * Avoid mode pickling in ray * Move hp search to integrations
-
Jin Young (Daniel) Sohn authored
* Only access loss tensor every logging_steps * tensor.item() was being called every step. This must not be done for XLA:TPU tensors as it's terrible for performance causing TPU<>CPU communication at each step. On RoBERTa MLM for example, it reduces step time by 30%, should be larger for smaller step time models/tasks. * Train batch size was not correct in case a user uses the `per_gpu_train_batch_size` flag * Avg reduce loss accross eval shards * Fix style (#6803) * t5 model should make decoder_attention_mask (#6800) * [s2s] Test hub configs in self-scheduled CI (#6809) * [s2s] round runtime in run_eval (#6798) * Pegasus finetune script: add --adafactor (#6811) * [bart] rename self-attention -> attention (#6708) * [tests] fix typos in inputs (#6818) * Fixed open in colab link (#6825) * Add model card for singbert lite. Update widget for singbert and singbert-large. (#6827) * BR_BERTo model card (#6793) * clearly indicate shuffle=False (#6312) * Clarify shuffle * clarify shuffle Co-authored-by:
Kevin Canwen Xu <canwenxu@126.com> * [s2s README] Add more dataset download instructions (#6737) * Style * Patch logging issue * Set default logging level to `WARNING` instead of `INFO` * TF Flaubert w/ pre-norm (#6841) * Dataset and DataCollator for BERT Next Sentence Prediction (NSP) task (#6644) * add datacollator and dataset for next sentence prediction task * bug fix (numbers of special tokens & truncate sequences) * bug fix (+ dict inputs support for data collator) * add padding for nsp data collator; renamed cached files to avoid conflict. * add test for nsp data collator * Style Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> Co-authored-by:
Lysandre <lysandre.debut@reseau.eseo.fr> * Fix in Adafactor docstrings (#6845) * Fix resuming training for Windows (#6847) * Only access loss tensor every logging_steps * tensor.item() was being called every step. This must not be done for XLA:TPU tensors as it's terrible for performance causing TPU<>CPU communication at each step. On RoBERTa MLM for example, it reduces step time by 30%, should be larger for smaller step time models/tasks. * Train batch size was not correct in case a user uses the `per_gpu_train_batch_size` flag * Avg reduce loss accross eval shards * comments Co-authored-by:
Sam Shleifer <sshleifer@gmail.com> Co-authored-by:
Stas Bekman <stas00@users.noreply.github.com> Co-authored-by:
Thomas Ashish Cherian <6967017+PandaWhoCodes@users.noreply.github.com> Co-authored-by:
Zane Lim <zyuanlim@gmail.com> Co-authored-by:
Rodolfo De Nadai <rdenadai@gmail.com> Co-authored-by:
xujiaze13 <37360975+xujiaze13@users.noreply.github.com> Co-authored-by:
Kevin Canwen Xu <canwenxu@126.com> Co-authored-by:
Lysandre <lysandre.debut@reseau.eseo.fr> Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> Co-authored-by:
Huang Lianzhe <hlz@pku.edu.cn> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Sylvain Gugger authored
-
- 26 Aug, 2020 1 commit
-
-
Lysandre Debut authored
* Logging * Style * hf_logging > utils.logging * Address @thomwolf's comments * Update test * Update src/transformers/benchmark/benchmark_utils.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Revert bad change Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
- 25 Aug, 2020 4 commits
-
-
Sylvain Gugger authored
-
Sylvain Gugger authored
* More tests to Trainer * Add warning in the doc
-
Sylvain Gugger authored
-
Sylvain Gugger authored
-
- 24 Aug, 2020 5 commits
-
-
Sylvain Gugger authored
-
Sylvain Gugger authored
-
Sylvain Gugger authored
* Add optuna hyperparameter search to Trainer * @julien-c suggestions Co-authored-by:
Julien Chaumond <chaumond@gmail.com> * Make compute_objective an arg function * Formatting * Rework to make it easier to add ray * Formatting * Initial support for Ray * Formatting * Polish and finalize * Add trial id to checkpoint with Ray * Smaller default * Use GPU in ray if available * Formatting * Fix test * Update install instruction Co-authored-by:
Richard Liaw <rliaw@berkeley.edu> * Address review comments * Formatting post-merge Co-authored-by:
Julien Chaumond <chaumond@gmail.com> Co-authored-by:
Richard Liaw <rliaw@berkeley.edu>
-
sgugger authored
-
Sylvain Gugger authored
* Don't reset the type of the dataset * Formatting * Update trainer.py Co-authored-by:Teven <teven.lescao@gmail.com>
-
- 20 Aug, 2020 3 commits
-
-
Sylvain Gugger authored
* Add a classmethod to easily build a Trainer from nlp dataset and metric * Fix docstrings * Split train/eval * Formatting * Log dropped columns + docs * Authorize callable activations * Poc for auto activation * Be framework-agnostic * Formatting * Remove class method * Remove unnecessary code
-
Sylvain Gugger authored
* Add tests to Trainer * Test if removing long breaks everything * Remove ugly hack * Fix distributed test * Use float for number of epochs
-
sgugger authored
-