Commits · 8fa0c956b34123d1f1406ae96d74c484976d0e3f · chenpangpang / transformers

05 Oct, 2020 1 commit
- Allow nested tensors in predicted logits (#7542) · 0270256b
  Sylvain Gugger authored Oct 05, 2020
  
  0270256b
01 Oct, 2020 2 commits

Fix seq2seq example test (#7518) · bdcc4b78
Sylvain Gugger authored Oct 01, 2020
```
* Fix seq2seq example test

* Fix bad copy-paste

* Also save the state
```
bdcc4b78

Clean the Trainer state (#7490) · 29baa8fa

Sylvain Gugger authored Oct 01, 2020

* Trainer should not modify its TrainingArguments

* Trainer should not modify its TrainingArguments

* Trainer should not modify its TrainingArguments

* Add test of resumed training

* Fixes

* Non multiGPU test

* Clean Trainer state

* Add more to the state

* Documentation

* One last test

* Make resume training test more complete

* Unwanted changes

29baa8fa

30 Sep, 2020 1 commit
- Remove config assumption in Trainer (#7464) · fdccf82e
  Sylvain Gugger authored Sep 30, 2020
```
* Remove config assumption in Trainer

* Initialize for eval
```
  fdccf82e
29 Sep, 2020 2 commits

Adding gradient checkpointing to GPT2 (#7446) · 9e9a1fb8

Teven authored Sep 29, 2020



* GPT2 gradient checkpointing

* find_unused_parameters removed if checkpointing

* find_unused_parameters removed if checkpointing

* Update src/transformers/configuration_gpt2.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Added a test for generation with checkpointing

* Update src/transformers/configuration_gpt2.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

9e9a1fb8

Add automatic best model loading to Trainer (#7431) · 52e8392b
Sylvain Gugger authored Sep 29, 2020
```
* Add automatic best model loading to Trainer

* Some small fixes

* Formatting
```
52e8392b

28 Sep, 2020 2 commits
- Catch PyTorch warning when saving/loading scheduler (#7401) · 7563d5a3
  Sylvain Gugger authored Sep 28, 2020
  
  7563d5a3
- Flos fix (#7384) · 4083a55a
  Marcin Zabłocki authored Sep 28, 2020
  
  4083a55a
24 Sep, 2020 1 commit

Fixing case in which `Trainer` hung while saving model in distributed training (#7365) · 7dfdf793

Teven authored Sep 24, 2020

* remote debugging

* remote debugging

* moved _store_flos call

* moved _store_flos call

* moved _store_flos call

* removed debugging artefacts

7dfdf793

23 Sep, 2020 1 commit

Fixed evaluation_strategy on epoch end bug (#7340) · 58405a52

Wissam Antoun authored Sep 23, 2020

* Fixed evaluation_strategy on epoch end bug

move the evaluation script outside the the iteration loop

* black formatting

58405a52

22 Sep, 2020 3 commits

Add num workers cli arg (#7322) · 17099ebd

Chady Kamar authored Sep 22, 2020

* Add dataloader_num_workers to TrainingArguments

This argument is meant to be used to set the
number of workers for the PyTorch DataLoader.

* Pass num_workers argument on DataLoader init

17099ebd

Add possibility to evaluate every epoch (#7302) · 89edf504

Sylvain Gugger authored Sep 22, 2020



* Add possibility to evaluate every epoch

* Remove multitype arg

* Remove needless import

* Use a proper enum

* Apply suggestions from @LysandreJik
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* One else and formatting
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

89edf504

Fix #7304 (#7305) · 244e1b5b
Sylvain Gugger authored Sep 22, 2020

244e1b5b

17 Sep, 2020 2 commits

Change to use relative imports in some files & Add python prompt symbols to example codes (#7202) · e643a297

Sohee Yang authored Sep 18, 2020



* Move 'from transformers' statements to relative imports in some files

* Add python prompt symbols in front of the example codes

* Reformat the code

* Add one missing space
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

e643a297

Trainer multi label (#7191) · 492bb6aa
Sylvain Gugger authored Sep 17, 2020
```
* Trainer accep multiple labels

* Missing import

* Fix dosctrings
```
492bb6aa

15 Sep, 2020 2 commits

fix ZeroDivisionError and epoch counting (#7125) · 4c62c602

Yih-Dar authored Sep 15, 2020

* fix ZeroDivisionError and epoch counting

* Add test for num_train_epochs calculation in trainer.py

* Remove @require_non_multigpu for test_num_train_epochs_in_training

4c62c602

Multi predictions trainer (#7126) · 7186ca62

Sylvain Gugger authored Sep 15, 2020

* Allow multiple outputs

* Formatting

* Move the unwrapping before metrics

* Fix typo

* Add test for non-supported config options

7186ca62

11 Sep, 2020 1 commit
- Compute loss method (#7074) · 4cbd50e6
  Sylvain Gugger authored Sep 11, 2020
  
  4cbd50e6
10 Sep, 2020 1 commit
- Fix CI with change of name of nlp (#7054) · 51448673
  Sylvain Gugger authored Sep 10, 2020
```
* nlp -> datasets

* More nlp -> datasets

* Woopsie

* More nlp -> datasets

* One last
```
  51448673
08 Sep, 2020 3 commits

Fixing FLOPS merge by checking if torch is available (#7013) · 5c4eb4b1

Lysandre Debut authored Sep 08, 2020



* Should check if `torch` is available

* fixed samples_count error, distributed_concat arguments

* style

* Import torch at beginning of file
Co-authored-by: TevenLeScao <teven.lescao@gmail.com>

5c4eb4b1

Floating-point operations logging in trainer (#6768) · 01d340ad

Teven authored Sep 08, 2020



* neFLOs calculation, logging, and reloading (#1)

* testing distributed consecutive batches

* fixed AttributeError from DataParallel

* removed verbosity

* rotate with use_mtime=True

* removed print

* fixed interaction with gradient accumulation

* indent formatting

* distributed neflo counting

* fixed typo

* fixed typo

* mean distributed losses

* exporting log history

* moved a few functions

* floating_point_ops clarification for transformers with parameter-reuse

* code quality

* double import

* made flo estimation more task-agnostic

* only logging flos if computed

* code quality

* unused import

* Update src/transformers/trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Sylvain review

* Update src/transformers/modeling_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* black
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

01d340ad

fixed trainer tr_loss memory leak (#6999) · 25afb4ea

Stuart Mesham authored Sep 08, 2020

* fixed trainer tr_loss memory leak

* detached returned training loss from computation graph in the Trainer class' training_step() method

* Revert "fixed trainer tr_loss memory leak"

This reverts commit 47226e4e

25afb4ea

03 Sep, 2020 1 commit
- move wandb/comet logger init to train() to allow parallel logging (#6850) · 0f360d3d
  krfricke authored Sep 03, 2020
```
* move wandb/comet logger init to train() to allow parallel logging

* Setup wandb/comet loggers on first call to log()
```
  0f360d3d
31 Aug, 2020 4 commits

Split hp search methods (#6857) · a59bcefb
Sylvain Gugger authored Aug 31, 2020
```
* Split the run_hp_search by backend

* Unused import
```
a59bcefb

Add checkpointing to Ray Tune HPO (#6747) · 23f9611c

krfricke authored Aug 31, 2020

* Introduce HPO checkpointing for PBT

* Moved checkpoint saving

* Fixed checkpoint subdir pass

* Fixed style

* Enable/disable checkpointing, check conditions for various tune schedulers incl. PBT

* Adjust number of GPUs to number of jobs

* Avoid mode pickling in ray

* Move hp search to integrations

23f9611c

Only access loss tensor every logging_steps (#6802) · 02d09c8f

Jin Young (Daniel) Sohn authored Aug 31, 2020



* Only access loss tensor every logging_steps

* tensor.item() was being called every step. This must not be done
for XLA:TPU tensors as it's terrible for performance causing TPU<>CPU
communication at each step. On RoBERTa MLM for example, it reduces step
time by 30%, should be larger for smaller step time models/tasks.
* Train batch size was not correct in case a user uses the
`per_gpu_train_batch_size` flag
* Avg reduce loss accross eval shards

* Fix style (#6803)

* t5 model should make decoder_attention_mask (#6800)

* [s2s] Test hub configs in self-scheduled CI (#6809)

* [s2s] round runtime in run_eval (#6798)

* Pegasus finetune script: add --adafactor (#6811)

* [bart] rename self-attention -> attention (#6708)

* [tests] fix typos in inputs (#6818)

* Fixed open in colab link (#6825)

* Add model card for singbert lite. Update widget for singbert and singbert-large. (#6827)

* BR_BERTo model card (#6793)

* clearly indicate shuffle=False (#6312)

* Clarify shuffle

* clarify shuffle
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>

* [s2s README] Add more dataset download instructions (#6737)

* Style

* Patch logging issue

* Set default logging level to `WARNING` instead of `INFO`

* TF Flaubert w/ pre-norm (#6841)

* Dataset and DataCollator for BERT Next Sentence Prediction (NSP) task (#6644)

* add datacollator and dataset for next sentence prediction task

* bug fix (numbers of special tokens & truncate sequences)

* bug fix (+ dict inputs support for data collator)

* add padding for nsp data collator; renamed cached files to avoid conflict.

* add test for nsp data collator

* Style
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

* Fix in Adafactor docstrings (#6845)

* Fix resuming training for Windows (#6847)

* Only access loss tensor every logging_steps

* tensor.item() was being called every step. This must not be done
for XLA:TPU tensors as it's terrible for performance causing TPU<>CPU
communication at each step. On RoBERTa MLM for example, it reduces step
time by 30%, should be larger for smaller step time models/tasks.
* Train batch size was not correct in case a user uses the
`per_gpu_train_batch_size` flag
* Avg reduce loss accross eval shards

* comments
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Thomas Ashish Cherian <6967017+PandaWhoCodes@users.noreply.github.com>
Co-authored-by: Zane Lim <zyuanlim@gmail.com>
Co-authored-by: Rodolfo De Nadai <rdenadai@gmail.com>
Co-authored-by: xujiaze13 <37360975+xujiaze13@users.noreply.github.com>
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Huang Lianzhe <hlz@pku.edu.cn>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

02d09c8f

Fix resuming training for Windows (#6847) · c48546c7
Sylvain Gugger authored Aug 31, 2020

c48546c7

26 Aug, 2020 1 commit

Centralize logging (#6434) · 77abd1e7

Lysandre Debut authored Aug 26, 2020



* Logging

* Style

* hf_logging > utils.logging

* Address @thomwolf's comments

* Update test

* Update src/transformers/benchmark/benchmark_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Revert bad change
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

77abd1e7

25 Aug, 2020 4 commits
- Add tokenizer to Trainer (#6689) · 124c3d6a
  Sylvain Gugger authored Aug 25, 2020
  
  124c3d6a
- More tests to Trainer (#6699) · abc02021
  Sylvain Gugger authored Aug 25, 2020
```
* More tests to Trainer

* Add warning in the doc
```
  abc02021
- Use generators tqdm progressbars (#6696) · f5bad031
  Sylvain Gugger authored Aug 25, 2020
  
  f5bad031
- Fix hyperparameter_search doc (#6695) · d20cbb88
  Sylvain Gugger authored Aug 24, 2020
  
  d20cbb88
24 Aug, 2020 5 commits

Move unused args to kwargs (#6694) · 6b4c6176
Sylvain Gugger authored Aug 24, 2020

6b4c6176
Lat fix for Ray HP search (#6691) · 8f98faf9
Sylvain Gugger authored Aug 24, 2020

8f98faf9

Add hyperparameter search to Trainer (#6576) · 3a7fdd3f

Sylvain Gugger authored Aug 24, 2020



* Add optuna hyperparameter search to Trainer

* @julien-c suggestions
Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Make compute_objective an arg function

* Formatting

* Rework to make it easier to add ray

* Formatting

* Initial support for Ray

* Formatting

* Polish and finalize

* Add trial id to checkpoint with Ray

* Smaller default

* Use GPU in ray if available

* Formatting

* Fix test

* Update install instruction
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Address review comments

* Formatting post-merge
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

3a7fdd3f

Missing commit · 0a850d21
sgugger authored Aug 24, 2020

0a850d21

Don't reset the dataset type + plug for rm unused columns (#6683) · b30879fe

Sylvain Gugger authored Aug 24, 2020



* Don't reset the type of the dataset

* Formatting

* Update trainer.py
Co-authored-by: Teven <teven.lescao@gmail.com>

b30879fe

20 Aug, 2020 3 commits

Trainer automatically drops unused columns in nlp datasets (#6449) · e5f45227

Sylvain Gugger authored Aug 20, 2020

* Add a classmethod to easily build a Trainer from nlp dataset and metric

* Fix docstrings

* Split train/eval

* Formatting

* Log dropped columns + docs

* Authorize callable activations

* Poc for auto activation

* Be framework-agnostic

* Formatting

* Remove class method

* Remove unnecessary code

e5f45227

Add tests to Trainer (#6605) · 573bdb0a

Sylvain Gugger authored Aug 20, 2020

* Add tests to Trainer

* Test if removing long breaks everything

* Remove ugly hack

* Fix distributed test

* Use float for number of epochs

573bdb0a

Fix CI · b3e54698
sgugger authored Aug 20, 2020

b3e54698