Commits · c33f6046c3dab8f41bedf893404e6469dea3bce8 · chenpangpang / transformers

09 May, 2022 1 commit

Add the auto_find_batch_size capability from Accelerate into Trainer (#17068) · 2fbb2379

Zachary Mueller authored May 09, 2022


Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

- Adds auto_batch_size finder 
- Moves training loop to an inner training loop

2fbb2379

03 May, 2022 2 commits
- Fix RNG reload in resume training from epoch checkpoint (#17055) · 1c9fcd0e
  Sylvain Gugger authored May 03, 2022
```
* Fix RNG reload in resume training from epoch checkpoint

* Fix test
```
  1c9fcd0e
- Make Trainer compatible with sharded checkpoints (#17053) · a8fa2f91
  Sylvain Gugger authored May 03, 2022
```
* Make Trainer compatible with sharded checkpoints

* Add doc
```
  a8fa2f91
19 Apr, 2022 2 commits

Add support for bitsandbytes (#15622) · 3104036e

Manuel R. Ciosici authored Apr 19, 2022



* Add initial BNB integration

* fixup! Add initial BNB integration

* Add bnb test decorator

* Update Adamw8bit option name

* Use the full bnb package name

* Overide bnb for all embedding layers

* Fix package name

* Formatting

* Remove unnecessary import

* Update src/transformers/trainer.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Rename AdamwBNB optimizer option

* Add training test checking that bnb memory utilization is lower

* fix merge

* fix merge; fix + extend new test

* cleanup

* expand bnb

* move all require_* candidates to testing_utils.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>

3104036e

Some tests misusing assertTrue for comparisons fix (#16771) · a2392415

code-review-doctor authored Apr 19, 2022

* Fix issue avoid-misusing-assert-true found at https://codereview.doctor



* fix tests

* fix tf
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

a2392415

29 Mar, 2022 1 commit

Avoid accessing .dataset of a DataLoader in Trainer (#16451) · d7c8ce57

Sander Land authored Mar 29, 2022



* Avoid accessing .dataset of a dataloader

* style

* fix

* cleaning up, reverting some misunderstandings

* black

* add train_dataset argument to get_train_dataloader, and fix other instances of length checks

* flake8

* address comments

* fix bug

* cleanup

* add test

* Update tests/trainer/test_trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* under torch

* merge

* stylistic suggestion
Co-authored-by: Sander Land <sander@chatdesk.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

d7c8ce57

23 Mar, 2022 1 commit

Reorganize file utils (#16264) · 4975002d

Sylvain Gugger authored Mar 23, 2022

* Split file_utils in several submodules

* Fixes

* Add back more objects

* More fixes

* Who exactly decided to import that from there?

* Second suggestion to code with code review

* Revert wront move

* Fix imports

* Adapt all imports

* Adapt all imports everywhere

* Revert this import, will fix in a separate commit

4975002d

08 Mar, 2022 1 commit

Seed _get_train_sampler's generator with arg seed to improve reproducibility (#15961) · 5b7dcc73

David Hall authored Mar 08, 2022



* Seed get_train_sampler's generator with arg seed to improve reproducibility

and make the world_size<=1 code path more similar to the others

* move test file into trainer test explicitly

* dumb typo

* make style lint happy

* per discussion, switch to data_seed

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

5b7dcc73

23 Feb, 2022 1 commit

[Test refactor 1/5] Per-folder tests reorganization (#15725) · 29c10a41

Lysandre Debut authored Feb 23, 2022



* Per-folder tests reorganization
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Stas Bekman <stas@stason.org>

29c10a41

09 Feb, 2022 1 commit
- Fix tests hub failure (#15580) · 315e6740
  Sylvain Gugger authored Feb 09, 2022
```
* Expose hub test problem

* Fix tests
```
  315e6740
03 Feb, 2022 1 commit
- [WIP] Add preprocess_logits_for_metrics Trainer param (#15473) · f1a4c4ea
  davidleonfdez authored Feb 03, 2022
```
* Add preprocess_logits_for_metrics Trainer param

* Compute accuracy in LM examples

* Improve comments
```
  f1a4c4ea
02 Feb, 2022 1 commit

Add W&B backend for hyperparameter sweep (#14582) · c74f3d4c

Ayush Chaurasia authored Feb 03, 2022

# Add support for W&B hyperparameter sweep
This PR:
* allows using wandb for running hyperparameter search.
* The runs are visualized on W&B sweeps dashboard
* This supports runnning sweeps on parallel devices, all reporting to the same central dashboard.

### Usage
**To run new a hyperparameter search:**
```
trainer.hyperparameter_search(
    backend="wandb", 
    project="transformers_sweep", # name of the project
    n_trials=5,
    metric="eval/loss", # metric to be optimized, default 'eval/loss'. A warning is raised if the passed metric is not found
)
```
This outputs a sweep id. Eg. `my_project/sweep_id`

**To run sweeps on parallel devices:**
Just pass sweep id which you want to run parallel
```
trainer.hyperparameter_search(
    backend="wandb", 
    sweep_id = "my_project/sweep_id"
)
```

c74f3d4c

13 Jan, 2022 1 commit

Deprecates AdamW and adds `--optim` (#14744) · 7b83feb5

Manuel R. Ciosici authored Jan 13, 2022



* Add AdamW deprecation warning

* Add --optim to Trainer

* Update src/transformers/optimization.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/optimization.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/optimization.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/optimization.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/training_args.py

* fix style

* fix

* Regroup adamws together
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Change --adafactor to --optim adafactor

* Use Enum for optimizer values

* fixup! Change --adafactor to --optim adafactor

* fixup! Change --adafactor to --optim adafactor

* fixup! Change --adafactor to --optim adafactor

* fixup! Use Enum for optimizer values

* Improved documentation for --adafactor
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Add mention of no_deprecation_warning
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Rename OptimizerOptions to OptimizerNames

* Use choices for --optim

* Move optimizer selection code to a function and add a unit test

* Change optimizer names

* Rename method
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Rename method
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Remove TODO comment
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Rename variable
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Rename variable
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Rename function

* Rename variable

* Parameterize the tests for supported optimizers

* Refactor

* Attempt to make tests pass on CircleCI

* Add a test with apex

* rework to add apex to parameterized; add actual train test

* fix import when torch is not available

* fix optim_test_params when torch is not available

* fix optim_test_params when torch is not available

* re-org

* small re-org

* fix test_fused_adam_no_apex

* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Remove .value from OptimizerNames

* Rename optimizer strings s|--adam_|--adamw_|

* Also rename Enum options

* small fix

* Fix instantiation of OptimizerNames. Remove redundant test

* Use ExplicitEnum instead of Enum

* Add unit test with string optimizer

* Change optimizer default to string value
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>

7b83feb5

11 Jan, 2022 1 commit
- Add test to check reported training loss (#15096) · 9dc8fb2f
  Sylvain Gugger authored Jan 11, 2022
```
* Add test

* Add tests for the reported train loss
```
  9dc8fb2f
23 Dec, 2021 1 commit
- Fix failing GPU trainer tests (#14903) · f566c6e3
  Sylvain Gugger authored Dec 23, 2021
```
* Fix failing GPU trainer tests

* Remove print statements
```
  f566c6e3
16 Dec, 2021 1 commit
- Remove datasets requirement (#14795) · d194d639
  Lysandre Debut authored Dec 16, 2021
  
  d194d639
03 Dec, 2021 1 commit

[trainer] add tf32-mode control (#14606) · 71b1bf7e

Stas Bekman authored Dec 03, 2021



* [trainer] add --tf32 support

* it's pt>=.17

* it's pt>=.17

* flip the default to True

* add experimental note

* simplify logic

* style

* switch to 3-state logic

* doc

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* re-style code
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

71b1bf7e

01 Dec, 2021 1 commit

WIP: Support for Training with BF16 (#13207) · 70996a54

Jamie DeAntonis authored Nov 30, 2021



* started bf16 integration

* minor changes

* code now runs

* style

* lay foundation for bf16 testing

* lay foundation for bf16 testing

* start the tests

* better bf16 check

* style

* 2 separate checkers - one for bf16 support, another for bf16+autocast

* Update src/transformers/training_args.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* a couple of comment resolutions

* more comment resolutions

* resolved a small bug

* just some print statemtns

* added todo marking

* added a todo

* adjust for API change s/fast_dtype/dtype/

* fix style

* merge 2 bf16 util functions

* bf16 now does scaling too

* Add support for bfloat16

* Revert T5 layernorm to float32

This is based on the comment at https://github.com/huggingface/transformers/pull/14448/files#r752660929 and the PyTorch PR https://github.com/pytorch/pytorch/pull/66920

 .

* Add comment about conversion to float32 before returning the numpy data

* Add comment about AMP-bfloat16 incompatibility

* Fix formatting

* typo

* reformer / bf16

* cleanup

* require at least pt-1.10

* fix

* will deal with deepspeed separately

* cleanup

* revert

* cleanup

* fp16_full_eval and bf16_full_eval are separate modes

* proper deprecation

* cleanup

* test and fixes

* spelling

* cleanup

* add a note that this API is experimental
Co-authored-by: jamie <jamie@cortx.com>
Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: suriya <suriya@cortx.com>
Co-authored-by: Manuel R. Ciosici <manuelrciosici@gmail.com>

70996a54

18 Nov, 2021 1 commit
- Fix finite IterableDataset test on multiple GPUs (#14445) · 83ef8bca
  Sylvain Gugger authored Nov 18, 2021
  
  83ef8bca
16 Nov, 2021 1 commit

Avoid looping when data exhausted (#14413) · a33168aa

Valentin authored Nov 16, 2021

* stop training when a finite IterableDataset is exhausted

when using an iterable dataset num_epochs is set to
sys.maxsize to make sure all data is consumed
likewise we want to set max_steps high enough
but still stop when all data is consumed

(cherry picked from commit 6f0e1d6363153da9051e93acffe1cbab3a3f3b12)

* fix typo flase -> false

* add test for stopping training on exhausted finite iterable dataset

* remove redundant gradient_accumulation_steps

* run make style

reformat training_args docstring

a33168aa

02 Nov, 2021 1 commit
- Update Transformers to huggingface_hub >= 0.1.0 (#14251) · 558f8543
  Sylvain Gugger authored Nov 02, 2021
```
* Update Transformers to huggingface_hub >= 0.1.0

* Forgot to save...

* Style

* Fix test
```
  558f8543
29 Oct, 2021 1 commit

Remove n_ctx from configs (#14165) · 5b45422b

Thomas Wang authored Oct 29, 2021

* Remove n_ctx from configs

* Fix GPTJ and OpenAIGPT, both are acceptable breaking changes as there are no configs such that it breaks

* Remove unecessary n_positions from TFOpenAIGPT

5b45422b

23 Sep, 2021 1 commit

Add SigOpt HPO to transformers trainer api (#13572) · 6a3a197f

kding1 authored Sep 23, 2021



* add sigopt hpo to transformers.
Signed-off-by: Ding, Ke <ke.ding@intel.com>

* extend sigopt changes to test code and others..
Signed-off-by: Ding, Ke <ke.ding@intel.com>

* Style.

* fix style for sigopt integration.
Signed-off-by: Ding, Ke <ke.ding@intel.com>

* Add necessary information to run unittests on SigOpt.
Co-authored-by: Morgan Funtowicz <funtowiczmo@gmail.com>

6a3a197f

17 Sep, 2021 1 commit

[Trainer] Add nan/inf logging filter (#13619) · 1f9dcfc1

Patrick von Platen authored Sep 17, 2021

* finish

* add test

* push

* remove unnecessary code

* up

* correct test

* Update src/transformers/training_args.py

1f9dcfc1

14 Sep, 2021 1 commit

Push to hub when saving checkpoints (#13503) · 3081d386

Sylvain Gugger authored Sep 14, 2021

* Push to hub when saving checkpoints

* Add model card

* Revert partial model card

* Small fix for checkpoint

* Add tests

* Add documentation

* Fix tests

* Bump huggingface_hub

* Fix test

3081d386

09 Sep, 2021 1 commit
- Refactor internals for Trainer push_to_hub (#13486) · e59d4d01
  Sylvain Gugger authored Sep 09, 2021
  
  e59d4d01
03 Aug, 2021 1 commit

fix `Trainer.train(resume_from_checkpoint=False)` is causing an exception (#12981) · b7439675

Philip May authored Aug 03, 2021



* fix #12970

* Update tests/test_trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/test_trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/test_trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* remove unnecessary issue link

* fix test formatting
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

b7439675

19 Jul, 2021 1 commit
- Enforce eval and save strategies are compatible when --load_best_model_at_end (#12786) · 0118ef89
  Sylvain Gugger authored Jul 19, 2021
```
* Enforce eval and save strategies are compatible when --load_best_model_at_end

* Update doc

* Fix typos

* Fix tests
```
  0118ef89
23 Jun, 2021 1 commit

Clean push to hub API (#12187) · 53c60bab

Sylvain Gugger authored Jun 23, 2021



* Clean push to hub API

* Create working dir if it does not exist

* Different tweak

* New API + all models + test Flax

* Adds the Trainer clean up

* Update src/transformers/file_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments

* (nit) output types

* No need to set clone_from when folder exists

* Update src/transformers/trainer.py
Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Add generated_from_trainer tag

* Update to new version

* Fixes
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

53c60bab

22 Jun, 2021 3 commits

[trainer] 2 bug fixes and a rename (#12309) · ebe54135
Stas Bekman authored Jun 22, 2021
```
* bug fixes and a rename

* add extended DDP test
```
ebe54135
[tests] multiple improvements (#12294) · 0d97ba8a
Stas Bekman authored Jun 21, 2021
```
* [tests] multiple improvements

* cleanup

* style

* todo to investigate

* fix
```
0d97ba8a

[trainer + examples] set log level from CLI (#12276) · dad414d5

Stas Bekman authored Jun 21, 2021



* set log level from CLI

* add log_level_replica + test + extended docs

* cleanup

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* rename datasets objects to allow datasets module

* improve the doc

* style

* doc improve
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

dad414d5

21 Jun, 2021 1 commit
- reset report_to to none, avoid deprecation warning (#12293) · a4ed074d
  Stas Bekman authored Jun 21, 2021
  
  a4ed074d
15 Jun, 2021 1 commit
- Ray Tune Integration Updates (#12134) · b9d66f4c
  Amog Kamsetty authored Jun 15, 2021
```
* fix

* fixes

* add back to scheduled tests

* formatting

* Update integrations.py
```
  b9d66f4c
14 Jun, 2021 2 commits

[style] consistent nn. and nn.functional: part 3 `tests` (#12155) · 372ab9cd
Stas Bekman authored Jun 14, 2021
```
* consistent nn. and nn.functional: p3 templates

* restore
```
372ab9cd

[optim] implement AdafactorSchedule (#12123) · ff7c8168

Stas Bekman authored Jun 14, 2021



* implement AdafactorSchedule

* typo

* fix

* Update src/transformers/optimization.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

ff7c8168

09 Jun, 2021 1 commit
- [test] support more than 2 gpus (#12074) · b1a8aa94
  Stas Bekman authored Jun 09, 2021
```
* support more than 2 gpus

* style
```
  b1a8aa94
01 Jun, 2021 1 commit

[Trainer] add train loss and flops metrics reports (#11980) · 4ba203d9

Stas Bekman authored Jun 01, 2021

* add train loss and flops metrics reports

* consistency

* add train_loss to skip keys

* restore on_train_end call timing

4ba203d9

25 May, 2021 1 commit
- Enable memory metrics in tests that need it (#11859) · 6da129cb
  Lysandre Debut authored May 25, 2021
  
  6da129cb
24 May, 2021 1 commit

[Trainer] Report both steps and num samples per second (#11818) · afe479ad

Sylvain Gugger authored May 24, 2021



* [Trainer] Report both steps and num samples per second

* Fix batch number

* Update src/transformers/trainer_utils.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Address review comments
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

afe479ad