Commits · 37e016331f6fee7a6b8c430057c08129da48c97f · chenpangpang / transformers

15 Sep, 2022 1 commit

Run `torchdynamo` tests (#19056) · 16242e1b

Yih-Dar authored Sep 15, 2022



* Enable torchdynamo tests

* make style
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

16242e1b

12 Aug, 2022 1 commit
- small change (#18584) · 1ccd2515
  Younes Belkada authored Aug 12, 2022
  
  1ccd2515
13 Jul, 2022 1 commit

Enable torchdynamo with torch_tensorrt(fx path) (#17765) · 7ea6ccc2

Wei authored Jul 13, 2022



* enable fx2trt

* Update perf_train_gpu_one.mdx

* Update perf_train_gpu_one.mdx

* add lib check

* update

* format

* update

* fix import check

* fix isort

* improve doc

* refactor ctx manager

* fix isort

* black format

* isort fix

* fix format

* update args

* update black

* cleanups

* Update perf_train_gpu_one.mdx

* code refactor

* code refactor to init

* remove redundancy

* isort

* replace self.args with args
Co-authored-by: Stas Bekman <stas@stason.org>

7ea6ccc2

12 Jul, 2022 1 commit

Enhance IPEX integration in Trainer (#18072) · b7d8bd37

jianan-gu authored Jul 12, 2022



* enhance ipex import

* refine codes

* refine style

* add link

* style
Co-authored-by: Stas Bekman <stas@stason.org>

b7d8bd37

01 Jul, 2022 1 commit
- higher atol to avoid flaky trainer test failure (#17979) · 664688b9
  Yih-Dar authored Jul 01, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  664688b9
30 Jun, 2022 1 commit
- skip some ipex tests until it works with torch 1.12 (#17964) · fe140464
  Yih-Dar authored Jun 30, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  fe140464
28 Jun, 2022 1 commit
- Fix `test_number_of_steps_in_training_with_ipex` (#17889) · f717d47f
  Yih-Dar authored Jun 28, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  f717d47f
21 Jun, 2022 1 commit

Prepare transformers for v0.8.0 huggingface-hub release (#17716) · 6a5272b2

Lysandre Debut authored Jun 21, 2022



* Prepare CI for v0.8.0

* pin hfh (revert before merge)

* Revert "pin hfh (revert before merge)"

This reverts commit a0103140e1c77b810ffcb735192968bc03be3e1f.

* Test rc3

* Test latest rc

* Unpin to the RC
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>

6a5272b2

20 Jun, 2022 1 commit
- deprecate is_torch_bf16_available (#17738) · a2d34b7c
  Stas Bekman authored Jun 20, 2022
```
* deprecate is_torch_bf16_available

* address suggestions
```
  a2d34b7c
14 Jun, 2022 1 commit

Extend Transformers Trainer Class to Enable PyTorch Torchscript for Inference (#17153) · 3b29c9fd

jianan-gu authored Jun 14, 2022



* add jit mode option and model wrap

* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* refine code

* Update src/transformers/trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* add ut and refine code

* code refine

* refine code

* add inference doc

* Update src/transformers/trainer.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update src/transformers/trainer.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* add cpu inference performance doc

* Update perf_infer_cpu.mdx

* Update perf_infer_cpu.mdx

* Update performance.mdx

* Update _toctree.yml

* refine jit func naming

* Update _toctree.yml

* Delete perf_infer_gpu_one.mdx

* Update perf_infer_cpu.mdx

* Update docs/source/en/perf_infer_cpu.mdx
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* add none check before jit

* Update docs/source/en/perf_infer_cpu.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/en/perf_infer_cpu.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

3b29c9fd

08 Jun, 2022 1 commit

Extend Transformers Trainer Class to Enable CPU AMP and Integrate Intel... · 34097b33

jianan-gu authored Jun 08, 2022


Extend Transformers Trainer Class to Enable CPU AMP and Integrate Intel Extension for PyTorch (#17138)

* init PR

* fix import ipex

* minor fix on bf16

* refine optimizer

* refine args notes

* refine code

* refine ipex optimize args

* refine half_precision_backend

* black format

* isort format

* isort format files

* flake8 format

* doc builder format

* refine codes

* remove jit and optim bits

* black preview format

* Update src/transformers/trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* refine code

* refine notes

* Update src/transformers/trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* code refine

* add ipex ut

* add performance cpu doc

* link to the cpu doc from main perf doc

* install ipex into CI's docker

* Update perf_train_cpu.mdx

* Update docs/source/en/perf_train_cpu.mdx
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update perf_train_cpu.mdx

* Update perf_train_cpu.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

34097b33

25 May, 2022 1 commit

Support compilation via Torchdynamo, AOT Autograd, NVFuser (#17308) · 897a8dd8

Animesh Jain authored May 25, 2022



* Support compilation via Torchdynamo, AOT Autograd, NVFuser

* Address comments

* Lint

* Stas comments - missing quality test

* Lintere

* Quality test

* Doc lint

* Reset CUDA peak mem

* Add CustomTrainer

* require a single gpu
Co-authored-by: Stas Bekman <stas@stason.org>

897a8dd8

18 May, 2022 1 commit
- [tests] fix copy-n-paste error (#17312) · 3601aa8f
  Stas Bekman authored May 18, 2022
```
* [tests] fix copy-n-paste error

* fix
```
  3601aa8f
16 May, 2022 1 commit
- Make TrainerHyperParameterSigOptIntegrationTest slow test (#17288) · 66b3e106
  Yih-Dar authored May 16, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  66b3e106
11 May, 2022 1 commit

Remove unnecessary columns for all dataset types in `Trainer` (#17166) · edcc66d2

Antoni Baum authored May 11, 2022

* Remove unneeded columns for IterableDataset

* Add test

* Update trainer tests

* Edit docstring

* Lint

* Apply feedback

* Apply feedback

edcc66d2

09 May, 2022 1 commit

Add the auto_find_batch_size capability from Accelerate into Trainer (#17068) · 2fbb2379

Zachary Mueller authored May 09, 2022


Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

- Adds auto_batch_size finder 
- Moves training loop to an inner training loop

2fbb2379

03 May, 2022 2 commits
- Fix RNG reload in resume training from epoch checkpoint (#17055) · 1c9fcd0e
  Sylvain Gugger authored May 03, 2022
```
* Fix RNG reload in resume training from epoch checkpoint

* Fix test
```
  1c9fcd0e
- Make Trainer compatible with sharded checkpoints (#17053) · a8fa2f91
  Sylvain Gugger authored May 03, 2022
```
* Make Trainer compatible with sharded checkpoints

* Add doc
```
  a8fa2f91
19 Apr, 2022 2 commits

Add support for bitsandbytes (#15622) · 3104036e

Manuel R. Ciosici authored Apr 19, 2022



* Add initial BNB integration

* fixup! Add initial BNB integration

* Add bnb test decorator

* Update Adamw8bit option name

* Use the full bnb package name

* Overide bnb for all embedding layers

* Fix package name

* Formatting

* Remove unnecessary import

* Update src/transformers/trainer.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Rename AdamwBNB optimizer option

* Add training test checking that bnb memory utilization is lower

* fix merge

* fix merge; fix + extend new test

* cleanup

* expand bnb

* move all require_* candidates to testing_utils.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>

3104036e

Some tests misusing assertTrue for comparisons fix (#16771) · a2392415

code-review-doctor authored Apr 19, 2022

* Fix issue avoid-misusing-assert-true found at https://codereview.doctor



* fix tests

* fix tf
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

a2392415

29 Mar, 2022 1 commit

Avoid accessing .dataset of a DataLoader in Trainer (#16451) · d7c8ce57

Sander Land authored Mar 29, 2022



* Avoid accessing .dataset of a dataloader

* style

* fix

* cleaning up, reverting some misunderstandings

* black

* add train_dataset argument to get_train_dataloader, and fix other instances of length checks

* flake8

* address comments

* fix bug

* cleanup

* add test

* Update tests/trainer/test_trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* under torch

* merge

* stylistic suggestion
Co-authored-by: Sander Land <sander@chatdesk.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

d7c8ce57

23 Mar, 2022 1 commit

Reorganize file utils (#16264) · 4975002d

Sylvain Gugger authored Mar 23, 2022

* Split file_utils in several submodules

* Fixes

* Add back more objects

* More fixes

* Who exactly decided to import that from there?

* Second suggestion to code with code review

* Revert wront move

* Fix imports

* Adapt all imports

* Adapt all imports everywhere

* Revert this import, will fix in a separate commit

4975002d

08 Mar, 2022 1 commit

Seed _get_train_sampler's generator with arg seed to improve reproducibility (#15961) · 5b7dcc73

David Hall authored Mar 08, 2022



* Seed get_train_sampler's generator with arg seed to improve reproducibility

and make the world_size<=1 code path more similar to the others

* move test file into trainer test explicitly

* dumb typo

* make style lint happy

* per discussion, switch to data_seed

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

5b7dcc73

23 Feb, 2022 1 commit

[Test refactor 1/5] Per-folder tests reorganization (#15725) · 29c10a41

Lysandre Debut authored Feb 23, 2022



* Per-folder tests reorganization
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Stas Bekman <stas@stason.org>

29c10a41

09 Feb, 2022 1 commit
- Fix tests hub failure (#15580) · 315e6740
  Sylvain Gugger authored Feb 09, 2022
```
* Expose hub test problem

* Fix tests
```
  315e6740
03 Feb, 2022 1 commit
- [WIP] Add preprocess_logits_for_metrics Trainer param (#15473) · f1a4c4ea
  davidleonfdez authored Feb 03, 2022
```
* Add preprocess_logits_for_metrics Trainer param

* Compute accuracy in LM examples

* Improve comments
```
  f1a4c4ea
02 Feb, 2022 1 commit

Add W&B backend for hyperparameter sweep (#14582) · c74f3d4c

Ayush Chaurasia authored Feb 03, 2022

# Add support for W&B hyperparameter sweep
This PR:
* allows using wandb for running hyperparameter search.
* The runs are visualized on W&B sweeps dashboard
* This supports runnning sweeps on parallel devices, all reporting to the same central dashboard.

### Usage
**To run new a hyperparameter search:**
```
trainer.hyperparameter_search(
    backend="wandb", 
    project="transformers_sweep", # name of the project
    n_trials=5,
    metric="eval/loss", # metric to be optimized, default 'eval/loss'. A warning is raised if the passed metric is not found
)
```
This outputs a sweep id. Eg. `my_project/sweep_id`

**To run sweeps on parallel devices:**
Just pass sweep id which you want to run parallel
```
trainer.hyperparameter_search(
    backend="wandb", 
    sweep_id = "my_project/sweep_id"
)
```

c74f3d4c

13 Jan, 2022 1 commit

Deprecates AdamW and adds `--optim` (#14744) · 7b83feb5

Manuel R. Ciosici authored Jan 13, 2022



* Add AdamW deprecation warning

* Add --optim to Trainer

* Update src/transformers/optimization.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/optimization.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/optimization.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/optimization.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/training_args.py

* fix style

* fix

* Regroup adamws together
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Change --adafactor to --optim adafactor

* Use Enum for optimizer values

* fixup! Change --adafactor to --optim adafactor

* fixup! Change --adafactor to --optim adafactor

* fixup! Change --adafactor to --optim adafactor

* fixup! Use Enum for optimizer values

* Improved documentation for --adafactor
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Add mention of no_deprecation_warning
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Rename OptimizerOptions to OptimizerNames

* Use choices for --optim

* Move optimizer selection code to a function and add a unit test

* Change optimizer names

* Rename method
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Rename method
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Remove TODO comment
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Rename variable
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Rename variable
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Rename function

* Rename variable

* Parameterize the tests for supported optimizers

* Refactor

* Attempt to make tests pass on CircleCI

* Add a test with apex

* rework to add apex to parameterized; add actual train test

* fix import when torch is not available

* fix optim_test_params when torch is not available

* fix optim_test_params when torch is not available

* re-org

* small re-org

* fix test_fused_adam_no_apex

* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Remove .value from OptimizerNames

* Rename optimizer strings s|--adam_|--adamw_|

* Also rename Enum options

* small fix

* Fix instantiation of OptimizerNames. Remove redundant test

* Use ExplicitEnum instead of Enum

* Add unit test with string optimizer

* Change optimizer default to string value
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>

7b83feb5

11 Jan, 2022 1 commit
- Add test to check reported training loss (#15096) · 9dc8fb2f
  Sylvain Gugger authored Jan 11, 2022
```
* Add test

* Add tests for the reported train loss
```
  9dc8fb2f
23 Dec, 2021 1 commit
- Fix failing GPU trainer tests (#14903) · f566c6e3
  Sylvain Gugger authored Dec 23, 2021
```
* Fix failing GPU trainer tests

* Remove print statements
```
  f566c6e3
16 Dec, 2021 1 commit
- Remove datasets requirement (#14795) · d194d639
  Lysandre Debut authored Dec 16, 2021
  
  d194d639
03 Dec, 2021 1 commit

[trainer] add tf32-mode control (#14606) · 71b1bf7e

Stas Bekman authored Dec 03, 2021



* [trainer] add --tf32 support

* it's pt>=.17

* it's pt>=.17

* flip the default to True

* add experimental note

* simplify logic

* style

* switch to 3-state logic

* doc

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* re-style code
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

71b1bf7e

01 Dec, 2021 1 commit

WIP: Support for Training with BF16 (#13207) · 70996a54

Jamie DeAntonis authored Nov 30, 2021



* started bf16 integration

* minor changes

* code now runs

* style

* lay foundation for bf16 testing

* lay foundation for bf16 testing

* start the tests

* better bf16 check

* style

* 2 separate checkers - one for bf16 support, another for bf16+autocast

* Update src/transformers/training_args.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* a couple of comment resolutions

* more comment resolutions

* resolved a small bug

* just some print statemtns

* added todo marking

* added a todo

* adjust for API change s/fast_dtype/dtype/

* fix style

* merge 2 bf16 util functions

* bf16 now does scaling too

* Add support for bfloat16

* Revert T5 layernorm to float32

This is based on the comment at https://github.com/huggingface/transformers/pull/14448/files#r752660929 and the PyTorch PR https://github.com/pytorch/pytorch/pull/66920

 .

* Add comment about conversion to float32 before returning the numpy data

* Add comment about AMP-bfloat16 incompatibility

* Fix formatting

* typo

* reformer / bf16

* cleanup

* require at least pt-1.10

* fix

* will deal with deepspeed separately

* cleanup

* revert

* cleanup

* fp16_full_eval and bf16_full_eval are separate modes

* proper deprecation

* cleanup

* test and fixes

* spelling

* cleanup

* add a note that this API is experimental
Co-authored-by: jamie <jamie@cortx.com>
Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: suriya <suriya@cortx.com>
Co-authored-by: Manuel R. Ciosici <manuelrciosici@gmail.com>

70996a54

18 Nov, 2021 1 commit
- Fix finite IterableDataset test on multiple GPUs (#14445) · 83ef8bca
  Sylvain Gugger authored Nov 18, 2021
  
  83ef8bca
16 Nov, 2021 1 commit

Avoid looping when data exhausted (#14413) · a33168aa

Valentin authored Nov 16, 2021

* stop training when a finite IterableDataset is exhausted

when using an iterable dataset num_epochs is set to
sys.maxsize to make sure all data is consumed
likewise we want to set max_steps high enough
but still stop when all data is consumed

(cherry picked from commit 6f0e1d6363153da9051e93acffe1cbab3a3f3b12)

* fix typo flase -> false

* add test for stopping training on exhausted finite iterable dataset

* remove redundant gradient_accumulation_steps

* run make style

reformat training_args docstring

a33168aa

02 Nov, 2021 1 commit
- Update Transformers to huggingface_hub >= 0.1.0 (#14251) · 558f8543
  Sylvain Gugger authored Nov 02, 2021
```
* Update Transformers to huggingface_hub >= 0.1.0

* Forgot to save...

* Style

* Fix test
```
  558f8543
29 Oct, 2021 1 commit

Remove n_ctx from configs (#14165) · 5b45422b

Thomas Wang authored Oct 29, 2021

* Remove n_ctx from configs

* Fix GPTJ and OpenAIGPT, both are acceptable breaking changes as there are no configs such that it breaks

* Remove unecessary n_positions from TFOpenAIGPT

5b45422b

23 Sep, 2021 1 commit

Add SigOpt HPO to transformers trainer api (#13572) · 6a3a197f

kding1 authored Sep 23, 2021



* add sigopt hpo to transformers.
Signed-off-by: Ding, Ke <ke.ding@intel.com>

* extend sigopt changes to test code and others..
Signed-off-by: Ding, Ke <ke.ding@intel.com>

* Style.

* fix style for sigopt integration.
Signed-off-by: Ding, Ke <ke.ding@intel.com>

* Add necessary information to run unittests on SigOpt.
Co-authored-by: Morgan Funtowicz <funtowiczmo@gmail.com>

6a3a197f

17 Sep, 2021 1 commit

[Trainer] Add nan/inf logging filter (#13619) · 1f9dcfc1

Patrick von Platen authored Sep 17, 2021

* finish

* add test

* push

* remove unnecessary code

* up

* correct test

* Update src/transformers/training_args.py

1f9dcfc1

14 Sep, 2021 1 commit

Push to hub when saving checkpoints (#13503) · 3081d386

Sylvain Gugger authored Sep 14, 2021

* Push to hub when saving checkpoints

* Add model card

* Revert partial model card

* Small fix for checkpoint

* Add tests

* Add documentation

* Fix tests

* Bump huggingface_hub

* Fix test

3081d386