Commits · b66c5ab20c8bb08d52cb840382498f936ea8da03 · chenpangpang / transformers

07 Dec, 2021 1 commit

[deepspeed] fix --load_best_model_at_end (#14652) · b66c5ab2

Stas Bekman authored Dec 06, 2021

* [deepspeed] fix load_best_model_at_end

* try with pull_request_target

* revert: try with pull_request_target

* style

* add test

* cleanup

b66c5ab2

06 Dec, 2021 1 commit
- Fix syntax for class references (#14644) · e513c16e
  Sylvain Gugger authored Dec 06, 2021
  
  e513c16e
01 Dec, 2021 2 commits

fix autocast for older pytorch · 14cc50d0
Stas Bekman authored Dec 01, 2021

14cc50d0

WIP: Support for Training with BF16 (#13207) · 70996a54

Jamie DeAntonis authored Nov 30, 2021



* started bf16 integration

* minor changes

* code now runs

* style

* lay foundation for bf16 testing

* lay foundation for bf16 testing

* start the tests

* better bf16 check

* style

* 2 separate checkers - one for bf16 support, another for bf16+autocast

* Update src/transformers/training_args.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* a couple of comment resolutions

* more comment resolutions

* resolved a small bug

* just some print statemtns

* added todo marking

* added a todo

* adjust for API change s/fast_dtype/dtype/

* fix style

* merge 2 bf16 util functions

* bf16 now does scaling too

* Add support for bfloat16

* Revert T5 layernorm to float32

This is based on the comment at https://github.com/huggingface/transformers/pull/14448/files#r752660929 and the PyTorch PR https://github.com/pytorch/pytorch/pull/66920

 .

* Add comment about conversion to float32 before returning the numpy data

* Add comment about AMP-bfloat16 incompatibility

* Fix formatting

* typo

* reformer / bf16

* cleanup

* require at least pt-1.10

* fix

* will deal with deepspeed separately

* cleanup

* revert

* cleanup

* fp16_full_eval and bf16_full_eval are separate modes

* proper deprecation

* cleanup

* test and fixes

* spelling

* cleanup

* add a note that this API is experimental
Co-authored-by: jamie <jamie@cortx.com>
Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: suriya <suriya@cortx.com>
Co-authored-by: Manuel R. Ciosici <manuelrciosici@gmail.com>

70996a54

23 Nov, 2021 1 commit

[deepspeed] zero inference (#14253) · 956a4831

Stas Bekman authored Nov 23, 2021



* [deepspeed] zero inference

* only z3 makes sense for inference

* fix and style

* docs

* rework

* fix test

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* responding to suggestions
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

956a4831

16 Nov, 2021 1 commit

Avoid looping when data exhausted (#14413) · a33168aa

Valentin authored Nov 16, 2021

* stop training when a finite IterableDataset is exhausted

when using an iterable dataset num_epochs is set to
sys.maxsize to make sure all data is consumed
likewise we want to set max_steps high enough
but still stop when all data is consumed

(cherry picked from commit 6f0e1d6363153da9051e93acffe1cbab3a3f3b12)

* fix typo flase -> false

* add test for stopping training on exhausted finite iterable dataset

* remove redundant gradient_accumulation_steps

* run make style

reformat training_args docstring

a33168aa

05 Nov, 2021 1 commit
- Add new LFS prune API (#14294) · 08a5f575
  Sylvain Gugger authored Nov 05, 2021
  
  08a5f575
01 Nov, 2021 1 commit
- Fix a writing issue in the comments of trainer.py (#14202) · 70d57118
  mathor authored Nov 01, 2021
  
  70d57118
20 Oct, 2021 2 commits
- Fix missing autocast() in Trainer.prediction_step() (#14075) · 0106826a
  Kwanghee Choi authored Oct 20, 2021
```
Co-authored-by: jonas <jonas@hpcnt.com>
```
  0106826a
- Trainer._load_rng_state() path fix (#14069) (#14071) · 3fefa292
  Robert Stone authored Oct 19, 2021
  
  3fefa292
11 Oct, 2021 2 commits
- Make username optional in hub_model_id (#13940) · 32634bce
  Sylvain Gugger authored Oct 11, 2021
  
  32634bce
- [Gradient checkpoining] Correct disabling `find_unused_parameters` in Trainer... · dca67968
  Patrick von Platen authored Oct 11, 2021
```
[Gradient checkpoining] Correct disabling `find_unused_parameters` in Trainer when gradient checkpointing is enabled (#13961)

* up

* correct test
```
  dca67968
07 Oct, 2021 1 commit
- Add missing whitespace to multiline strings (#13916) · 57420b10
  Alex Hedges authored Oct 07, 2021
  
  57420b10
06 Oct, 2021 3 commits
- Fix nan-loss condition (#13911) · 5d390e9e
  Anton Lozhkov authored Oct 06, 2021
  
  5d390e9e
- Fix hp search for non sigopt backends (#13897) · 8f2c07d3
  Sylvain Gugger authored Oct 06, 2021
  
  8f2c07d3
- Fix trainer logging_nan_inf_filter in torch_xla mode (#13896) · 77770ec7
  Yanming Wang authored Oct 06, 2021
```
* Fix logging_nan_inf_filter in torch_xla mode

* Update src/transformers/trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix format
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
```
  77770ec7
05 Oct, 2021 1 commit
- Allow dataset to be an optional argument for (Distributed)LengthGroupedSampler (#13820) · 1b74af76
  Zhaofeng Wu authored Oct 05, 2021
```
* Allow dataset to be an optional argument for (Distributed)LengthGroupedSampler

* Fix
```
  1b74af76
27 Sep, 2021 1 commit

Fix loss computation in Trainer (#13760) · 3ffd18a6

Sylvain Gugger authored Sep 27, 2021


Co-authored-by: quantitative-technologies <james.hirschorn@quantitative-technologies.com>
Co-authored-by: quantitative-technologies <james.hirschorn@quantitative-technologies.com>

3ffd18a6

26 Sep, 2021 1 commit

[Trainer] Make sure shown loss in distributed training is correctly averaged... · 91df4551

Patrick von Platen authored Sep 26, 2021

[Trainer] Make sure shown loss in distributed training is correctly averaged over all workers (#13681)

* push

* improve tr loss gather

91df4551

23 Sep, 2021 2 commits

Add cpu distributed fine-tuning support for transformers Trainer API (#13574) · 8632a60d

kding1 authored Sep 23, 2021



* update trainer with cpu distributed fine-tuning support.
Signed-off-by: Ding, Ke <ke.ding@intel.com>

* Style.

* refinement on cpu dist training check.
Signed-off-by: Ding, Ke <ke.ding@intel.com>

* style.
Signed-off-by: Ding, Ke <ke.ding@intel.com>

* Test over private field not public one.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Morgan Funtowicz <funtowiczmo@gmail.com>
Co-authored-by: Funtowicz Morgan <mfuntowicz@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

8632a60d

Add SigOpt HPO to transformers trainer api (#13572) · 6a3a197f

kding1 authored Sep 23, 2021



* add sigopt hpo to transformers.
Signed-off-by: Ding, Ke <ke.ding@intel.com>

* extend sigopt changes to test code and others..
Signed-off-by: Ding, Ke <ke.ding@intel.com>

* Style.

* fix style for sigopt integration.
Signed-off-by: Ding, Ke <ke.ding@intel.com>

* Add necessary information to run unittests on SigOpt.
Co-authored-by: Morgan Funtowicz <funtowiczmo@gmail.com>

6a3a197f

22 Sep, 2021 1 commit

Make gradient_checkpointing a training argument (#13657) · 27d46397

Sylvain Gugger authored Sep 22, 2021



* Make gradient_checkpointing a training argument

* Update src/transformers/modeling_utils.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update src/transformers/configuration_utils.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Fix tests

* Style

* document Gradient Checkpointing as a performance feature

* Small rename

* PoC for not using the config

* Adapt BC to new PoC

* Forgot to save

* Rollout changes to all other models

* Fix typo
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>

27d46397

17 Sep, 2021 1 commit

[Trainer] Add nan/inf logging filter (#13619) · 1f9dcfc1

Patrick von Platen authored Sep 17, 2021

* finish

* add test

* push

* remove unnecessary code

* up

* correct test

* Update src/transformers/training_args.py

1f9dcfc1

14 Sep, 2021 2 commits

separate model card git push from the rest (#13514) · 054b6013
elishowk authored Sep 14, 2021
```
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
```
054b6013

Push to hub when saving checkpoints (#13503) · 3081d386

Sylvain Gugger authored Sep 14, 2021

* Push to hub when saving checkpoints

* Add model card

* Revert partial model card

* Small fix for checkpoint

* Add tests

* Add documentation

* Fix tests

* Bump huggingface_hub

* Fix test

3081d386

09 Sep, 2021 1 commit
- Refactor internals for Trainer push_to_hub (#13486) · e59d4d01
  Sylvain Gugger authored Sep 09, 2021
  
  e59d4d01
31 Aug, 2021 1 commit
- Handle nested dict/lists of tensors as inputs in the Trainer (#13338) · 4d10474f
  Sylvain Gugger authored Aug 31, 2021
  
  4d10474f
30 Aug, 2021 3 commits

Use DS callable API to allow hf_scheduler + ds_optimizer (#13216) · 42f359d0

Olatunji Ruwase authored Aug 30, 2021



* Use DS callable API to allow hf_scheduler + ds_optimizer

* Preserve backward-compatibility

* Restore backward compatibility

* Tweak arg positioning

* Tweak arg positioning

* bump the required version

* Undo indent

* Update src/transformers/trainer.py

* style
Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

42f359d0

Fall back to `observed_batch_size` when the `dataloader` does not know the `batch_size`. (#13188) · 03056730
Maxwell Forbes authored Aug 30, 2021

03056730
Check None before going through iteration (#13250) · d5064953
Li-Huai (Allan) Lin authored Aug 30, 2021
```
* Check None before going through iteration

* Format
```
d5064953

23 Aug, 2021 1 commit

SageMaker: Fix sagemaker DDP & metric logs (#13181) · f689743e

Philipp Schmid authored Aug 23, 2021



* Barrier -> barrier

* added logger for metrics

* removed stream handler in trainer

* moved handler

* removed streamhandler from trainer

* updated test image and instance type added datasets version to test

* Update tests/sagemaker/scripts/pytorch/requirements.txt
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

f689743e

19 Aug, 2021 1 commit
- Update namespaces inside torch.utils.data to the latest. (#13167) · 91ff480e
  Allan Lin authored Aug 19, 2021
```
* Update torch.utils.data namespaces to the latest.

* Format

* Update Dataloader.

* Style
```
  91ff480e
06 Aug, 2021 2 commits

Tpu tie weights (#13030) · 7fcee113

Sylvain Gugger authored Aug 06, 2021

* Fix tied weights on TPU

* Manually tie weights in no trainer examples

* Fix for test

* One last missing

* Gettning owned by my scripts

* Address review comments

* Fix test

* Fix tests

* Fix reformer tests

7fcee113

[WIP] Disentangle auto modules from other modeling files (#13023) · 9870093f

Sylvain Gugger authored Aug 06, 2021

* Initial work

* All auto models

* All tf auto models

* All flax auto models

* Tokenizers

* Add feature extractors

* Fix typos

* Fix other typo

* Use the right config

* Remove old mapping names and update logic in AutoTokenizer

* Update check_table

* Fix copies and check_repo script

* Fix last test

* Add back name

* clean up

* Update template

* Update template

* Forgot a )

* Use alternative to fixup

* Fix TF model template

* Address review comments

* Address review comments

* Style

9870093f

03 Aug, 2021 1 commit

fix `Trainer.train(resume_from_checkpoint=False)` is causing an exception (#12981) · b7439675

Philip May authored Aug 03, 2021



* fix #12970

* Update tests/test_trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/test_trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/test_trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* remove unnecessary issue link

* fix test formatting
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

b7439675

30 Jul, 2021 1 commit
- Add substep callbacks (#12951) · fe6ff4a9
  wulu473 authored Jul 30, 2021
```
Co-authored-by: Lukas Wutschitz <lukas.wutschitz@microsoft.com>
```
  fe6ff4a9
26 Jul, 2021 1 commit
- Fix push_to_hub for TPUs (#12895) · ba15fe79
  Sylvain Gugger authored Jul 26, 2021
  
  ba15fe79
21 Jul, 2021 3 commits
- Raise warning in HP search when hp is not in args (#12831) · 8c2384d8
  Sylvain Gugger authored Jul 21, 2021
  
  8c2384d8
- [debug] DebugUnderflowOverflow doesn't work with DP (#12816) · cf0755aa
  Stas Bekman authored Jul 21, 2021
  
  cf0755aa
- Refer warmup_ratio when setting warmup_num_steps. (#12818) · 037bdf82
  Masatoshi TSUCHIYA authored Jul 21, 2021
```
* Refer warmup_ratio when setting warmup_num_steps.

* Add a method to get number of warmup steps to TrainerArguments class.

* Fix.

* Fix.
```
  037bdf82