Commits · f285e4c3adcec31b335e346d4dcea20fbdc73a1a · chenpangpang / transformers

04 Feb, 2021 2 commits
- Fix test for sagemaker and TPU integrations · 4739ce17
  Sylvain Gugger authored Feb 04, 2021
  
  4739ce17
- [trainer] a few fixes (#9993) · 8c3b1fcb
  Stas Bekman authored Feb 04, 2021
```
* trainer fixes

* don't switch the model  just for deepspeed and mp

* correct the fix
```
  8c3b1fcb
03 Feb, 2021 1 commit

fix steps_in_epoch variable in trainer when using max_steps (#9969) · 5442a11f

yylun authored Feb 03, 2021



* fix steps_in_epoch variable when using max_steps

* redundant sentence

* Revert "redundant sentence"

This reverts commit ad5c0e9b6e66d65732dee2239cdc9c76dfa0dc5a.

* remove redundant sentence
Co-authored-by: wujindou <wujindou@sogou-inc.com>

5442a11f

02 Feb, 2021 1 commit
- Use compute_loss in prediction_step (#9935) · d996024a
  Sylvain Gugger authored Feb 02, 2021
  
  d996024a
29 Jan, 2021 1 commit
- When on sagemaker use their env variables for saves (#9876) · 7eadfe16
  Sylvain Gugger authored Jan 29, 2021
```
* When on sagemaker use their env variables for saves

* Address review comments

* Quality
```
  7eadfe16
28 Jan, 2021 4 commits
- pin_memory -> dataloader_pin_memory (#9874) · bc109ae5
  abhishek thakur authored Jan 28, 2021
  
  bc109ae5
- on_log event should occur *after* the current log is written (#9872) · 80e4184f
  abhishek thakur authored Jan 28, 2021
  
  80e4184f
- Deprecate model_path in Trainer.train (#9854) · b4e559cf
  Sylvain Gugger authored Jan 28, 2021
  
  b4e559cf
- Pin memory in Trainer by default (#9857) · 25fcb5c1
  abhishek thakur authored Jan 28, 2021
```
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
```
  25fcb5c1
27 Jan, 2021 2 commits

When resuming training from checkpoint, Trainer loads model (#9818) · 35d55b7b

Sylvain Gugger authored Jan 27, 2021

* Whenresuming training from checkpoint, Trainer loads model

* Finish cleaning tests

* Address review comment

* Use global_step from state

35d55b7b

Add a flag for find_unused_parameters (#9820) · c7b7bd99

Sylvain Gugger authored Jan 27, 2021



* Add a flag for find_unused_parameters

* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Remove negation
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

c7b7bd99

26 Jan, 2021 1 commit

Smdistributed trainer (#9798) · 0d0efd3a

Sylvain Gugger authored Jan 26, 2021

* Add a debug print

* Adapt Trainer to use smdistributed if available

* Forgotten parenthesis

* Real check for sagemaker

* Donforget to define device...

* Woopsie, local)rank is defined differently

* Update since local_rank has the proper value

* Remove debug statement

* More robust check for smdistributed

* Quality

* Deal with key not present error

0d0efd3a

25 Jan, 2021 2 commits
- Fix style · af41da50
  Sylvain Gugger authored Jan 25, 2021
  
  af41da50
- Fix a typo in Trainer.hyperparameter_search docstring (#9762) · 626116b7
  Sorami Hisamoto authored Jan 25, 2021
```
`compute_objectie` => `compute_objective`
```
  626116b7
22 Jan, 2021 1 commit
- Add `report_to` training arguments to control the reporting integrations used (#9735) · 82d46feb
  Sylvain Gugger authored Jan 22, 2021
  
  82d46feb
21 Jan, 2021 2 commits

Fix memory regression in Seq2Seq example (#9713) · 5f80c15e

Sylvain Gugger authored Jan 21, 2021

* Fix memory regression in Seq2Seq example

* Fix test and properly deal with -100

* Easier condition with device safety

* Patch for MBartTokenzierFast

5f80c15e

[trainer] no --deepspeed and --sharded_ddp together (#9712) · 4a20b7c4

Stas Bekman authored Jan 20, 2021



* no --deepspeed and --sharded_ddp together

* Update src/transformers/trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

4a20b7c4

20 Jan, 2021 1 commit
- fix the backward for deepspeed (#9705) · cd5565be
  Stas Bekman authored Jan 20, 2021
  
  cd5565be
15 Jan, 2021 1 commit
- deepspeed + grad acumm (#9622) · c60e0e1e
  Stas Bekman authored Jan 15, 2021
  
  c60e0e1e
14 Jan, 2021 1 commit

Upstream (and rename) sortish sampler (#9574) · 329fe274

Sylvain Gugger authored Jan 14, 2021



* Upstream (and rename) sortish sampler

* Use proper sampler

* Update src/transformers/trainer_pt_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

329fe274

13 Jan, 2021 1 commit

[trainer] deepspeed integration (#9211) · 2df34f4a

Stas Bekman authored Jan 12, 2021



* deepspeed integration

* style

* add test

* ds wants to do its own backward

* fp16 assert

* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* style

* for clarity extract what args are being passed to deepspeed

* introduce the concept of self.wrapped_model

* s/self.wrapped_model/self.model_wrapped/

* complete transition to self.wrapped_model / self.model

* fix

* doc

* give ds its own init

* add custom overrides, handle bs correctly

* fix test

* clean up model_init logic, fix small bug

* complete fix

* collapse --deepspeed_config into --deepspeed

* style

* start adding doc notes

* style

* implement hf2ds optimizer and scheduler configuration remapping

* oops

* call get_num_training_steps absolutely when needed

* workaround broken auto-formatter

* deepspeed_config arg is no longer needed - fixed in deepspeed master

* use hf's fp16 args in config

* clean

* start on the docs

* rebase cleanup

* finish up --fp16

* clarify the supported stages

* big refactor thanks to discovering deepspeed.init_distributed

* cleanup

* revert fp16 part

* add checkpoint-support

* more init ds into integrations

* extend docs

* cleanup

* unfix docs

* clean up old code

* imports

* move docs

* fix logic

* make it clear which file it's referring to

* document nodes/gpus

* style

* wrong format

* style

* deepspeed handles gradient clipping

* easier to read

* major doc rewrite

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* docs

* switch to AdamW optimizer

* style

* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* clarify doc
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

2df34f4a

11 Jan, 2021 2 commits

[trainer] round numbers in trainer state (#9491) · e6f211ca
Stas Bekman authored Jan 11, 2021
```
* round numbers

* style

* round only on logging
```
e6f211ca

[trainer] remove `--model_parallel` (#9451) · 33b74228

Stas Bekman authored Jan 11, 2021



* fix bad merge - dropped code

* remove --model_parallel

* Deal with TrainingArguments

* Use a private attr and fix batch sizes

* fix _n_gpu

* add is_parallel helper wrapper

* fix attribute

* introduce a new attribute is_model_parallel

* docs

* docs

* Put back init False and rearrange doc

* Ignore non-init args in HFArgumentParser
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>

33b74228

06 Jan, 2021 2 commits

Fast transformers import part 1 (#9441) · 0c96262f

Sylvain Gugger authored Jan 06, 2021

* Don't import libs to check they are available

* Don't import integrations at init

* Add importlib_metdata to deps

* Remove old vars references

* Avoid syntax error

* Adapt testing utils

* Try to appease torchhub

* Add dependency

* Remove more private variables

* Fix typo

* Another typo

* Refine the tf availability test

0c96262f

[trainer] self.model_wrapped + _model_unwrap (#9390) · 9f675b05

Stas Bekman authored Jan 06, 2021



* model wrapped + model_unwrap

* cleanup

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* style

* deprecation warning

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

9f675b05

05 Jan, 2021 2 commits

[trainer] --model_parallel hasn't been implemented for most models (#9347) · 748006c0

Stas Bekman authored Jan 05, 2021

* --model_parallel hasn't been implemented for most models

* make the help clear as well

* implement is_parallelizable; use it

* oops

* remove property

748006c0

feat(wandb): save model as artifact (#8119) · 30fa0b78

Boris Dayma authored Jan 05, 2021

* feat(wandb): log artifacts

* fix: typo

* feat(wandb): ensure name is allowed

* feat(wandb): log artifact

* feat(wandb): saving logic

* style: improve formatting

* fix: unrelated typo

* feat: use a fake trainer

* fix: simplify

* feat(wandb): log model files as artifact

* style: fix style

* docs(wandb): correct description

* feat: unpack model + allow env Truethy values

* feat: TrainerCallback can access tokenizer

* style: fix style

* feat(wandb): log more interesting metadata

* feat: unpack tokenizer

* feat(wandb): metadata with load_best_model_at_end

* feat(wandb): more robust metadata

* style(wandb): fix formatting

30fa0b78

04 Jan, 2021 1 commit

[trainer] parametrize default output_dir (#9352) · d018afce

Stas Bekman authored Jan 04, 2021

This PR:

* fixes trainer to have the logger agree with the actual default `output_dir`, but setting it one place and passing it as an argument to both places

@sgugger

d018afce

22 Dec, 2020 1 commit

Seq2seq trainer (#9241) · 490b39e6

Sylvain Gugger authored Dec 22, 2020



* Add label smoothing in Trainer

* Add options for scheduler and Adafactor in Trainer

* Put Seq2SeqTrainer in the main lib

* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Address review comments and adapt scripts

* Documentation

* Move test not using script to tests folder
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

490b39e6

21 Dec, 2020 1 commit

[RAG] Add Ray implementation for distributed retrieval (#9197) · a4b21cdd

Amog Kamsetty authored Dec 21, 2020



* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* uncomment

* uncomment

* wip

* updates

* add docstring

* updates

* fix arg

* fixes

* add unit tests

* update readme

* update readme

* update finetune script

* update test

* add test

* add ray to test dependencies

* separate ray and ray tune

* formatting

* shutdown ray at end of test

* fix tests

* formatting

* formatting

* even more formatting

* address comments

* formatting

* add files

* Update examples/research_projects/rag/test_distributed_retriever.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* address comments

* addressing comments
Co-authored-by: Ubuntu <ubuntu@ip-172-31-21-208.us-west-2.compute.internal>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

a4b21cdd

18 Dec, 2020 2 commits
- Add timing inside Trainer (#9196) · 1198ba8f
  Sylvain Gugger authored Dec 18, 2020
```
* Add timing inside Trainer

* Fix tests

* Add n_objs for train

* Sort logs
```
  1198ba8f
- [trainer] apex fixes and tests (#9180) · f06d0fad
  Stas Bekman authored Dec 17, 2020
  
  f06d0fad
17 Dec, 2020 1 commit
- Fix gradient clipping for Sharded DDP (#9168) · 77d6941e
  Sylvain Gugger authored Dec 17, 2020
```
* Fix gradient clipping for Sharded DDP

* Fix typos in comments
```
  77d6941e
16 Dec, 2020 1 commit

Experimental support for fairscale ShardedDDP (#9139) · 9a671853

Sylvain Gugger authored Dec 16, 2020

* Experimental stupport for fairscale ShardedDDP

* Add import error if fairscale not available

* Address review comments

* Fix seq2seq trainer

9a671853

15 Dec, 2020 2 commits

Add possibility to switch between APEX and AMP in Trainer (#9137) · ad895af9

Sylvain Gugger authored Dec 15, 2020



* Add possibility to switch between APEX and AMP in Trainer

* Update src/transformers/training_args.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Address review comments

* Update src/transformers/training_args.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

ad895af9

[finetune_trainer] enhancements and fixes (#9042) · c19d0462

Stas Bekman authored Dec 14, 2020



* trainer and finetune_trainer enhancements and fixes

* add fallback default

* move the fixing of incorrect keys back into finetune trainer

* s/eval/val/ to match the split

* trainer can now use a different prefix than eval_ for metrics

* document new arg

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* use 'eval' as the default for metric_key_prefix

* complete adjust var names + disambiguate

* fix logger

* add clarifying comment

* add clarifying comment

* style

* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/trainer.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* complete removal of optional for metric_key_prefix

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

c19d0462

09 Dec, 2020 1 commit
- Remove use of deprected method in Trainer HP search (#8996) · 61abd50b
  Sylvain Gugger authored Dec 09, 2020
  
  61abd50b
02 Dec, 2020 1 commit

[trainer] improve code readability (#8903) · 7e1cb00c

Stas Bekman authored Dec 02, 2020

* [trainer] improve code

This PR:
- removes redundant code 
```
self.model = model if model is not None else None
```
and
```
self.model = model
```
are the same.

* separate attribute assignment from code logic - which simplifies things further.

* whitespace

7e1cb00c

01 Dec, 2020 1 commit
- Better support for resuming training (#8878) · 7c10dd22
  Sylvain Gugger authored Dec 01, 2020
  
  7c10dd22
30 Nov, 2020 1 commit

Use model.from_pretrained for DataParallel also (#8795) · 77384941

Shai Erera authored Nov 30, 2020

* Use model.from_pretrained for DataParallel also

When training on multiple GPUs, the code wraps a model with torch.nn.DataParallel. However if the model has custom from_pretrained logic, it does not get applied during load_best_model_at_end.

This commit uses the underlying model during load_best_model_at_end, and re-wraps the loaded model with DataParallel.

If you choose to reject this change, then could you please move the this logic to a function, e.g. def load_best_model_checkpoint(best_model_checkpoint) or something, so that it can be overridden?

* Fix silly bug

* Address review comments

Thanks for the feedback. I made the change that you proposed, but I also think we should update L811 to check if `self.mode` is an instance of `PreTrained`, otherwise we would still not get into that `if` section, right?

77384941