Commits · b08843cf4db9670a6bd5257f566eba46b2c46264 · chenpangpang / transformers

01 Dec, 2020 9 commits

Add a `parallel_mode` property to TrainingArguments (#8877) · b08843cf
Sylvain Gugger authored Dec 01, 2020
```
* Add a `distributed_env` property to TrainingArguments

* Change name

* Address comment
```
b08843cf
Better support for resuming training (#8878) · 7c10dd22
Sylvain Gugger authored Dec 01, 2020

7c10dd22

[CI] skip docs-only jobs take #2 (#8853) · 21db560d

Stas Bekman authored Dec 01, 2020

* restore skip

* Revert "Remove deprecated `evalutate_during_training` (#8852)"

This reverts commit 55302990.

* check that pipeline.git.base_revision is defined before proceeding

* Revert "Revert "Remove deprecated `evalutate_during_training` (#8852)""

This reverts commit dfec84db3fdce1079f01f1bc8dfaf21db2ccaba1.

* check that pipeline.git.base_revision is defined before proceeding

* doc only

* doc + code

* restore

* restore

* typo

21db560d

Better warning when loading a tokenizer with AutoTokenizer w/o SnetencePiece (#8881) · a947386c
Lysandre Debut authored Dec 01, 2020

a947386c

Prevent BatchEncoding from blindly passing casts down to the tensors it... · 9c18f156

Adam Pocock authored Dec 01, 2020

Prevent BatchEncoding from blindly passing casts down to the tensors it contains. Fixes #6582. (#8860)

Update src/transformers/tokenization_utils_base.py with review fix
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

9c18f156

Make the big table creation/check platform independent (#8856) · c0df963e
Sylvain Gugger authored Dec 01, 2020

c0df963e

2 typos in modeling_rag.py (#8676) · d366228d

Ratthachat (Jung) authored Dec 01, 2020

* 2 typos - from_question_encoder_generator_configs

fix 2 typos
from_encoder_generator_configs --> from_question_encoder_generator_configs

* apply make style

d366228d

Fix doc for language code (#8848) · 814b9550
Rodolfo Quispe authored Dec 01, 2020

814b9550

Ctrl for sequence classification (#8812) · 4a9e502a

elk-cloner authored Dec 01, 2020

* add CTRLForSequenceClassification

* pass local test

* merge with master

* fix modeling test for sequence classification

* fix deco

* fix assert

4a9e502a

30 Nov, 2020 15 commits

[s2s trainer] fix DP mode (#8823) · 7f34d757

Stas Bekman authored Nov 30, 2020

* fix DP case on multi-gpu

* make executable

* test all 3 modes

* use the correct check for distributed

* dp doesn't need a special case

* restore original name

* cleanup

7f34d757

NerPipeline (TokenClassification) now outputs offsets of words (#8781) · d8fc26e9

Nicolas Patry authored Nov 30, 2020

* NerPipeline (TokenClassification) now outputs offsets of words

- It happens that the offsets are missing, it forces the user to pattern
match the "word" from his input, which is not always feasible.
For instance if a sentence contains the same word twice, then there
is no way to know which is which.
- This PR proposes to fix that by outputting 2 new keys for this
pipelines outputs, "start" and "end", which correspond to the string
offsets of the word. That means that we should always have the
invariant:

```python
input[entity["start"]: entity["end"]] == entity["entity_group"]
                                    # or entity["entity"] if not grouped
```

* Fixing doc style

d8fc26e9

fix pypi complaint on version naming · 5fd3d81e
LysandreJik authored Nov 30, 2020

5fd3d81e

Attempt to fix Flax CI error(s) (#8829) · 51b07131

Funtowicz Morgan authored Nov 30, 2020



* Slightly increase tolerance between pytorch and flax output
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* test_multiple_sentences doesn't require torch
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Simplify parameterization on "jit" to use boolean rather than str
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Use `require_torch` on `test_multiple_sentences` because we pull the weight from the hub.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Rename "jit" parameter to "use_jit" for (hopefully) making it self-documenting.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Remove pytest.mark.parametrize which seems to fail in some circumstances
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fix unused imports.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fix style.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Give default parameters values for traced model.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Review comment: Change sentences to sequences
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

51b07131

Update docs · 9995a341
LysandreJik authored Nov 30, 2020

9995a341
Release: v4.0.0 · 22b0ff75
LysandreJik authored Nov 30, 2020

22b0ff75

Remove deprecated `evalutate_during_training` (#8852) · 55302990

Sylvain Gugger authored Nov 30, 2020



* Remove deprecated `evalutate_during_training`

* Update src/transformers/training_args_tf.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

55302990

Use model.from_pretrained for DataParallel also (#8795) · 77384941

Shai Erera authored Nov 30, 2020

* Use model.from_pretrained for DataParallel also

When training on multiple GPUs, the code wraps a model with torch.nn.DataParallel. However if the model has custom from_pretrained logic, it does not get applied during load_best_model_at_end.

This commit uses the underlying model during load_best_model_at_end, and re-wraps the loaded model with DataParallel.

If you choose to reject this change, then could you please move the this logic to a function, e.g. def load_best_model_checkpoint(best_model_checkpoint) or something, so that it can be overridden?

* Fix silly bug

* Address review comments

Thanks for the feedback. I made the change that you proposed, but I also think we should update L811 to check if `self.mode` is an instance of `PreTrained`, otherwise we would still not get into that `if` section, right?

77384941

Merge remote-tracking branch 'origin/master' · 4062c75e
Sylvain Gugger authored Nov 30, 2020

4062c75e
Comment the skip job on doc line · 08e70763
Sylvain Gugger authored Nov 30, 2020

08e70763
Add a direct link to the big table (#8850) · 75f8100f
Sylvain Gugger authored Nov 30, 2020

75f8100f
Correct docstring. (#8845) · cc983cd9
Fraser Greenlee authored Nov 30, 2020
```
Related issue: https://github.com/huggingface/transformers/issues/8837
```
cc983cd9
token-classification: use is_world_process_zero instead of deprecated is_world_master() (#8828) · 19fa01ce
Stefan Schweter authored Nov 30, 2020

19fa01ce

Add T5 Encoder for Feature Extraction (#8717) · 40ecaf0c

Ahmed Elnaggar authored Nov 30, 2020



* Add T5 Encoder class for feature extraction

* fix T5 encoder add_start_docstrings indent

* update init with T5 encoder

* update init with TFT5ModelEncoder

* remove TFT5ModelEncoder

* change T5ModelEncoder order in init

* add T5ModelEncoder to transformers init

* clean T5ModelEncoder

* update init with TFT5ModelEncoder

* add TFModelEncoder for Tensorflow

* update init with TFT5ModelEncoder

* Update src/transformers/models/t5/modeling_t5.py

change output from Seq2SeqModelOutput to BaseModelOutput
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* remove encoder_outputs

1. remove encoder_outputs from the function call.
2. remove the encoder_outputs If statement.
3. remove isinstance from return_dict.

* Authorize missing decoder keys

* remove unnecessary input parameters

remove pask_key_values and use_cache

* remove use_cache

remove use_cache from the forward method

* add doctoring for T5 encoder

add doctoring for T5 encoder with T5_ENCODER_INPUTS_DOCSTRING

* change return_dict to dot access

* add T5_ENCODER_INPUTS_DOCSTRING for TF T5

* change TFT5Encoder output type to BaseModelOutput

* remove unnecessary parameters for TFT5Encoder

* remove unnecessary if statement

* add import BaseModelOutput

* fix BaseModelOutput typo to TFBaseModelOutput

* update T5 doc with T5ModelEncoder

* add T5ModelEncoder to tests

* finish pytorch

* finish docs and mt5

* add mtf to init

* fix init

* remove n_positions

* finish PR

* Update src/transformers/models/mt5/modeling_mt5.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/t5/modeling_t5.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/t5/modeling_tf_t5.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/mt5/modeling_tf_mt5.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* make style
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

40ecaf0c

Migration guide from v3.x to v4.x (#8763) · 610cb106

Lysandre Debut authored Nov 29, 2020



* Migration guide from v3.x to v4.x

* Better wording

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Sylvain's comments

* Better wording.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

610cb106

29 Nov, 2020 3 commits

[CI] implement job skipping for doc-only PRs (#8826) · c239dcda

Stas Bekman authored Nov 29, 2020

* implement job skipping for doc-only PRs

* silent grep is crucial

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* let's add doc

* let's add code

* revert test commits

* restore

* Better name

* Better name

* Better name

* some more testing

* some more testing

* some more testing

* finish testing

c239dcda

Minor docs typo fixes (#8797) · 3a08cc1c

Guy Rosin authored Nov 29, 2020



* Fix minor typos

* Additional typos

* Style fix
Co-authored-by: guyrosin <guyrosin@assist-561.cs.technion.ac.il>

3a08cc1c

[Pegasus] Refactor Tokenizer (#8731) · 5ced23dc

Patrick von Platen authored Nov 29, 2020

* refactor

* further refactor

* fix the rest tomorrow

* save intermediate

* finish slow tokenizer

* make more tests pass

* finish refactor

* fix comment

* clean further

* fix name

* fix naming

* Update src/transformers/models/reformer/tokenization_reformer.py

* Apply suggestions from code review

* Apply suggestions from code review

* refactor

* fix init tokenizers

* refactor

* improve convert

* refactor

* correct convert slow tokenizer

* final fix for Pegasus Tok

* remove ipdb

* improve links

5ced23dc

28 Nov, 2020 1 commit
- fix mt5 config (#8832) · 36b60ce9
  Patrick von Platen authored Nov 28, 2020
  
  36b60ce9
27 Nov, 2020 12 commits

Model parallel tests should return, not pass in non model parallel settings. (#8825) · 18c32eeb
Lysandre Debut authored Nov 27, 2020

18c32eeb
Temporarily deactivate model generation · edbff1fd
LysandreJik authored Nov 27, 2020

edbff1fd
suggest a numerical limit of 50MB for determining @slow (#8824) · 00ea4565
Stas Bekman authored Nov 27, 2020

00ea4565

BART & FSMT: fix decoder not returning hidden states from the last layer (#8597) · 0a921b64

Max Del authored Nov 27, 2020



* Fix decoder not returning hidden states from the last layer

* Resolve conflict

* Change the way to gather hidden states

* Add decoder hidden states test

* Make pytest and black happy

* Remove redundant line

* remove new line
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

0a921b64

Add barthez model (#8393) · 81fe0bf0

Moussa Kamal Eddine authored Nov 27, 2020



* Add init barthez

* Add barthez model, tokenizer and docs

BARThez is a pre-trained french seq2seq model that uses BART objective.

* Apply suggestions from code review docs typos
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add license

* Change URLs scheme

* Remove barthez model keep tokenizer

* Fix style

* Fix quality

* Update tokenizer

* Add fast tokenizer

* Add fast tokenizer test
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

81fe0bf0

Fix setup.py (#8798) · b0f2dbc5
Julien Plu authored Nov 27, 2020
```
enforce unix newline encoding regardless of OS creating the file
```
b0f2dbc5
Create README.md (#8729) · 03bddc37
Manuel Romero authored Nov 27, 2020
```
* Create README.md

* Fix model path
```
03bddc37

Extend typing to path-like objects in `PretrainedConfig` and `PreTrainedModel` (#8770) · f9a2a9e3

Giovanni Compagnoni authored Nov 27, 2020

* update configuration_utils.py typing to allow pathlike objects when sensible

* update modeling_utils.py typing to allow pathlike objects when sensible

* black

* update tokenization_utils_base.py typing to allow pathlike objects when sensible

* update tokenization_utils_fast.py typing to allow pathlike objects when sensible

* update configuration_auto.py typing to allow pathlike objects when sensible

* update configuration_auto.py docstring to allow pathlike objects when sensible

* update tokenization_auto.py docstring to allow pathlike objects when sensible

* black

f9a2a9e3

Fix dpr<>bart config for RAG (#8808) · a7d46a06

Patrick von Platen authored Nov 27, 2020

* correct dpr test and bert pos fault

* fix dpr bert config problem

* fix layoutlm

* add config to dpr as well

a7d46a06

[Flax test] Add require pytorch to flix flax test (#8816) · a2cf3759
Patrick von Platen authored Nov 27, 2020
```
* try flax fix

* same for roberta
```
a2cf3759

Update README.md (#8815) · e3ef62bc

mdermentzi authored Nov 27, 2020

The tokenizer called at the input_ids of example 2 is currently encoding text_1. I think this should be changed to text_2.

e3ef62bc

[FlaxBert] Fix non-broadcastable attention mask for batched forward-passes (#8791) · f8eda599

Kristian Holsheimer authored Nov 27, 2020

* [FlaxBert] Fix non-broadcastable attention mask for batched forward-passes

* [FlaxRoberta] Fix non-broadcastable attention mask

* Use jax.numpy instead of ordinary numpy (otherwise not jit-able)

* Partially revert "Use jax.numpy ..."

* Add tests for batched forward passes

* Avoid unnecessary OOMs due to preallocation of GPU memory by XLA

* Auto-fix style

* Re-enable GPU memory preallocation but with mem fraction < 1/paralleism

f8eda599