Commits · 1551e2dc6d27e7eba88d89fdfaf16bce82a3c573 · chenpangpang / transformers

15 Dec, 2020 12 commits

[WIP] Tapas v4 (tres) (#9117) · 1551e2dc

NielsRogge authored Dec 15, 2020



* First commit: adding all files from tapas_v3

* Fix multiple bugs including soft dependency and new structure of the library

* Improve testing by adding torch_device to inputs and adding dependency on scatter

* Use Python 3 inheritance rather than Python 2

* First draft model cards of base sized models

* Remove model cards as they are already on the hub

* Fix multiple bugs with integration tests

* All model integration tests pass

* Remove print statement

* Add test for convert_logits_to_predictions method of TapasTokenizer

* Incorporate suggestions by Google authors

* Fix remaining tests

* Change position embeddings sizes to 512 instead of 1024

* Comment out positional embedding sizes

* Update PRETRAINED_VOCAB_FILES_MAP and PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES

* Added more model names

* Fix truncation when no max length is specified

* Disable torchscript test

* Make style & make quality

* Quality

* Address CI needs

* Test the Masked LM model

* Fix the masked LM model

* Truncate when overflowing

* More much needed docs improvements

* Fix some URLs

* Some more docs improvements

* Test PyTorch scatter

* Set to slow + minify

* Calm flake8 down

* First commit: adding all files from tapas_v3

* Fix multiple bugs including soft dependency and new structure of the library

* Improve testing by adding torch_device to inputs and adding dependency on scatter

* Use Python 3 inheritance rather than Python 2

* First draft model cards of base sized models

* Remove model cards as they are already on the hub

* Fix multiple bugs with integration tests

* All model integration tests pass

* Remove print statement

* Add test for convert_logits_to_predictions method of TapasTokenizer

* Incorporate suggestions by Google authors

* Fix remaining tests

* Change position embeddings sizes to 512 instead of 1024

* Comment out positional embedding sizes

* Update PRETRAINED_VOCAB_FILES_MAP and PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES

* Added more model names

* Fix truncation when no max length is specified

* Disable torchscript test

* Make style & make quality

* Quality

* Address CI needs

* Test the Masked LM model

* Fix the masked LM model

* Truncate when overflowing

* More much needed docs improvements

* Fix some URLs

* Some more docs improvements

* Add add_pooling_layer argument to TapasModel

Fix comments by @sgugger and @patrickvonplaten

* Fix issue in docs + fix style and quality

* Clean up conversion script and add task parameter to TapasConfig

* Revert the task parameter of TapasConfig

Some minor fixes

* Improve conversion script and add test for absolute position embeddings

* Improve conversion script and add test for absolute position embeddings

* Fix bug with reset_position_index_per_cell arg of the conversion cli

* Add notebooks to the examples directory and fix style and quality

* Apply suggestions from code review

* Move from `nielsr/` to `google/` namespace

* Apply Sylvain's comments
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Rogge Niels <niels.rogge@howest.be>
Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: sgugger <sylvain.gugger@gmail.com>

1551e2dc

Add possibility to switch between APEX and AMP in Trainer (#9137) · ad895af9

Sylvain Gugger authored Dec 15, 2020



* Add possibility to switch between APEX and AMP in Trainer

* Update src/transformers/training_args.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Address review comments

* Update src/transformers/training_args.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

ad895af9

Fix add order (#9129) · e7717497
Julien Plu authored Dec 15, 2020

e7717497
Fix Bart Shift (#9135) · 18ecd36f
Patrick von Platen authored Dec 15, 2020
```
* correct mistake in order

* fix tensor copy

* clone tensor correctly
```
18ecd36f
correct mistake in order (#9134) · d018622d
Patrick von Platen authored Dec 15, 2020

d018622d
fix bart loss masking (#9131) · 80bdb9c3
Patrick von Platen authored Dec 15, 2020

80bdb9c3
Fix typo in trainer_tf.py (#9132) · 3caba8d3
Manbish authored Dec 15, 2020

3caba8d3

[TF Bart] Refactor TFBart (#9029) · abc573f5

Patrick von Platen authored Dec 15, 2020

* reorder file

* delete unnecesarry function

* make style

* save intermediate

* fix attention masks

* correct tf bart past key values

* solve merge conflict bug

* correct tensor dims

* save intermediate tf

* change attn layer

* fix typo re-order past

* inputs_embeds

* make fix copies

* finish tests

* fix graph mode

* appyl lysandres suggestions

abc573f5

Added TF OpenAi GPT1 Sequence Classification (#9105) · 389aba34

sandip authored Dec 15, 2020



* TF OpenAI GPT Sequence Classification

* Update src/transformers/models/openai/modeling_tf_openai.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

389aba34

Fix stack overflow (#9114) · 59da3f27
Lysandre Debut authored Dec 15, 2020

59da3f27

Clarify use of TrainingArguments.disable_tqdm in Jupyter Notebooks (#9076) · ed1845ef

lewtun authored Dec 15, 2020



* Clarify impact of disable_tqdm on Jupyter Notebooks

* Add weblink to argparse

* Replace "dev set" with more common "validation set" in do_eval

* Tweak prediction_loss_only

* Tweak description of Adam hyperparameters

* Add weblink to TensorBoard

* Capitalise apex

* Tweak local_rank description

* Add weblink for wandb

* Replace nlp with datasets

* Tweak grammar in model_parallel

* Capitalise apex

* Update TensorFlow training args to match PyTorch ones

* Fix style

* Fix underscore in weblink
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix underscore in weblink
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix underscore in weblink
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix underscore in weblink
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add obj to datasets.Dataset
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

ed1845ef

[finetune_trainer] enhancements and fixes (#9042) · c19d0462

Stas Bekman authored Dec 14, 2020



* trainer and finetune_trainer enhancements and fixes

* add fallback default

* move the fixing of incorrect keys back into finetune trainer

* s/eval/val/ to match the split

* trainer can now use a different prefix than eval_ for metrics

* document new arg

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* use 'eval' as the default for metric_key_prefix

* complete adjust var names + disambiguate

* fix logger

* add clarifying comment

* add clarifying comment

* style

* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/trainer.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* complete removal of optional for metric_key_prefix

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

c19d0462

14 Dec, 2020 5 commits

Fix T5 and BART for TF (#9063) · df3f4d2a

Julien Plu authored Dec 14, 2020

* Fix T5 for graphe compilation+execution

* Fix BART

* Fix import

* Fix naming

* fix attribute name

* Oops

* fix import

* fix tests

* fix tests

* Update test

* Add mising import

* Address Patrick's comments

* Style

* Address Patrick's comment

df3f4d2a

Add parallelization support for T5EncoderModel (#9082) · a9c8bff7

Ahmed Elnaggar authored Dec 14, 2020



* add model parallelism to T5EncoderModel

add model parallelism to T5EncoderModel

* remove decoder from T5EncoderModel parallelize

* uodate T5EncoderModel docs

* Extend T5ModelTest for T5EncoderModel

* fix T5Stask using range for get_device_map

* fix style
Co-authored-by: Ahmed Elnaggar <elnaggar@rostlab.informatik.tu-muenchen.de>

a9c8bff7

correct var name in TrainingArguments docstring (#9096) · d6af344c
Navjot authored Dec 14, 2020

d6af344c
[RAG, Bart] Align RAG, Bart cache with T5 and other models of transformers (#9098) · fa1ddced
Patrick von Platen authored Dec 14, 2020
```
* fix rag

* fix slow test

* fix past in bart
```
fa1ddced

Fix embeddings resizing in TF models (#8657) · 51d9c569

Julien Plu authored Dec 14, 2020

* Resize the biases in same time than the embeddings

* Trigger CI

* Biases are not reset anymore

* Remove get_output_embeddings + better LM model detection in generation utils

* Apply style

* First test on BERT

* Update docstring + new name

* Apply the new resizing logic to all the models

* fix tests

* Apply style

* Update the template

* Fix naming

* Fix naming

* Apply style

* Apply style

* Remove unused import

* Revert get_output_embeddings

* Trigger CI

* Update num parameters

* Restore get_output_embeddings in TFPretrainedModel and add comments

* Style

* Add decoder resizing

* Style

* Fix tests

* Separate bias and decoder resize

* Fix tests

* Fix tests

* Apply style

* Add bias resizing in MPNet

* Trigger CI

* Apply style

51d9c569

11 Dec, 2020 3 commits
- Make ProphetNetModel really compatible with EncoderDecoder (#9033) · 9cc9f412
  Patrick von Platen authored Dec 11, 2020
```
* improve

* finish

* upload model

* fix lm head

* fix test
```
  9cc9f412
- Fix PreTrainedTokenizer.pad when first inputs are empty (#9018) · 70527ba6
  Sylvain Gugger authored Dec 11, 2020
```
* Fix PreTrainedTokenizer.pad when first inputs are empty

* Handle empty inputs case
```
  70527ba6
- 🎨 Change nn.dropout to layer.Dropout (#9047) · 935e3469
  Cola authored Dec 11, 2020
  
  935e3469
10 Dec, 2020 5 commits
- Remove value error (#8985) · b01ddc95
  Julien Plu authored Dec 10, 2020
```
* Remove value error

* Try a fix for parameter ordering

* Restore previous behavior

* Add documentation

* Review the comment
```
  b01ddc95
- Refactor FLAX tests (#9034) · 8d4bb020
  Sylvain Gugger authored Dec 10, 2020
  
  8d4bb020
- MPNet copyright files (#9015) · 51e81e58
  Sylvain Gugger authored Dec 10, 2020
  
  51e81e58
- Fix documention of book in LayoutLM (#9017) · 35bffd70
  Sylvain Gugger authored Dec 10, 2020
  
  35bffd70
- ✏ Fix typo (#9020) · c95de29e
  Cola authored Dec 10, 2020
  
  c95de29e
09 Dec, 2020 6 commits

[Bart] Refactor - fix issues, consistency with the library, naming (#8900) · 06971ac4

Patrick von Platen authored Dec 09, 2020

* remove make on the fly linear embedding

* start refactor

* big first refactor

* save intermediate

* save intermediat

* correct mask issue

* save tests

* refactor padding masks

* make all tests pass

* further refactor

* make pegasus test pass

* fix bool if

* fix leftover tests

* continue

* bart renaming

* delete torchscript test hack

* fix imports in tests

* correct shift

* fix docs and repo cons

* re-add fix for FSTM

* typo in test

* fix typo

* fix another typo

* continue

* hot fix 2 for tf

* small fixes

* refactor types linting

* continue

* finish refactor

* fix import in tests

* better bart names

* further refactor and add test

* delete hack

* apply sylvains and lysandres commens

* small perf improv

* further perf improv

* improv perf

* fix typo

* make style

* small perf improv

06971ac4

Flax Masked Language Modeling training example (#8728) · 75627148

Funtowicz Morgan authored Dec 09, 2020



* Remove "Model" suffix from Flax models to look more :hugs:
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Initial working (forward + backward) for Flax MLM training example.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Simply code
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Addressing comments, using module and moving to LM task.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Restore parameter name "module" wrongly renamed model.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Restore correct output ordering...
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Actually commit the example 😅

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Add FlaxBertModelForMaskedLM after rebasing.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make it possible to initialize the training from scratch
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Reuse flax linen example of cross entropy loss
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Added specific data collator for flax
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Remove todo for data collator
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Added evaluation step
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Added ability to provide dtype to support bfloat16 on TPU
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Enable flax tensorboard output
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Enable jax.pmap support.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Ensure batches are correctly sized to be dispatched with jax.pmap
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Enable bfloat16 with --fp16 cmdline args
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Correctly export metrics to tensorboard
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Added dropout and ability to use it.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Effectively enable & disable during training and evaluation steps.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Oops.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Enable specifying kernel initializer scale
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Style.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Added warmup step to the learning rate scheduler.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fix typo.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Print training loss
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make style
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* fix linter issue (flake8)
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fix model matching
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fix dummies
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fix non default dtype on Flax models
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Use the same create_position_ids_from_input_ids for FlaxRoberta
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make Roberta attention as Bert
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* fix copy
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Wording.
Co-authored-by: Marc van Zee <marcvanzee@gmail.com>
Co-authored-by: Marc van Zee <marcvanzee@gmail.com>

75627148

Add MP Net 2 (#9004) · df2af6d8
StillKeepTry authored Dec 09, 2020

df2af6d8
push (#9008) · da37a21c
Patrick von Platen authored Dec 09, 2020

da37a21c
Remove use of deprected method in Trainer HP search (#8996) · 61abd50b
Sylvain Gugger authored Dec 09, 2020

61abd50b

Diverse beam search 2 (#9006) · 02d0e035

Patrick von Platen authored Dec 09, 2020



* diverse beam search

* bug fixes

* bug fixes

* bug fix

* separate out diverse_beam_search function

* separate out diverse_beam_search function

* bug fix

* improve code quality

* bug fix

* bug fix

* separate out diverse beam search scorer

* code format

* code format

* code format

* code format

* add test

* code format

* documentation changes

* code quality

* add slow integration tests

* more general name

* refactor into logits processor

* add test

* avoid too much copy paste

* refactor

* add to docs

* fix-copies

* bug fix

* Revert "bug fix"

This reverts commit c99eb5a8dc57a7b0d33a8ac06d8c6a32a7812ad4.

* improve comment

* implement sylvains feedback
Co-authored-by: Ayush Jain <a.jain@sprinklr.com>
Co-authored-by: ayushtiku5 <40797286+ayushtiku5@users.noreply.github.com>

02d0e035

08 Dec, 2020 6 commits

Templates overhaul 1 (#8993) · 67ff1c31
Lysandre Debut authored Dec 08, 2020

67ff1c31

Removed unused `encoder_hidden_states` and `encoder_attention_mask` (#8972) · 7809eb82

guillaume-be authored Dec 08, 2020

* Removed unused `encoder_hidden_states` and `encoder_attention_mask` from MobileBert

* Removed decoder tests for MobileBert

* Removed now unnecessary import

7809eb82

Fix interaction of return_token_type_ids and add_special_tokens (#8854) · b7cdd00f
Lysandre Debut authored Dec 08, 2020

b7cdd00f
Make `ModelOutput` pickle-able (#8989) · 04c446f7
Sylvain Gugger authored Dec 08, 2020

04c446f7

Optional layers (#8961) · bf7f79cd

Julien Plu authored Dec 08, 2020

* Apply on BERT and ALBERT

* Update TF Bart

* Add input processing to TF BART

* Add input processing for TF CTRL

* Add input processing to TF Distilbert

* Add input processing to TF DPR

* Add input processing to TF Electra

* Add deprecated arguments

* Add input processing to TF XLM

* remove unused imports

* Add input processing to TF Funnel

* Add input processing to TF GPT2

* Add input processing to TF Longformer

* Add input processing to TF Lxmert

* Apply style

* Add input processing to TF Mobilebert

* Add input processing to TF GPT

* Add input processing to TF Roberta

* Add input processing to TF T5

* Add input processing to TF TransfoXL

* Apply style

* Rebase on master

* Fix wrong model name

* Fix BART

* Apply style

* Put the deprecated warnings in the input processing function

* Remove the unused imports

* Raise an error when len(kwargs)>0

* test ModelOutput instead of TFBaseModelOutput

* Address Patrick's comments

* Address Patrick's comments

* Add boolean processing for the inputs

* Take into account the optional layers

* Add missing/unexpected weights in the other models

* Apply style

* rename parameters

* Apply style

* Remove useless

* Remove useless

* Remove useless

* Update num parameters

* Fix tests

* Address Patrick's comment

* Remove useless attribute

bf7f79cd

[training] SAVE_STATE_WARNING was removed in pytorch (#8979) · 9d7d0005

Stas Bekman authored Dec 07, 2020

* [training] SAVE_STATE_WARNING was removed in pytorch

FYI `SAVE_STATE_WARNING` has been removed 3 days ago: pytorch/pytorch#46813

Fixes: #8232

@sgugger

* style, but add () to prevent autoformatters from botching it

* switch to try/except

* cleanup

9d7d0005

07 Dec, 2020 3 commits

Copyright (#8970) · 00aa9dbc
Sylvain Gugger authored Dec 07, 2020
```
* Add copyright everywhere missing

* Style
```
00aa9dbc

transformers-cli: LFS multipart uploads (> 5GB) (#8663) · 28fa014a

Julien Chaumond authored Dec 07, 2020



* initial commit

* [cli] lfs commands

* Fix FileSlice

* Tweak to FileSlice

* [hf_api] Backport filetype arg from `datasets`

cc @lhoestq

* Silm down the CI while i'm working

* Ok let's try this in CI

* Update config.yml

* Do not try this at home

* one more try

* Update lfs.py

* Revert "Tweak to FileSlice"

This reverts commit d7e32c4b3500400486411e85a2b74e57fb6b52f5.

* Update test_hf_api.py

* Update test_hf_api.py

* Update test_hf_api.py

* CI still green?

* make CI green again?

* Update test_hf_api.py

* make CI red again?

* Update test_hf_api.py

* add CI style back

* Fix CI?

* oh my

* doc + switch back to real staging endpoint

* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Pierric Cistac <Pierrci@users.noreply.github.com>

* Fix docblock + f-strings
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Pierric Cistac <Pierrci@users.noreply.github.com>

28fa014a

Add TFGPT2ForSequenceClassification based on DialogRPT (#8714) · 483e1327

sandip authored Dec 07, 2020

* Add TFGPT2ForSequenceClassification based on DialogRPT

* Add TFGPT2ForSequenceClassification based on DialogRPT

* TFGPT2ForSequenceClassification based on DialogRPT-refactored code, implemented review comments and added input processing

* Add TFGPT2ForSequenceClassification based on DialogRPT

* TFGPT2ForSequenceClassification based on DialogRPT-refactored code, implemented review comments and added input processing

* code refactor for latest other TF PR

* code refactor

* code refactor

* Update modeling_tf_gpt2.py

483e1327