Commits · a98173cc45727f619f4c99b60448584385caf30a · chenpangpang / transformers

20 Jan, 2021 1 commit
- make RepetitionPenaltyLogitsProcessor faster (#9600) · a98173cc
  LSinev authored Jan 20, 2021
  
  a98173cc
19 Jan, 2021 10 commits

Fix model templates and use less than 119 chars (#9684) · 7e662e6a
Sylvain Gugger authored Jan 19, 2021
```
* Fix model templates and use less than 119 chars

* Missing new line
```
7e662e6a

Add separated decoder_head_mask for T5 Models (#9634) · 2ebbbf55

Daniel Stancl authored Jan 19, 2021

* Add decoder_head_mask for PyTorch T5 model

* Add decoder_head_mask args into T5Model and T5ForConditionalGeneration

* Slightly change the order of input args to be in accordance
with the convention from BART-based models introduced within the PR #9569.

* Make style for modeling_t5.py

* Add decoder_head_mask for TF T5 models

* Separate head_mask and decoder_head_mask args in TF T5 models

* Slightly change the order of input args to follow convention
of BART-based models updated in PR #9569

* Update test_forward_signature tests/test_modeling_tf_common.py
w.r.t. the changed order of input args

* Add FutureWarnings for T5 and TFT5 models

* Add FutureWarnings for T5 and TFT5 models warning a user that
input argument `head_mask` was split into two arguments -
`head_mask` and `decoder_head_mask`

* Add default behaviour - `decoder_head_mask` is set to copy
`head_mask`

* Fix T5 modeling and FutureWarning

* Make proper usage of head_mask and decoder_head_mask
in cross_attention

* Fix conditions for raising FutureWarning

* Reformat FutureWarning in T5 modeling

* Refactor the warning message

2ebbbf55

New run_seq2seq script (#9605) · e4c06ed6

Sylvain Gugger authored Jan 19, 2021



* New run_seq2seq script

* Add tests

* Mark as slow

* Update examples/seq2seq/run_seq2seq.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/data/data_collator.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Update src/transformers/data/data_collator.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Address review comments
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>

e4c06ed6

Fix TF Flaubert and XLM (#9661) · fa876aee
Julien Plu authored Jan 19, 2021
```
* Fix Flaubert and XLM

* Fix Flaubert and XLM

* Apply style
```
fa876aee

Update integrations.py (#9652) · 11ec7490

max yue authored Jan 20, 2021

File "/share/apps/anaconda3/envs/my_env/lib/python3.7/site-packages/transformers/integrations.py", line 419, in __init__
self._SummaryWriter = SummaryWriter
UnboundLocalError: local variable 'SummaryWriter' referenced before assignment

11ec7490

Update `past_key_values` in GPT-2 (#9596) · b020a736

Yusuke Mori authored Jan 20, 2021



* Update past_key_values in gpt2 (#9391)

* Update generation_utils, and rename some items

* Update modeling_gpt2 to avoid an error in gradient_checkpointing

* Remove 'reorder_cache' from util and add variations to XLNet, TransfoXL, GPT-2

* Change the location of '_reorder_cache' in modeling files

* Add '_reorder_cache' in modeling_ctrl

* Fix a bug of my last commit in CTRL

* Add '_reorder_cache' to GPT2DoubleHeadsModel

* Manage 'use_cache' in config of test_modeling_gpt2

* Clean up the doc string

* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix the doc string (GPT-2, CTRL)

* improve gradient_checkpointing_behavior
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

b020a736

Fix GPT conversion script (#9676) · d302d88b
Sylvain Gugger authored Jan 19, 2021

d302d88b
Fix imports in conversion scripts (#9674) · 053efc5d
Sylvain Gugger authored Jan 19, 2021

053efc5d
add mbart to automodel for masked lm (#9673) · 2390c16f
Patrick von Platen authored Jan 19, 2021

2390c16f

Fix DPRReaderTokenizer's attention_mask (#9663) · 917dbb15

Sergey Mkrtchyan authored Jan 19, 2021

* Fix the attention_mask in DPRReaderTokenizer

* Add an integration test for DPRReader inference

* Run make style

917dbb15

18 Jan, 2021 2 commits

Add head_mask/decoder_head_mask for BART (#9569) · 357fb1c5

Daniel Stancl authored Jan 18, 2021



* Add head_mask/decoder_head_mask for BART

This branch implement head_mask and decoder_head_mask
for BART-based models. Full list below:
- BART
- MBart
- Blenderbot
- BlenderbotSmall
- Marian
- Pegasus

Everything is accompanied with updated testing.

* Fix test_headmasking for BART models

* Fix text_headmasking for BART-like models
which has only 2 layers in each modules.
The condition
```
self.assertNotEqual(attentions[1][..., 0, :, :].flatten().sum().item(), 0.0)
```
is, therefore, invalid for encoder-decoder models considering
the `head_mask`
```
head_mask = torch.ones(
    self.model_tester.num_hidden_layers,
    self.model_tester.num_attention_heads,
    device=torch_device,
)
head_mask[0, 0] = 0
head_mask[-1, :-1] = 0
```
specified in the `test_headmasking` test/function.

* Adjust test_modeling_common.py to reflect T5 input args

* Update tests/test_modeling_common.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* make style

* make fix-copies
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

357fb1c5

Fix: torch.utils.checkpoint import error. (#9626) · 65eb5d9a
Devrim authored Jan 18, 2021

65eb5d9a

15 Jan, 2021 5 commits

deepspeed + grad acumm (#9622) · c60e0e1e
Stas Bekman authored Jan 15, 2021

c60e0e1e

Ignore lm_head decoder bias warning (#9615) · 6d3b688b

Lysandre Debut authored Jan 15, 2021

* Ignore lm_head decoder bias warning

* Revert "Ignore lm_head decoder bias warning"

This reverts commit f25177a9da6ca898e351f46c8b1515971de5c670.

* predictions -> lm_head

6d3b688b

Remove unused token_type_ids in MPNet (#9564) · 8eba1f8c

Julien Plu authored Jan 15, 2021



* Add warning

* Remove unused import

* Fix missing call

* Fix missing call

* Completely remove token_type_ids

* Apply style

* Remove unused import

* Update src/transformers/models/mpnet/modeling_tf_mpnet.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

8eba1f8c

[TF Led] Fix wrong decoder attention mask behavior (#9601) · 90ca8d36
Patrick von Platen authored Jan 15, 2021
```
* fix tf led

* remove loop file
```
90ca8d36
Revert "Gradient accumulation for TFTrainer (#9585)" · 85788bae
Kiyoung Kim authored Jan 15, 2021
```
This reverts commit 3f40070c.
```
85788bae

14 Jan, 2021 6 commits

Upstream (and rename) sortish sampler (#9574) · 329fe274

Sylvain Gugger authored Jan 14, 2021



* Upstream (and rename) sortish sampler

* Use proper sampler

* Update src/transformers/trainer_pt_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

329fe274

Gradient accumulation for TFTrainer (#9585) · 3f40070c

Kiyoung Kim authored Jan 15, 2021



* gradient accumulation for tftrainer

* label naming
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* label naming
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

3f40070c

BatchEncoding.to with device with tests (#9584) · 280db79a
Lysandre Debut authored Jan 14, 2021

280db79a
Make logs tf compliant (#9565) · a26536f0
Julien Plu authored Jan 14, 2021

a26536f0
Compliancy with tf-nightly (#9570) · 14d677ca
Julien Plu authored Jan 14, 2021
```
* Compliancy with tf-nightly

* Add more version + restore min version check
```
14d677ca
Fix Trainer with a parallel model (#9578) · 5e1bea4f
Sylvain Gugger authored Jan 14, 2021
```
* Fix Trainer with a parallel model

* More clean up
```
5e1bea4f

13 Jan, 2021 7 commits

v4.3.0.dev0 · e63cad79
Lysandre authored Jan 13, 2021

e63cad79
Release: v4.2.0 · 7d9a9d0c
Lysandre authored Jan 13, 2021

7d9a9d0c

Fix data parallelism in Trainer (#9566) · 04dc65e5

Sylvain Gugger authored Jan 13, 2021



* Fix data parallelism in Trainer

* Update src/transformers/training_args.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

04dc65e5

Speed up TopKLogitsWarper and TopPLogitsWarper (pytorch) (#9557) · 0c9f01a8
LSinev authored Jan 13, 2021
```
* make TopKLogitsWarper faster

* make TopPLogitsWarper faster
```
0c9f01a8
Fix barthez tokenizer (#9562) · 245cdb46
Lysandre Debut authored Jan 13, 2021

245cdb46
fix BlenderbotSmallTokenizer (#9538) · 69ed3606
Suraj Patil authored Jan 13, 2021
```
* add model_input_names

* fix test
```
69ed3606

[trainer] deepspeed integration (#9211) · 2df34f4a

Stas Bekman authored Jan 12, 2021



* deepspeed integration

* style

* add test

* ds wants to do its own backward

* fp16 assert

* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* style

* for clarity extract what args are being passed to deepspeed

* introduce the concept of self.wrapped_model

* s/self.wrapped_model/self.model_wrapped/

* complete transition to self.wrapped_model / self.model

* fix

* doc

* give ds its own init

* add custom overrides, handle bs correctly

* fix test

* clean up model_init logic, fix small bug

* complete fix

* collapse --deepspeed_config into --deepspeed

* style

* start adding doc notes

* style

* implement hf2ds optimizer and scheduler configuration remapping

* oops

* call get_num_training_steps absolutely when needed

* workaround broken auto-formatter

* deepspeed_config arg is no longer needed - fixed in deepspeed master

* use hf's fp16 args in config

* clean

* start on the docs

* rebase cleanup

* finish up --fp16

* clarify the supported stages

* big refactor thanks to discovering deepspeed.init_distributed

* cleanup

* revert fp16 part

* add checkpoint-support

* more init ds into integrations

* extend docs

* cleanup

* unfix docs

* clean up old code

* imports

* move docs

* fix logic

* make it clear which file it's referring to

* document nodes/gpus

* style

* wrong format

* style

* deepspeed handles gradient clipping

* easier to read

* major doc rewrite

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* docs

* switch to AdamW optimizer

* style

* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* clarify doc
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

2df34f4a

12 Jan, 2021 7 commits

Refactor `prepare_seq2seq_batch` (#9524) · 063d8d27

Sylvain Gugger authored Jan 12, 2021

* Add target contextmanager and rework prepare_seq2seq_batch

* Fix tests, treat BART and Barthez

* Add last tokenizers

* Fix test

* Set src token before calling the superclass

* Remove special behavior for T5

* Remove needless imports

* Remove needless asserts

063d8d27

LayoutLM Config (#9539) · a1100fac
Lysandre Debut authored Jan 12, 2021

a1100fac

Improve LayoutLM (#9476) · e45eba3b

NielsRogge authored Jan 12, 2021



* Add LayoutLMForSequenceClassification and integration tests

Improve docs

Add LayoutLM notebook to list of community notebooks

* Make style & quality

* Address comments by @sgugger, @patrickvonplaten and @LysandreJik

* Fix rebase with master

* Reformat in one line

* Improve code examples as requested by @patrickvonplaten
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

e45eba3b

[T5] enable T5 fp16 (#9487) · ccd1923f
Suraj Patil authored Jan 12, 2021
```
* fix t5 fp16
```
ccd1923f
fix blenderbot tok (#9532) · 2aa9c2f2
Patrick von Platen authored Jan 12, 2021

2aa9c2f2
[ProphetNet] Fix naming and wrong config (#9514) · a051d892
Patrick von Platen authored Jan 12, 2021
```
* fix naming issues

* better names
```
a051d892

[TFBart] Split TF-Bart (#9497) · 7f286132

Patrick von Platen authored Jan 12, 2021

* make templates ready

* make add_new_model_command_ready

* finish tf bart

* prepare tf mbart

* finish tf bart

* add tf mbart

* add marian

* prep pegasus

* add tf pegasus

* push blenderbot tf

* add blenderbot

* add blenderbot small

* clean-up

* make fix copy

* define blend bot tok

* fix

* up

* make style

* add to docs

* add copy statements

* overwrite changes

* improve

* fix docs

* finish

* fix last slow test

* fix missing git conflict line

* fix blenderbot

* up

* fix blenderbot small

* load changes

* finish copied from

* upload fix

7f286132

11 Jan, 2021 2 commits
- [trainer] round numbers in trainer state (#9491) · e6f211ca
  Stas Bekman authored Jan 11, 2021
```
* round numbers

* style

* round only on logging
```
  e6f211ca
- Fix cardinality (#9505) · ba702966
  Julien Plu authored Jan 11, 2021
  
  ba702966