Commits · 189387e9b2e1d6d1a0fb8355fd01e9a89fdb3e4a · chenpangpang / transformers

05 Jan, 2021 7 commits

LED (#9278) · 189387e9

Patrick von Platen authored Jan 05, 2021

* create model

* add integration

* save current state

* make integration tests pass

* add one more test

* add explanation to tests

* remove from bart

* add padding

* remove unnecessary test

* make all tests pass

* re-add cookie cutter tests

* finish PyTorch

* fix attention test

* Update tests/test_modeling_common.py

* revert change

* remove unused file

* add string to doc

* save intermediate

* make tf integration tests pass

* finish tf

* fix doc

* fix docs again

* add led to doctree

* add to auto tokenizer

* added tips for led

* make style

* apply jplus statements

* correct tf longformer

* apply lysandres suggestions

* apply sylvains suggestions

* Apply suggestions from code review

189387e9

Fix TF Funnel (#9300) · 52d62e68

Julien Plu authored Jan 05, 2021

* Fix Funnel

* Apply Patrick's comment

* Remove comment

* Fix dummy value

* Apply style

52d62e68

[trainer] --model_parallel hasn't been implemented for most models (#9347) · 748006c0

Stas Bekman authored Jan 05, 2021

* --model_parallel hasn't been implemented for most models

* make the help clear as well

* implement is_parallelizable; use it

* oops

* remove property

748006c0

Use stable functions (#9369) · 4225740a
Julien Plu authored Jan 05, 2021

4225740a

[logging] autoflush (#9385) · 4aa8f6ad

Stas Bekman authored Jan 05, 2021

This PR proposes to:

* auto-flush `transformers` logging 

When using logging for tracing signals from different parts of the code and which could be mixed with print debug this aids to get all the logging events synchronized. 

I don't think this change will introduce any performance impacts.

If it helps someone here is the code I used to sync `transformers` logging with various other debug prints.

I was porting bart to MP and I needed to trace that the device switching happens correctly and I added a bunch of logger.info calls inside `modeling_bart.py` and also had some other helpers `print` debug messages which weren't logger based:

```

# auto flush std streams
from sys import stdout, stderr
def stdout_write_flush(args, w=stderr.write): w(args); stderr.flush()
def stderr_write_flush(args, w=stderr.write): w(args); stderr.flush()
stdout.write = stdout_write_flush
stderr.write = stderr_write_flush

from transformers import BartTokenizer, BartForConditionalGeneration, BartConfig

import logging
import transformers.utils.logging
import transformers.models.bart.modeling_bart

# I wanted a shorter simpler format
handlers = transformers.utils.logging._get_library_root_logger().handlers
for handler in handlers:
    formatter = logging.Formatter("[%(funcName)s] %(message)s")
    handler.setFormatter(formatter)

transformers.models.bart.modeling_bart.logger.setLevel(transformers.logging.INFO)
```

@LysandreJik, @sgugger, @patrickvonplaten

4aa8f6ad

Fix TF Longformer (#9348) · 83eec97e

Julien Plu authored Jan 05, 2021

* Fix longformer

* Apply style

* Remove serving content

* Forgot a condition

* Apply style

* Address Patrick's comments

* Fix dtype

83eec97e

feat(wandb): save model as artifact (#8119) · 30fa0b78

Boris Dayma authored Jan 05, 2021

* feat(wandb): log artifacts

* fix: typo

* feat(wandb): ensure name is allowed

* feat(wandb): log artifact

* feat(wandb): saving logic

* style: improve formatting

* fix: unrelated typo

* feat: use a fake trainer

* fix: simplify

* feat(wandb): log model files as artifact

* style: fix style

* docs(wandb): correct description

* feat: unpack model + allow env Truethy values

* feat: TrainerCallback can access tokenizer

* style: fix style

* feat(wandb): log more interesting metadata

* feat: unpack tokenizer

* feat(wandb): metadata with load_best_model_at_end

* feat(wandb): more robust metadata

* style(wandb): fix formatting

30fa0b78

04 Jan, 2021 6 commits

replace apex.normalization.FusedLayerNorm with torch.nn.LayerNorm (#9386) · 47ca0eaa
Stas Bekman authored Jan 04, 2021

47ca0eaa

Fix TF DPR (#9283) · ec54d70e

Julien Plu authored Jan 04, 2021

* Fix DPR

* Keep usual models

* Apply style

* Address Sylvain's comments

ec54d70e

[trainer] parametrize default output_dir (#9352) · d018afce

Stas Bekman authored Jan 04, 2021

This PR:

* fixes trainer to have the logger agree with the actual default `output_dir`, but setting it one place and passing it as an argument to both places

@sgugger

d018afce

Fix Flaubert (#9292) · d735b074
Julien Plu authored Jan 04, 2021

d735b074
Fix CTRL (#9291) · 6c03d4ac
Julien Plu authored Jan 04, 2021

6c03d4ac

Add utility function for retrieving locally cached models (#8836) · c581d8af

Charles authored Jan 04, 2021



* add get_cached_models function

* add List type to import

* fix code quality

* Update src/transformers/file_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/file_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/file_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/file_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/file_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

c581d8af

02 Jan, 2021 3 commits
- file_utils.py: TF examples outputs.last_hidden_states -> state (#9382) · c4fd609a
  Chris Kennedy authored Jan 02, 2021
  
  c4fd609a
- [Docs] `past_key_values` return a tuple of tuple as a default (#9381) · b01f451c
  Patrick von Platen authored Jan 02, 2021
```
* push

* make style
```
  b01f451c
- use return dict for rag encoder (#9363) · 5f7a07c0
  Derrick Blakely authored Jan 02, 2021
  
  5f7a07c0
30 Dec, 2020 1 commit
- torch.cuda.is_available() is redundant as apex handles that internally (#9350) · ae333d04
  Stas Bekman authored Dec 30, 2020
  
  ae333d04
29 Dec, 2020 1 commit

[prophetnet] wrong import (#9349) · 8217d4e3

Stas Bekman authored Dec 29, 2020

```
python -c "from apex.normalization import FusedProphetNetLayerNorm"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: cannot import name 'FusedProphetNetLayerNorm' from 'apex.normalization' (/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/apex/normalization/__init__.py)
```
It looks like this code has never been tested, so it silently fails inside try/except.

Discovered this by accident in https://github.com/huggingface/transformers/issues/9338#issuecomment-752217708

8217d4e3

28 Dec, 2020 2 commits
- Fix TransfoXL (#9302) · 64103fb6
  Julien Plu authored Dec 28, 2020
  
  64103fb6
- Fix TF T5 (#9301) · d97d06d0
  Julien Plu authored Dec 28, 2020
```
* Fix T5

* Fix test

* Fix test
```
  d97d06d0
27 Dec, 2020 1 commit
- push (#9320) · 8e74eca7
  Patrick von Platen authored Dec 27, 2020
  
  8e74eca7
25 Dec, 2020 1 commit

[GPT2] Correct gradient checkpointing (#9308) · 61443cd7

Patrick von Platen authored Dec 25, 2020

* correct gpt2

* fix gpt2

* fix use_cache ordering

* correct past tolerance

* fix for all cases

* style

61443cd7

24 Dec, 2020 6 commits

Update tokenization_utils_base.py (#9293) · 7777db15
Bram Vanroy authored Dec 24, 2020
```
Missing "s" typo
```
7777db15

fix typo in modeling_encoder_decoder.py (#9297) · 71963a66

Daniele Sartiano authored Dec 24, 2020



* Update modeling_encoder_decoder.py

Fixed typo.

* typo
Co-authored-by: Suraj Patil <surajp815@gmail.com>

71963a66

Proposed Fix : [RagSequenceForGeneration] generate "without" input_ids (#9220) · f3a3b91d

Ratthachat (Jung) authored Dec 24, 2020

* Create modeling_tf_dpr.py

* Add TFDPR

* Add back TFPegasus, TFMarian, TFMBart, TFBlenderBot

last commit accidentally deleted these 4 lines, so I recover them back

* Add TFDPR

* Add TFDPR

* clean up some comments, add TF input-style doc string

* Add TFDPR

* Make return_dict=False as default

* Fix return_dict bug (in .from_pretrained)

* Add get_input_embeddings()

* Create test_modeling_tf_dpr.py

The current version is already passed all 27 tests!
Please see the test run at : 
https://colab.research.google.com/drive/1czS_m9zy5k-iSJbzA_DP1k1xAAC_sdkf?usp=sharing



* fix quality

* delete init weights

* run fix copies

* fix repo consis

* del config_class, load_tf_weights

They shoud be 'pytorch only'

* add config_class back

after removing it, test failed ... so totally only removing "use_tf_weights = None" on Lysandre suggestion

* newline after .. note::

* import tf, np (Necessary for ModelIntegrationTest)

* slow_test from_pretrained with from_pt=True

At the moment we don't have TF weights (since we don't have official official TF model)
Previously, I did not run slow test, so I missed this bug

* Add simple TFDPRModelIntegrationTest

Note that this is just a test that TF and Pytorch gives approx. the same output.
However, I could not test with the official DPR repo's output yet

* upload correct tf model

* remove position_ids as missing keys

* fix RagSeq generate with context_input_ids

fix RagSeq generate with context_input_ids

* apply style

* delete unused lines

* Add test_rag_sequence_generate_batch_from_context_input_ids

* Readability improved

* stylying

* Stylize

* typos

* add check_model_generate_from_context_input_ids

* make style

* Apply suggestions from code review

* make style2
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: patrickvonplaten <patrick@huggingface.co>

f3a3b91d

enable cache by default (#9296) · 2a18b709
Suraj Patil authored Dec 24, 2020

2a18b709
Fix typo in file_utils.py (#9289) · 6189ae99
Jungwhan authored Dec 24, 2020

6189ae99
allow integer device for BatchEncoding (#9271) · 222dbdb2
Jethro Kuan authored Dec 24, 2020
```
Fixes #9244
Co-authored-by: Jethro Kuan <jethro.kuan@bytedance.com>
```
222dbdb2

23 Dec, 2020 3 commits

Add caching mechanism to BERT, RoBERTa (#9183) · 88ef8893

Suraj Patil authored Dec 23, 2020

* add past_key_values

* add use_cache option

* make mask before cutting ids

* adjust position_ids according to past_key_values

* flatten past_key_values

* fix positional embeds

* fix _reorder_cache

* set use_cache to false when not decoder, fix attention mask init

* add test for caching

* add past_key_values for Roberta

* fix position embeds

* add caching test for roberta

* add doc

* make style

* doc, fix attention mask, test

* small fixes

* adress patrick's comments

* input_ids shouldn't start with pad token

* use_cache only when decoder

* make consistent with bert

* make copies consistent

* add use_cache to encoder

* add past_key_values to tapas attention

* apply suggestions from code review

* make coppies consistent

* add attn mask in tests

* remove copied from longformer

* apply suggestions from code review

* fix bart test

* nit

* simplify model outputs

* fix doc

* fix output ordering

88ef8893

Fix param error (#9273) · 4bafc43b

Xu Song authored Dec 23, 2020

TypeError: forward() got an unexpected keyword argument 'token_type_ids'

4bafc43b

Fix gpt2 document (#9272) · 58e8a761
Xu Song authored Dec 23, 2020

58e8a761

22 Dec, 2020 5 commits

Model Templates for Seq2Seq (#9251) · cbe63949

Patrick von Platen authored Dec 22, 2020

* adapt cookie cutter

* fix copy past statement

* delete copy statements for now

* remove unused import from template

* make doc rst

* correct config docstring

* correct training

* correct inputs processing tf enc dec

* make style

* adapt templates

* clean tabs

* correct tensor -> Tensor naming

* correct indent

* correct templates

* fix the test

* break lines to avoid > 119

* Apply suggestions from code review

cbe63949

[hf_api] Fix incorrect typing · 5b5f7dd0
Julien Chaumond authored Dec 22, 2020

5b5f7dd0

Fix TF BART for saved model creation (#9252) · 1558d191

Julien Plu authored Dec 22, 2020



* Fix TF BART for saved model creation

* Apply style

* Update src/transformers/models/bart/modeling_tf_bart.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/bart/modeling_tf_bart.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Rework the fix

* Fix condition

* Apply style

* Fix condition

* Fix shape_list

* Apply Patrick's solution

* Apply Patrick's solution

* Rebase

* make tests pass
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>

1558d191

Seq2seq trainer (#9241) · 490b39e6

Sylvain Gugger authored Dec 22, 2020



* Add label smoothing in Trainer

* Add options for scheduler and Adafactor in Trainer

* Put Seq2SeqTrainer in the main lib

* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Address review comments and adapt scripts

* Documentation

* Move test not using script to tests folder
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

490b39e6

[EncoderDecoder] Make tests more aggressive (#9256) · e9d77ccd

Patrick von Platen authored Dec 22, 2020

* add tests

* make style and fix bart bug

* fix bart past key value edge case

* correct tf bart test

* fix gpt2 tf

* fix t5 test

e9d77ccd

21 Dec, 2020 4 commits
- [MPNet] Add slow to fast tokenizer converter (#9233) · 9a12b969
  Patrick von Platen authored Dec 21, 2020
```
* add converter

* delet unnecessary comments
```
  9a12b969
- add base model classes to bart subclassed models (#9230) · f4432b7e
  Suraj Patil authored Dec 21, 2020
```
* add base model classes to  bart subclassed models

* add doc
```
  f4432b7e
- Fixed beam search generation for GPT2 and T5 (#9219) · 08abdabd
  TobiasNorlund authored Dec 21, 2020
  
  08abdabd
- Improve BERT-like models performance with better self attention (#9124) · 5a8a4eb1
  Julien Plu authored Dec 21, 2020
```
* Improve BERT-like models attention layers

* Apply style

* Put back error raising instead of assert

* Update template

* Fix copies

* Apply raising valueerror in MPNet

* Restore the copy check for the Intermediate layer in Longformer

* Update longformer
```
  5a8a4eb1