Commits · 8594dd80dd49b388fe2ba29c68e609a8d3ace9f6 · chenpangpang / transformers

03 Apr, 2020 1 commit
- BertJapaneseTokenizer accept options for mecab (#3566) · 8594dd80
  Yohei Tamura authored Apr 04, 2020
```
* BertJapaneseTokenizer accept options for mecab

* black

* fix mecab_option to Option[str]
```
  8594dd80
02 Apr, 2020 2 commits
- delete bogus print statement (#3595) · f68d2285
  Patrick von Platen authored Apr 02, 2020
  
  f68d2285
- [Encoder-Decoder] Force models outputs to always have batch_size as their first dim (#3536) · 390c1285
  Patrick von Platen authored Apr 02, 2020
```
* solve conflicts

* improve comments
```
  390c1285
01 Apr, 2020 4 commits
- [T5, TF 2.2] change tf t5 argument naming (#3547) · a4ee4da1
  Patrick von Platen authored Apr 01, 2020
```
* change tf t5 argument naming for TF 2.2

* correct bug in testing
```
  a4ee4da1
- fix bug in warnings T5 pipelines (#3545) · 06dd5975
  Patrick von Platen authored Apr 01, 2020
  
  06dd5975
- Correct output shape for Bert NSP models in docs (#3482) · 9de9ceb6
  Anirudh Srinivasan authored Apr 02, 2020
  
  9de9ceb6
- [T5, Testst] Add extensive hard-coded integration tests and make sure PT and... · b815edf6
  Patrick von Platen authored Apr 01, 2020
```
[T5, Testst] Add extensive hard-coded integration tests and make sure PT and TF give equal results (#3550)

* add some t5 integration tests

* finish summarization and translation integration tests for T5 - results loook good

* add tf test

* fix == vs is bug

* fix tf beam search error and make tf t5 tests pass
```
  b815edf6
31 Mar, 2020 2 commits
- [Generate] Add bad words list argument to the generate function (#3367) · b38d552a
  Patrick von Platen authored Mar 31, 2020
```
* add bad words list

* make style

* add bad_words_tokens

* make style

* better naming

* make style

* fix typo
```
  b38d552a
- remove useless and confusing lm_labels line (#3531) · 55bcae7f
  Patrick von Platen authored Mar 31, 2020
  
  55bcae7f
30 Mar, 2020 7 commits

[BART] Update encoder and decoder on set_input_embedding (#3501) · 1f728657
dougian authored Mar 30, 2020
```
Co-authored-by: Ioannis Douratsos <ioannisd@amazon.com>
```
1f728657
[InputExample] Unfreeze for now, cf. #3423 · cc598b31
Julien Chaumond authored Mar 30, 2020

cc598b31

Update the NER TF script (#3511) · d38bbb22

Julien Plu authored Mar 30, 2020



* Update the NER TF script to remove the softmax and make the pad token label id to -1

* Reformat the quality and style
Co-authored-by: Julien Plu <julien.plu@adevinta.com>

d38bbb22

Release: v2.7.0 · 6f5a12a5
LysandreJik authored Mar 30, 2020

6f5a12a5
fix lm lables in docstring (#3529) · 296252c4
Patrick von Platen authored Mar 30, 2020

296252c4

[T5] make decoder input ids optional for t5 training (#3521) · 75ec6c9e

Patrick von Platen authored Mar 30, 2020

* make decoder input ids optional for t5 training

* lm_lables should not be shifted in t5

* add tests

* finish shift right functionality for PT T5

* move shift right to correct class

* cleaner code

* replace -100 values with pad token id

* add assert statement

* remove unnecessary for loop

* make style

75ec6c9e

[T5] Add training documenation (#3507) · 5b44e0a3

Patrick von Platen authored Mar 30, 2020

* Add clear description of how to train T5

* correct docstring in T5

* correct typo

* correct docstring format

* update t5 model docs

* implement collins feedback

* fix typo and add more explanation for sentinal tokens

* delete unnecessary todos

5b44e0a3

29 Mar, 2020 1 commit
- [BART] add bart-large-xsum weights (#3422) · f6a23d19
  Sam Shleifer authored Mar 29, 2020
  
  f6a23d19
27 Mar, 2020 3 commits
- Add T5 to docs (#3461) · fa9af246
  Patrick von Platen authored Mar 27, 2020
```
* add t5 docs basis

* improve docs

* add t5 docs

* improve t5 docstring

* add t5 tokenizer docstring

* finish docstring

* make style

* add pretrained models

* correct typo

* make examples work

* finalize docs
```
  fa9af246
- Correct indentation in docstring · e2c05f06
  LysandreJik authored Mar 27, 2020
```
For some reason Sphinx extremely dislikes this and crashes.
```
  e2c05f06
- [Bart/Memory] Two separate, smaller decoder attention masks (#3371) · 3ee431dd
  Sam Shleifer authored Mar 26, 2020
  
  3ee431dd
26 Mar, 2020 10 commits

[Bart: example] drop columns that are exclusively pad_token_id… (#3400) · c10decf7

Sam Shleifer authored Mar 26, 2020

* trim seq_len below 1024 if there are columns full of pad_token_id
* Centralize trim_batch so SummarizationDataset can use it too

c10decf7

[Bart/Memory] SelfAttention only returns weights if config.outp… (#3369) · 63f4d8ca
Sam Shleifer authored Mar 26, 2020

63f4d8ca
[Bart] Fix: put dummy_inputs on correct device (#3398) · 2b2a2f8d
Sam Shleifer authored Mar 26, 2020
```
* Dummy inputs to model.device

* Move self.device to ModuleUtilsMixin
```
2b2a2f8d
[Seq2Seq Generation] Call encoder before expanding input_ids (#3370) · 1a5aefc9
Sam Shleifer authored Mar 26, 2020

1a5aefc9
[Bart/Memory] don't create lm_head (#3323) · 39371ee4
Sam Shleifer authored Mar 26, 2020
```
* delete lm_head, skips weight tying
* Fixed s3
```
39371ee4

Add missing token classification for XLM (#3277) · 1a6c546c

sakares saengkaew authored Mar 26, 2020



* Add the missing token classification for XLM

* fix styling

* Add XLMForTokenClassification to AutoModelForTokenClassification class

* Fix docstring typo for non-existing class

* Add the missing token classification for XLM

* fix styling

* fix styling

* Add XLMForTokenClassification to AutoModelForTokenClassification class

* Fix docstring typo for non-existing class

* Add missing description for AlbertForTokenClassification

* fix styling

* Add missing docstring for AlBert

* Slow tests should be slow
Co-authored-by: Sakares Saengkaew <s.sakares@gmail.com>
Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>

1a6c546c

rename string in pipeline · 31197054
Patrick von Platen authored Mar 26, 2020

31197054

Adds translation pipeline (#3419) · 022e8fab

Patrick von Platen authored Mar 26, 2020

* fix merge conflicts

* add t5 summarization example

* change parameters for t5 summarization

* make style

* add first code snippet for translation

* only add prefixes

* add prefix patterns

* make style

* renaming

* fix conflicts

* remove unused patterns

* solve conflicts

* fix merge conflicts

* remove translation example

* remove summarization example

* make sure tensors are in numpy for float comparsion

* re-add t5 config

* fix t5 import config typo

* make style

* remove unused numpy statements

* update doctstring

* import translation pipeline

022e8fab

Add t5 to pipeline(task='summarization') (#3413) · 9c683ef0

Patrick von Platen authored Mar 26, 2020

* solve conflicts

* move warnings below

* incorporate changes

* add pad_to_max_length to pipelines

* add bug fix for T5 beam search

* add prefix patterns

* make style

* fix conflicts

* adapt pipelines for task specific parameters

* improve docstring

* remove unused patterns

9c683ef0

Force the return of token type IDs (#3439) · ffcffebe
Lysandre Debut authored Mar 26, 2020

ffcffebe

25 Mar, 2020 2 commits

Extend config with task specific configs. (#3433) · ffa17fe3
Patrick von Platen authored Mar 25, 2020
```
* add new default configs

* change prefix default to None
```
ffa17fe3

Experiment w/ dataclasses (including Py36) (#3423) · 83272a38

Julien Chaumond authored Mar 25, 2020

* [ci] Also run test_examples in py37

(will revert at the end of the experiment)

* InputExample: use immutable dataclass

* [deps] Install dataclasses for Py<3.7

* [skip ci] Revert "[ci] Also run test_examples in py37"

This reverts commit d29afd9959786b77759b0b8fa4e6b4335b952015.

83272a38

24 Mar, 2020 3 commits
- Expose missing mappings (see #3415) · f8823bad
  Julien Chaumond authored Mar 24, 2020
  
  f8823bad
- Release: v2.6.0 · 471cce24
  LysandreJik authored Mar 24, 2020
  
  471cce24
- [examples] Use AutoModels in more examples · a8e3336a
  Julien Chaumond authored Mar 23, 2020
  
  a8e3336a
23 Mar, 2020 1 commit
- [ALBERT] move things around for more consistent naming · e25c4f40
  Julien Chaumond authored Mar 23, 2020
```
see #3359

cc @lysandrejik
```
  e25c4f40
20 Mar, 2020 1 commit
- Clean special token init in modeling_....py (#3264) · 95e00d08
  Patrick von Platen authored Mar 20, 2020
```
* make style

* fix conflicts
```
  95e00d08
19 Mar, 2020 3 commits

Simpler Error message when loading config/model with .from_pretrained() (#3341) · ecfd3363
Julien Chaumond authored Mar 19, 2020

ecfd3363

Support T5 Generation (#3228) · bbf26c4e

Patrick von Platen authored Mar 19, 2020



* fix conflicts

* update bart max length test

* correct spelling mistakes

* implemented model specific encode function

* fix merge conflicts

* better naming

* save intermediate state -> need to rethink strucuture a bit

* leave tf problem as it is for now

* current version

* add layers.pop

* remove ipdb

* make style

* clean return cut decoding

* remove ipdbs

* Fix restoring layers in the decoders that doesnt exists.

* push good intermediate solution for now

* fix conflicts

* always good to refuse to merge conflicts when rebasing

* fix small bug

* improve function calls

* remove unused file

* add correct scope behavior for t5_generate
Co-authored-by: Morgan Funtowicz <funtowiczmo@gmail.com>

bbf26c4e

Export ALBERT main layer in TensorFlow (#3354) · f049be7a
Lysandre Debut authored Mar 19, 2020

f049be7a