Commits · b2309cc6bffb5b0676b559c5932f1322a59b0811 · chenpangpang / transformers

06 Jul, 2020 2 commits
- Typo fix in `training` doc (#5495) · b2309cc6
  Arnav Sharma authored Jul 06, 2020
  
  b2309cc6
- Fix typo in training (#5510) · 7ecff0cc
  ELanning authored Jul 06, 2020
  
  7ecff0cc
03 Jul, 2020 10 commits

[cleanup] TF T5 tests only init t5-base once. (#5410) · 58cca47c
Sam Shleifer authored Jul 03, 2020

58cca47c
better error message (#5497) · 99117292
Patrick von Platen authored Jul 03, 2020

99117292
unpining specific git versions in setup.py · b58a15a3
Thomas Wolf authored Jul 03, 2020

b58a15a3
Release: 3.0.1 · fedabcd1
Thomas Wolf authored Jul 03, 2020

fedabcd1

Exposing prepare_for_model for both slow & fast tokenizers (#5479) · 17ade127

Lysandre Debut authored Jul 03, 2020



* Exposing prepare_for_model for both slow & fast tokenizers

* Update method signature

* The traditional style commit

* Hide the warnings behind the verbose flag

* update default truncation strategy and prepare_for_model

* fix tests and prepare_for_models methods
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

17ade127

Create model card (#5396) · 814ed7ee

Manuel Romero authored Jul 03, 2020

Create model card for electicidad-small (Spanish Electra) fine-tuned on SQUAD-esv1

814ed7ee

grammar corrections and train data update (#5448) · 49281ac9
Moseli Motsoehli authored Jul 03, 2020
```
- fixed grammar and spelling
- added an intro
- updated Training data references
```
49281ac9
Update upstream (#5456) · 97355339
chrisliu authored Jul 03, 2020

97355339
Create model card (#5464) · 55b932a8
Manuel Romero authored Jul 03, 2020
```
Create model card for electra-small-discriminator fine-tuned on SQUAD v2.0
```
55b932a8

QA Pipelines fixes (#5429) · 21cd8c40

Funtowicz Morgan authored Jul 03, 2020



* Make QA pipeline supports models with more than 2 outputs such as BART assuming start/end are the two first outputs.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* When using the new padding/truncation paradigm setting padding="max_length" + max_length=X actually pads the input up to max_length.

This result in every sample going through QA pipelines to be of size 384 whatever the actual input size is making the overall pipeline very slow.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Mask padding & question before applying softmax. Softmax has been refactored to operate in log space for speed and stability.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Format.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Use PaddingStrategy.LONGEST instead of DO_NOT_PAD
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Revert "When using the new padding/truncation paradigm setting padding="max_length" + max_length=X actually pads the input up to max_length."

This reverts commit 1b00a9a2
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Trigger CI after unattended failure

* Trigger CI

21cd8c40

02 Jul, 2020 11 commits
- Fix roberta model ordering for TFAutoModel (#5414) · 8438bab3
  Pierric Cistac authored Jul 02, 2020
  
  8438bab3
- Tokenizer summary (#5467) · 6b735a72
  Sylvain Gugger authored Jul 02, 2020
```
* Work on tokenizer summary

* Finish tutorial

* Link to it

* Apply suggestions from code review
Co-authored-by: Anthony MOI <xn1t0x@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Add vocab definition
Co-authored-by: Anthony MOI <xn1t0x@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
```
  6b735a72
- Update: ElectraDiscriminatorPredictions forward. (#5471) · ef0e9d80
  Shen authored Jul 02, 2020
```
`ElectraDiscriminatorPredictions.forward` should not need `attention_mask`.
```
  ef0e9d80
- Create model card (#5432) · 13a8588f
  Manuel Romero authored Jul 02, 2020
```
Create model card for electra-base-discriminator fine-tuned on SQUAD v1.1
```
  13a8588f
- [model_cards] roberta-large-mnli: fix sep_token · a0a6387a
  Julien Chaumond authored Jul 02, 2020
  
  a0a6387a
- Create roberta-large-mnli-README.md · 215db688
  Julien Chaumond authored Jul 02, 2020
  
  215db688
- Bans SentencePiece 0.1.92 (#5418) · 69d313e8
  Lysandre Debut authored Jul 02, 2020
  
  69d313e8
- Fix typo in glossary (#5466) · 84e56669
  George Ho authored Jul 02, 2020
  
  84e56669
- Fixing missing arguments for TransfoXL tokenizer when using TextGenerationPipeline (#5465) · c6a510c6
  Teven authored Jul 02, 2020
```
* overriding _parse_and_tokenize in `TextGenerationPipeine` to allow for TransfoXl tokenizer arguments
```
  c6a510c6
- Changed expected_output_ids in TransfoXL generation test (#5462) · 6726416e
  Teven authored Jul 02, 2020
```
* Changed expected_output_ids in TransfoXL generation test to match #4826 generation PR.

* making black happy

* making isort happy
```
  6726416e
- fix use of mems in Transformer-XL (#4826) · 812def00
  tommccoy authored Jul 02, 2020
```
Fixed duplicated memory use in Transformer-XL generation leading to bad predictions and performance.
```
  812def00
01 Jul, 2020 17 commits

Add Reformer MLM notebook (#5450) · 306f1a26
Patrick von Platen authored Jul 02, 2020
```
* Add Reformer MLM notebook

* Update notebooks/README.md
```
306f1a26
[Reformer] Add Masked LM Reformer (#5426) · d16e36c7
Patrick von Platen authored Jul 01, 2020
```
* fix conflicts

* fix

* happy rebasing
```
d16e36c7
Don't discard entity_group when token is the latest in the sequence. (#5439) · f4323dbf
Funtowicz Morgan authored Jul 01, 2020
```
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
```
f4323dbf
Fix tensor label type inference in default collator (#5250) · 35befd9c
Joe Davison authored Jul 01, 2020
```
* allow tensor label inputs to default collator

* replace try/except with type check
```
35befd9c
finish reformer qa head (#5433) · fe81f7d1
Patrick von Platen authored Jul 01, 2020

fe81f7d1

[Longformer] Major Refactor (#5219) · d697b6ca

Patrick von Platen authored Jul 01, 2020

* refactor naming

* add small slow test

* refactor

* refactor naming

* rename selected to extra

* big global attention refactor

* make style

* refactor naming

* save intermed

* refactor functions

* finish function refactor

* fix tests

* fix longformer

* fix longformer

* fix longformer

* fix all tests but one

* finish longformer

* address sams and izs comments

* fix transpose

d697b6ca

[fix] Marian tests import (#5442) · e0d58ddb
Sam Shleifer authored Jul 01, 2020

e0d58ddb

Raises PipelineException on FillMaskPipeline when there are != 1 mask_token in the input (#5389) · 608d5a7c

Funtowicz Morgan authored Jul 01, 2020



* Added PipelineException
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* fill-mask pipeline raises exception when more than one mask_token detected.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Put everything in a function.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Added tests on pipeline fill-mask when input has != 1 mask_token
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fix numel() computation for TF
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Addressing PR comments.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Remove function typing to avoid import on specific framework.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Quality.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Retry typing with @julien-c tip.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Quality².
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Simplify fill-mask mask_token checking.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Trigger CI

608d5a7c

Fix dropdown bug in searches (#5440) · 6c55e9fc
Sylvain Gugger authored Jul 01, 2020
```
* Trigger CI

* Fix dropdown bug in searches
```
6c55e9fc

Clean up diffs in Trainer/TFTrainer (#5417) · 734a28a7

Sylvain Gugger authored Jul 01, 2020



* Cleanup and unify Trainer/TFTrainer

* Forgot to adapt TFTrainingArgs

* In tf scripts n_gpu -> n_replicas

* Update src/transformers/training_args.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments

* Formatting

* Fix typo
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

734a28a7

MarianTokenizer.prepare_translation_batch uses new tokenizer API (#5182) · 43cb03a9
Sam Shleifer authored Jul 01, 2020

43cb03a9
Move tests/utils.py -> transformers/testing_utils.py (#5350) · 13deb95a
Sam Shleifer authored Jul 01, 2020

13deb95a
Trigger CI · 9c219305
sgugger authored Jul 01, 2020

9c219305

Add support for past states (#5399) · 64e3d966

Sylvain Gugger authored Jul 01, 2020

* Add support for past states

* Style and forgotten self

* You mean, documenting is not enough? I have to actually add it too?

* Add memory support during evaluation

* Fix tests in eval and add TF support

* No need to change this line anymore

64e3d966

Fix examples titles and optimization doc page (#5408) · 4ade7491
Sylvain Gugger authored Jul 01, 2020

4ade7491

Create README.md (#5422) · d60d231e

Moseli Motsoehli authored Jun 30, 2020



* Create README.md

* Update model_cards/MoseliMotsoehli/TswanaBert/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>

d60d231e

Create model card for schmidek/electra-small-cased (#5400) · 298bdab1
Jay authored Jul 01, 2020

298bdab1