Commits · b169ac9c2ba62c828000516dbce1af9126ca25ab · chenpangpang / transformers

10 Apr, 2020 3 commits

[examples] Generate argparsers from type hints on dataclasses (#3669) · b169ac9c

Julien Chaumond authored Apr 10, 2020

* [examples] Generate argparsers from type hints on dataclasses

* [HfArgumentParser] way simpler API

* Restore run_language_modeling.py for easier diff

* [HfArgumentParser] final tweaks from code review

b169ac9c

Multilingual BART - (#3602) · 7a7fdf71
Sam Shleifer authored Apr 10, 2020
```
- support mbart-en-ro weights
- add MBartTokenizer
```
7a7fdf71

Big cleanup of `glue_convert_examples_to_features` (#3688) · f98d0ef2

Julien Chaumond authored Apr 10, 2020

* Big cleanup of `glue_convert_examples_to_features`

* Use batch_encode_plus

* Cleaner wrapping of glue_convert_examples_to_features for TF

@lysandrejik

* Cleanup syntax, thanks to @mfuntowicz

* Raise explicit error in case of user error

f98d0ef2

09 Apr, 2020 5 commits

[T5, generation] Add decoder caching for T5 (#3682) · ce2298fb

Patrick von Platen authored Apr 10, 2020



* initial commit to add decoder caching for T5

* better naming for caching

* finish T5 decoder caching

* correct test

* added extensive past testing for T5

* clean files

* make tests cleaner

* improve docstring

* improve docstring

* better reorder cache

* make style

* Update src/transformers/modeling_t5.py
Co-Authored-By: Yacine Jernite <yjernite@users.noreply.github.com>

* make set output past work for all layers

* improve docstring

* improve docstring
Co-authored-by: Yacine Jernite <yjernite@users.noreply.github.com>

ce2298fb

Fix force_download of files on Windows (#3697) · 9384e5f6
calpt authored Apr 09, 2020

9384e5f6
[Exbert] Change style of button · bc65afc4
Julien Chaumond authored Apr 09, 2020

bc65afc4
Update quotes · 31baeed6
LysandreJik authored Apr 09, 2020
```
cc @julien-c
```
31baeed6
Correct transformers-cli env call · f8208fa4
Teven authored Apr 09, 2020

f8208fa4

08 Apr, 2020 6 commits
- Updating the TensorFlow models to work as expected with tokenizers v3.0.0 (#3684) · 6435b9f9
  Lysandre Debut authored Apr 08, 2020
```
* Updating modeling tf files; adding tests

* Merge `encode_plus` and `batch_encode_plus`
```
  6435b9f9
- close #3699 · 500aa123
  LysandreJik authored Apr 08, 2020
  
  500aa123
- More doc for model cards (#3698) · a594ee9c
  Julien Chaumond authored Apr 08, 2020
```
see https://github.com/huggingface/transformers/pull/3679#pullrequestreview-389368270
```
  a594ee9c
- Update doc for {Summarization,Translation}Pipeline and other tweaks · 83703cd0
  Julien Chaumond authored Apr 07, 2020
  
  83703cd0
- Created README.md for model card ChemBERTa (#3666) · a1b3b416
  Seyone Chithrananda authored Apr 08, 2020
```
* created readme.md

* update readme with fixes

Fixes from PR comments
```
  a1b3b416
- Fix typo in FeatureExtractionPipeline docstring · 747907dc
  Lorenzo Ampil authored Apr 03, 2020
  
  747907dc
07 Apr, 2020 8 commits
- [Bart] Replace config.output_past with use_cache kwarg (#3632) · 715aa5b1
  Sam Shleifer authored Apr 07, 2020
  
  715aa5b1
- [examples] SummarizationDataset cleanup (#3451) · e344e3d4
  Sam Shleifer authored Apr 07, 2020
  
  e344e3d4
- [Tokenization] fix edge case for bert tokenization (#3517) · b0ad0695
  Patrick von Platen authored Apr 07, 2020
```
* fix egde gase for bert tokenization

* add Lysandres comments for improvement

* use new is_pretokenized_flag
```
  b0ad0695
- [Examples, Benchmark] Improve benchmark utils (#3674) · 80fa0f78
  Patrick von Platen authored Apr 07, 2020
```
* improve and add features to benchmark utils

* update benchmark style

* remove output files
```
  80fa0f78
- Optimize causal mask using torch.where (#2715) · 05deb52d
  Michael Pang authored Apr 07, 2020
```
* Optimize causal mask using torch.where

Instead of multiplying by 1.0 float mask, use torch.where with a bool mask for increased performance.

* Maintain compatiblity with torch 1.0.0 - thanks for PR feedback

* Fix typo

* reformat line for CI
```
  05deb52d
- Speedup torch summarization tests (#3663) · 0a4b1068
  Sam Shleifer authored Apr 07, 2020
  
  0a4b1068
- Fix roberta checkpoint conversion script (#3642) · 5aa8a278
  Myle Ott authored Apr 07, 2020
  
  5aa8a278
- [model_cards] Turn down spurious warnings · 11cc1e16
  Julien Chaumond authored Apr 07, 2020
```
Close #3639 + spurious warning mentioned in #3227

cc @lysandrejik @thomwolf
```
  11cc1e16
06 Apr, 2020 18 commits

fixed TransfoXLLMHeadModel documentation (#3661) · 0a9d09b4
Teven authored Apr 07, 2020
```
Co-authored-by: TevenLeScao <teven.lescao@gmail.com>
```
0a9d09b4

Tokenizers v3.0.0 (#3185) · 96ab75b8

Funtowicz Morgan authored Apr 06, 2020



* Renamed num_added_tokens to num_special_tokens_to_add
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Cherry-Pick: Partially fix space only input without special tokens added to the output #3091
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added property is_fast on PretrainedTokenizer and PretrainedTokenizerFast
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Make fast tokenizers unittests work on Windows.

* Entirely refactored unittest for tokenizers fast.

* Remove ABC class for CommonFastTokenizerTest

* Added embeded_special_tokens tests from allenai @dirkgr

* Make embeded_special_tokens tests from allenai more generic

* Uniformize vocab_size as a property for both Fast and normal tokenizers

* Move special tokens handling out of PretrainedTokenizer (SpecialTokensMixin)

* Ensure providing None input raise the same ValueError than Python tokenizer + tests.

* Fix invalid input for assert_padding when test...

96ab75b8

Fix RoBERTa/XLNet Pad Token in run_multiple_choice.py (#3631) · e52d1258

Ethan Perez authored Apr 06, 2020

* Fix RoBERTa/XLNet Pad Token in run_multiple_choice.py

`convert_examples_to_fes atures` sets `pad_token=0` by default, which is correct for BERT but incorrect for RoBERTa (`pad_token=1`) and XLNet (`pad_token=5`). I think the other arguments to `convert_examples_to_features` are correct, but it might be helpful if someone checked who is more familiar with this part of the codebase.

* Simplifying change to match recent commits

e52d1258

Create README.md · 0ac33ddd
ktrapeznikov authored Apr 06, 2020

0ac33ddd
Add model card · 326e6eba
Manuel Romero authored Apr 06, 2020

326e6eba
Add model card · 43eca3f8
Manuel Romero authored Apr 06, 2020

43eca3f8
Create README.md · 6bec88ca
Manuel Romero authored Apr 06, 2020

6bec88ca
Add model card (#3655) · 769b60f9
Manuel Romero authored Apr 06, 2020
```
* Add model card

* Fix model name in fine-tuning script
```
769b60f9
Create model card (#3654) · c4bcb019
Manuel Romero authored Apr 06, 2020
```
* Create model card

* Fix model name in fine-tuning script
```
c4bcb019
Create README.md · 6903a987
Manuel Romero authored Apr 06, 2020

6903a987
Create README.md (#3662) · 760872db
MichalMalyska authored Apr 06, 2020

760872db
Add model card for BERTeus (#3649) · 47e1334c
jjacampos authored Apr 06, 2020
```
* Add model card for BERTeus

* Update README
```
47e1334c

BioMed Roberta-Base (AllenAI) (#3643) · 529534dc

Suchin authored Apr 06, 2020



* added model card

* updated README

* updated README

* updated README

* added evals

* removed pico eval

* Tweaks
Co-authored-by: Julien Chaumond <chaumond@gmail.com>

529534dc

Update notebooks (#3620) · 261c4ff4

Lysandre Debut authored Apr 06, 2020

* Update notebooks

* From local to global link

* from local links to *actual* global links

261c4ff4

[model_cards] ELECTRA (w/ examples of usage) · 39a34cc3

Julien Chaumond authored Apr 06, 2020


Co-Authored-By: Kevin Clark <clarkkev@users.noreply.github.com>
Co-Authored-By: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

39a34cc3

Re-pin isort · ea6dba27
LysandreJik authored Apr 06, 2020

ea6dba27
unpin isort for pypi · 11c3257a
LysandreJik authored Apr 06, 2020

11c3257a
Release: v2.8.0 · 36bffc81
LysandreJik authored Apr 06, 2020

36bffc81