Commits · bc00b37a0d12279762043ce12a527ef112f78404 · chenpangpang / transformers

09 Oct, 2020 8 commits
- Revert "Better model links in the README and index" · bc00b37a
  sgugger authored Oct 09, 2020
```
This reverts commit 76e05518.
```
  bc00b37a
- Better model links in the README and index · 76e05518
  sgugger authored Oct 09, 2020
  
  76e05518
- Fix dataset cardinality (#7678) · 9ad83059
  Julien Plu authored Oct 09, 2020
```
* Fix test

* Fix cardinality issue

* Fix test
```
  9ad83059
- add license to xlm-roberta-large-xnli card · a1ac0828
  Joe Davison authored Oct 09, 2020
  
  a1ac0828
- Reintroduce clean_text on BertTokenizer call which was removed by mistake in #4723 (#5749) · 21ed3a6b
  Funtowicz Morgan authored Oct 09, 2020
```
* Reintroduce clean_text call which was removed by mistake in #4723
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Added unittest for clean_text parameter on Bert tokenizer.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Better unittest name.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Adapt unittest to use untrained tokenizer.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Code quality + update test
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
```
  21ed3a6b
- Update XLM-RoBERTa details (#7669) · 5668fdb0
  Noah Trenaman authored Oct 09, 2020
  
  5668fdb0
- fix nn.DataParallel compatibility with PyTorch 1.5 (#7671) · 0578a913
  guhur authored Oct 09, 2020
```
The same type of errors as in https://github.com/huggingface/transformers/pull/4300
```
  0578a913
- [s2s] Switch README urls to cdn (#7670) · 297233fa
  Sam Shleifer authored Oct 08, 2020
  
  297233fa
08 Oct, 2020 8 commits

[pseudo] Switch URLS to CDN (#7661) · a1ecc90d
Sam Shleifer authored Oct 08, 2020

a1ecc90d
[s2s] configure lr_scheduler from command line (#7641) · 06a973fd
Suraj Patil authored Oct 08, 2020

06a973fd

Fix RobertaForCausalLM docs (#7642) · 4a00613c

Lysandre Debut authored Oct 08, 2020



* Fix RobertaForCausalLM docs

* Apply review suggestion
Co-authored-by: sgugger <sylvain.gugger@gmail,com>
Co-authored-by: sgugger <sylvain.gugger@gmail,com>

4a00613c

Green tests: update torch-hub test dependencies (add protobuf and pin tokenizer 0.9.0-RC2) (#7658) · 55cb2ee6
Thomas Wolf authored Oct 08, 2020
```
* pin torch-hub test

* add protobuf dep
```
55cb2ee6

Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove... · 9aeacb58

Thomas Wolf authored Oct 08, 2020


Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141)

* [WIP] SP tokenizers

* fixing tests for T5

* WIP tokenizers

* serialization

* update T5

* WIP T5 tokenization

* slow to fast conversion script

* Refactoring to move tokenzier implementations inside transformers

* Adding gpt - refactoring - quality

* WIP adding several tokenizers to the fast world

* WIP Roberta - moving implementations

* update to dev4 switch file loading to in-memory loading

* Updating and fixing

* advancing on the tokenizers - updating do_lower_case

* style and quality

* moving forward with tokenizers conversion and tests

* MBart, T5

* dumping the fast version of transformer XL

* Adding to autotokenizers + style/quality

* update init and space_between_special_tokens

* style and quality

* bump up tokenizers version

* add protobuf

* fix pickle Bert JP with Mecab

* fix newly added tokenizers

* style and quality

* fix bert japanese

* fix funnel

* limite tokenizer warning to one occurence

* clean up file

* fix new tokenizers

* fast tokenizers deep tests

* WIP adding all the special fast tests on the new fast tokenizers

* quick fix

* adding more fast tokenizers in the fast tests

* all tokenizers in fast version tested

* Adding BertGenerationFast

* bump up setup.py for CI

* remove BertGenerationFast (too early)

* bump up tokenizers version

* Clean old docstrings

* Typo

* Update following Lysandre comments
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>

9aeacb58

Replaced torch.load for loading the pretrained vocab of TransformerXL... · 4d04120c

Piero Molino authored Oct 08, 2020


Replaced torch.load for loading the pretrained vocab of TransformerXL tokenizer to pickle.load (#6935)

* Replaced torch.load for loading the pretrained vocab of TransformerXL to pickle.load

* Replaced torch.save with pickle.dump when saving the vocabulary

* updating transformer-xl

* uploaded on S3 - compatibility

* fix tests

* style

* Address review comments
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

4d04120c

[pseudolabels] cleanup markdown table (#7653) · aba4e229
Sam Shleifer authored Oct 07, 2020

aba4e229
Fix 3 failing slow bart/blender tests (#7652) · e3e65173
Sam Shleifer authored Oct 07, 2020

e3e65173

07 Oct, 2020 13 commits
- Blenderbot (#7418) · 960faaaf
  Sam Shleifer authored Oct 07, 2020
```
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
```
  960faaaf
- Added model cards for Tagalog BERT models (#7603) · aee7967f
  Blaise Cruz authored Oct 08, 2020
  
  aee7967f
- Create README.md for IsRoBERTa language model (#7640) · b1c06140
  Bobby Donchev authored Oct 07, 2020
```
* Create README.md

* Update README.md

* Apply suggestions from code review
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
```
  b1c06140
- [Model card] SinhalaBERTo model. (#7558) · e10d3895
  Keshan authored Oct 08, 2020
```
* [Model card] SinhalaBERTo model.

This is the model card for keshan/SinhalaBERTo model.

* Update model_cards/keshan/SinhalaBERTo/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
```
  e10d3895
- [model_card] bert-base-5lang-cased (#7573) · 167bce56
  Amine Abdaoui authored Oct 07, 2020
```
Co-authored-by: Amin <amin.geotrend@gmail.com>
```
  167bce56
- Create README.md (#7581) · 923dd4e5
  Abed khooli authored Oct 07, 2020
  
  923dd4e5
- Update README.md (#7590) · 85ead0fe
  dartrevan authored Oct 07, 2020
  
  85ead0fe
- Update README.md (#7629) · c6b9c72e
  Ilias Chalkidis authored Oct 07, 2020
```
Minor changes: Add arxiv link + Layout improvement + fix typos
```
  c6b9c72e
- Create Model Card For "abhilash1910/french-roberta" Model (#7544) · 048b4bd2
  Abhilash Majumder authored Oct 08, 2020
  
  048b4bd2
- [model_card] nikokons/gpt2-greek · c2e0d8ac
  Julien Chaumond authored Oct 07, 2020
```
by @nikkon3
```
  c2e0d8ac
- [s2s] release pseudolabel links and instructions (#7639) · e2bb9abb
  Sam Shleifer authored Oct 07, 2020
  
  e2bb9abb
- Trainer callbacks (#7596) · 08ba4b49
  Sylvain Gugger authored Oct 07, 2020
```
* Initial callback proposal

* Finish various callbacks

* Post-rebase conflicts

* Fix tests

* Don't use something that's not set

* Documentation

* Remove unwanted print.

* Document all models can work

* Add tests + small fixes

* Update docs/source/internal/trainer_utils.rst
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments

* Fix TF tests

* Real fix this time

* This one should work

* Fix typo

* Really fix typo
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
```
  08ba4b49
- Add GPT2 to sequence classification auto model (#7630) · 8fa0c956
  Lysandre Debut authored Oct 07, 2020
  
  8fa0c956
06 Oct, 2020 11 commits
- Fix tokenizer UnboundLocalError when padding is set to PaddingStrategy.MAX_LENGTH (#7610) · e084089e
  Gabriele Picco authored Oct 06, 2020
```
* Fix UnboundLocalError when PaddingStrategy is MAX_LENGTH

* Fix UnboundLocalError for TruncationStrategy
```
  e084089e
- Fix wrong reference name/filename in docstring (#7616) · adfe6ace
  Philipp authored Oct 07, 2020
```
Resolves: #7613
```
  adfe6ace
- Fix-copies · f0d20ad3
  Lysandre authored Oct 06, 2020
  
  f0d20ad3
- Add GPT2ForSequenceClassification based on DialogRPT (#7501) · 59824318
  Lysandre Debut authored Oct 06, 2020
```
* Add GPT2ForSequenceClassification based on DialogRPT

* Better documentation

* Code quality
```
  59824318
- [s2s] save first batch to json for debugging purposes (#6810) · 500be01c
  Sam Shleifer authored Oct 06, 2020
  
  500be01c
- [bart] fix config.classif_dropout (#7593) · 2b574e7c
  Sam Shleifer authored Oct 06, 2020
  
  2b574e7c
- typo fix (#7611) · aa6c3c14
  Ahmed Elnaggar authored Oct 06, 2020
```
It should be T5-3B not T5-3M.
```
  aa6c3c14
- Docker GPU Images: Add NVIDIA/apex to the cuda images with pytorch (#7598) · 98fb7185
  Adrien David-Sivelle authored Oct 06, 2020
```
- Use cuda:10.2 image instead of 10.1 (to address version mismatch
  warning with pytorch)
- Use devel version that is built on the runtime and includes headers
  and development tools (was otherwise failing to build apex)
```
  98fb7185
- fix return dicitonary labels from masked_lm_labels to labels (#7595) · 4d541f51
  George Mihaila authored Oct 06, 2020
  
  4d541f51
- Update README.md (#7612) · 8d2c248d
  cedspam authored Oct 06, 2020
  
  8d2c248d
- Create README.md (LEGAL-BERT Model card) (#7607) · 1c80b2c6
  Ilias Chalkidis authored Oct 06, 2020
```
* Create README.md

Model description for all LEGAL-BERT models, published as part of  "LEGAL-BERT: The Muppets straight out of Law School". Chalkidis et al., 2018, In Findings of EMNLP 2020

* Update model_cards/nlpaueb/legal-bert-base-uncased/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
```
  1c80b2c6