Commits · 9c2b2db2cdf0af968aae58d6075b6654224fb760 · chenpangpang / transformers

12 Oct, 2020 11 commits
- [marian] Automate Tatoeba-Challenge conversion (#7709) · 9c2b2db2
  Sam Shleifer authored Oct 12, 2020
  
  9c2b2db2
- Add license info to nlptown/bert-base-multilingual-uncased-sentiment (#7738) · aacac8f7
  Alex Combessie authored Oct 12, 2020
  
  aacac8f7
- Fix #7331 (#7732) · 1f1d950b
  Lysandre Debut authored Oct 12, 2020
  
  1f1d950b
- Fix tf text class (#7724) · d9ffb87e
  Julien Plu authored Oct 12, 2020
```
* Fix test

* fix generic text classification

* fix test

* Fix tests
```
  d9ffb87e
- Fix code quality · d6175a42
  sgugger authored Oct 12, 2020
  
  d6175a42
- Fix trainer callback (#7720) · 1d5ea34f
  Jonathan Chang authored Oct 12, 2020
```
Fix a bug that happends when subclassing Trainer and
overwriting evaluate() without calling prediciton_loop()
```
  1d5ea34f
- The input training data files (multiple files in glob format). (#7717) · f176e707
  Kelvin authored Oct 12, 2020
```
Very often splitting large files to smaller files can prevent tokenizer going out of memory in environment like Colab that does not have swap memory
```
  f176e707
- Update tokenization_utils_base.py (#7696) · 34fcfb44
  AndreaSottana authored Oct 12, 2020
```
Minor spelling corrections in docstrings. "information" is uncountable in English and has no plural.
```
  34fcfb44
- check for tpu availability in save_pretrained (#7699) · 2f34bcf3
  fteufel authored Oct 12, 2020
```
Added is_torch_tpu_available() to the condition
for saving a model as xla model. "xla_device"
property of config can also be True on a non-xla
device, when loading a checkpointthat was trained
on xla before.

Resolves #7695
```
  2f34bcf3
- Fix typo in all model docs (#7714) · 13c18577
  Sylvain Gugger authored Oct 12, 2020
  
  13c18577
- fixed typo in warning line 207. (#7718) · 83086858
  Berowne authored Oct 12, 2020
```
replace 'men_len' with 'mem_len' to match parameter name
```
  83086858
11 Oct, 2020 3 commits
- Corrected typo: maked → masked (#7703) · 03ec02a6
  Miguel Victor authored Oct 12, 2020
  
  03ec02a6
- [examples] bump pl=0.9.0 (#7053) · 827c5194
  Sam Shleifer authored Oct 11, 2020
  
  827c5194
- Fix docstring in AutoModel class (#7694) · ba4bbd92
  Alexandr Maslov authored Oct 11, 2020
  
  ba4bbd92
10 Oct, 2020 2 commits
- Added license information for default and distilbert models (#7688) · 26d5475d
  Andrew Kane authored Oct 10, 2020
  
  26d5475d
- Fix flaky test in test_trainer (#7689) · c6e18de9
  Sylvain Gugger authored Oct 09, 2020
  
  c6e18de9
09 Oct, 2020 14 commits
- Fix title level in Blenderbot doc (#7687) · 2c9e83f7
  Sylvain Gugger authored Oct 09, 2020
  
  2c9e83f7
- Import integration libraries first (#7650) · 9618cd69
  Doug Blank authored Oct 09, 2020
```
* Import intergration libraries first

* isort and black happiness

* flake8 happiness

* Add a test

* Black reformat

* Ignore import order in tests

* A heavy-handed method of disabling comet for tests

* Remove comet_ml tests

* Run black on setup.py
```
  9618cd69
- Complete release instruction · 4dcc424d
  sgugger authored Oct 09, 2020
  
  4dcc424d
- Better links for models in READMED and doc index (#7680) · a3cea6a8
  Sylvain Gugger authored Oct 09, 2020
  
  a3cea6a8
- Delete extra test file (#7681) · 0af53b1e
  Sam Shleifer authored Oct 09, 2020
  
  0af53b1e
- [pegasus] Faster tokenizer tests (#7672) · b0f05e0c
  Stas Bekman authored Oct 09, 2020
  
  b0f05e0c
- Revert "Better model links in the README and index" · bc00b37a
  sgugger authored Oct 09, 2020
```
This reverts commit 76e05518.
```
  bc00b37a
- Better model links in the README and index · 76e05518
  sgugger authored Oct 09, 2020
  
  76e05518
- Fix dataset cardinality (#7678) · 9ad83059
  Julien Plu authored Oct 09, 2020
```
* Fix test

* Fix cardinality issue

* Fix test
```
  9ad83059
- add license to xlm-roberta-large-xnli card · a1ac0828
  Joe Davison authored Oct 09, 2020
  
  a1ac0828
- Reintroduce clean_text on BertTokenizer call which was removed by mistake in #4723 (#5749) · 21ed3a6b
  Funtowicz Morgan authored Oct 09, 2020
```
* Reintroduce clean_text call which was removed by mistake in #4723
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Added unittest for clean_text parameter on Bert tokenizer.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Better unittest name.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Adapt unittest to use untrained tokenizer.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Code quality + update test
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
```
  21ed3a6b
- Update XLM-RoBERTa details (#7669) · 5668fdb0
  Noah Trenaman authored Oct 09, 2020
  
  5668fdb0
- fix nn.DataParallel compatibility with PyTorch 1.5 (#7671) · 0578a913
  guhur authored Oct 09, 2020
```
The same type of errors as in https://github.com/huggingface/transformers/pull/4300
```
  0578a913
- [s2s] Switch README urls to cdn (#7670) · 297233fa
  Sam Shleifer authored Oct 08, 2020
  
  297233fa
08 Oct, 2020 8 commits

[pseudo] Switch URLS to CDN (#7661) · a1ecc90d
Sam Shleifer authored Oct 08, 2020

a1ecc90d
[s2s] configure lr_scheduler from command line (#7641) · 06a973fd
Suraj Patil authored Oct 08, 2020

06a973fd

Fix RobertaForCausalLM docs (#7642) · 4a00613c

Lysandre Debut authored Oct 08, 2020



* Fix RobertaForCausalLM docs

* Apply review suggestion
Co-authored-by: sgugger <sylvain.gugger@gmail,com>
Co-authored-by: sgugger <sylvain.gugger@gmail,com>

4a00613c

Green tests: update torch-hub test dependencies (add protobuf and pin tokenizer 0.9.0-RC2) (#7658) · 55cb2ee6
Thomas Wolf authored Oct 08, 2020
```
* pin torch-hub test

* add protobuf dep
```
55cb2ee6

Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove... · 9aeacb58

Thomas Wolf authored Oct 08, 2020


Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141)

* [WIP] SP tokenizers

* fixing tests for T5

* WIP tokenizers

* serialization

* update T5

* WIP T5 tokenization

* slow to fast conversion script

* Refactoring to move tokenzier implementations inside transformers

* Adding gpt - refactoring - quality

* WIP adding several tokenizers to the fast world

* WIP Roberta - moving implementations

* update to dev4 switch file loading to in-memory loading

* Updating and fixing

* advancing on the tokenizers - updating do_lower_case

* style and quality

* moving forward with tokenizers conversion and tests

* MBart, T5

* dumping the fast version of transformer XL

* Adding to autotokenizers + style/quality

* update init and space_between_special_tokens

* style and quality

* bump up tokenizers version

* add protobuf

* fix pickle Bert JP with Mecab

* fix newly added tokenizers

* style and quality

* fix bert japanese

* fix funnel

* limite tokenizer warning to one occurence

* clean up file

* fix new tokenizers

* fast tokenizers deep tests

* WIP adding all the special fast tests on the new fast tokenizers

* quick fix

* adding more fast tokenizers in the fast tests

* all tokenizers in fast version tested

* Adding BertGenerationFast

* bump up setup.py for CI

* remove BertGenerationFast (too early)

* bump up tokenizers version

* Clean old docstrings

* Typo

* Update following Lysandre comments
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>

9aeacb58

Replaced torch.load for loading the pretrained vocab of TransformerXL... · 4d04120c

Piero Molino authored Oct 08, 2020


Replaced torch.load for loading the pretrained vocab of TransformerXL tokenizer to pickle.load (#6935)

* Replaced torch.load for loading the pretrained vocab of TransformerXL to pickle.load

* Replaced torch.save with pickle.dump when saving the vocabulary

* updating transformer-xl

* uploaded on S3 - compatibility

* fix tests

* style

* Address review comments
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

4d04120c

[pseudolabels] cleanup markdown table (#7653) · aba4e229
Sam Shleifer authored Oct 07, 2020

aba4e229
Fix 3 failing slow bart/blender tests (#7652) · e3e65173
Sam Shleifer authored Oct 07, 2020

e3e65173

07 Oct, 2020 2 commits
- Blenderbot (#7418) · 960faaaf
  Sam Shleifer authored Oct 07, 2020
```
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
```
  960faaaf
- Added model cards for Tagalog BERT models (#7603) · aee7967f
  Blaise Cruz authored Oct 08, 2020
  
  aee7967f