Commits · 37ed3ab719f10dc00bf63ac343b441bf78bb1eee · chenpangpang / transformers

13 May, 2021 1 commit

Enable option for subword regularization in more tokenizers. (#11417) · 37ed3ab7

Philip May authored May 13, 2021

* improve slow class tok usage at xlm rob

* add subword regularization for barthez

* improve barthez tok. test

* fix tokenizer tests

* add subword regularization for camembert

* add subword regularization for deberta v2 tokenizer

* add more doc to deberta v2 tokenizer

* add subword regularization for speech to text tok.

* fix sp_model_kwargs type in speech 2 text tok.

* add subword regularization for M2M100 tok.

* add more concrete type hints

* fix tests for m2m100 and s2t tok.

* add missing Any import

* fix syntax error in m2m100 tok.

* fix unpickle of m2m100 and s2t tok.

* fix test of m2m100 and s2t tok.

* improve unpickle of deberta v2 tok.

* add test for pickle of barthez & camembert

* fix pickle of barthez & camembert

* add test for deberta v2 tok. pickle

* fix m2m100 tok. pickle

* fix s2t tok. pickle

* add subword regularization to albert tok.

* refactor subword reg. test into TokenizerTesterMixin

improve albert tok. test

remove sample argument form albert tok.

check subword reg. using TokenizerTesterMixin

improve tok. tests

improve xlm roberta tok. tests

improve xlm roberta tok. tests

* add subword regularization for big bird t.

* improve xlm roberta tok. test

* add subword regularization for mbart50 tok.

* add subword regularization for pegasus tok.

* add subword regularization for reformer tok.

* add subword regularization for T5 tok.

* fix t5 tok. test formatting

* add subword regularization for xlm_proph. tok.

* add subword regularization for xlnet tok.

* add subword regularization for gert_gen tok.

* add typing to tokenizers

* add typing to xlm rob. tok

* add subword regularization for marian tok.

* add reverse tok. test

* fix marian tok test

* fix marian tok test

* fix casing in tok. tests

* fix style of tok. common test

* fix deberta v2 tok test

* add type annotations to tok. tests

* add type annotations to tok. __init__

* add typing to kokenizer

* add type annotations to tok. __init__

* don't specify the default when it's None

* fix barthez tok. doc

* move sentencepiece tok. tests to TokenizerTesterMixin

* fix unused imports

* fix albert tok. test

* add comment to sentencepiece test options

* fix Any import at big bird tok.

* fix Any import at xlm prophetnet tok.

* empty commit to trigger CI

37ed3ab7

12 May, 2021 9 commits

Vit deit fixes (#11309) · fa84540e

NielsRogge authored May 12, 2021



* Improve docs of DeiT and ViT, add community notebook

* Add gitignore for test_samples

* Add notebook with Trainer
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

fa84540e

Docs for v4.7.0.dev0 · d77eb0cf
Lysandre authored May 12, 2021

d77eb0cf
Release: v4.6.0 · 64e78564
Lysandre authored May 12, 2021

64e78564

[Lazy init] Force fall back to slow init for composite models (#11705) · fd6204b2

Patrick von Platen authored May 12, 2021



* fix encoder-decoder & RAG

* finalize

* Update src/transformers/models/encoder_decoder/modeling_encoder_decoder.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/rag/modeling_rag.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

fd6204b2

fix example in config doc (#11696) · 5c1cda9d
Suraj Patil authored May 12, 2021

5c1cda9d
remove defaults to None if optional (#11703) · 77f4c46b
Philip May authored May 12, 2021

77f4c46b
Updates README and fixes bug (#11701) · 6797cdc0
Marc van Zee authored May 12, 2021

6797cdc0
Fix clip docs (#11694) · f063c56d
Suraj Patil authored May 12, 2021
```
* fix doc url

* fix example
```
f063c56d

CLIP (#11445) · 8719afa1

Suraj Patil authored May 12, 2021



* begin second draft

* fix import, style

* add loss

* fix embeds, logits_scale, and projection

* fix imports

* add conversion script

* add feature_extractor and processor

* style

* add tests for tokenizer, extractor and processor

* add vision model tests

* add weight init

* add more tests

* fix save_load  test

* model output, dosstrings, causal mask

* config doc

* add clip model tests

* return dict

* bigin integration test

* add integration tests

* fix-copies

* fix init

* Clip => CLIP

* fix module name

* docs

* fix doc

* output_dim => projection_dim

* fix checkpoint names

* remoe fast tokenizer file

* fix conversion script

* fix tests, quality

* put causal mask on device

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix attribute test

* style

* address sylvains comments

* style

* fix docstrings

* add qucik_gelu in activations, docstrings

* clean-up attention test

* fix act fun

* fix config

* fix torchscript tests

* even batch_size

* remove comment

* fix ouput tu_tuple

* fix save load tests

* fix add tokens test

* add fast tokenizer

* update copyright

* new processor API

* fix docs

* docstrings

* docs

* fix doc

* fix doc

* fix tokenizer

* fix import in doc example

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* check types of config

* valhalla => openai

* load image using url

* fix test

* typo
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

8719afa1

11 May, 2021 8 commits

Adds Flax BERT finetuning example on GLUE (#11564) · 4ce6bcc3

Marc van Zee authored May 11, 2021



* Adds Flax BERT finetuning example

* fix traced jax tensor type

* Use Optax losses and learning schedulers

* Add 1GPU training results

* merge into master & make style

* fix input

* del file

* Fix bug in loss and add torch runs

* finish bert flax fine-tune

* Update examples/flax/text-classification/README.md

* Update examples/flax/text-classification/run_flax_glue.py

* add requirements

* finalize

* finalize
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick@huggingface.co>

4ce6bcc3

Test checkpointing (#11682) · f13f1f8f
Sylvain Gugger authored May 11, 2021
```
* Add test and see where CI is unhappy

* Load with strict=False
```
f13f1f8f
Fix TF Roberta for mixed precision training (#11675) · d9b28627
Julien Plu authored May 11, 2021

d9b28627

Auto modelcard (#11599) · a135f595

Sylvain Gugger authored May 11, 2021



* Autogenerate model cards from the Trainer

* ModelCard deprecated

* Fix test

* Style

* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Address review comments

* Quality

* With all metadata

* Metadata

* Post-merge conflict mess

* Data args and all examples

* Default license and languages when possible
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

a135f595

Grammar and style edits for the frontpage README (#11679) · b3429ab6

Matt authored May 11, 2021



* Grammar and style edits for the frontpage README

* Going all-in on em-dashes because you only live once

* Update README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

b3429ab6

Fix docstring of description about input_ids (#11672) · 901153c6
nxznm authored May 11, 2021

901153c6
Add --text_column to run_summarization_no_trainer (#11673) · 64232bc0
Jonathan Chang authored May 11, 2021

64232bc0
Add MacOS TF version (#11674) · 024cd19b
Julien Plu authored May 11, 2021
```
Co-authored-by: Julien Plu <jplu@argos.local>
```
024cd19b

10 May, 2021 10 commits

Fixes NoneType exception when topk is larger than one coupled with a small... · 9120ae7d

Pavel Soriano authored May 10, 2021

Fixes NoneType exception when topk is larger than one coupled with a small context in the Question-Answering pipeline (#11628)

* added fix to decode function. added test to qa pipeline tests

* completed topk docstring

* fixed formatting with black

* applied style_doc to fix line length

9120ae7d

push (#11667) · dcb0e614
Patrick von Platen authored May 10, 2021

dcb0e614
Save scaler state dict when checkpointing (#11663) · 05a93067
Sylvain Gugger authored May 10, 2021

05a93067
Fix suggested by @bhadreshpsavani (#11660) · ef8d32c5
Matt authored May 10, 2021

ef8d32c5
Update community.md (#11654) · 575c9791
Vasudev Gupta authored May 10, 2021

575c9791

Big Bird Fast Tokenizer implementation (#11075) · f7f87295

Tanmay Laud authored May 10, 2021



* Added Big Bird Fast Tokenizer initial file

* style fixes

* flake fixes

* Added big bird fast tokenizer to init files

* Added big bird fast to Auto tokenization

* fix styles

* minor quality fixes

* Added initial test code

* Fix SpmConverter when precompiled_charsmap doesn't exist

* fixed post processor

* minor style fix

* minor fix input names

* Actually fix identity normalization

* style

* Added token type ids to fast tokenizer

* style

* flake fix

* fix copies
Co-authored-by: Anthony MOI <m.anthony.moi@gmail.com>

f7f87295

updated user permissions based on umask (#11119) · 80da304a

Bhavitvya Malik authored May 10, 2021

* updated user permissions based on umask

* updated user permissions based on umask

* changes as per suggestions

* minor changes

80da304a

Update requirements.txt (#11634) · 1a0b4178
Quentin Lhoest authored May 10, 2021

1a0b4178
Update code example (#11631) · f785c516
NielsRogge authored May 10, 2021
```
* Update code example

* Code review
```
f785c516
[Examples] Fix invalid links after reorg (#11650) · 7e406f4a
Tommy Chiang authored May 10, 2021

7e406f4a

09 May, 2021 1 commit
- [Examples] Check key exists in datasets first (#11503) · f2ffcaf4
  Tommy Chiang authored May 10, 2021
  
  f2ffcaf4
07 May, 2021 7 commits

[examples] fix sys.path in conftest.py (#11636) · ba0d50f2

Stas Bekman authored May 07, 2021

* restore conftest.py

* fix conftest and make copies

* remove unneeded parts

* remove unwanted files

ba0d50f2

[self-push CI] sync with self-scheduled (#11637) · cd9b8d7e
Stas Bekman authored May 07, 2021
```
forgot to add the missing `libaio-dev` to this workflow
```
cd9b8d7e
Reduce to 1 worker and set timeout for GPU TF tests (#11633) · da37eb8e
Lysandre Debut authored May 07, 2021

da37eb8e

Add the ImageClassificationPipeline (#11598) · 39084ca6

Lysandre Debut authored May 07, 2021



* Add the ImageClassificationPipeline

* Code review
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>

* Have `load_image` at the module level
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>

39084ca6

make fix copy (#11627) · e7bff0aa
Patrick von Platen authored May 07, 2021

e7bff0aa

Add BigBirdPegasus (#10991) · dc3f6758

Vasudev Gupta authored May 07, 2021



* init bigbird pegasus

* add debugging nb ; update config

* init conversion

* update conversion script

* complete conversion script

* init forward()

* complete forward()

* add tokenizer

* add some slow tests

* commit current

* fix copies

* add docs

* add conversion script for bigbird-roberta-summarization

* remove TODO

* small fixups

* correct tokenizer

* add bigbird core for now

* fix config

* fix more

* revert pegasus-tokenizer back

* make style

* everything working for pubmed; yayygit status

* complete tests finally

* remove bigbird pegasus tok

* correct tokenizer

* correct tests

* add tokenizer files

* finish make style

* fix test

* update

* make style

* fix tok utils base file

* make fix-copies

* clean a bit

* small update

* fix some suggestions

* add to readme

* fix a bit, clean tests

* fix more tests

* Update src/transformers/__init__.py

* Update src/transformers/__init__.py

* make fix-copies

* complete attn switching, auto-padding left

* make style

* fix auto-padding test

* make style

* fix batched attention tests

* put tolerance at 1e-1 for stand-alone decoder test

* fix docs

* fix tests

* correct slow tokenizer conversion

* Apply suggestions from code review
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* complete remaining suggestions

* fix test
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

dc3f6758

Fix comment in run_clm_no_trainer.py (#11624) · 6f40e317
Jonathan Chang authored May 07, 2021

6f40e317

06 May, 2021 4 commits
- Fix RNG saves in distributed mode. (#11620) · 33fd83bc
  Sylvain Gugger authored May 06, 2021
```
* Fix RNG saves in distributed mode.

* Update src/transformers/trainer.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
```
  33fd83bc
- [cuda ext tests] fixing tests (#11619) · 619200cc
  Stas Bekman authored May 06, 2021
```
* fixing tests

* cleanup
```
  619200cc
- fix tests (#11615) · 44c5621d
  Patrick von Platen authored May 06, 2021
  
  44c5621d
- Re-styling in seq2seq attention (#11613) · 7eee950a
  Sylvain Gugger authored May 06, 2021
  
  7eee950a