Commits · 711d901c49bbc896f508920b70bfd8a83f11e5da · chenpangpang / transformers

13 Jul, 2021 7 commits

Fix minor docstring typos. (#12682) · 711d901c
qqaatw authored Jul 14, 2021

711d901c

Add option to load a pretrained model with mismatched shapes (#12664) · 90178b0c

Sylvain Gugger authored Jul 13, 2021



* Add option to load a pretrained model with mismatched shapes

* Fail at loading when mismatched shapes in Flax

* Fix tests

* Update src/transformers/modeling_flax_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Address review comments
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

90178b0c

[Blenderbot] Fix docs (#12227) · 7f6d3750
Patrick von Platen authored Jul 13, 2021
```
* fix_torch_device_generate_test

* remove @

* fix docs
```
7f6d3750

Wrong model is used in example, should be character instead of subword model (#12676) · 9519f0cd

Jeroen Steggink authored Jul 13, 2021



* Wrong model is used, should be character instead of subword

In the original Google repo for CANINE there was mixup in the model names in the README.md, which was fixed 2 weeks ago. Since this transformer model was created before, it probably resulted in wrong use in this example.

s = subword, c = character

* canine.rst style fix

* Update docs/source/model_doc/canine.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Styling canine.rst

* Added links to model cards.

* Fixed links to model cards.
Co-authored-by: Jeroen Steggink <978411+jsteggink@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

9519f0cd

Add ByT5 option to example run_t5_mlm_flax.py (#12634) · 5803a2a7

Nick Doiron authored Jul 13, 2021

* Allow ByT5 type in Flax T5 script

* use T5TokenizerFast

* change up tokenizer config

* model_args

* reorder imports

* Update run_t5_mlm_flax.py

5803a2a7

**encode_plus() shouldn't run for W2V2CTC (#12655) · 9da1acae
Lysandre Debut authored Jul 13, 2021
```
* **encode_plus() shouldn't run for  W2V2CTC

* Typo
```
9da1acae
Patch BigBird tokenization test (#12653) · a6938c47
Lysandre Debut authored Jul 13, 2021

a6938c47

12 Jul, 2021 19 commits
- Update timeline for Flax event evaluation · c523b241
  Omar Sanseviero authored Jul 12, 2021
  
  c523b241
- Fix typo in README_zh-hans.md (#12663) · dc06e435
  Kevin Canwen Xu authored Jul 13, 2021
  
  dc06e435
- Translate README.md to Simplified Chinese (#12596) · 9d771c54
  Kevin Canwen Xu authored Jul 13, 2021
```
* README Translation for Chinese (Simplified)

* update link

* h3->h4

* html refactor

* update model list

* fix

* Add a translation guide

* format

* update

* typo

* Refine wording
```
  9d771c54
- fix typo in modeling_t5.py docstring (#12640) · 21a81c1e
  Philip May authored Jul 12, 2021
  
  21a81c1e
- fixed docs (#12646) · b90d4993
  Ahmed Khaled authored Jul 12, 2021
  
  b90d4993
- remove documentation (#12657) · da0e9ee6
  Philipp Schmid authored Jul 12, 2021
  
  da0e9ee6
- Fix transfo xl integration test (#12652) · b189226e
  Lysandre Debut authored Jul 12, 2021
```
* Cleanup test

* Skip TF TransfoXL test
```
  b189226e
- Pipeline should be agnostic (#12656) · fd41e2da
  Lysandre Debut authored Jul 12, 2021
  
  fd41e2da
- Pickle auto models (#12654) · 9b3aab2c
  Sylvain Gugger authored Jul 12, 2021
```
* PoC, it pickles!

* Remove old method.

* Apply to every auto object
```
  9b3aab2c
- TF summarization example (#12617) · 379f6494
  Matt authored Jul 12, 2021
```
* Adding a TF summarization example

* Style pass

* Style fixes

* Updates for review comments

* Adding README

* Style pass

* Remove unused import
```
  379f6494
- Fix typo · 0f43e742
  Sylvain Gugger authored Jul 12, 2021
  
  0f43e742
- Fix syntax in conda file · 9adff7a0
  Sylvain Gugger authored Jul 12, 2021
  
  9adff7a0
- Minimum requirement for pyyaml · ad420542
  Sylvain Gugger authored Jul 12, 2021
  
  ad420542
- The extended trainer tests should require torch (#12650) · fb5665b5
  Lysandre Debut authored Jul 12, 2021
  
  fb5665b5
- Skip TestMarian_MT_EN (#12649) · 0af8579b
  Lysandre Debut authored Jul 12, 2021
```
* Skip TestMarian_MT_EN

* Skip EN_ZH and EN_ROMANCE

* Skip EN_ROMANCE pipeline
```
  0af8579b
- Add tokenizer_file parameter to PreTrainedTokenizerFast docstring (#12624) · a882b9fa
  Lewis Bails authored Jul 12, 2021
```
Co-authored-by: Lewis Bails <Lewis.Bails@infomedia.dk>
```
  a882b9fa
- fix type check (#12638) · f8f9a679
  Suraj Patil authored Jul 12, 2021
  
  f8f9a679
- Point to the right file for hybrid CLIP (#12599) · 2dd9440d
  Eduardo Gonzalez Ponferrada authored Jul 11, 2021
  
  2dd9440d
- added test file (#12630) · de23ecea
  Bhadresh Savani authored Jul 12, 2021
  
  de23ecea
10 Jul, 2021 3 commits

fix anchor (#12620) · 9ee66ada
Stas Bekman authored Jul 09, 2021

9ee66ada

[doc] DP/PP/TP/etc parallelism (#12524) · 0dcc3c86

Stas Bekman authored Jul 09, 2021



* wip

* complete the doc

* missing img

* improve

* correction

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

0dcc3c86

[debugging utils] minor doc improvements (#12525) · 4cdbf63c
Stas Bekman authored Jul 09, 2021

4cdbf63c

09 Jul, 2021 11 commits

Add TFHubertModel (#12206) · fb65f65e

Will Rice authored Jul 09, 2021

* TFHubert

* Update with TFWav2Vec Bug Fixes

* Add OOV Error

* Feedback changes

* Fix kwargs call

fb65f65e

[FLax] Fix marian docs 2 (#12615) · 934222e3
Patrick von Platen authored Jul 09, 2021
```
* fix_torch_device_generate_test

* remove @

* up
```
934222e3
[Flax Marian] Add marian flax example (#12614) · 165606e5
Patrick von Platen authored Jul 09, 2021
```
* fix_torch_device_generate_test

* remove @

* finish better examples for marian flax
```
165606e5
[Flax] Fix mt5 auto (#12612) · 51eb6d34
Patrick von Platen authored Jul 09, 2021
```
* fix_torch_device_generate_test

* remove @

* fix mt5 auto
```
51eb6d34

Pass `model_kwargs` when loading a model in `pipeline()` (#12449) · e7f33e8c

Alex Hedges authored Jul 09, 2021

* Pass model_kwargs when loading a model in pipeline

* Add test for model_kwargs parameter of pipeline()

* Rewrite test to not download model

* Fix failing style checks

e7f33e8c

Fix arg count for partial functions (#12609) · 18ca59e1
Sylvain Gugger authored Jul 09, 2021

18ca59e1

Simplify unk token (#12582) · 0cc2dc24

Sylvain Gugger authored Jul 09, 2021

* Base test

* More test

* Fix mistake

* Add a docstring change

* Add doc ignore

* Simplify logic for unk token in Unigram tokenizers

* Remove changes from otehr branch

0cc2dc24

[Flax] Fix cur step flax examples (#12608) · deecdd49
Patrick von Platen authored Jul 09, 2021
```
* fix_torch_device_generate_test

* remove @

* fix save problem
```
deecdd49

[Flax] Add flax marian (#12595) · 65e27215

Patrick von Platen authored Jul 09, 2021



* fix_torch_device_generate_test

* remove @

* add marian

* finish make style

* add model

* add docs

* add test

* add integration tests

* up

* solve bug

* correct tests

* correct some tests

* Apply suggestions from code review
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* correct adapt marian

* finish
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

65e27215

This will reduce "Already borrowed error": (#12550) · cc12e1db

Nicolas Patry authored Jul 09, 2021

* This will reduce "Already borrowed error":

Original issue https://github.com/huggingface/tokenizers/issues/537



The original issue is caused by transformers calling many times
mutable functions on the rust tokenizers.
Rust needs to guarantee that only 1 agent has a mutable reference
to memory at a given time (for many reasons which don't need explaining
here). Usually, the rust compiler can guarantee that this property is
true at compile time.

Unfortunately, this is impossible for Python to do that, so PyO3, the
bridge between rust and python used by `tokenizers`, will change the
compile guarantee for a dynamic guarantee, so if multiple agents try
to have multiple mutable borrows at the same time, then the runtime will
yell with "Already borrowed".

The proposed fix here in transformers, is simply to reduce the actual
number of calls that really need mutable borrows. By reducing them,
we reduce the risk of running into "Already borrowed" error.
The caveat is now we add a call to read the current configuration of the
`_tokenizer`, so worst case we have 2 calls instead of 1, and best case
we simply have 1 + a Python comparison of a dict (should be negligible).

* Adding a test.

* trivial error :(.

* Update tests/test_tokenization_fast.py
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>

* Adding reference to original issues in the tests.

* Update the tests with fast tokenizer.
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>

cc12e1db

Add Flax sprint project evaluation section (#12592) · 8fe836af
Omar Sanseviero authored Jul 09, 2021

8fe836af