Commits · a54961c5f70ff01ca3d62a56ece083096b7c1a7d · chenpangpang / transformers

10 Jan, 2022 4 commits

Make OpenAIGPTTokenizer work with SpaCy 2.x and 3.x (#15019) · a54961c5

cody-moveworks authored Jan 10, 2022

* Make OpenAIGPTTokenizer work with SpaCy 3.x

SpaCy 3.x introduced an API change to creating the tokenizer that
breaks OpenAIGPTTokenizer. The old API for creating the tokenizer in
SpaCy 2.x no longer works under SpaCy 3.x, but the new API for creating
the tokenizer in SpaCy 3.x DOES work under SpaCy 2.x. Switching to the
new API should allow OpenAIGPTTokenizer to work under both SpaCy 2.x and
SpaCy 3.x versions.

* Add is_spacy_available and is_ftfy_available methods to file utils

* Add spacy and ftfy unittest decorator to testing utils

* Add tests for OpenAIGPTTokenizer that require spacy and ftfy

* Modify CircleCI config to run tests that require spacy and ftfy

* Remove unneeded unittest decorators are reuse test code

* Run make fixup

a54961c5

Update check_repo.py (#15014) · 9fbf7c87
Kamal Raj authored Jan 10, 2022
```
added new line
```
9fbf7c87
fix model table cell text alignment (#14999) · 0a03a868
Yih-Dar authored Jan 10, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
0a03a868

[Wav2Vec2 Speech Event] Add speech event v2 (#15083) · d72343d2

Patrick von Platen authored Jan 10, 2022

* up

* up

* up

* up

* up

* up

* improve

* up

* up

* Update src/transformers/trainer.py

* up

* up

* up

d72343d2

08 Jan, 2022 1 commit

Fix convert for newer megatron-lm bert model (#14082) · 768e6c14

yoquankara authored Jan 09, 2022

* Fix convert for newer megatron-lm models

* Save megatron-bert config in a proper way

* Fix code style

768e6c14

07 Jan, 2022 3 commits

[VisionTextDualEncoder] Add token_type_ids param (#15073) · 623b4f7c

Yih-Dar authored Jan 07, 2022



* fix doc example - TypeError: get_text_features() got an unexpected keyword argument 'token_type_ids'

* add token_type_ids param
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

623b4f7c

[Fix doc examples] Add missing from_pretrained (#15044) · ac224bb0

Yih-Dar authored Jan 07, 2022



* fix doc example - ValueError: Parameter config should be an instance of class `PretrainedConfig`

* Update src/transformers/models/segformer/modeling_segformer.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* update
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

ac224bb0

Resubmit changes after rebase to master (#14982) · f18c6fa9
K.C. Tung authored Jan 07, 2022

f18c6fa9

06 Jan, 2022 8 commits

[VisionTextDualEncoder] Fix doc example · cc406da4
Yih-Dar authored Jan 06, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
cc406da4
Update run_speech_recognition_seq2seq.py (#14967) · b67f345d
flozi00 authored Jan 06, 2022

b67f345d
Add 'with torch.no_grad()' to BertGeneration integration test forward passes (#14963) · f71fb5c3
Tavin Turner authored Jan 06, 2022

f71fb5c3
Remove old asserts. (#15012) · d2183a46
Nicolas Patry authored Jan 06, 2022

d2183a46
Add detectron2 to Github actions (#15053) · 83c552d3
NielsRogge authored Jan 06, 2022

83c552d3
wrapped forward passes in torch.no_grad() (#15037) · 5ab87cd4
Matt Churgin authored Jan 06, 2022

5ab87cd4
Enabling `TF` on `image-classification` pipeline. (#15030) · 5a06118b
Nicolas Patry authored Jan 06, 2022

5a06118b

Add Flax image captioning example (#14864) · 9f89fa02

Yih-Dar authored Jan 06, 2022



* add image captioning example

* update README

* fix style & quality

* simplify

* apply review suggestions

* Apply suggestions from code review
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Apply suggestions from code review
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Apply review suggestions

* add comments about using np instead jax array

* remove unused lines

* add model creation script

* only support from_pretrained

* fix style

* fix

* not use cache_dir when creating model

* fix tokenizer creation

* update README

* fix quality

* apply suggestion

* simplify some blocks

* Update examples/flax/image-captioning/README.md


* Update examples/flax/image-captioning/run_image_captioning_flax.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* apply suggestion
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>

9f89fa02

05 Jan, 2022 6 commits
- [CLIP] Fix TF test (#15042) · 2e9af294
  Suraj Patil authored Jan 05, 2022
  
  2e9af294
- [SpeechEncoderDecoder] Fix from pretrained (#15043) · 443fdaf2
  Patrick von Platen authored Jan 05, 2022
  
  443fdaf2
- [CLIP] Fix PT test (#15041) · ae929dcb
  Patrick von Platen authored Jan 05, 2022
  
  ae929dcb
- Adding QoL for `batch_size` arg (like others enabled everywhere). (#15027) · 65cb94ff
  Nicolas Patry authored Jan 05, 2022
```
* Adding QoL for `batch_size` arg (like others enabled everywhere).

* Typo.
```
  65cb94ff
- Fix doc example: mask_time_indices (numpy) has no attribute 'to' (#15033) · e34dd055
  Yih-Dar authored Jan 05, 2022
```
* fix doc example - AttributeError: 'numpy.ndarray' object has no attribute 'to'

* fix more

* Apply suggestions from code review

* Update src/transformers/models/unispeech/modeling_unispeech.py
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
```
  e34dd055
- [megatron convert] PYTHONPATH requirements (#14956) · 927f6544
  Stas Bekman authored Jan 05, 2022
```
* [megatron convert] PYTHONPATH requirements

* more info
```
  927f6544
04 Jan, 2022 5 commits

[doc] Update parallelism.mdx (#15018) · 857ab55c
Kevin Ko authored Jan 05, 2022
```
* Update parallelism.mdx

* Update parallelism.mdx
```
857ab55c

Hotfix `chunk_length_s` instead of `_ms`. (#15029) · 19d37c2d

Nicolas Patry authored Jan 04, 2022

* Hotfix `chunk_length_s` instead of `_ms`.

* Adding fix of `pad_token` which should be last/previous token for CTC

proper decoding

* Fixing ChunkPipeline unwrapping.

* Adding a PackIterator specific test.

19d37c2d

Add Flax RoFormer (#15005) · 21aecc09

Daniel Stancl authored Jan 04, 2022



* Add FlaxRoFormer

* Clean code + make quality

* Fix output pooling for FlaxRoFormerForMultipleChoiceModule

* Apply suggestions from code review

* add flax model to repos
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

21aecc09

Fix a little typo (#15002) · 9e1775dd
milyiyo authored Jan 04, 2022

9e1775dd
Fix Code block (#14983) · 774ed4a0
flozi00 authored Jan 04, 2022

774ed4a0

03 Jan, 2022 7 commits

Update parallelism.mdx (#15013) · f2ab2183

Kevin Ko authored Jan 04, 2022

* Update parallelism.mdx

* Update parallelism.mdx

* Update parallelism.mdx

* Update parallelism.mdx

* Update parallelism.mdx

* Update parallelism.mdx

* Update parallelism.mdx

* Update parallelism.mdx

f2ab2183

[Tests] Correct Wav2Vec2 & WavLM tests (#15015) · dbac8899
Patrick von Platen authored Jan 03, 2022
```
* up

* up

* up
```
dbac8899
fix missing import (#15016) · 0b4c3a1a
Yih-Dar authored Jan 03, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
0b4c3a1a

Large audio chunking for the existing ASR pipeline (#14896) · 38f95d18

Anton Lozhkov authored Jan 03, 2022



* Naive ASR chunking

* Fixing batching for ASR.
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

38f95d18

Improve truncation_side (#14947) · d33dc796

Nicolas Patry authored Jan 03, 2022



* Enabling `truncation_side` for Slow and Fast tokenizer.
Co-Authored-by: Niels Rogge <48327001+NielsRogge@users.noreply.github.com>

* Disable failing tests.

* Layout xlm.

* assert -> assertEqual.
Co-authored-by: Niels Rogge <48327001+NielsRogge@users.noreply.github.com>

d33dc796

Fixing t2t pipelines lists outputs. (#15008) · 8c2618e6
Nicolas Patry authored Jan 03, 2022
```
Backward compatibility broken in
https://github.com/huggingface/transformers/pull/14988
```
8c2618e6

Map model_type and doc pages names (#14944) · 8f6373c6

Sylvain Gugger authored Jan 03, 2022



* Map model_type and doc pages names

* Add script

* Fix typo

* Quality

* Manual check for Auto
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

8f6373c6

30 Dec, 2021 6 commits

Allow training to resume even if RNG states are not properly loaded (#14994) · e68c3756
Sylvain Gugger authored Dec 30, 2021
```
* Allow training to resume even if RNG states are not properly loaded

* Proper f-string
```
e68c3756

Enabling `tokenizers` upgrade. (#14941) · 08cb5718

Nicolas Patry authored Dec 30, 2021

* Enabling `tokenizers` upgrade.

* Moved ugly comment.

* Tokenizers==0.11.1 needs an update to keep borrow checker

happy in highly contiguous calls.

* Support both 0.11.1 and 0.11.0

08cb5718

Adding `num_return_sequences` support for text2text generation. (#14988) · f8a989cf

Nicolas Patry authored Dec 30, 2021



* Adding `num_return_sequences` support for text2text generation.
Co-Authored-By: Enze <pu.miao@foxmail.com>

* Update tests/test_pipelines_text2text_generation.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/test_pipelines_text2text_generation.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Enze <pu.miao@foxmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

f8a989cf

[Generate] correct encoder_outputs are passed without attention_mask (#14980) · c043ce6c
Patrick von Platen authored Dec 30, 2021
```
* [Generate] correct encoder_outputs are passed without attention_mask

* Apply suggestions from code review

* up
```
c043ce6c

[AutoProcessor] Correct AutoProcessor and automatically add processor… (#14881) · a1392883

Patrick von Platen authored Dec 30, 2021

* [AutoProcessor] Correct AutoProcessor and automatically add processor class

* up

* up

* up

* up

* up

* up

* up

* up

* continue tomorrow

* up

* up

* up

* make processor class private

* fix loop

a1392883

Fixing a pathological case for slow tokenizers (#14981) · d7d60df0
Nicolas Patry authored Dec 30, 2021
```
* Fixing a pathological case for slow tokenizers

* Update src/transformers/tokenization_utils.py
```
d7d60df0