Commits · 9f9ddcc2def6671802b84668e3f101b5a7b8b402 · chenpangpang / transformers

02 Nov, 2022 6 commits

Fix Issue 15003: SentencePiece Tokenizers Not Adding Special Tokens in... · 9f9ddcc2

Ben Eyal authored Nov 02, 2022

🚨 🚨 🚨 Fix Issue 15003: SentencePiece Tokenizers Not Adding Special Tokens in `convert_tokens_to_string` (#15775)

* Add test for SentencePiece not adding special tokens to strings

* Add SentencePieceStringConversionMixin to fix issue 15003

* Fix conversion from tokens to string for most SentencePiece tokenizers

Tokenizers fixed:
- AlbertTokenizer
- BarthezTokenizer
- CamembertTokenizer
- FNetTokenizer
- M2M100Tokenizer
- MBart50Tokenizer
- PegasusTokenizer
- Speech2TextTokenizer

* Fix MarianTokenizer, adjust SentencePiece test to accomodate vocab

* Fix DebertaV2Tokenizer

* Ignore LayoutXLMTokenizer in SentencePiece string conversion test

* Run 'make style' and 'make quality'

* Clean convert_tokens_to_string test

Instead of explicitly ignoring LayoutXLMTokenizer in the test,
override the test in LayoutLMTokenizationTest and do nothing in it.

* Remove commented out code

* Improve robustness of convert_tokens_to_string test

Instead of comparing lengths of re-tokenized text and input_ids,
check that converting all special tokens to string yields a string
with all special tokens.

* Inline and remove SentencePieceStringConversionMixin

The convert_tokens_to_string method is now implemented
in each relevant SentencePiece tokenizer.

* Run 'make style' and 'make quality'

* Revert removal of space in convert_tokens_to_string

* Remove redundant import

* Revert test text to original

* Uncomment the lowercasing of the reverse_text variable

* Mimic Rust tokenizer behavior for tokenizers

- Albert
- Barthez
- Camembert
- MBart50
- T5

* Fix accidentally skipping test in wrong tokenizer

* Add test for equivalent Rust and slow tokenizer behavior

* Override _decode in BigBirdTokenizer to mimic Rust behavior

* Override _decode in FNetTokenizer to mimic Rust behavior

* Override _decode in XLNetTokenizer to mimic Rust behavior

* Remove unused 're' import

* Update DebertaV2Tokenizer to mimic Rust tokenizer

* Deberta tokenizer now behaves like Albert and its `convert_tokens_to_string` is not tested.

* Ignore problematic tests in Deberta V2

* Add comment on why the Deberta V2 tests are skipped

9f9ddcc2

Improve model tester (#19984) · f69eb24b

Yih-Dar authored Nov 02, 2022



* part 1

* part 2

* part 3

* fix

* For CANINE

* For ESMFold
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

f69eb24b

Update auto processor to check image processor created (#20021) · 9aedce99
amyeroberts authored Nov 02, 2022

9aedce99
Quality (#20002) · 49b77b89
Sylvain Gugger authored Nov 02, 2022

49b77b89
Fix gradient checkpoint test in encoder-decoder (#20017) · c6c9db3d
Yih-Dar authored Nov 02, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
c6c9db3d

Add Image Processors (#19796) · a6b77598

amyeroberts authored Nov 02, 2022



* Add CLIP image processor

* Crop size as dict too

* Update warning

* Actually use logger this time

* Normalize doesn't change dtype of input

* Add perceiver image processor

* Tidy up

* Add DPT image processor

* Add Vilt image processor

* Tidy up

* Add poolformer image processor

* Tidy up

* Add LayoutLM v2 and v3 imsge processors

* Tidy up

* Add Flava image processor

* Tidy up

* Add deit image processor

* Tidy up

* Add ConvNext image processor

* Tidy up

* Add levit image processor

* Add segformer image processor

* Add in post processing

* Fix up

* Add ImageGPT image processor

* Fixup

* Add mobilevit image processor

* Tidy up

* Add postprocessing

* Fixup

* Add VideoMAE image processor

* Tidy up

* Add ImageGPT image processor

* Fixup

* Add ViT image processor

* Tidy up

* Add beit image processor

* Add mobilevit image processor

* Tidy up

* Add postprocessing

* Fixup

* Fix up

* Fix flava and remove tree module

* Fix image classification pipeline failing tests

* Update feature extractor in trainer scripts

* Update pad_if_smaller to accept tuple and int size

* Update for image segmentation pipeline

* Update src/transformers/models/perceiver/image_processing_perceiver.py
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>

* Update src/transformers/image_processing_utils.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/beit/image_processing_beit.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* PR comments - docstrings; remove accidentally added resize; var names

* Update docstrings

* Add exception if size is not in the right format

* Fix exception check

* Fix up

* Use shortest_edge in tuple in script
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

a6b77598

01 Nov, 2022 3 commits

Generate: contrastive search with full optional outputs (#19963) · 831590f6

Joao Gante authored Nov 01, 2022

* Use beam search functionality; Add extra outputs and test

* Add full tests for contrastive search

* Add error message on unconventional cache format

831590f6

Added onnx config whisper (#19525) · c796b6de

Mohit Sharma authored Nov 01, 2022

* Added onnx config whisper

* added whisper support onnx

* add audio input data

* added whisper support onnx

* fixed the seqlength value

* Updated the whisper onnx ocnfig

* restore files to old version

* removed attention mask from inputs

* Updated get_dummy_input_onnxruntime docstring

* Updated relative imports and token generation

* update docstring

c796b6de

Add ESMFold (#19977) · 7f9b7b3f

Matt authored Nov 01, 2022



* initial commit

* First draft that gets outputs without crashing!

* Add all the ported openfold dependencies

* testing

* Restructure config files for ESMFold

* Debugging to find output discrepancies

* Mainly style

* Make model runnable without extra deps

* Remove utils and merge them to the modeling file

* Use correct gelu and remove some debug prints

* More cleanup

* Update esm docs

* Update conversion script to support ESMFold properly

* Port some top-level changes from ESMFold repo

* Expand EsmFold docstrings

* Make attention_mask optional (default to all 1s)

* Add inference test for ESMFold

* Use config and not n kwargs

* Add modeling output class

* Remove einops

* Remove chunking in ESM FFN

* Update tests for ESMFold

* Quality

* REpo consistency

* Remove tree dependency from ESMFold

* make fixup

* Add an error in case my structure map function breaks later

* Remove needless code

* Stop auto-casting the LM to float16 so CPU tests pass

* Stop auto-casting the LM to float16 so CPU tests pass

* Final test updates

* Split test file

* Copyright and quality

* Unpin PyTorch to see built doc

* Fix config file to_dict() method

* Add some docstrings to the output

* Skip TF checkpoint tests for ESM until we reupload those

* make fixup

* More docstrings

* Unpin to get even with main

* Flag example to write
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>

7f9b7b3f

31 Oct, 2022 2 commits

Add support for gradient checkpointing (#19990) · 4c9e0f02
NielsRogge authored Oct 31, 2022
```
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
```
4c9e0f02

[Conditional, Deformable DETR] Add postprocessing methods (#19709) · 0b294c23

NielsRogge authored Oct 31, 2022



* Add postprocessing methods

* Update docs

* Add fix

* Add test

* Add test for deformable detr postprocessing

* Add post processing methods for segmentation

* Update code examples

* Add post_process to make the pipeline work

* Apply updates
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>

0b294c23

28 Oct, 2022 2 commits

Add Onnx Config for ImageGPT (#19868) · 0d4c45c5

Raghav Prabhakar authored Oct 28, 2022



* add Onnx Config for ImageGPT

* add generate_dummy_inputs for onnx config

* add TYPE_CHECKING clause

* Update doc for generate_dummy_inputs
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

0d4c45c5

Support segformer fx (#19924) · 347ba38c

donguk.lim authored Oct 28, 2022



* Support segformer fx

* Add fx_compatible attribute to test_modeling_segformer.py

* Update glpn model (fx support)

glpn model was copied from segformer.

* Update utils/fx.py | add semantic-segmentation

for SegformerForSemanticSegmentation model

* Fix minor import order(isort)

* Add random input generation for segformer fx
Co-authored-by: noelbird <lduldu00228@gmail.com>

347ba38c

27 Oct, 2022 3 commits

Safetensors tf (#19900) · 6c24443f

Sylvain Gugger authored Oct 27, 2022



* Wip

* Add safetensors support for TensorFlow

* First tests

* Add final test for now

* Retrigger CI like this

* Update src/transformers/modeling_tf_utils.py
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

6c24443f

Fix bug in Wav2Vec2's GPU tests (#19803) · ea118ae2
Antonio Carlos Falcão Petri authored Oct 27, 2022
```
* Fix tests when running on GPU

* Fix tests that require mp.set_start_method
```
ea118ae2
Some fixes regarding auto mappings and test class names (#19923) · f1e42bc5
Yih-Dar authored Oct 27, 2022
```
* Add pegasus_x

* ViTMSN

* ESM
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
f1e42bc5

26 Oct, 2022 4 commits

`accelerate` support for `RoBERTa` family (#19906) · 76296569
Younes Belkada authored Oct 26, 2022

76296569

Allow flax subfolder (#19902) · 6d023270

Patrick von Platen authored Oct 26, 2022

* add first generation tutorial

* [Flax] Add subfolder functionality

* [Flax] Add subfolder functionality

* up

* finish

* delete file and re-add test

6d023270

Update `max_diff` in `test_save_load_fast_init_to_base` (#19849) · 688c3e8e

Yih-Dar authored Oct 26, 2022



* Fix test_save_load_fast_init_to_base

* Fix test_save_load_fast_init_to_base

* update
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

688c3e8e

Factored out some code in the `image-segmentation` pipeline. (#19727) · 5fd5990d

Nicolas Patry authored Oct 26, 2022

* Factored out some code in the image-segmentation pipeline

Re-enable `small_model_pt`.

Re-enable `small_model_pt`.

Enabling the current test with the current values.

Debugging the values on the CI.

More logs ? Printing doesn't work ?

Using the CI values instead. Seems to be a Pillow sensitivity.

Added a test showcasing that models not supporting some tasks get a
clear error.

Factored out code.

Further factor out.

Fixup.

Bad rebase.

Put `panoptic` before `instance` as it should be a superset.

* Fixing tests.

* Adding subtasks tests

+ Fixes `instance` segmentation which was broken due to default and
non kwargs arguments.

* Fix bad replace.

5fd5990d

25 Oct, 2022 3 commits
- Fix incorrect model<->tokenizer mapping in tokenization testing (#19872) · f9257843
  Yih-Dar authored Oct 25, 2022
```
* Fix model-tokenizer mapping
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  f9257843
- [Past CI] Vilt only supports PT >= v1.10 (#19851) · eedaba68
  Lysandre Debut authored Oct 25, 2022
```
* Support for Vilt in v1.9

* Skip if not higher or equal than 1.10

* Move test :)

* I am bad at python
```
  eedaba68
- Add missing lang tokens in M2M100Tokenizer.get_vocab (#18416) · ab108a0e
  Guillaume Klein authored Oct 25, 2022
  
  ab108a0e
24 Oct, 2022 3 commits

Refactor conversion function (#19799) · d4eb52d1

Sylvain Gugger authored Oct 24, 2022

* Refactor conversion function

* Remove dupe line

* Fixes

* Fixes

* Use the right variable...

* Fix last test

d4eb52d1

Update `LEDModelIntegrationTests` expected values (#19841) · 8b2501b4
Yih-Dar authored Oct 24, 2022
```
* Update expected values

* fix style
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
8b2501b4

fix image2test args forwarding (#19648) · d3f4cef7

Rak Alexey authored Oct 24, 2022



* fix image2test args forwarding

* fix issues

* Proposing the update to the PR.

* Fixup.
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

d3f4cef7

21 Oct, 2022 7 commits

Run some TF Whisper tests in subprocesses to avoid GPU OOM (#19772) · 34368421

Yih-Dar authored Oct 21, 2022



* Run some TF Whisper tests in subprocesses to avoid GPU OOM
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

34368421

Generate: contrastive search test updates (#19787) · e0b825a8
Joao Gante authored Oct 21, 2022
```
* contrastive search test updates

* make fixup
```
e0b825a8

Fix image segmentation pipeline errors, resolve backward compatibility issues (#19768) · cca51aa1

Alara Dirik authored Oct 21, 2022

* Fix panoptic segmentation and pipeline
* Update ImageSegmentationPipeline tests and reenable test_small_model_pt
* Resolve backward compatibility issues

cca51aa1

Fix CTRL `test_torchscrip_xxx` CI by updating `_create_and_check_torchscript` (#19786) · 3a1aeea3
Yih-Dar authored Oct 21, 2022
```
* Run inputs before trace

* Run inputs before trace
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
3a1aeea3

Add sentencepiece to BertJapaneseTokenizer (#19769) · 31565ff0

Hao Wang authored Oct 21, 2022

* support sentencepiece for bertjapanesetokenizer

* add test vocab file for sentencepiece, bertjapanesetokenizer

* make BasicTokenizer be identical to transformers.models.bert.tokenization_bert.BasicTokenizer

* fix missing of \n in comment

* fix init argument missing in tests

* make spm_file be optional, exclude spiece.model from tests/fixtures, and add description comments

* make comment length less than 119

* apply doc style check

31565ff0

Update `ImageToTextPipelineTests.test_small_model_tf` (#19785) · 3aaabaa2

Yih-Dar authored Oct 21, 2022



* update expected values for the correct TF checkpoint

* Run test

* Clean up

* fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

3aaabaa2

PT <-> TF for composite models (#19732) · 84f6bee5

Yih-Dar authored Oct 21, 2022



* First step of PT->TF for composite models

* Update the tests

* For VisionEncoderDecoderModel

* Fix

* Fix

* Add comment

* Fix

* clean up import

* Save memory

* For (TF)EncoderDecoderModel

* For (TF)EncoderDecoderModel
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

84f6bee5

20 Oct, 2022 1 commit

`image-segmentation` pipeline: re-enable `small_model_pt` test. (#19716) · a4038666

Nicolas Patry authored Oct 20, 2022



* Re-enable `small_model_pt`.

Re-enable `small_model_pt`.

Enabling the current test with the current values.

Debugging the values on the CI.

More logs ? Printing doesn't work ?

Using the CI values instead. Seems to be a Pillow sensitivity.

* Update src/transformers/pipelines/image_segmentation.py
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>

a4038666

19 Oct, 2022 3 commits

Image transforms add center crop (#19718) · 5041bc35

amyeroberts authored Oct 19, 2022



* Add center crop to transforms library

* Return PIL images if PIL image input by default

* Fixup and add docstring

* Trigger CI

* Update src/transformers/image_transforms.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/image_transforms.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* PR comments - move comments; unindent
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

5041bc35

Specify TF framework explicitly in more pipeline tests (#19748) · bed2edb9
Yih-Dar authored Oct 19, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
bed2edb9

Adding the state-of-the-art contrastive search decoding methods for the... · 71786b10

GMFTBY authored Oct 19, 2022

Adding the state-of-the-art contrastive search decoding methods for the codebase of generation_utils.py (#19477)

* add: the contrastive search for generaton_utils

* add: testing scripts for contrastive search under examples/text-generation

* update the quality of codes

* revise the docstring; make the generation_contrastive_search.py scripts;

* revise the examples/pytorch/text-generation/run_generation_contrastive_search.py to the auto-APIs format

* revise the necessary documents

* fix: revise the docstring of generation_contrastive_search.py

* Fix the code indentation

* fix: revise the nits and examples in contrastive_search docstring.

* fix the copyright

* delete generation_contrastive_search.py

* revise the logic in contrastive_search

* update the intergration test and the docstring

* run the tests over

* add the slow decorate to the contrastive_search intergrate test

* add more test

* do the style, quality, consistency checks

71786b10

18 Oct, 2022 3 commits

Repo utils test (#19696) · a929f81e

Sylvain Gugger authored Oct 18, 2022

* Create repo utils test job

* Last occurence

* Add tests for tests_fetcher

* Better filtering

* Let's learn more

* Should fix

* Should fix

* Remove debug

* Style

* WiP

WiP

WiP

WiP

WiP

WiP

WiP

WiP

WiP

* Quality

* address review comments

* Fix link

a929f81e

Clean up deprecation warnings (#19654) · a23819ed

David Yang authored Oct 19, 2022

* Clean up deprecation warnings

Notes:
Changed some strings in tests to raw strings, which will change the literal content of the strings as they are fed into whatever machine handles them.
Test cases for past in the past/past_key_values switch changed/removed due to warning of impending removal

* Add PILImageResampling abstraction for PIL.Image.Resampling

a23819ed

Fix activations being all the same module (#19728) · fb0bd7b7
Sylvain Gugger authored Oct 18, 2022

fb0bd7b7