Commits · b7e292aebdb638e2238cd9febf8c09253195fb5d · chenpangpang / transformers

24 Feb, 2022 4 commits
- [TFXLNet] Correct tf xlnet generate (#15822) · cbf43911
  Patrick von Platen authored Feb 24, 2022
```
* [TFXLNet] Correct tf xlnet

* adapt test comment
```
  cbf43911
- [Unispeech] Fix slow tests (#15818) · ca57b450
  Patrick von Platen authored Feb 24, 2022
```
* remove soundfile old way of loading audio

* Adapt slow test
```
  ca57b450
- Revert changes in logit size for semantic segmentation models (#15722) · 35ecf99c
  Sylvain Gugger authored Feb 24, 2022
```
* Revert changes in logit size for semantic segmentation models

* Address review comments
```
  35ecf99c
- Fix from_pretrained with default base_model_prefix (#15814) · d1fcc90a
  Sylvain Gugger authored Feb 24, 2022
  
  d1fcc90a
23 Feb, 2022 3 commits

[Test refactor 1/5] Per-folder tests reorganization (#15725) · 29c10a41

Lysandre Debut authored Feb 23, 2022



* Per-folder tests reorganization
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Stas Bekman <stas@stason.org>

29c10a41

Enable `image-segmentation` on `AutoModelForSemanticSegmentation` (#15647) · 9e71d464

Nicolas Patry authored Feb 23, 2022

* Enabling Beit SegFormer to `image-segmentation`.

* Fixing the score.

* Fix import ?

* Missing in type hint.

* Multiple test fixes:

- Add `raw_image` support. It should be the default IMHO since in Python
  world it doesn't make any sense to base64 encode the image (Sorry
  @mishig, didn't catch that in my review). I really think we should
  consider breaking BC here.
- Add support for Segformer tiny test (needed
  `SegformerModelTester.get_config` to enable TinyConfig
  @NielsRogge)
- Add the check that `batch_size` works correctly on that pipeline.
  Uncovered that it doesn't for Detr, which IMO is OK since images
  after `feature_extractor` don't have the same size. Comment should
  explain.

* Type hint as a string.

* Make fixup + update black.

* torch+vision protections.

* Don't use torchvision, use F.interpolate instead (no new dep).

* Last fixes for Segformer.

* Update test to reflect new image (which was broken)

* Update tests.

* Major BC modification:

- Removed the string compressed PNG string, that's a job for users
`transformers` stays in python land.
- Removed the `score` for semantic segmentation. It has hardly a meaning
  on its own in this context.
- Don't include the grayscale with logits for now (which could enable
  users to get a sense of confidence). Might be done later.
- Don't include the surface of the mask (could be used for sorting by
  users, to filter out small masks). It's already calculable, and
  it's easier to add later, than to add now and break later if we need.

* `make fixup`.

* Small changes.

* Rebase + doc fixup.

9e71d464

Adding ZeroShotImageClassificationPipeline (#12119) · f9582c20

Nicolas Patry authored Feb 23, 2022



* [Proposal] Adding ZeroShotImageClassificationPipeline

- Based on CLIP

* WIP, Resurection in progress.

* Resurrection... achieved.

* Reword handling different `padding_value` for `feature_extractor` and
`tokenizer`.

* Thanks doc-builder !

* Adding docs + global namespace `ZeroShotImageClassificationPipeline`.

* Fixing templates.

* Make the test pass and be robust to floating error.

* Adressing suraj's comments on docs mostly.

* Tf support start.

* TF support.

* Update src/transformers/pipelines/zero_shot_image_classification.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>

f9582c20

22 Feb, 2022 2 commits

Time stamps for CTC models (#15687) · c44d3675

Patrick von Platen authored Feb 22, 2022



* [Wav2Vec2 Time Stamps]

* Add first version

* add word time stamps

* Fix

* save intermediate space

* improve

* [Finish CTC Tokenizer]

* remove @

* remove @

* push

* continue with phonemes

* up

* finish PR

* up

* add example

* rename

* finish

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* correct split

* finalize
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

c44d3675

Gelu10 (#15676) · 32295b15

Funtowicz Morgan authored Feb 22, 2022

* Add GeLU10 (clipped version of GeLU) to transformers to improve quantization performances.

* Add unittests.

* Import tensorflow after `is_tf_available` check.

* Fix tensorflow wrong function `tf.tensor` to `tf.constant`

* style.

* use `tf.math.max`

* Fix tf tests.

* style.

* style style style style style style

* style style style style style style

* Address @sgugger comments.

* Fix wrong operator for raising ValueError for ClippedGELUActivation.

32295b15

21 Feb, 2022 1 commit
- revert temporary addition to test next version of CLIPTokenizerFast (#15717) · 0187c6f0
  SaulLu authored Feb 21, 2022
  
  0187c6f0
18 Feb, 2022 6 commits

fix bug in PT speech-encoder-decoder (#15699) · 60ba4820

Sanchit Gandhi authored Feb 18, 2022



* fix bug in PT speech-encoder-decoder

* add pt test for `inputs is not None`

* fix test

* new pt test

* Update tests/test_modeling_speech_encoder_decoder.py

* make fixup
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

60ba4820

Fix auto (#15706) · 83f45cd6
Lysandre Debut authored Feb 18, 2022

83f45cd6

Add PLBart (#13269) · ae1f8350

Gunjan Chhablani authored Feb 18, 2022

* Init PLBART

* Add missing configuration file

* Add conversion script and configurationf ile

* Fix style

* Update modeling and conversion scripts

* Fix scale embedding in config

* Add comment

* Fix conversion script

* Add classification option to conversion script

* Fix vocab size in config doc

* Add tokenizer files from MBart50

* Allow no lang code in regular tokenizer

* Add PLBart Tokenizer Converters

* Remove mask from multi tokenizer

* Remove mask from multi tokenizer

* Change from MBart-50 to MBart tokenizer

* Fix names and modify src/tgt behavior

* Fix imports for tokenizer

* Remove <mask> from multi tokenizer

* Fix style

* Change tokenizer_class to processor_class

* Add attribute map to config class

* Update modeling file to modified MBart code

* Update configuration file to MBart style configuration

* Fix tokenizer

* Separate tokenizers

* Fix error in tokenization auto

* Copy MBart tests

* Replace with MBart tokenization tests

* Fix style

* Fix language code in multi tokenizer

* Fix configuration docs

* Add entry for plbart_multi in transformers init

* Add dummy objects and fix imports

* Fix modeling tests

* Add TODO in config

* Fix copyright year

* Fix modeling docs and test

* Fix some tokenization tests and style

* Add changes from review

* Fix copies

* Fix docs

* Fix docs

* Fix style

* Fix year

* Add changes from review

* Remove extra changes

* Fix base tokenizer and doc

* Fix style

* Fix modeling and slow tokenizer tests

* Remove Multi-tokenizer Converter and Tests

* Delete QA model and Multi Tokenizer dummy objects

* Fix repo consistency and code quality issues

* Fix example documentation

* Fix style

* Remove PLBartTokenizer from type checking in init

* Fix consistency issue

* Add changes from review

* Fix style

* Remove PLBartTokenizerFast

* Remove FastTokenizer converter

* Fix AutoTokenzier mapping

* Add plbart to toctree and fix consistency issues

* Add language codes tokenizer test

* Fix styling and doc issues

* Add fixes for failing tests

* Fix copies

* Fix failing modeling test

* Change assert to assertTrue in modeling tests

ae1f8350

Fix LongformerModel hidden states (#15537) · 2f2fefd6

Yih-Dar authored Feb 18, 2022



* add undo padding

* fix

* fix tuple issue

* make style and quality

* move unpad logic to LongformerEncoder + unpad attentions + update tests

* move unpad logic to TFLongformerEncoder
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

2f2fefd6

TF: add initializer_std with a small value in TFFunnelModelTester (#15684) · f8ff3fad
Yih-Dar authored Feb 18, 2022

f8ff3fad

fix CLIP fast tokenizer and change some properties of the slow version (#15067) · e93763d4

SaulLu authored Feb 18, 2022



Very big changes concerning the tokenizer fast of CLIP which did not correspond to the tokenizer slow of CLIP
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

e93763d4

17 Feb, 2022 2 commits

Add SimMIM (#15586) · 57882177

NielsRogge authored Feb 17, 2022



* Add first draft

* Make model importable

* Make SwinForMaskedImageModeling importable

* Fix imports

* Add missing inits

* Add support for Swin

* Fix bug

* Fix bug

* Fix another bug

* Fix Swin MIM implementation

* Fix default encoder stride

* Fix Swin

* Add print statements for debugging

* Add image_size data argument

* Fix Swin

* Fix image_size

* Add print statements for debugging

* Fix print statement

* Remove print statements

* Improve reshaping of bool_masked_pos

* Add support for DeiT, fix tests

* Improve docstrings

* Apply new black version

* Improve script

* Fix bug

* Improve README

* Apply suggestions from code review

* Remove DS_Store and add to gitignore

* Apply suggestions from code review + fix BEiT Flax

* Revert BEiT changes

* Improve README

* Fix code quality

* Improve README
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MBP.localdomain>
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>

57882177

Add PoolFormer (#15531) · f84e0dbd

Tanay Mehta authored Feb 17, 2022



* Added all files, PoolFormerFeatureExtractor still failing tests

* Fixed PoolFormerFeatureExtractor not being able to import

* Completed Poolformer doc

* Applied Suggested fixes

* Fixed errors in modeling_auto.py

* Fix feature extractor, convert docs to Markdown, styling of code

* Remove PoolFormer from check_repo and fix integration test

* Remove Poolformer from check_repo

* Fixed configuration_poolformer.py docs and removed inference.py from poolformer

* Ran with black v22

* Added PoolFormer to _toctree.yml

* Updated poolformer doc

* Applied suggested fixes and added on README.md

* Did make fixup and make fix-copies, tests should pass now

* Changed PoolFormer weights conversion script name and fixed README

* Applied fixes in test_modeling_poolformer.py and modeling_poolformer.py

* Added PoolFormerFeatureExtractor to AutoFeatureExtractor API
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MBP.localdomain>

f84e0dbd

16 Feb, 2022 4 commits

Implementation of activations as pytorch modules (#15616) · f65fe366

Eldar Kurtic authored Feb 16, 2022



* Implement activations as pytorch modules

* Apply fixup

* Add missing tests for activations

* Update docstring
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

f65fe366

[Wav2Vec2ProcessorWithLM] Fix auto processor with lm (#15683) · 3a4376d0
Patrick von Platen authored Feb 16, 2022

3a4376d0

Add register method to AutoProcessor (#15669) · cdc51ffd

Sylvain Gugger authored Feb 16, 2022



* Add push_to_hub method to processors

* Fix test

* The other one too!

* Add register method to AutoProcessor

* Update src/transformers/models/auto/processing_auto.py
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

cdc51ffd

Add push_to_hub method to processors (#15668) · 2d02f7b2
Sylvain Gugger authored Feb 15, 2022
```
* Add push_to_hub method to processors

* Fix test

* The other one too!
```
2d02f7b2

15 Feb, 2022 8 commits

Fix vit test (#15671) · 1ddf3c2b
Lysandre Debut authored Feb 15, 2022

1ddf3c2b

Fix model equivalence tests (#15670) · 943e2aa0

Lysandre Debut authored Feb 15, 2022



* Fix model equivalence tests

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

943e2aa0

TF generate refactor - Greedy Search (#15562) · 2e12b907

Patrick von Platen authored Feb 15, 2022



* TF generate start refactor

* Add tf tests for sample generate

* re-organize

* boom boom

* Apply suggestions from code review

* re-add

* add all code

* make random greedy pass

* make encoder-decoder random work

* further improvements

* delete bogus file

* make gpt2 and t5 tests work

* finish logits tests

* correct logits processors

* correct past / encoder_outputs drama

* refactor some methods

* another fix

* refactor shape_list

* fix more shape list

* import shape
_list

* finish docs

* fix imports

* make style

* correct tf utils

* Fix TFRag as well

* Apply Lysandre's and Sylvais suggestions

* Update tests/test_generation_tf_logits_process.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Update src/transformers/tf_utils.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* remove cpu according to gante

* correct logit processor
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

2e12b907

Add `decoder_kwargs` to send to LM on asr pipeline. (#15646) · a3dbbc34

Nicolas Patry authored Feb 15, 2022


Co-authored-by: Giuseppe Attanasio <giuseppeattanasio6@gmail.com>
Co-authored-by: Giuseppe Attanasio <giuseppeattanasio6@gmail.com>

a3dbbc34

add scores to Wav2Vec2WithLMOutput (#15413) · 67047b86
arampacha authored Feb 15, 2022
```
* add scores to Wav2Vec2WithLMOutput

* style fixup
```
67047b86

Allow custom code for Processors (#15649) · 45f56580

Sylvain Gugger authored Feb 15, 2022

* Allow custom code for Processors

* Add more test

* Test all auto_map configs are properly set

45f56580

Fix ASR pipelines from local directories with wav2vec models that have... · 9eb7e9ba

Javier de la Rosa authored Feb 15, 2022

Fix ASR pipelines from local directories with wav2vec models that have language models attached (#15590)

* Fix loading pipelines with wav2vec models with lm when in local paths

* Adding tests

* Fix test

* Adding tests

* Flake8 fixes

* Removing conflict files :(

* Adding task type to test

* Remove unnecessary test and imports

9eb7e9ba

[SpeechEncoderDecoder] Make sure no EOS is generated in test (#15655) · 041fdc4a
Patrick von Platen authored Feb 15, 2022

041fdc4a

14 Feb, 2022 1 commit

Sylvain Gugger authored Feb 14, 2022

* Rework AutoFeatureExtractor.from_pretrained internal

* Custom feature extractor

* Add more tests

* Add support for custom feature extractor code

* Clean up

* Add register API to AutoFeatureExtractor

2e11a043

11 Feb, 2022 4 commits
- Add push to hub to feature extractor (#15632) · 52d2e6f6
  Sylvain Gugger authored Feb 11, 2022
```
* Add push to hub to feature extractor

* Quality

* Clean up
```
  52d2e6f6
- Custom feature extractor (#15630) · 7a32e472
  Sylvain Gugger authored Feb 11, 2022
```
* Rework AutoFeatureExtractor.from_pretrained internal

* Custom feature extractor

* Add more tests

* Add support for custom feature extractor code

* Clean up
```
  7a32e472
- Fix _configuration_file argument getting passed to model (#15629) · 2dce350b
  Sylvain Gugger authored Feb 11, 2022
  
  2dce350b
- TF MT5 embeddings resize (#15567) · 2f40c728
  Joao Gante authored Feb 11, 2022
```
* Fix TF MT5 vocab resize

* more assertive testing
```
  2f40c728
10 Feb, 2022 4 commits

Compute loss independent from decoder for TF EncDec models (as #14139) (#15175) · 724e51c6

Yih-Dar authored Feb 10, 2022



* Compute loss independent from decoder (as 14139)

* fix expected seq_len + style

* Apply the same change to TFVisionEncoderDecoderModel

* fix style

* Add case with labels in equivalence test

* uncomment

* Add case with labels in equivalence test

* add decoder_token_labels

* use hf_compute_loss

* Apply suggestions from code review
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Add copied from
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

724e51c6

Add Tensorflow handling of ONNX conversion (#13831) · cb7ed6e0

Alberto Bégué authored Feb 10, 2022



* Add TensorFlow support for ONNX export

* Change documentation to mention conversion with Tensorflow

* Refactor export into export_pytorch and export_tensorflow

* Check model's type instead of framework installation to choose between TF and Pytorch
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Alberto Bégué <alberto.begue@della.ai>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

cb7ed6e0

Reformat tokenization_fnet · e923917c
Lysandre authored Feb 09, 2022

e923917c
Make slow tests slow · 644ec052
Sylvain Gugger authored Feb 09, 2022

644ec052

09 Feb, 2022 1 commit
- Fix tests hub failure (#15580) · 315e6740
  Sylvain Gugger authored Feb 09, 2022
```
* Expose hub test problem

* Fix tests
```
  315e6740