Commits · 6cc06d17394f5715cdf2d13a1ef7680bedaee9e2 · chenpangpang / transformers

11 Nov, 2022 2 commits

Fix type - update any PIL.Image.Resampling (#20172) · 6cc06d17
amyeroberts authored Nov 11, 2022

6cc06d17

[OWL-ViT] Make model consistent with CLIP (#20144) · cbbeca3d

NielsRogge authored Nov 11, 2022



* Apply fix

* Fix test

* Remove another argument which is not used

* Fix pipeline test

* Add argument back, add deprecation warning

* Add warning add other location

* Use warnings instead

* Add num_channels to config
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MBP.localdomain>

cbbeca3d

10 Nov, 2022 2 commits

Add Jukebox model (replaces #16875) (#17826) · 61a51f5f
Arthur authored Nov 10, 2022

61a51f5f

[processor] Add 'model input names' property (#20117) · 905e5773

Sanchit Gandhi authored Nov 10, 2022

* [processor] Add 'model input names' property

* add test

* no f string

* add generic property method to mixin

* copy to multimodal

* copy to vision

* tests for all audio

* remove ad-hoc tests

* style

* fix flava test

* fix test

* fix processor code

905e5773

09 Nov, 2022 4 commits

Update VisionEncoderDecoder to use an image processor (#20137) · f3d99e49

amyeroberts authored Nov 09, 2022

* TrOCR processor uses an image processor

* Update VisionEncoderDecoder

* Add feature_extractor_class property

f3d99e49

Generate: move generation_*.py src files into generation/*.py (#20096) · f270b960

Joao Gante authored Nov 09, 2022

* move generation_*.py src files into generation/*.py

* populate generation.__init__ with lazy loading

* move imports and references from generation.xxx.object to generation.object

f270b960

Attempting to test automatically the `_keys_to_ignore`. (#20042) · bac2d29a

Nicolas Patry authored Nov 09, 2022



* Attempting to test automatically the `_keys_to_ignore`.

* Style.

* First fix pass.

* Moving test on its own.

* Another batch.

* Second round removing BatchNorm

* Fixing layoutlmv{2,3} + support older Python.

* Disable miss missing warning.

* Removing dodgy additions.

* Big pass.

* mbart.

* More corrections.

* Fixup.

* Updating test_correct_missing_keys

* Add escape hatch for when the head has no extra params so doesn't need

the missing keys check.

* Fixing test.

* Greener.

* Green ! (except for weird splinter bug).

* Adding a test about `named_parameters` usage.

* Shorten message.

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* After rebase modifications.

* More explicit condition checking.

* Fixing slow tests issues.

* Remove extra pdb.

* Remove print.

* Attempt to make failure consistent + fixing roc_bert.

* Removing the seed  (all tests passing with it).
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

bac2d29a

Update `CLIPSegModelTester` (#20134) · c4cad8e3
Yih-Dar authored Nov 09, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
c4cad8e3

08 Nov, 2022 3 commits

AutoImageProcessor (#20111) · 4eb918e6

amyeroberts authored Nov 08, 2022

* AutoImageProcessor skeleton

* Update references

* Add mapping in init

* Add model image processors to __init__ for importing

* Add AutoImageProcessor tests

* Fix up

* Image Processor documentation

* Remove pdb

* Update docs/source/en/model_doc/mobilevit.mdx

* Update docs

* Don't add whitespace on json files

* Remove fixtures

* Move checking model config down

* Fix up

* Add check for image processor

* Remove FeatureExtractorMixin in docstrings

* Rename model_tmpfile to config_tmpfile

* Don't make None if not in image processor map

4eb918e6

Add RocBert (#20013) · efa889d2

Weiwe Shi authored Nov 08, 2022



* add roc_bert

* update roc_bert readme

* code style

* change name and delete unuse file

* udpate model file

* delete unuse log file

* delete tokenizer fast

* reformat code and change model file path

* add RocBertForPreTraining

* update docs

* delete wrong notes

* fix copies

* fix make repo-consistency error

* fix files are not present in the table of contents error

* change RocBert -> RoCBert

* add doc, add detail test
Co-authored-by: weiweishi <weiweishi@tencent.com>

efa889d2

Add CLIPSeg (#20066) · 25896306

NielsRogge authored Nov 08, 2022



* Add first draft

* Update conversion script

* Improve conversion script

* Improve conversion script some more

* Add conditional embeddings

* Add initial decoder

* Fix activation function of decoder

* Make decoder outputs match original implementation

* Make decoder outputs match original implementation

* Add more copied from statements

* Improve model outputs

* Fix auto tokenizer file

* Fix more tests

* Add test

* Improve README and docs, improve conditional embeddings

* Fix more tests

* Remove print statements

* Remove initial embeddings

* Improve conversion script

* Add interpolation of position embeddings

* Finish addition of interpolation of position embeddings

* Add support for refined checkpoint

* Fix refined checkpoint

* Remove unused parameter

* Improve conversion script

* Add support for training

* Fix conversion script

* Add CLIPSegFeatureExtractor

* Fix processor

* Fix CLIPSegProcessor

* Fix conversion script

* Fix most tests

* Fix equivalence test

* Fix README

* Add model to doc tests

* Use better variable name

* Convert other checkpoint as well

* Update config, add link to paper

* Add docs

* Update organization

* Replace base_model_prefix with clip

* Fix base_model_prefix

* Fix checkpoint of config

* Fix config checkpoint

* Remove file

* Use logits for output

* Fix tests
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>

25896306

07 Nov, 2022 2 commits
- Skip 2 tests in `VisionTextDualEncoderProcessorTest` (#20098) · 4ab6e9e2
  Yih-Dar authored Nov 07, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  4ab6e9e2
- Generate: TF contrastive search with XLA support (#20050) · a0f86743
  Joao Gante authored Nov 07, 2022
```
* Add contrastive search
```
  a0f86743
04 Nov, 2022 3 commits
- Update defaults and logic to match old FE (#20065) · d68c4602
  amyeroberts authored Nov 04, 2022
```
* Update defaults and logic to match old FE

* Use docker run rest values
```
  d68c4602
- Allow passing arguments to model testers for CLIP-like models (#20044) · 2d02178e
  Yih-Dar authored Nov 04, 2022
```
* POC

* For more CLIP-like models
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  2d02178e
- Fix ESM LM head test (#20045) · 1076d587
  Matt authored Nov 04, 2022
```
* Fix esm lm head test

* make fixup
```
  1076d587
03 Nov, 2022 1 commit

[Whisper Tokenizer] Make more user-friendly (#19921) · 06d48806

Sanchit Gandhi authored Nov 03, 2022



* [Whisper Tokenizer] Make more user-friendly

* use property

* make indexing rigorous

* small clean-up

* tests

* skip seq2seq tests

* remove multilingual arg

* reorder args

* collapse to one function
Co-authored-by: ArthurZucker <arthur@huggingface.co>

* option to override attributes
Co-authored-by: ArthurZucker <arthur@huggingface.co>

* add to docs

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* make comment more clear
Co-authored-by: sgugger <sylvain@huggingface.co>

* don't add special tokens in get_decoder_prompt_ids

* add test for set_prefix_tokens
Co-authored-by: ArthurZucker <arthur@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: sgugger <sylvain@huggingface.co>

06d48806

02 Nov, 2022 5 commits

🚨

Fix Issue 15003: SentencePiece Tokenizers Not Adding Special Tokens in... · 9f9ddcc2

Ben Eyal authored Nov 02, 2022

🚨 🚨 🚨 Fix Issue 15003: SentencePiece Tokenizers Not Adding Special Tokens in `convert_tokens_to_string` (#15775)

* Add test for SentencePiece not adding special tokens to strings

* Add SentencePieceStringConversionMixin to fix issue 15003

* Fix conversion from tokens to string for most SentencePiece tokenizers

Tokenizers fixed:
- AlbertTokenizer
- BarthezTokenizer
- CamembertTokenizer
- FNetTokenizer
- M2M100Tokenizer
- MBart50Tokenizer
- PegasusTokenizer
- Speech2TextTokenizer

* Fix MarianTokenizer, adjust SentencePiece test to accomodate vocab

* Fix DebertaV2Tokenizer

* Ignore LayoutXLMTokenizer in SentencePiece string conversion test

* Run 'make style' and 'make quality'

* Clean convert_tokens_to_string test

Instead of explicitly ignoring LayoutXLMTokenizer in the test,
override the test in LayoutLMTokenizationTest and do nothing in it.

* Remove commented out code

* Improve robustness of convert_tokens_to_string test

Instead of comparing lengths of re-tokenized text and input_ids,
check that converting all special tokens to string yields a string
with all special tokens.

* Inline and remove SentencePieceStringConversionMixin

The convert_tokens_to_string method is now implemented
in each relevant SentencePiece tokenizer.

* Run 'make style' and 'make quality'

* Revert removal of space in convert_tokens_to_string

* Remove redundant import

* Revert test text to original

* Uncomment the lowercasing of the reverse_text variable

* Mimic Rust tokenizer behavior for tokenizers

- Albert
- Barthez
- Camembert
- MBart50
- T5

* Fix accidentally skipping test in wrong tokenizer

* Add test for equivalent Rust and slow tokenizer behavior

* Override _decode in BigBirdTokenizer to mimic Rust behavior

* Override _decode in FNetTokenizer to mimic Rust behavior

* Override _decode in XLNetTokenizer to mimic Rust behavior

* Remove unused 're' import

* Update DebertaV2Tokenizer to mimic Rust tokenizer

* Deberta tokenizer now behaves like Albert and its `convert_tokens_to_string` is not tested.

* Ignore problematic tests in Deberta V2

* Add comment on why the Deberta V2 tests are skipped

9f9ddcc2

Improve model tester (#19984) · f69eb24b

Yih-Dar authored Nov 02, 2022



* part 1

* part 2

* part 3

* fix

* For CANINE

* For ESMFold
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

f69eb24b

Update auto processor to check image processor created (#20021) · 9aedce99
amyeroberts authored Nov 02, 2022

9aedce99
Fix gradient checkpoint test in encoder-decoder (#20017) · c6c9db3d
Yih-Dar authored Nov 02, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
c6c9db3d

Add Image Processors (#19796) · a6b77598

amyeroberts authored Nov 02, 2022



* Add CLIP image processor

* Crop size as dict too

* Update warning

* Actually use logger this time

* Normalize doesn't change dtype of input

* Add perceiver image processor

* Tidy up

* Add DPT image processor

* Add Vilt image processor

* Tidy up

* Add poolformer image processor

* Tidy up

* Add LayoutLM v2 and v3 imsge processors

* Tidy up

* Add Flava image processor

* Tidy up

* Add deit image processor

* Tidy up

* Add ConvNext image processor

* Tidy up

* Add levit image processor

* Add segformer image processor

* Add in post processing

* Fix up

* Add ImageGPT image processor

* Fixup

* Add mobilevit image processor

* Tidy up

* Add postprocessing

* Fixup

* Add VideoMAE image processor

* Tidy up

* Add ImageGPT image processor

* Fixup

* Add ViT image processor

* Tidy up

* Add beit image processor

* Add mobilevit image processor

* Tidy up

* Add postprocessing

* Fixup

* Fix up

* Fix flava and remove tree module

* Fix image classification pipeline failing tests

* Update feature extractor in trainer scripts

* Update pad_if_smaller to accept tuple and int size

* Update for image segmentation pipeline

* Update src/transformers/models/perceiver/image_processing_perceiver.py
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>

* Update src/transformers/image_processing_utils.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/beit/image_processing_beit.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* PR comments - docstrings; remove accidentally added resize; var names

* Update docstrings

* Add exception if size is not in the right format

* Fix exception check

* Fix up

* Use shortest_edge in tuple in script
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

a6b77598

01 Nov, 2022 2 commits

Generate: contrastive search with full optional outputs (#19963) · 831590f6

Joao Gante authored Nov 01, 2022

* Use beam search functionality; Add extra outputs and test

* Add full tests for contrastive search

* Add error message on unconventional cache format

831590f6

Add ESMFold (#19977) · 7f9b7b3f

Matt authored Nov 01, 2022



* initial commit

* First draft that gets outputs without crashing!

* Add all the ported openfold dependencies

* testing

* Restructure config files for ESMFold

* Debugging to find output discrepancies

* Mainly style

* Make model runnable without extra deps

* Remove utils and merge them to the modeling file

* Use correct gelu and remove some debug prints

* More cleanup

* Update esm docs

* Update conversion script to support ESMFold properly

* Port some top-level changes from ESMFold repo

* Expand EsmFold docstrings

* Make attention_mask optional (default to all 1s)

* Add inference test for ESMFold

* Use config and not n kwargs

* Add modeling output class

* Remove einops

* Remove chunking in ESM FFN

* Update tests for ESMFold

* Quality

* REpo consistency

* Remove tree dependency from ESMFold

* make fixup

* Add an error in case my structure map function breaks later

* Remove needless code

* Stop auto-casting the LM to float16 so CPU tests pass

* Stop auto-casting the LM to float16 so CPU tests pass

* Final test updates

* Split test file

* Copyright and quality

* Unpin PyTorch to see built doc

* Fix config file to_dict() method

* Add some docstrings to the output

* Skip TF checkpoint tests for ESM until we reupload those

* make fixup

* More docstrings

* Unpin to get even with main

* Flag example to write
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>

7f9b7b3f

31 Oct, 2022 2 commits

Add support for gradient checkpointing (#19990) · 4c9e0f02
NielsRogge authored Oct 31, 2022
```
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
```
4c9e0f02

[Conditional, Deformable DETR] Add postprocessing methods (#19709) · 0b294c23

NielsRogge authored Oct 31, 2022



* Add postprocessing methods

* Update docs

* Add fix

* Add test

* Add test for deformable detr postprocessing

* Add post processing methods for segmentation

* Update code examples

* Add post_process to make the pipeline work

* Apply updates
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>

0b294c23

28 Oct, 2022 1 commit

Support segformer fx (#19924) · 347ba38c

donguk.lim authored Oct 28, 2022



* Support segformer fx

* Add fx_compatible attribute to test_modeling_segformer.py

* Update glpn model (fx support)

glpn model was copied from segformer.

* Update utils/fx.py | add semantic-segmentation

for SegformerForSemanticSegmentation model

* Fix minor import order(isort)

* Add random input generation for segformer fx
Co-authored-by: noelbird <lduldu00228@gmail.com>

347ba38c

27 Oct, 2022 2 commits
- Fix bug in Wav2Vec2's GPU tests (#19803) · ea118ae2
  Antonio Carlos Falcão Petri authored Oct 27, 2022
```
* Fix tests when running on GPU

* Fix tests that require mp.set_start_method
```
  ea118ae2
- Some fixes regarding auto mappings and test class names (#19923) · f1e42bc5
  Yih-Dar authored Oct 27, 2022
```
* Add pegasus_x

* ViTMSN

* ESM
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  f1e42bc5
25 Oct, 2022 2 commits
- [Past CI] Vilt only supports PT >= v1.10 (#19851) · eedaba68
  Lysandre Debut authored Oct 25, 2022
```
* Support for Vilt in v1.9

* Skip if not higher or equal than 1.10

* Move test :)

* I am bad at python
```
  eedaba68
- Add missing lang tokens in M2M100Tokenizer.get_vocab (#18416) · ab108a0e
  Guillaume Klein authored Oct 25, 2022
  
  ab108a0e
24 Oct, 2022 1 commit
- Update `LEDModelIntegrationTests` expected values (#19841) · 8b2501b4
  Yih-Dar authored Oct 24, 2022
```
* Update expected values

* fix style
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  8b2501b4
21 Oct, 2022 4 commits

Run some TF Whisper tests in subprocesses to avoid GPU OOM (#19772) · 34368421

Yih-Dar authored Oct 21, 2022



* Run some TF Whisper tests in subprocesses to avoid GPU OOM
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

34368421

Fix image segmentation pipeline errors, resolve backward compatibility issues (#19768) · cca51aa1

Alara Dirik authored Oct 21, 2022

* Fix panoptic segmentation and pipeline
* Update ImageSegmentationPipeline tests and reenable test_small_model_pt
* Resolve backward compatibility issues

cca51aa1

Add sentencepiece to BertJapaneseTokenizer (#19769) · 31565ff0

Hao Wang authored Oct 21, 2022

* support sentencepiece for bertjapanesetokenizer

* add test vocab file for sentencepiece, bertjapanesetokenizer

* make BasicTokenizer be identical to transformers.models.bert.tokenization_bert.BasicTokenizer

* fix missing of \n in comment

* fix init argument missing in tests

* make spm_file be optional, exclude spiece.model from tests/fixtures, and add description comments

* make comment length less than 119

* apply doc style check

31565ff0

PT <-> TF for composite models (#19732) · 84f6bee5

Yih-Dar authored Oct 21, 2022



* First step of PT->TF for composite models

* Update the tests

* For VisionEncoderDecoderModel

* Fix

* Fix

* Add comment

* Fix

* clean up import

* Save memory

* For (TF)EncoderDecoderModel

* For (TF)EncoderDecoderModel
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

84f6bee5

18 Oct, 2022 4 commits

Clean up deprecation warnings (#19654) · a23819ed

David Yang authored Oct 19, 2022

* Clean up deprecation warnings

Notes:
Changed some strings in tests to raw strings, which will change the literal content of the strings as they are fed into whatever machine handles them.
Test cases for past in the past/past_key_values switch changed/removed due to warning of impending removal

* Add PILImageResampling abstraction for PIL.Image.Resampling

a23819ed

Add table transformer [v2] (#19614) · dd523da5

NielsRogge authored Oct 18, 2022

* First draft

* Add conversion script

* Make conversion work

* Upload checkpoints

* Add final fixes

* Revert changes of conditional and deformable detr

* Fix toctree, add and remove copied from

* Use model type

* Improve docs

* Improve code example

* Update copies

* Add copied formt

* Don't update conditional detr

* Don't update deformable detr

dd523da5

Allow user-managed Pool in Wav2Vec2ProcessorWithLM.batch_decode (#18351) · af150e4a

Antonio Carlos Falcão Petri authored Oct 18, 2022



* [Wav2Vec2] Allow user-managed Pool in Wav2Vec2ProcessorWithLM.batch_decode

* [Wav2Vec2] Add user-managed LM's pool tests and usage examples

* Improve styling
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* [Wav2Vec2] Fix hyperlink references
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

af150e4a

Improve DETR models (#19644) · 90071fe4

NielsRogge authored Oct 18, 2022

* Improve DETR models

* Fix Deformable DETR loss and matcher

* Fixup

* Fix integration tests

* Improve variable names

* Apply suggestion

* Fix copies

* Fix DeformableDetrLoss

* Make Conditional DETR copy from Deformable DETR

* Copy from deformable detr's hungarian matcher

* Fix bug

90071fe4