Commits · 3b8e2932ce743008f63585aae1e1b8b30dc8b3ac · chenpangpang / transformers

29 Mar, 2024 1 commit

Mark `test_eager_matches_sdpa_generate` flaky for some models (#29479) · 43d17c18

Yih-Dar authored Mar 29, 2024



* fix

* revert for qwen2

* revert for qwen2

* update

* update

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

43d17c18

28 Mar, 2024 5 commits

[`LlamaSlowConverter`] Slow to Fast better support (#29797) · 536ea2ac

Arthur authored Mar 29, 2024

* fix

* fix test

* style

* nit

* rather rely on concert token to id

* fix quality

* Update src/transformers/convert_slow_tokenizer.py

536ea2ac

[ `TokenizationLlama`] fix the way we convert tokens to strings to keep... · a2a7f716

Arthur authored Mar 28, 2024

[ `TokenizationLlama`] fix the way we convert tokens to strings to keep leading spaces 🚨 breaking fix (#29453)

* nit

* update test and fix test

* fixup

a2a7f716

RoPE models: add numerical sanity-check test for RoPE scaling (#29808) · 441de62f
Joao Gante authored Mar 28, 2024
```
* add hard rope scaling test

* make fixup

* quick rope scaling tests

* add copy statements
```
441de62f
Tests: replace `torch.testing.assert_allclose` by `torch.testing.assert_close` (#29915) · 248d5d23
Joao Gante authored Mar 28, 2024
```
* replace torch.testing.assert_allclose by torch.testing.assert_close

* missing atol rtol
```
248d5d23

Adding Flash Attention 2 Support for GPT2 (#29226) · 22d159dd

Eduardo Pacheco authored Mar 28, 2024



* First commit to add flash attention 2 for GPT-2

* more improvements

* Make GPT2 pass tests and fixed Decison Transformers copies

* Fixed missing arg

* fix copies

* Added expected speedup

* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Added test

* Fixed attn attribute

* Update docs/source/en/model_doc/gpt2.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt2.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update Decision transformer attentions

* More updates

* Passing tests

* Fix copies

* Fix copies part 2

* Decision transformer updates

* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fix copies

* Decision transformer not supporting flash attn

* Addressed comments

* Addressed comments

* Addressed comments

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

22d159dd

27 Mar, 2024 4 commits

MixtralSparseMoeBlock: add gate jitter (#29865) · a25037be

Lorenzo Verardo authored Mar 27, 2024

This commit adds gate jitter to MixtralSparseMoeBlock's input data
before passing it through the MoE layer, if turned on.

a25037be

Fix 29807, sinusoidal positional encodings overwritten by post_init() (#29813) · a81cf9ee

Hovnatan Karapetyan authored Mar 27, 2024

* Check for requires_grad when initing weights

* Add unit test

* Move sinusoidal positional encoding generation after post_init()

* Add modules to skip init list

* Move create_sinusoidal_embeddings to _init_weights

a81cf9ee

Mamba `slow_forward` gradient fix (#29563) · cefb819f

Anton Vlasjuk authored Mar 27, 2024

* FIX: Cached slow forward in mamba
- additionally added mamba cached test
- added unused test (mamba causal lm forward and backward)
- fixed typo: "causl" --> "causal"

* formatting

* fix: use real `slow_forward` call instead of torch module's

* add shape assertion for mixer block test

* adjust shape assertion

cefb819f

Add Qwen2MoE (#29377) · 1c39974a

Bo Zheng authored Mar 27, 2024



* add support for qwen2 MoE models

* update docs

* add support for qwen2 MoE models

* update docs

* update model name & test

* update readme

* update class names & readme & model_doc of Qwen2MoE.

* update architecture name

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* update modeling_qwen2_moe.py

* fix model architecture

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* update modeling_qwen2_moe.py

* fix model architecture

* fix style

* fix test when there are sparse and non sparse layers

* fixup

* Update README.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixup

* fixup

* add archive back

* add support for qwen2 MoE models

* update docs

* update model name & test

* update readme

* update class names & readme & model_doc of Qwen2MoE.

* update architecture name

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* update modeling_qwen2_moe.py

* fix model architecture

* fixup

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* fix style

* fix test when there are sparse and non sparse layers

* fixup

* add archive back

* fix integration test

* fixup

---------
Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

1c39974a

25 Mar, 2024 1 commit

Remove static pretrained maps from the library's internals (#29112) · 39114c03

Lysandre Debut authored Mar 25, 2024



* [test_all] Remove static pretrained maps from the library's internals

* Deprecate archive maps instead of removing them

* Revert init changes

* [test_all] Deprecate instead of removing

* [test_all] PVT v2 support

* [test_all] Tests should all pass

* [test_all] Style

* Address review comments

* Update src/transformers/models/deprecated/_archive_maps.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/deprecated/_archive_maps.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* [test_all] trigger tests

* [test_all] LLAVA

* [test_all] Bad rebase

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

39114c03

22 Mar, 2024 1 commit

Correct llava mask & fix missing setter for `vocab_size` (#29389) · 13b23704

fxmarty authored Mar 22, 2024

* correct llava mask

* fix vipllava as wlel

* mask out embedding for padding tokens

* add test

* fix style

* add setter

* fix test on suggestion

13b23704

21 Mar, 2024 1 commit

Prepend `bos token` to Blip generations (#29642) · b469ebc5

Raushan Turganbay authored Mar 21, 2024



* prepend "bos" to blip generation

* minor changes

* Update src/transformers/models/blip_2/modeling_blip_2.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/models/instructblip/modeling_instructblip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add generation tester mixin

---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

b469ebc5

20 Mar, 2024 5 commits

[`BC 4.37 -> 4.38`] for Llama family, memory and speed (#29753) · ff841900

Arthur authored Mar 21, 2024

* attempt to fix

* the actual fix that works with compilation!

* this?

* temporary update

* nit?

* dispatcg to memory efficient?

* update both models that have static cache support

* fix copies fix compile

* make sure fix

* fix cohere and gemma

* fix beams?

* nit

* slipped through the cracks

* nit

* nits

* update

* fix-copies

* skip failing tests

* nits

ff841900

Add LLaVa-1.6, bis (#29586) · d91fd7f9

NielsRogge authored Mar 20, 2024



* First draft

* Fix tests, add docs

* Improve docstrings

* Fix test

* Address comments

* Address comments

* Remove vocab_size attribute

* Remove batch_size

* Address comment

* Add image processor tests

* Support fx

* Update docstring

* Add support for 34b

* Convert 34b model

* Add integration tests

* Update checkpoints

* Convert vicuna-13b, remove doc tests

* Remove script

* Remove file

* Address comments

* Improve docstrings

* Deprecate vocab_size

* Remove aspect_ratio_setting

* Address comments

* Update READMEs

* Add tips about chat templates

* Fix tests

* Deprecate vocab_size safely

* Update tests

---------
Co-authored-by: Amy Roberts <22614925+amyeroberts@users.noreply.github.com>

d91fd7f9

Add correct batched handling for apply_chat_template (#29222) · 9d999481

Matt authored Mar 20, 2024



* Add correct batched handling for apply_chat_template

* Fix warning method

* Add error for incompatible options

* expand tests

* Add a skip for markuplm

* Add skips for other layout models

* Skip for LayoutLMv2

* Slightly update the warning message

* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* typo fix

* Update docstring for conversation kwarg

* Update return docstring

* Remove the warning, improve error message

* Update src/transformers/tokenization_utils_base.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/tokenization_utils_base.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/test_tokenization_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/test_tokenization_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Remove return_dict=None

* Fix up some merge cruft

* More merge cruft

* Add another skip

* Add another skip

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

9d999481

SuperPointModel -> SuperPointForKeypointDetection (#29757) · 3c17c529
amyeroberts authored Mar 20, 2024

3c17c529
Tests: Musicgen tests + `make fix-copies` (#29734) · 1a5c500f
Joao Gante authored Mar 20, 2024
```
* make fix-copies

* some tests fixed

* tests fixed
```
1a5c500f

19 Mar, 2024 3 commits

Clean-up generation tests after moving methods to private (#29582) · 425ba56c

Raushan Turganbay authored Mar 19, 2024

* clean-up tests

* refine comments

* fix musicgen tests

* make style

* remove slow decorator from a test

* more clean-up

* fix other failing tests

425ba56c

Implementation of SuperPoint and AutoModelForKeypointDetection (#28966) · 56baa033

StevenBucaille authored Mar 19, 2024



* Added SuperPoint docs

* Added tests

* Removed commented part

* Commit to create and fix add_superpoint branch with a new branch

* Fixed dummy_pt_objects

* Committed missing files

* Fixed README.md

* Apply suggestions from code review

Fixed small changes
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Moved ImagePointDescriptionOutput from modeling_outputs.py to modeling_superpoint.py

* Removed AutoModelForKeypointDetection and related stuff

* Fixed inconsistencies in image_processing_superpoint.py

* Moved infer_on_model logic simply in test_inference

* Fixed bugs, added labels to forward method with checks whether it is properly a None value, also added tests about this logic in test_modeling_superpoint.py

* Added tests to SuperPointImageProcessor to ensure that images are properly converted to grayscale

* Removed remaining mentions of MODEL_FOR_KEYPOINT_DETECTION_MAPPING

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fixed from (w, h) to (h, w) as input for tests

* Removed unnecessary condition

* Moved last_hidden_state to be the first returned

* Moved last_hidden_state to be the first returned (bis)

* Moved last_hidden_state to be the first returned (ter)

* Switched image_width and image_height in tests to match recent changes

* Added config as first SuperPointConvBlock init argument

* Reordered README's after merge

* Added missing first config argument to SuperPointConvBlock instantiations

* Removed formatting error

* Added SuperPoint to README's de, pt-br, ru, te and vi

* Checked out README_fr.md

* Fixed README_fr.md

* Test fix README_fr.md

* Test fix README_fr.md

* Last make fix-copies !

* Updated checkpoint path

* Removed unused SuperPoint doc

* Added missing image

* Update src/transformers/models/superpoint/modeling_superpoint.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Removed unnecessary import

* Update src/transformers/models/superpoint/modeling_superpoint.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Added SuperPoint to _toctree.yml

---------
Co-authored-by: steven <steven.bucaillle@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Steven Bucaille <steven.bucaille@buawei.com>

56baa033

[`GemmaConverter`] use user_defined_symbols (#29473) · 2f9a3edb

Arthur authored Mar 20, 2024

* use user_defined_symbols

* fixup

* nit

* add a very robust test

* make sure all models are tested with the `pretrained_tokenizer_to_test`

* should we make sure we test all of them?

* merge

* remove the id

* fix test

* update

* ousies

* oups

* fixup

* fix copies check

* remove `pretrained_tokenizer_to_test`

2f9a3edb

18 Mar, 2024 1 commit

Add MusicGen Melody (#28819) · c43b380e

Yoach Lacombe authored Mar 18, 2024



* first modeling code

* make repository

* still WIP

* update model

* add tests

* add latest change

* clean docstrings and copied from

* update docstrings md and readme

* correct chroma function

* correct copied from and remove unreleated test

* add doc to toctree

* correct imports

* add convert script to notdoctested

* Add suggestion from Sanchit
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* correct get_uncoditional_inputs docstrings

* modify README according to SANCHIT feedback

* add chroma to audio utils

* clean librosa and torchaudio hard dependencies

* fix FE

* refactor audio decoder -> audio encoder for consistency with previous musicgen

* refactor conditional -> encoder

* modify sampling rate logics

* modify license at the beginning

* refactor all_self_attns->all_attentions

* remove ignore copy from causallm generate

* add copied from for from_sub_models

* fix make copies

* add warning if audio is truncated

* add copied from where relevant

* remove artefact

* fix convert script

* fix torchaudio and FE

* modify chroma method according to feedback-> better naming

* refactor input_values->input_features

* refactor input_values->input_features and fix import fe

* add input_features to docstrigs

* correct inputs_embeds logics

* remove dtype conversion

* refactor _prepare_conditional_hidden_states_kwargs_for_generation ->_prepare_encoder_hidden_states_kwargs_for_generation

* change warning for chroma length

* Update src/transformers/models/musicgen_melody/convert_musicgen_melody_transformers.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* change way to save wav, using soundfile

* correct docs and change to soundfile

* fix import

* fix init proj layers

* remove line breaks from md

* fix issue with docstrings

* add FE suggestions

* improve is in logics and remove useless imports

* remove custom from_pretrained

* simplify docstring code

* add suggestions for modeling tests

* make style

* update converting script with sanity check

* remove encoder attention mask from conditional generation

* replace musicgen melody checkpoints with official orga

* rename ylacombe->facebook in checkpoints

* fix copies

* remove unecessary warning

* add shape in code docstrings

* add files to slow doc tests

* fix md bug and add md to not_tested

* make fix-copies

* fix hidden states test and batching

---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

c43b380e

15 Mar, 2024 2 commits

[FIX] Fix speech2test modeling tests (#29672) · 4e98d594

Yoach Lacombe authored Mar 15, 2024



* fix speech_to_test generation tests

* Add details to comment

* Update tests/models/speech_to_text/test_modeling_speech_to_text.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

4e98d594

Cohere Model Release (#29622) · 0e4a1c34

Saurabh Dash authored Mar 15, 2024



* Cohere Model Release (#1)

Cohere Model Release

* Remove unnecessary files and code (#2)

Some cleanup

* Delete cohere-model directory (#3)

* Make Fix (#5)

* Pr fixes (#6)

* fixes for pr

* pr fixes for the format

* pr fixes for the format

* src/transformers/models/auto/tokenization_auto.py

* Tokenizer test (#8)

* tokenizer test

* format fix

* Adding Docs and other minor changes (#7)

* Add modeling tests (#9)

* Smol Fix (#11)

* tokenization tests are fixed

* format fixes

* fix pr doc tests

* fix pr doc tests

* fix pr doc tests

* fix pr style check

* small changes in cohere.md

* FIX: Address final comments for transformers integration (#13)

* fix modeling final nits and add proper test file

* for now leave empty tests

* add integration test

* push new test

* fix modeling cohere (#14)

* Update chat templates to use the new API (#15)

---------
Co-authored-by: ahmetustun <ahmetustun89@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

0e4a1c34

14 Mar, 2024 2 commits
- Fix PVT v2 tests (#29660) · 7b87ecb0
  Yih-Dar authored Mar 14, 2024
```
* update

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  7b87ecb0
- Add `dataset_revision` argument to `RagConfig` (#29610) · 2cc3cc83
  Yih-Dar authored Mar 14, 2024
```
* add arg

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  2cc3cc83
13 Mar, 2024 5 commits

Add PvT-v2 Model (#26812) · 1fc505b8

Nate Cibik authored Mar 13, 2024



* Added pytests for pvt-v2, all passed

* Added pvt_v2 to docs/source/end/model_doc

* Ran fix-copies and fixup. All checks passed

* Added additional ReLU for linear attention mode

* pvt_v2_b2_linear converted and working

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* PvT-v2 now works in AutoModel

* Reverted batch eval changes for PR

* Expanded type support for Pvt-v2 config

* Fixed config docstring. Added channels property

* Fixed model names in tests

* Fixed config backbone compat. Added additional type support for image size in config

* Fixed config backbone compat

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* Set key and value layers to use separate linear modules. Fixed pruning function

* Set AvgPool to 7

* Fixed issue in init

* PvT-v2 now works in AutoModel

* Successful conversion of pretrained weights for PVT-v2

* Successful conversion of pretrained weights for PVT-v2 models

* Added pytests for pvt-v2, all passed

* Ran fix-copies and fixup. All checks passed

* Added additional ReLU for linear attention mode

* pvt_v2_b2_linear converted and working

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* Set key and value layers to use separate linear modules. Fixed pruning function

* Set AvgPool to 7

* Fixed issue in init

* PvT-v2 now works in AutoModel

* Successful conversion of pretrained weights for PVT-v2

* Successful conversion of pretrained weights for PVT-v2 models

* Added pytests for pvt-v2, all passed

* Ran fix-copies and fixup. All checks passed

* Added additional ReLU for linear attention mode

* pvt_v2_b2_linear converted and working

* Reverted batch eval changes for PR

* Updated index.md

* Expanded type support for Pvt-v2 config

* Fixed config docstring. Added channels property

* Fixed model names in tests

* Fixed config backbone compat

* Ran fix-copies

* Fixed PvtV2Backbone tests

* Added TFRegNet to OBJECTS_TO_IGNORE in check_docstrings.py

* Fixed backbone stuff and fixed tests: all passing

* Ran make fixup

* Made modifications for code checks

* Remove ONNX config from configuration_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Use explicit image size dict in test_modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Make image_size optional in test_modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Remove _ntuple use in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Remove reference to fp16_enabled

* Model modules now take config as first argument even when not used

* Replaced abbreviations for "SR" and "AP" with explicit "spatialreduction" and "averagepooling"

* All LayerNorm now instantiates with config.layer_norm_eps

* Added docstring for depth-wise conv layer

* PvtV2Config now only takes Union[int, Tuple[int, int]] for image size

* Refactored PVTv2 in prep for gradient checkpointing

* Gradient checkpointing ready to test

* Removed override of _set_gradient_checkpointing

* Cleaned out old code

* Applied code fixup

* Applied code fixup

* Began debug of pvt_v2 tests

* Leave handling of num_labels to base pretrained config class

* Deactivated gradient checkpointing tests until it is fixed

* Removed PvtV2ImageProcessor which duped PvtImageProcessor

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* Set key and value layers to use separate linear modules. Fixed pruning function

* Set AvgPool to 7

* Fixed issue in init

* PvT-v2 now works in AutoModel

* Successful conversion of pretrained weights for PVT-v2

* Successful conversion of pretrained weights for PVT-v2 models

* Added pytests for pvt-v2, all passed

* Added pvt_v2 to docs/source/end/model_doc

* Ran fix-copies and fixup. All checks passed

* Added additional ReLU for linear attention mode

* pvt_v2_b2_linear converted and working

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* PvT-v2 now works in AutoModel

* Reverted batch eval changes for PR

* Expanded type support for Pvt-v2 config

* Fixed config docstring. Added channels property

* Fixed model names in tests

* Fixed config backbone compat. Added additional type support for image size in config

* Fixed config backbone compat

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* Set key and value layers to use separate linear modules. Fixed pruning function

* Set AvgPool to 7

* Fixed issue in init

* PvT-v2 now works in AutoModel

* Successful conversion of pretrained weights for PVT-v2

* Successful conversion of pretrained weights for PVT-v2 models

* Added pytests for pvt-v2, all passed

* Ran fix-copies and fixup. All checks passed

* Added additional ReLU for linear attention mode

* pvt_v2_b2_linear converted and working

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* Set key and value layers to use separate linear modules. Fixed pruning function

* Set AvgPool to 7

* Fixed issue in init

* PvT-v2 now works in AutoModel

* Successful conversion of pretrained weights for PVT-v2

* Successful conversion of pretrained weights for PVT-v2 models

* Added pytests for pvt-v2, all passed

* Ran fix-copies and fixup. All checks passed

* Added additional ReLU for linear attention mode

* pvt_v2_b2_linear converted and working

* Reverted batch eval changes for PR

* Expanded type support for Pvt-v2 config

* Fixed config docstring. Added channels property

* Fixed model names in tests

* Fixed config backbone compat

* Ran fix-copies

* Fixed PvtV2Backbone tests

* Added TFRegNet to OBJECTS_TO_IGNORE in check_docstrings.py

* Fixed backbone stuff and fixed tests: all passing

* Ran make fixup

* Made modifications for code checks

* Remove ONNX config from configuration_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Use explicit image size dict in test_modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Make image_size optional in test_modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Remove _ntuple use in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Remove reference to fp16_enabled

* Model modules now take config as first argument even when not used

* Replaced abbreviations for "SR" and "AP" with explicit "spatialreduction" and "averagepooling"

* All LayerNorm now instantiates with config.layer_norm_eps

* Added docstring for depth-wise conv layer

* PvtV2Config now only takes Union[int, Tuple[int, int]] for image size

* Refactored PVTv2 in prep for gradient checkpointing

* Gradient checkpointing ready to test

* Removed override of _set_gradient_checkpointing

* Cleaned out old code

* Applied code fixup

* Applied code fixup

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* PvT-v2 now works in AutoModel

* Ran fix-copies and fixup. All checks passed

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* PvT-v2 now works in AutoModel

* Reverted batch eval changes for PR

* Fixed config docstring. Added channels property

* Fixed config backbone compat

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* PvT-v2 now works in AutoModel

* Ran fix-copies and fixup. All checks passed

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* PvT-v2 now works in AutoModel

* Fixed config backbone compat

* Ran fix-copies

* Began debug of pvt_v2 tests

* Leave handling of num_labels to base pretrained config class

* Deactivated gradient checkpointing tests until it is fixed

* Removed PvtV2ImageProcessor which duped PvtImageProcessor

* Fixed issue from rebase

* Fixed issue from rebase

* Set tests for gradient checkpointing to skip those using reentrant since it isn't supported

* Fixed issue from rebase

* Fixed issue from rebase

* Changed model name in docs

* Removed duplicate PvtV2Backbone

* Work around type switching issue in tests

* Fix model name in config comments

* Update docs/source/en/model_doc/pvt_v2.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Changed name of variable from 'attn_reduce' to 'sr_type'

* Changed name of variable from 'attn_reduce' to 'sr_type'

* Changed from using 'sr_type' to 'linear_attention' for clarity

* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py

Removed old code

* Changed from using 'sr_type' to 'linear_attention' for clarity

* Fixed Class names to be more descriptive

* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py

Removed outdated code

* Moved paper abstract to single line in pvt_v2.md

* Added usage tips to pvt_v2.md

* Simplified module inits by passing layer_idx

* Fixed typing for hidden_act in PvtV2Config

* Removed unusued import

* Add pvt_v2 to docs/source/en/_toctree.yml

* Updated documentation in docs/source/en/model_doc/pvt_v2.md to be more comprehensive.

* Updated documentation in docs/source/en/model_doc/pvt_v2.md to be more comprehensive.

* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py

Move function parameters to single line
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py

Update year of copyright to 2024
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py

Make code more explicit
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Updated sr_ratio to be more explicit spatial_reduction_ratio

* Removed excess type hints in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Move params to single line in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Removed needless comment in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update copyright date in pvt_v2.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Moved params to single line in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Updated copyright date in configuration_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Cleaned comments in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Renamed spatial_reduction Conv2D operation

* Revert "Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
"

This reverts commit c4a04416dde8f3475ab405d1feb368600e0f8538.

* Updated conversion script to reflect module name change

* Deprecated reshape_last_stage option in config

* Removed unused imports

* Code formatting

* Fixed outdated decorators on test_inference_fp16

* Added "Copied from" comments in test_modeling_pvt_v2.py

* Fixed import listing

* Updated model name

* Force empty commit for PR refresh

* Fixed linting issue

* Removed # Copied from comments

* Added PVTv2 to README_fr.md

* Ran make fix-copies

* Replace all FoamoftheSea hub references with OpenGVLab

* Fixed out_indices and out_features logic in configuration_pvt_v2.py

* Made ImageNet weight conversion verification optional in convert_pvt_v2_to_pytorch.py

* Ran code fixup

* Fixed order of parent classes in PvtV2Config to fix the to_dict method override

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

1fc505b8

Fix `multi_gpu_data_parallel_forward` for `MusicgenTest` (#29632) · fe085560
Yih-Dar authored Mar 13, 2024
```
update
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
fe085560

Fix batching tests for new models (Mamba and SegGPT) (#29633) · 5ac264d8

Raushan Turganbay authored Mar 13, 2024



* fix batchinng tests for new models

* Update tests/models/seggpt/test_modeling_seggpt.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

5ac264d8

Adds pretrained IDs directly in the tests (#29534) · 11bbb505
Lysandre Debut authored Mar 13, 2024
```
* Adds pretrained IDs directly in the tests

* Fix tests

* Fix tests

* Review!
```
11bbb505

[Flash Attention 2] Add flash attention 2 for GPT-J (#28295) · be3fd8a2

bytebarde authored Mar 13, 2024



* initial implementation of flash attention for gptj

* modify flash attention and overwrite test_flash_attn_2_generate_padding_right

* update flash attention support list

* remove the copy line in the `CodeGenBlock`

* address copy mechanism

* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Add GPTJ attention classes

* add expected outputs in the gptj test

* Ensure repo consistency with 'make fix-copies'

---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

be3fd8a2

12 Mar, 2024 2 commits

Add tests for batching support (#29297) · 8e64ba28

Raushan Turganbay authored Mar 12, 2024



* add tests for batching support

* Update src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update tests/test_modeling_common.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update tests/test_modeling_common.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update tests/test_modeling_common.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* fixes and comments

* use cosine distance for conv models

* skip mra model testing

* Update tests/models/vilt/test_modeling_vilt.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* finzalize  and make style

* check model type by input names

* Update tests/models/vilt/test_modeling_vilt.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fixed batch size for all testers

* Revert "fixed batch size for all testers"

This reverts commit 525f3a0a058f069fbda00352cf202b728d40df99.

* add batch_size for all testers

* dict from model output

* do not skip layoutlm

* bring back some code from git revert

* Update tests/test_modeling_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/test_modeling_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* clean-up

* where did minus go in tolerance

* make whisper happy

* deal with consequences of losing minus

* deal with consequences of losing minus

* maskformer needs its own test for happiness

* fix more models

* tag flaky CV models from Amy's approval

* make codestyle

---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

8e64ba28

Update flava tests (#29611) · a15bd3af

Yih-Dar authored Mar 12, 2024



* update

* update

* update

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

a15bd3af

11 Mar, 2024 1 commit
- [`Mamba doc`] Post merge updates (#29472) · 4f27ee93
  Arthur authored Mar 11, 2024
```
* post merge update

* nit

* oups
```
  4f27ee93
08 Mar, 2024 2 commits

[tests] add the missing `require_sacremoses` decorator (#29504) · 8e589c83

Fanli Lin authored Mar 08, 2024



* add sacremoses check

* fix style

* for FlaubertTokenizer

* HerbertTokenizer fix

* add typeHint

* Update src/transformers/testing_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* make less skipped

* make quality

* remove import

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

8e589c83

Generate: left-padding test, revisited (#29515) · bc764f42

Joao Gante authored Mar 08, 2024



* left-padding test revisited

* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

bc764f42

07 Mar, 2024 3 commits

Fix `VisionEncoderDecoder` Positional Arg (#29497) · b338a6c3

Nick DeGroot authored Mar 07, 2024

* 🐛 Fix vision encoder decoder positional arg

* ✅

 Add test for VisionEncoderDecoder with LayoutLMv3 encoder

---------
Co-authored-by: Nick DeGroot <1966472+nickthegroot@users.noreply.github.com>

b338a6c3

Flava multimodal add attention mask (#29446) · 923733c2
Raushan Turganbay authored Mar 07, 2024
```
* flava multimodal add attn mask

* make style

* check mask is not None
```
923733c2
Enable BLIP for auto VQA (#29499) · 979fccc9
regisss authored Mar 07, 2024
```
* Enable BLIP for auto VQA

* Make style

* Add VQA to BLIP pipeline tests
```
979fccc9

05 Mar, 2024 1 commit

[`Add Mamba`] Adds support for the `Mamba` models (#28094) · fb1c62e9

Arthur authored Mar 05, 2024



* initial-commit

* start cleaning

* small nits

* small nits

* current updates

* add kernels

* small refactoring little step

* add comments

* styling

* nit

* nits

* Style

* Small changes

* Push dummy mambda simple slow

* nit

* Use original names

* Use original names and remove norm

* Updates for inference params

* Style nd updates

* nits

* Match logits

* Add a test

* Add expected generated text

* nits doc, imports and styling

* style

* oups

* dont install kernels, invite users to install the required kernels

* let use use the original packages

* styling

* nits

* fix some copieds

* update doc

* fix-copies

* styling done

* nits

* fix import check

* run but wrong cuda ress

* mamba CUDA works :)

* fix the fast path

* config naming nits

* conversion script is not required at this stage

* finish fixing the fast path: generation make sense now!

* nit

* Let's start working on the CIs

* style

* better style

* more nits

* test nit

* quick fix for now

* nits

* nit

* nit

* nit

* nits

* update test rest

* fixup

* update test

* nit

* some fixes

* nits

* update test values

* fix styling

* nit

* support peft

* integrations tests require torchg

* also add slow markers

* styling

* chose forward wisely

* nits

* update tests

* fix gradient checkpointing

* fixup

* nit

* fix doc

* check copies

* fix the docstring

* fix some more tests

* style

* fix beam search

* add init schene

* update

* nit

* fix

* fixup the doc

* fix the doc

* fixup

* tentative update but slow is no longer good

* nit

* should we always use float32?

* nits

* revert wrong changes

* res in float32

* cleanup

* skip fmt for now

* update generation values

* update test values running original model

* fixup

* update tests + rename inference_params to cache_params + make sure training does not use cache_params

* small nits

* more nits

* fix final CIs

* style

* nit doc

* I hope final doc nits

* nit

* 🫠

* final touch!

* fix torch import

* Apply suggestions from code review
Co-authored-by: Lysandre Debut <hi@lysand.re>

* Apply suggestions from code review

* fix fix and fix

* fix base model prefix!

* nit

* Update src/transformers/models/mamba/__init__.py

* Update docs/source/en/model_doc/mamba.md
Co-authored-by: Lysandre Debut <hi@lysand.re>

* nit

---------
Co-authored-by: Lysandre Debut <hi@lysand.re>

fb1c62e9