Commits · 0d04b1e25a79ef18af419881d708fafc665851c7 · chenpangpang / transformers

02 Apr, 2024 2 commits

Add Flash Attention 2 support to Musicgen and Musicgen Melody (#29939) · 0d04b1e2

Yoach Lacombe authored Apr 02, 2024

* add FA2 to o.g Musicgen

* make style

* add FA2 support to Musicgen Melody

* add generation FA2 tests to o.g Musicgen

* make style and fix copies

* add Musicgen to FA2 docs + deprecate list

* add sdpa supports to Musicgen's

* make style and fix copies

* refactor attention implementation arguments

* add Copied from to sdpa tests

* add copied form in sdpa tests melody

* add copied for FA2 generation tests

* add FA2 inference copied from

* make style

0d04b1e2

[docs] Big model loading (#29920) · 096f3046
Steven Liu authored Apr 01, 2024
```
* update

* feedback
```
096f3046

30 Mar, 2024 1 commit

Update model card and link of blog post. (#29928) · 46d63681

Bo Zheng authored Mar 31, 2024



* Update qwen2_moe.md

* update link of blogpost.

* fixup

---------
Co-authored-by: bozheng-hit <dsoul0621@gmail.com>

46d63681

29 Mar, 2024 1 commit
- Super tiny fix 12 typos about "with with" (#29926) · 5ad7f170
  fzyzcjy authored Mar 29, 2024
```
* with with

* style
```
  5ad7f170
28 Mar, 2024 4 commits

Update installs in image classification doc (#29947) · ba56ed08

MariaHei authored Mar 28, 2024

Trainer with PyTorch now requires accelerate to be installed.

Partly resolves huggingface/transformers#29174

ba56ed08

Add beam search visualizer to the doc (#29876) · c9d2e855
Aymeric Roucher authored Mar 28, 2024

c9d2e855
[doc] fix some typos and add `xpu` to the testing documentation (#29894) · 7c19fafe
Fanli Lin authored Mar 28, 2024
```
fix typo
```
7c19fafe

Adding Flash Attention 2 Support for GPT2 (#29226) · 22d159dd

Eduardo Pacheco authored Mar 28, 2024



* First commit to add flash attention 2 for GPT-2

* more improvements

* Make GPT2 pass tests and fixed Decison Transformers copies

* Fixed missing arg

* fix copies

* Added expected speedup

* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Added test

* Fixed attn attribute

* Update docs/source/en/model_doc/gpt2.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt2.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update Decision transformer attentions

* More updates

* Passing tests

* Fix copies

* Fix copies part 2

* Decision transformer updates

* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fix copies

* Decision transformer not supporting flash attn

* Addressed comments

* Addressed comments

* Addressed comments

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

22d159dd

27 Mar, 2024 1 commit

Add Qwen2MoE (#29377) · 1c39974a

Bo Zheng authored Mar 27, 2024



* add support for qwen2 MoE models

* update docs

* add support for qwen2 MoE models

* update docs

* update model name & test

* update readme

* update class names & readme & model_doc of Qwen2MoE.

* update architecture name

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* update modeling_qwen2_moe.py

* fix model architecture

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* update modeling_qwen2_moe.py

* fix model architecture

* fix style

* fix test when there are sparse and non sparse layers

* fixup

* Update README.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixup

* fixup

* add archive back

* add support for qwen2 MoE models

* update docs

* update model name & test

* update readme

* update class names & readme & model_doc of Qwen2MoE.

* update architecture name

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* update modeling_qwen2_moe.py

* fix model architecture

* fixup

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* fix style

* fix test when there are sparse and non sparse layers

* fixup

* add archive back

* fix integration test

* fixup

---------
Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

1c39974a

26 Mar, 2024 2 commits
- [docs] Indent ordered list in add_new_model.md (#29796) · b9ceb03d
  Michael authored Mar 26, 2024
  
  b9ceb03d
- Fix header in IFE task guide (#29859) · de81a677
  Merve Noyan authored Mar 26, 2024
```
Update image_feature_extraction.md
```
  de81a677
25 Mar, 2024 1 commit
- remove quotes in code example (#29812) · 7eb3ba82
  Johannes Kolbe authored Mar 25, 2024
```
Co-authored-by: Johannes <johannes.kolbe@tech.better.team>
```
  7eb3ba82
24 Mar, 2024 1 commit

model_summary.md - Restore link to Harvard's Annotated Transformer. (#29702) · 76a33a10

gamepad_coder authored Mar 23, 2024

* model_summary.md - Add link to Harvard's Annotated Transformer.

* model_summary.md - slight wording change + capitalize name of the paper

* model_summary.md - moves the Annotated Transformer link in a praenthesis next to the link to the original paper (great idea, stevhliu!)

* model_summary.md - moves the Annotated Transformer link in a praenthesis next to the link to the original paper (commit pt. 2, accidentally removed "has" in pt. 1)

76a33a10

23 Mar, 2024 1 commit
- [DOCS] Fix typo for llava next docs (#29829) · dafe3702
  Billy Cao authored Mar 24, 2024
```
Fix typo for llava next docs
```
  dafe3702
21 Mar, 2024 1 commit
- [docs] Remove redundant `-` and `the` from custom_tools.md (#29767) · 5d1a58a6
  Michael authored Mar 21, 2024
```
[docs] Remove redundant  and  from custom_tools.md
```
  5d1a58a6
20 Mar, 2024 2 commits

Add LLaVa-1.6, bis (#29586) · d91fd7f9

NielsRogge authored Mar 20, 2024



* First draft

* Fix tests, add docs

* Improve docstrings

* Fix test

* Address comments

* Address comments

* Remove vocab_size attribute

* Remove batch_size

* Address comment

* Add image processor tests

* Support fx

* Update docstring

* Add support for 34b

* Convert 34b model

* Add integration tests

* Update checkpoints

* Convert vicuna-13b, remove doc tests

* Remove script

* Remove file

* Address comments

* Improve docstrings

* Deprecate vocab_size

* Remove aspect_ratio_setting

* Address comments

* Update READMEs

* Add tips about chat templates

* Fix tests

* Deprecate vocab_size safely

* Update tests

---------
Co-authored-by: Amy Roberts <22614925+amyeroberts@users.noreply.github.com>

d91fd7f9

SuperPointModel -> SuperPointForKeypointDetection (#29757) · 3c17c529
amyeroberts authored Mar 20, 2024

3c17c529

19 Mar, 2024 2 commits

Implementation of SuperPoint and AutoModelForKeypointDetection (#28966) · 56baa033

StevenBucaille authored Mar 19, 2024



* Added SuperPoint docs

* Added tests

* Removed commented part

* Commit to create and fix add_superpoint branch with a new branch

* Fixed dummy_pt_objects

* Committed missing files

* Fixed README.md

* Apply suggestions from code review

Fixed small changes
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Moved ImagePointDescriptionOutput from modeling_outputs.py to modeling_superpoint.py

* Removed AutoModelForKeypointDetection and related stuff

* Fixed inconsistencies in image_processing_superpoint.py

* Moved infer_on_model logic simply in test_inference

* Fixed bugs, added labels to forward method with checks whether it is properly a None value, also added tests about this logic in test_modeling_superpoint.py

* Added tests to SuperPointImageProcessor to ensure that images are properly converted to grayscale

* Removed remaining mentions of MODEL_FOR_KEYPOINT_DETECTION_MAPPING

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fixed from (w, h) to (h, w) as input for tests

* Removed unnecessary condition

* Moved last_hidden_state to be the first returned

* Moved last_hidden_state to be the first returned (bis)

* Moved last_hidden_state to be the first returned (ter)

* Switched image_width and image_height in tests to match recent changes

* Added config as first SuperPointConvBlock init argument

* Reordered README's after merge

* Added missing first config argument to SuperPointConvBlock instantiations

* Removed formatting error

* Added SuperPoint to README's de, pt-br, ru, te and vi

* Checked out README_fr.md

* Fixed README_fr.md

* Test fix README_fr.md

* Test fix README_fr.md

* Last make fix-copies !

* Updated checkpoint path

* Removed unused SuperPoint doc

* Added missing image

* Update src/transformers/models/superpoint/modeling_superpoint.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Removed unnecessary import

* Update src/transformers/models/superpoint/modeling_superpoint.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Added SuperPoint to _toctree.yml

---------
Co-authored-by: steven <steven.bucaillle@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Steven Bucaille <steven.bucaille@buawei.com>

56baa033

FEAT / Optim: Add GaLore optimizer (#29588) · f6261d7d

Younes Belkada authored Mar 19, 2024



* add galore v1

* add import

* add tests and doc

* fix doctest

* forward contrib credits from discussions

* forward contrib credits from discussions

* Apply suggestions from code review
Co-authored-by: Zach Mueller <muellerzr@gmail.com>

* fix failing tests'

* switch to `optim_target_modules` and clarify docs

* more clarification

* enhance lookup logic

* update a test to add peak memory

* add regex, all-linear and single string support

* add layer-wise optimization through DummyOptimizers and LRSchedulers

* forward contrib credits from discussions and original idea

* add a section about DDP not supported in layerwise

* Update src/transformers/trainer.py
Co-authored-by: Zach Mueller <muellerzr@gmail.com>

* fix self

* check only if layer_wise

* Update src/transformers/training_args.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* oops

* make use of intervals

* clarify comment

* add matching tests

* GaLoRe -> GaLore

* move to `get_scheduler`

* add note on docs

* add a warning

* adapt a bit the docs

* update docstring

* support original API

* Update docs/source/en/trainer.md

* slightly refactor

* Update docs/source/en/trainer.md
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* Update src/transformers/training_args.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix args parsing and add tests

* remove warning for regex

* fix type hint

* add note about extra args

* make `is_regex` return optional

---------

Co-authored-by: Maxime <maximegmd @users.noreply.github.com>
Co-authored-by: Wing Lian <winglian @users.noreply.github.com>
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
Co-authored-by: hiyouga <hiyouga@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

f6261d7d

18 Mar, 2024 2 commits

Update the pipeline tutorial to include `gradio.Interface.from_pipeline` (#29684) · 838b87ab

Abubakar Abid authored Mar 18, 2024



* Update pipeline_tutorial.md to include gradio

* Update pipeline_tutorial.md

* Update docs/source/en/pipeline_tutorial.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/pipeline_tutorial.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/pipeline_tutorial.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/pipeline_tutorial.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update pipeline_tutorial.md

* Update docs/source/en/pipeline_tutorial.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

838b87ab

Add MusicGen Melody (#28819) · c43b380e

Yoach Lacombe authored Mar 18, 2024



* first modeling code

* make repository

* still WIP

* update model

* add tests

* add latest change

* clean docstrings and copied from

* update docstrings md and readme

* correct chroma function

* correct copied from and remove unreleated test

* add doc to toctree

* correct imports

* add convert script to notdoctested

* Add suggestion from Sanchit
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* correct get_uncoditional_inputs docstrings

* modify README according to SANCHIT feedback

* add chroma to audio utils

* clean librosa and torchaudio hard dependencies

* fix FE

* refactor audio decoder -> audio encoder for consistency with previous musicgen

* refactor conditional -> encoder

* modify sampling rate logics

* modify license at the beginning

* refactor all_self_attns->all_attentions

* remove ignore copy from causallm generate

* add copied from for from_sub_models

* fix make copies

* add warning if audio is truncated

* add copied from where relevant

* remove artefact

* fix convert script

* fix torchaudio and FE

* modify chroma method according to feedback-> better naming

* refactor input_values->input_features

* refactor input_values->input_features and fix import fe

* add input_features to docstrigs

* correct inputs_embeds logics

* remove dtype conversion

* refactor _prepare_conditional_hidden_states_kwargs_for_generation ->_prepare_encoder_hidden_states_kwargs_for_generation

* change warning for chroma length

* Update src/transformers/models/musicgen_melody/convert_musicgen_melody_transformers.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* change way to save wav, using soundfile

* correct docs and change to soundfile

* fix import

* fix init proj layers

* remove line breaks from md

* fix issue with docstrings

* add FE suggestions

* improve is in logics and remove useless imports

* remove custom from_pretrained

* simplify docstring code

* add suggestions for modeling tests

* make style

* update converting script with sanity check

* remove encoder attention mask from conditional generation

* replace musicgen melody checkpoints with official orga

* rename ylacombe->facebook in checkpoints

* fix copies

* remove unecessary warning

* add shape in code docstrings

* add files to slow doc tests

* fix md bug and add md to not_tested

* make fix-copies

* fix hidden states test and batching

---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

c43b380e

15 Mar, 2024 3 commits

[docs] Spanish translation of attention.md (#29681) · 00c1d87a

Aaron Jimenez authored Mar 15, 2024

* add attention to es/ and edit es/_toctree.yml

* translate attention.md

* fix transformers

* fix transformers

00c1d87a

[Quantization] Quanto quantizer (#29023) · 28de2f4d

Marc Sun authored Mar 15, 2024



* start integration

* fix

* add and debug tests

* update tests

* make pytorch serialization works

* compatible with device_map and offload

* fix tests

* make style

* add ref

* guard against safetensors

* add float8 and style

* fix is_serializable

* Fix shard_checkpoint compatibility with quanto

* more tests

* docs

* adjust memory

* better

* style

* pass tests

* Update src/transformers/modeling_utils.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* add is_safe_serialization instead

* Update src/transformers/quantizers/quantizer_quanto.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* add QbitsTensor tests

* fix tests

* simplify activation list

* Update docs/source/en/quantization.md
Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>

* better comment

* Update tests/quantization/quanto_integration/test_quanto.py
Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>

* Update tests/quantization/quanto_integration/test_quanto.py
Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>

* find and fix edge case

* Update docs/source/en/quantization.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* pass weights_only_kwarg instead

* fix shard_checkpoint loading

* simplify update_missing_keys

* Update tests/quantization/quanto_integration/test_quanto.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* recursion to get all tensors

* block serialization

* skip serialization tests

* fix

* change by cuda:0 for now

* fix regression

* update device_map

* fix doc

* add noteboon

* update torch_dtype

* update doc

* typo

* typo

* remove comm

---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Younes Belkada <younesbelkada@gmail.com>

28de2f4d

Cohere Model Release (#29622) · 0e4a1c34

Saurabh Dash authored Mar 15, 2024



* Cohere Model Release (#1)

Cohere Model Release

* Remove unnecessary files and code (#2)

Some cleanup

* Delete cohere-model directory (#3)

* Make Fix (#5)

* Pr fixes (#6)

* fixes for pr

* pr fixes for the format

* pr fixes for the format

* src/transformers/models/auto/tokenization_auto.py

* Tokenizer test (#8)

* tokenizer test

* format fix

* Adding Docs and other minor changes (#7)

* Add modeling tests (#9)

* Smol Fix (#11)

* tokenization tests are fixed

* format fixes

* fix pr doc tests

* fix pr doc tests

* fix pr doc tests

* fix pr style check

* small changes in cohere.md

* FIX: Address final comments for transformers integration (#13)

* fix modeling final nits and add proper test file

* for now leave empty tests

* add integration test

* push new test

* fix modeling cohere (#14)

* Update chat templates to use the new API (#15)

---------
Co-authored-by: ahmetustun <ahmetustun89@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

0e4a1c34

13 Mar, 2024 6 commits

[docs] Remove broken ChatML format link from chat_templating.md (#29643) · f738ab3b
Aaron Jimenez authored Mar 13, 2024
```
* remove ChatML link from en/

* remove ChatML link in ja/

* remove ChatML link in zh/
```
f738ab3b

Add PvT-v2 Model (#26812) · 1fc505b8

Nate Cibik authored Mar 13, 2024



* Added pytests for pvt-v2, all passed

* Added pvt_v2 to docs/source/end/model_doc

* Ran fix-copies and fixup. All checks passed

* Added additional ReLU for linear attention mode

* pvt_v2_b2_linear converted and working

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* PvT-v2 now works in AutoModel

* Reverted batch eval changes for PR

* Expanded type support for Pvt-v2 config

* Fixed config docstring. Added channels property

* Fixed model names in tests

* Fixed config backbone compat. Added additional type support for image size in config

* Fixed config backbone compat

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* Set key and value layers to use separate linear modules. Fixed pruning function

* Set AvgPool to 7

* Fixed issue in init

* PvT-v2 now works in AutoModel

* Successful conversion of pretrained weights for PVT-v2

* Successful conversion of pretrained weights for PVT-v2 models

* Added pytests for pvt-v2, all passed

* Ran fix-copies and fixup. All checks passed

* Added additional ReLU for linear attention mode

* pvt_v2_b2_linear converted and working

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* Set key and value layers to use separate linear modules. Fixed pruning function

* Set AvgPool to 7

* Fixed issue in init

* PvT-v2 now works in AutoModel

* Successful conversion of pretrained weights for PVT-v2

* Successful conversion of pretrained weights for PVT-v2 models

* Added pytests for pvt-v2, all passed

* Ran fix-copies and fixup. All checks passed

* Added additional ReLU for linear attention mode

* pvt_v2_b2_linear converted and working

* Reverted batch eval changes for PR

* Updated index.md

* Expanded type support for Pvt-v2 config

* Fixed config docstring. Added channels property

* Fixed model names in tests

* Fixed config backbone compat

* Ran fix-copies

* Fixed PvtV2Backbone tests

* Added TFRegNet to OBJECTS_TO_IGNORE in check_docstrings.py

* Fixed backbone stuff and fixed tests: all passing

* Ran make fixup

* Made modifications for code checks

* Remove ONNX config from configuration_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Use explicit image size dict in test_modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Make image_size optional in test_modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Remove _ntuple use in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Remove reference to fp16_enabled

* Model modules now take config as first argument even when not used

* Replaced abbreviations for "SR" and "AP" with explicit "spatialreduction" and "averagepooling"

* All LayerNorm now instantiates with config.layer_norm_eps

* Added docstring for depth-wise conv layer

* PvtV2Config now only takes Union[int, Tuple[int, int]] for image size

* Refactored PVTv2 in prep for gradient checkpointing

* Gradient checkpointing ready to test

* Removed override of _set_gradient_checkpointing

* Cleaned out old code

* Applied code fixup

* Applied code fixup

* Began debug of pvt_v2 tests

* Leave handling of num_labels to base pretrained config class

* Deactivated gradient checkpointing tests until it is fixed

* Removed PvtV2ImageProcessor which duped PvtImageProcessor

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* Set key and value layers to use separate linear modules. Fixed pruning function

* Set AvgPool to 7

* Fixed issue in init

* PvT-v2 now works in AutoModel

* Successful conversion of pretrained weights for PVT-v2

* Successful conversion of pretrained weights for PVT-v2 models

* Added pytests for pvt-v2, all passed

* Added pvt_v2 to docs/source/end/model_doc

* Ran fix-copies and fixup. All checks passed

* Added additional ReLU for linear attention mode

* pvt_v2_b2_linear converted and working

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* PvT-v2 now works in AutoModel

* Reverted batch eval changes for PR

* Expanded type support for Pvt-v2 config

* Fixed config docstring. Added channels property

* Fixed model names in tests

* Fixed config backbone compat. Added additional type support for image size in config

* Fixed config backbone compat

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* Set key and value layers to use separate linear modules. Fixed pruning function

* Set AvgPool to 7

* Fixed issue in init

* PvT-v2 now works in AutoModel

* Successful conversion of pretrained weights for PVT-v2

* Successful conversion of pretrained weights for PVT-v2 models

* Added pytests for pvt-v2, all passed

* Ran fix-copies and fixup. All checks passed

* Added additional ReLU for linear attention mode

* pvt_v2_b2_linear converted and working

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* Set key and value layers to use separate linear modules. Fixed pruning function

* Set AvgPool to 7

* Fixed issue in init

* PvT-v2 now works in AutoModel

* Successful conversion of pretrained weights for PVT-v2

* Successful conversion of pretrained weights for PVT-v2 models

* Added pytests for pvt-v2, all passed

* Ran fix-copies and fixup. All checks passed

* Added additional ReLU for linear attention mode

* pvt_v2_b2_linear converted and working

* Reverted batch eval changes for PR

* Expanded type support for Pvt-v2 config

* Fixed config docstring. Added channels property

* Fixed model names in tests

* Fixed config backbone compat

* Ran fix-copies

* Fixed PvtV2Backbone tests

* Added TFRegNet to OBJECTS_TO_IGNORE in check_docstrings.py

* Fixed backbone stuff and fixed tests: all passing

* Ran make fixup

* Made modifications for code checks

* Remove ONNX config from configuration_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Use explicit image size dict in test_modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Make image_size optional in test_modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Remove _ntuple use in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Remove reference to fp16_enabled

* Model modules now take config as first argument even when not used

* Replaced abbreviations for "SR" and "AP" with explicit "spatialreduction" and "averagepooling"

* All LayerNorm now instantiates with config.layer_norm_eps

* Added docstring for depth-wise conv layer

* PvtV2Config now only takes Union[int, Tuple[int, int]] for image size

* Refactored PVTv2 in prep for gradient checkpointing

* Gradient checkpointing ready to test

* Removed override of _set_gradient_checkpointing

* Cleaned out old code

* Applied code fixup

* Applied code fixup

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* PvT-v2 now works in AutoModel

* Ran fix-copies and fixup. All checks passed

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* PvT-v2 now works in AutoModel

* Reverted batch eval changes for PR

* Fixed config docstring. Added channels property

* Fixed config backbone compat

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* PvT-v2 now works in AutoModel

* Ran fix-copies and fixup. All checks passed

* Allowed for batching of eval metrics

* copied models/pvt to adapt to pvt_v2

* First commit of pvt_v2

* PvT-v2 now works in AutoModel

* Fixed config backbone compat

* Ran fix-copies

* Began debug of pvt_v2 tests

* Leave handling of num_labels to base pretrained config class

* Deactivated gradient checkpointing tests until it is fixed

* Removed PvtV2ImageProcessor which duped PvtImageProcessor

* Fixed issue from rebase

* Fixed issue from rebase

* Set tests for gradient checkpointing to skip those using reentrant since it isn't supported

* Fixed issue from rebase

* Fixed issue from rebase

* Changed model name in docs

* Removed duplicate PvtV2Backbone

* Work around type switching issue in tests

* Fix model name in config comments

* Update docs/source/en/model_doc/pvt_v2.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Changed name of variable from 'attn_reduce' to 'sr_type'

* Changed name of variable from 'attn_reduce' to 'sr_type'

* Changed from using 'sr_type' to 'linear_attention' for clarity

* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py

Removed old code

* Changed from using 'sr_type' to 'linear_attention' for clarity

* Fixed Class names to be more descriptive

* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py

Removed outdated code

* Moved paper abstract to single line in pvt_v2.md

* Added usage tips to pvt_v2.md

* Simplified module inits by passing layer_idx

* Fixed typing for hidden_act in PvtV2Config

* Removed unusued import

* Add pvt_v2 to docs/source/en/_toctree.yml

* Updated documentation in docs/source/en/model_doc/pvt_v2.md to be more comprehensive.

* Updated documentation in docs/source/en/model_doc/pvt_v2.md to be more comprehensive.

* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py

Move function parameters to single line
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py

Update year of copyright to 2024
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/pvt_v2/modeling_pvt_v2.py

Make code more explicit
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Updated sr_ratio to be more explicit spatial_reduction_ratio

* Removed excess type hints in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Move params to single line in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Removed needless comment in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update copyright date in pvt_v2.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Moved params to single line in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Updated copyright date in configuration_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Cleaned comments in modeling_pvt_v2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Renamed spatial_reduction Conv2D operation

* Revert "Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
"

This reverts commit c4a04416dde8f3475ab405d1feb368600e0f8538.

* Updated conversion script to reflect module name change

* Deprecated reshape_last_stage option in config

* Removed unused imports

* Code formatting

* Fixed outdated decorators on test_inference_fp16

* Added "Copied from" comments in test_modeling_pvt_v2.py

* Fixed import listing

* Updated model name

* Force empty commit for PR refresh

* Fixed linting issue

* Removed # Copied from comments

* Added PVTv2 to README_fr.md

* Ran make fix-copies

* Replace all FoamoftheSea hub references with OpenGVLab

* Fixed out_indices and out_features logic in configuration_pvt_v2.py

* Made ImageNet weight conversion verification optional in convert_pvt_v2_to_pytorch.py

* Ran code fixup

* Fixed order of parent classes in PvtV2Config to fix the to_dict method override

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

1fc505b8

[docs] Spanish translate chat_templating.md & yml addition (#29559) · d3801aae

njackman-2344 authored Mar 13, 2024

* torchscript and trainer md es translation

* corrected md es files and even corrected spelling in en md

* made es corrections to trainer.md

* deleted entrenamiento... title on yml

* placed entrenamiento in right place

* translated es chat_templating.md w/ yml addition

* requested es changes to md and yml

* last es changes to md

d3801aae

Fix minor typo: infenrece => inference (#29621) · 62478857
Dries Verachtert authored Mar 13, 2024

62478857

Warn about tool use (#29628) · 38bff8c8

Lysandre Debut authored Mar 13, 2024



* Warn against remote tool use

* Additional disclaimer

* Update docs/source/en/custom_tools.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

38bff8c8

[Flash Attention 2] Add flash attention 2 for GPT-J (#28295) · be3fd8a2

bytebarde authored Mar 13, 2024



* initial implementation of flash attention for gptj

* modify flash attention and overwrite test_flash_attn_2_generate_padding_right

* update flash attention support list

* remove the copy line in the `CodeGenBlock`

* address copy mechanism

* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Add GPTJ attention classes

* add expected outputs in the gptj test

* Ensure repo consistency with 'make fix-copies'

---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

be3fd8a2

12 Mar, 2024 3 commits
- Fix typo ; Update quantization.md (#29615) · 11163fff
  Furkan Akkurt authored Mar 12, 2024
```
Update quantization.md
```
  11163fff
- Fix minor typo: softare => software (#29602) · 73efe896
  Dries Verachtert authored Mar 12, 2024
  
  73efe896
- Fix Fuyu doc typos (#29601) · 6cc5411d
  Raushan Turganbay authored Mar 12, 2024
```
fix fuyu docs
```
  6cc5411d
11 Mar, 2024 5 commits

Tiny improvement for doc (#29581) · 73a27345

fzyzcjy authored Mar 12, 2024



* Update add_new_model.md

* Update docs/source/en/add_new_model.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

73a27345

Fixed broken link (#29558) · b45c0f55

Amrit Gupta authored Mar 11, 2024

Fixed broken link for Resources -> Token Classification -> Finetuning BERT for named-entity

b45c0f55

Make torch xla available on GPU (#29334) · 873d9bb3

Yitong Huang authored Mar 11, 2024



* add USE_TORCH_XLA env

* rename torch_tpu to torch_xla

* better is_torch_xla_available; fix some fsdp and performance issues

* fix format

* fix bug when pjrt_device is cpu

* fix bug

* fix the deprecation handling

---------
Co-authored-by: anw90 <ang868@gmail.com>
Co-authored-by: wangang.wa <wangang.wa@alibaba-inc.com>

873d9bb3

[`Docs`] fixed minor typo (#29555) · d80c9a34
j-gc authored Mar 11, 2024

d80c9a34
[`Mamba doc`] Post merge updates (#29472) · 4f27ee93
Arthur authored Mar 11, 2024
```
* post merge update

* nit

* oups
```
4f27ee93

07 Mar, 2024 1 commit
- v4.39 deprecations 🧼 (#29492) · ffe60fdc
  Joao Gante authored Mar 07, 2024
  
  ffe60fdc
06 Mar, 2024 1 commit
- Generate: get generation mode from the generation config instance 🧼 (#29441) · 700d48fb
  Joao Gante authored Mar 06, 2024
  
  700d48fb