Commits · 4c8149d643576c23d4df559d4931ccf08fa7aee4 · chenpangpang / transformers

09 Jul, 2024 4 commits
- Fix `_init_weights` for `ResNetPreTrainedModel` (#31851) · 4c8149d6
  Yih-Dar authored Jul 09, 2024
```
* init

* test

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  4c8149d6
- Generate: Add new decoding strategy "DoLa" in `.generate()` (#29619) · d094d8d9
  Yung-Sung Chuang authored Jul 09, 2024
```
Co-authored-by: Joao Gante <joao@huggingface.co>
```
  d094d8d9
- Test loading generation config with safetensor weights (#31550) · 4c2538b8
  Joao Gante authored Jul 09, 2024
```
fix test
```
  4c2538b8
- FX symbolic_trace: do not test decoder_inputs_embeds (#31840) · 0abf5e8e
  fxmarty authored Jul 09, 2024
```
only test input_embeds, not decoder_input_embeds
```
  0abf5e8e
08 Jul, 2024 4 commits

Avoid failure `TFBlipModelTest::test_pipeline_image_to_text` (#31827) · 4879ac2b
Yih-Dar authored Jul 08, 2024
```
* fix

* fix

* fix

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
4879ac2b

transformers.fx.symbolic_trace supports inputs_embeds (#31574) · ba743700

fxmarty authored Jul 08, 2024



* symbolic trace supports inputs_embeds

* fix test?

* Update tests/test_modeling_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

ba743700

Add FA2 and `sdpa` support for SigLIP (#31499) · a177821b

Pavel Iakubovskii authored Jul 08, 2024

* Rebase to main

* Fix attention implementation autoset for tex and vision configs

* Fixup

* Minor fixes

* Fix copies

* Fix attention_mask for FA2

* Add eqvivalence tests for siglip

* Remove right padding test

* Uncomment flaky

* Fix import

* Add to docs

* Fix test message

* Add sdpa

* Add sdpa equivalence test

* Add siglip sdpa to docs

* Fix typing for attention output

* Add sdpa tests

* Fix signature of FA2

* Autoset attn_implementation in config

* Rename bsz -> batch_size

* Move back autoset attn method

* Mark as flaky

* Correct attention mask padding

* [run-slow] siglip

* Add FA2 and sdpa docs

* Style fix

* Remove flaky for FA2 test

* Change attention implementation set

* Change attn_implementaiton propogation

* Fix typos

* Add modality to assert message

* Add more sdpa backends in test

* [run slow] siglip

* Add math sdpa backend for all options

* [run slow] siglip

a177821b

Add ZoeDepth (#30136) · 06fd7972

NielsRogge authored Jul 08, 2024



* First draft

* Add docs

* Clean up code

* Convert model

* Add image processor

* Convert Zoe_K

* More improvements

* Improve variable names and docstrings

* Improve variable names

* Improve variable names

* Replace nn.sequential

* More improvements

* Convert ZoeD_NK

* Fix most tests

* Verify pixel values

* Verify pixel values

* Add squeeze

* Update beit to support arbitrary window sizes

* Improve image processor

* Improve docstring

* Improve beit

* Improve model outputs

* Add figure

* Fix beit

* Update checkpoint

* Fix repo id

* Add _keys_to_ignore_on_load_unexpected

* More improvements

* Address comments

* Address comments

* Address comments

* Address comments

* Rename variable name

* Add backbone_hidden_size

* Vectorize

* Vectorize more

* Address comments

* Clarify docstring

* Remove backbone_hidden_size

* Fix image processor

* Remove print statements

* Remove print statement

* Add integration test

* Address comments

* Address comments

* Address comments

* Address comments

* Add requires_backends

* Clean up

* Simplify conversion script

* Simplify more

* Simplify more

* Simplify more

* Clean up

* Make sure beit is loaded correctly

* Address comment

* Address bin_configurations

* Use bin_configurations

* Convert models, add integration tests

* Fix doc test

* Address comments

* Unify regressor classes

* Clarify arguments

* Improve resize_image

* Add num_relative_features

* Address comment

* [run-slow]beit,data2vec,zoedepth

* [run-slow]beit,data2vec,zoedepth

* Address comments

* Address comment

* Address comment

* Replace nn.TransformerEncoderLayer and nn.TransformerEncoder

* Replace nn.MultiheadAttention

* Add attributes for patch transformer to config

* Add tests for ensure_multiple_of

* Update organization

* Add tests

* [run-slow] beit data2vec

* Update ruff

* [run-slow] beit data2vec

* Add comment

* Improve docstrings, add test

* Fix interpolate_pos_encoding

* Fix slow tests

* Add docstring

* Update src/transformers/models/zoedepth/image_processing_zoedepth.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/zoedepth/image_processing_zoedepth.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Improve tests and docstrings

* Use run_common_tests

* Improve docstrings

* Improve docstrings

* Improve tests

* Improve tests

* Remove print statements

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

06fd7972

05 Jul, 2024 6 commits

Fix galore lr display with schedulers (#31710) · a01b033c

Anton Vlasjuk authored Jul 05, 2024

* fix galore lr display with lr schedulers

* style

* add some tests to check for displayed lrs

* copy-paste err for warmup steps

* standardize the default lr to be only in the optimizer

* trying out my luck with the reads

a01b033c

Allow FP16 or other precision inference for Pipelines (#31342) · ac262604

Billy Cao authored Jul 06, 2024



* cast image features to model.dtype where needed to support FP16 or other precision in pipelines

* Update src/transformers/pipelines/image_feature_extraction.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Use .to instead

* Add FP16 pipeline support for zeroshot audio classification

* Remove unused torch imports

* Add docs on FP16 pipeline

* Remove unused import

* Add FP16 tests to pipeline mixin

* Add fp16 placeholder for mask_generation pipeline test

* Add FP16 tests for all pipelines

* Fix formatting

* Remove torch_dtype arg from is_pipeline_test_to_skip*

* Fix format

* trigger ci

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

ac262604

Add training support for SigLIP (#31495) · 1d3eaa6f

Billy Cao authored Jul 05, 2024

* Add siglip loss function

* Update docs

* Enable training tests
[experimental] enable GC training tests as it has worked for my own data

* Remove test_training* overrides to enable training tests
[run_slow] siglip

* Skip training tests for Siglip text model and ImageClassificationModel
[run_slow] siglip

* Skip GC training tests for SiglipForImageClassification

* Explicitly skip training tests for SiglipVisionModel
Add skip reason for training tests for SiglipTextModel

* Remove copied from to fix CI

1d3eaa6f

Code agent: allow function persistence between steps (#31769) · 15560252
Aymeric Roucher authored Jul 05, 2024
```
* Code agent: allow function persistence between steps
```
15560252

Fix gemma tests (#31794) · eef0507f

Yih-Dar authored Jul 05, 2024



* skip 3 7b tests

* fix

* fix

* fix

* [run-slow] gemma

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

eef0507f

Fix serialization for offloaded model (#31727) · 8c5c180d
Marc Sun authored Jul 05, 2024
```
* Fix serialization

* style

* add test
```
8c5c180d

03 Jul, 2024 4 commits

Fix RT-DETR weights initialization (#31724) · 048f599f

Pavel Iakubovskii authored Jul 03, 2024

* Fix init for rt-detr heads

* Fixup

* Add separate prior_prob value to config for initialization

* Add bbox init

* Change to 1 / num_labels init

* Adjust weights init test

* Fix style for test

048f599f

Fix RT-DETR cache for generate_anchors (#31671) · b9752161

Pavel Iakubovskii authored Jul 03, 2024

* Fix cache and type conversion

* Add test

* Fixup

* nit

* [run slow] rt_detr

* Fix test

* Fixup

* [run slow] rt_detr

* Update src/transformers/models/rt_detr/modeling_rt_detr.py

b9752161

Gemma 2: Update slow tests (#31759) · ddfaf119
Joao Gante authored Jul 03, 2024
```
gemma 2 slow tests
```
ddfaf119

fix assisted decoding (#31401) · 7f91f168

jiqing-feng authored Jul 03, 2024

* fix assisted decoding

* check None

* fix typo

* fix _prepare_special_tokens

* fix style

* fix lint

* add tests for assisted decoding

* fix style

* fix tests check

7f91f168

02 Jul, 2024 4 commits

Make tool JSON schemas consistent (#31756) · cd0935dd
Matt authored Jul 02, 2024
```
Make the order of array items consistent using sorted()
```
cd0935dd
🚨🚨 TextGenerationPipeline: rely on the tokenizer default kwargs (#31747) · 82486e59
Joao Gante authored Jul 02, 2024
```
* rely on the tokenizer default kwargs

* fix a few tests
```
82486e59

[whisper] static kv cache (#31166) · a9701953

Sanchit Gandhi authored Jul 02, 2024



* make work with cache abstraction

* correct for static cache

* hacks for compile

* make fast

* fix

* fix pos ids

* generate

* fix sdpa

* fix sdpa cache pos

* fix fa2

* clean fa2

* integrate cache into generate

* make style

* copies

* more copies

* update eager

* update sdpa

* update fa2

* simplify

* use cache pos

* always compute cross-cache for debug

* avoid recompiles
Co-authored-by: Arthur Zucker <arthur@huggingface.co>

* fix fix

* fix fix fix

* more fix

* try encoder-decoder cache (too messy)

* revert encoder-decoder cache

* check cross-attn cache

* use enc-dec dataclass

* use richer enc-dec dataclass

* clean-up

* revert static cache changes

* small fixes

* revert to cpu flag

* fix copies

* add static slow test

* past k/v docstring

* more docstrings

* cache_position docstrings

* add to docs

* add enc-dec cache to docs

* make style

* fix after rebase

* fix beam

* style

* fix generation strategies

* fix most decoder-only tests

* style

* skip test

* more clean up

* small docstrings

* Apply suggestions from code review
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* add todo

* only crop self-attn

* check cache in mixin

* style

* fix re-compile after rebase

* move `is_updated` logic to enc-dec wrapper

* revert back

* revert cache back

* finalise design

* fix

* fix fix

* style

* Update src/transformers/cache_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* deprecate

* updates

* final updates

* style

* style

---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

a9701953

Move some test files (`tets/test_xxx_utils.py`) to `tests/utils` (#31730) · 93cd94b7

Yih-Dar authored Jul 02, 2024



* move

* move

* move

* move

* Update tests/utils/test_image_processing_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

93cd94b7

28 Jun, 2024 2 commits

add gather_use_object arguments (#31514) · cb298978

Sangbum Daniel Choi authored Jun 28, 2024



* add gather_use_object arguments

* fix name and pass the CI test for Seq2SeqTrainer

* make style

* make it to functools

* fix typo

* add accelerate version:

* adding warning

* Update src/transformers/trainer.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* make style

* Update src/transformers/training_args.py

* check function move to initial part

* add test for eval_use_gather_object

---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

cb298978

Fix return_dict in encodec (#31646) · 82a1fc72

Jacky Lee authored Jun 28, 2024

* fix: use return_dict parameter

* fix: type checks

* fix: unused imports

* update: one-line if else

* remove: recursive check

82a1fc72

27 Jun, 2024 4 commits

Fix post gemma merge (#31660) · 75a63198
Arthur authored Jun 27, 2024
```
* nit

* toctree issue

* protect gemma2 tests as well

* sdpa supported
```
75a63198

Add gemma 2 (#31659) · 0cf60f13

Arthur authored Jun 27, 2024



* inital commit

* Add doc

* protect?

* fixup stuffs

* update tests

* fix build documentation

* mmmmmmm config attributes

* style

* nit

* uodate

* nit

* Fix docs

* protect some stuff

---------
Co-authored-by: Lysandre <lysandre@huggingface.co>

0cf60f13

change anchor_image_size None for compatibility (#31640) · be50a033
Sangbum Daniel Choi authored Jun 27, 2024
```
* change anchor_image_size None for compatibility

* make fix-copies
```
be50a033
[QoL] Allow dtype str for torch_dtype arg of from_pretrained (#31590) · 3a028101
Billy Cao authored Jun 27, 2024
```
* Allow dtype str for torch_dtype in from_pretrained

* Update docstring

* Add tests for str torch_dtype
```
3a028101

26 Jun, 2024 7 commits

Skip tests properly (#31308) · 1de7dc74

amyeroberts authored Jun 26, 2024

* Skip tests properly

* [test_all]

* Add 'reason' as kwarg for skipTest

* [test_all] Fix up

* [test_all]

1de7dc74

Fix dtype casting in swinv2 and swinv2sr to allow non-FP32 inference (#31589) · 1f9f57ab

Billy Cao authored Jun 27, 2024



* Fix dtype casting in modeling_swin2sr to allow non-FP32 inference

* Fix formattting

* Fix for swinv2 too

* Update src/transformers/models/swin2sr/modeling_swin2sr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/swinv2/modeling_swinv2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Add FP16 tests for swin2sr and swinv2

* [run_slow] swin2sr, swinv2

* [run_slow] swin2sr, swinv2

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

1f9f57ab

Fix paligemma detection inference (#31587) · 492ee17e
Pablo Montalvo authored Jun 26, 2024
```
* fix extended attention mask

* add slow test for detection instance

* [run-slow]paligemma
```
492ee17e

Add LLaVa NeXT Video (#31252) · e71f2863

Raushan Turganbay authored Jun 26, 2024



* squash into single commit

* run diff once more

* docstring

* tests

* minor chnages and ready to go

* Update src/transformers/models/llava_next_video/processing_llava_next_video.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/vipllava/test_modeling_vipllava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* [run-slow] llava-next-video

* [run-slow] llava-next-video

* [run-slow] llava_next_video

* fix two tests

* fix slow tests

* remove logit checks due to numeric errors

* run test once more

* [run-slow] llava_next_video

* final try to pass the test

* [run-slow] llava_next_video

* [run-slow] llava_next_video

* [run-slow] llava_next_video

* style

* fix

* style

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

e71f2863

Fix RT-DETR inference with float16 and bfloat16 (#31639) · b1ec7454

Pavel Iakubovskii authored Jun 26, 2024



* [run_slow] rt_detr

* Fix positional embeddings and anchors dtypes

* [run slow] rt_detr

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fixup

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

b1ec7454

Llama et al. / FSDP : Fix breaking change in 4.40 for FSDP (#31161) · 3f93fd06

Younes Belkada authored Jun 26, 2024



* fix llama fsdp

* fixup

* adding FSDP tests for CPU offloading

* fixes

* fix tests

* fix tests

* add it for mixtral

* propagate the changes on other models

* Update src/transformers/models/phi/modeling_phi.py

* Delete utils/testing_scripts/fsdp_cpu_offloading.py

Remove script - FSDP + CPU offloading it tested in the test suite

* Delete utils/testing_scripts/dummy_fsdp_config.yml

* Update + add cache_positions docstring

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

3f93fd06

[`GPT-NeoX`] Add SDPA support (#31031) · b07770c5

Anton Vlasjuk authored Jun 26, 2024

* starting support for sdpa in `gptneox` models

* small comment on tests

* fix dropout

* documentation and style

* clarify concrete paths for reference

* generalise attn projections and rope application

added head mask check to sdpa mask creation

handle sdpa memory backend bug via own version flag

* update docs and style

* move dtype casting outside of general attn_projection_and_rope function

fix flash_attn_2 stuff

* more generic attn warning if output_attns or head_mask

* simplify head mask check by moving head mask creation to a later point

* remove copied llama artifact

* remove padding_mask from attention function signature

* removing unnecessary comments, only "save" attn implementation once

* [run_slow] gpt_neox

b07770c5

25 Jun, 2024 4 commits

Add ViTImageProcessorFast to tests (#31424) · 0f67ba1d
amyeroberts authored Jun 25, 2024
```
* Add ViTImageProcessor to tests

* Correct data format

* Review comments
```
0f67ba1d

Add video modality for InstrucBLIP (#30182) · fc689d75

Raushan Turganbay authored Jun 25, 2024

* squash in single commit

* add docs

* dummy obj

* more changes in diff converter

* tiny fix

* make docs happy

* skip test

* repo consistency tests

* update docstring

* style

* fix tests

* change diff imports

* [run-slow] instructblipvideo

* [run-slow] instructblipvideo

* fix tests and remove logit check

* [run-slow] instructblipvideo

fc689d75

fix output data type of image classification (#31444) · a958c4a8

jiqing-feng authored Jun 25, 2024



* fix output data type of image classification

* add tests for low-precision pipeline

* add bf16 pipeline tests

* fix bf16 tests

* Update tests/pipelines/test_pipelines_image_classification.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix import

* fix import torch

* fix style

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

a958c4a8

Siglip: add `_no_split_module` (#31566) · 7e86cb6c
Raushan Turganbay authored Jun 25, 2024
```
* device-map siglip

* move split modules to PretrainedSigLip
```
7e86cb6c

24 Jun, 2024 1 commit

Fix bug about add_special_tokens and so on (#31496) · 0e23e60a

Hiroshi Matsuda authored Jun 24, 2024

* fix bug about add_special_tokens and so on

* improve add_special_tokens and padding behavior

* add a test case for add_special_tokens and padding

0e23e60a