Commits · f91c16d270e5e3ff32fdb32ccf286d05c03dfa66 · chenpangpang / transformers

"tests/test_tokenization_big_bird.py" did not exist on "7fd1febf38bd01ad413abc56ed06700a9675c143"

02 Jul, 2024 7 commits

Fix documentation for Gemma2. (#31682) · f91c16d2

Jörg Bornschein authored Jul 02, 2024



* Fix documentation for Gemma2. 

Model sizes and Blog post URL are wrong in the documentation.

* Update docs/source/en/model_doc/gemma2.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

f91c16d2

Make tool JSON schemas consistent (#31756) · cd0935dd
Matt authored Jul 02, 2024
```
Make the order of array items consistent using sorted()
```
cd0935dd
🚨🚨 TextGenerationPipeline: rely on the tokenizer default kwargs (#31747) · 82486e59
Joao Gante authored Jul 02, 2024
```
* rely on the tokenizer default kwargs

* fix a few tests
```
82486e59

[whisper] static kv cache (#31166) · a9701953

Sanchit Gandhi authored Jul 02, 2024



* make work with cache abstraction

* correct for static cache

* hacks for compile

* make fast

* fix

* fix pos ids

* generate

* fix sdpa

* fix sdpa cache pos

* fix fa2

* clean fa2

* integrate cache into generate

* make style

* copies

* more copies

* update eager

* update sdpa

* update fa2

* simplify

* use cache pos

* always compute cross-cache for debug

* avoid recompiles
Co-authored-by: Arthur Zucker <arthur@huggingface.co>

* fix fix

* fix fix fix

* more fix

* try encoder-decoder cache (too messy)

* revert encoder-decoder cache

* check cross-attn cache

* use enc-dec dataclass

* use richer enc-dec dataclass

* clean-up

* revert static cache changes

* small fixes

* revert to cpu flag

* fix copies

* add static slow test

* past k/v docstring

* more docstrings

* cache_position docstrings

* add to docs

* add enc-dec cache to docs

* make style

* fix after rebase

* fix beam

* style

* fix generation strategies

* fix most decoder-only tests

* style

* skip test

* more clean up

* small docstrings

* Apply suggestions from code review
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* add todo

* only crop self-attn

* check cache in mixin

* style

* fix re-compile after rebase

* move `is_updated` logic to enc-dec wrapper

* revert back

* revert cache back

* finalise design

* fix

* fix fix

* style

* Update src/transformers/cache_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* deprecate

* updates

* final updates

* style

* style

---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

a9701953

Fix mistral ONNX export (#31696) · 57d7594a
fxmarty authored Jul 02, 2024
```
* use bitwise or

* why is the CI not triggered?
```
57d7594a

Move some test files (`tets/test_xxx_utils.py`) to `tests/utils` (#31730) · 93cd94b7

Yih-Dar authored Jul 02, 2024



* move

* move

* move

* move

* Update tests/utils/test_image_processing_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

93cd94b7

remove incorrect urls pointing to the llava repository (#31107) · cf85e86e

Krisztián Boros authored Jul 02, 2024

* remove incorrect urls pointing to the llava repository

* remove incorrect urls pointing to the llava repository; removing entire comments

* remove incorrect urls pointing to the llava repository; removing entire comments; ran fix-copies

* ran fixup

cf85e86e

01 Jul, 2024 1 commit
- dependencies: `keras-nlp<0.14` pin (#31684) · 3345ae73
  Joao Gante authored Jul 01, 2024
```
* keras nlp pin

* this should use the new docker images:dev

* dev-ci
```
  3345ae73
28 Jun, 2024 6 commits

Add French version of run scripts tutorial (#31483) · e6550295

Jade Choghari authored Jun 28, 2024



* Add French translation of run scripts tutorial

* Update docs/source/fr/run_scripts_fr.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/fr/run_scripts_fr.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/fr/run_scripts_fr.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/fr/run_scripts_fr.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/fr/run_scripts_fr.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------
Co-authored-by: Jade Choghari <chogharijade@icloud.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

e6550295

Gemma capping is a must for big models (#31698) · bbf1e618
Arthur authored Jun 28, 2024
```
* softcapping

* soft cap before the mask

* style

* ...

* super nit
```
bbf1e618

add gather_use_object arguments (#31514) · cb298978

Sangbum Daniel Choi authored Jun 28, 2024



* add gather_use_object arguments

* fix name and pass the CI test for Seq2SeqTrainer

* make style

* make it to functools

* fix typo

* add accelerate version:

* adding warning

* Update src/transformers/trainer.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* make style

* Update src/transformers/training_args.py

* check function move to initial part

* add test for eval_use_gather_object

---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

cb298978

Fix return_dict in encodec (#31646) · 82a1fc72

Jacky Lee authored Jun 28, 2024

* fix: use return_dict parameter

* fix: type checks

* fix: unused imports

* update: one-line if else

* remove: recursive check

82a1fc72

Fix Gemma2 4d attention mask (#31674) · 5e89b335

hoshi-hiyouga authored Jun 28, 2024



Update modeling_gemma2.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

5e89b335

don't zero out the attention_mask when using sliding window with flash attention (#31670) · 0142aab7
Wing Lian authored Jun 27, 2024
```
* don't zero out the attention_mask when using sliding window with flash attention

* chore: lint
```
0142aab7

27 Jun, 2024 12 commits

[HybridCache] Fix `get_seq_length` method (#31661) · 1c68f2ca
Sanchit Gandhi authored Jun 27, 2024
```
* fix gemma2

* handle in generate
```
1c68f2ca
[docs] Llama3 (#31662) · 464aa746
Steven Liu authored Jun 27, 2024
```
quick usage to top
```
464aa746
Fix float out of range in owlvit and owlv2 when using FP16 or lower precision (#31657) · e44b878c
Billy Cao authored Jun 28, 2024

e44b878c
Fix post gemma merge (#31660) · 75a63198
Arthur authored Jun 27, 2024
```
* nit

* toctree issue

* protect gemma2 tests as well

* sdpa supported
```
75a63198
v4.43.0.dev0 · 727eea4a
Lysandre authored Jun 27, 2024

727eea4a

Add gemma 2 (#31659) · 0cf60f13

Arthur authored Jun 27, 2024



* inital commit

* Add doc

* protect?

* fixup stuffs

* update tests

* fix build documentation

* mmmmmmm config attributes

* style

* nit

* uodate

* nit

* Fix docs

* protect some stuff

---------
Co-authored-by: Lysandre <lysandre@huggingface.co>

0cf60f13

Remove deprecated config attribute in VLMs (#31655) · 4aa17d00
Raushan Turganbay authored Jun 27, 2024
```
remove
```
4aa17d00
change anchor_image_size None for compatibility (#31640) · be50a033
Sangbum Daniel Choi authored Jun 27, 2024
```
* change anchor_image_size None for compatibility

* make fix-copies
```
be50a033
[QoL] Allow dtype str for torch_dtype arg of from_pretrained (#31590) · 3a028101
Billy Cao authored Jun 27, 2024
```
* Allow dtype str for torch_dtype in from_pretrained

* Update docstring

* Add tests for str torch_dtype
```
3a028101

[`Llama`] Conversion: fix and simplify the script! (#31591) · 11138ca0

Arthur authored Jun 27, 2024



* fix and simplify the script!

* add co-author

---------
Co-authored-by: crackalamoo <crackalamoo@users.noreply.github.com>

11138ca0

Fix ONNX exports for Optimum compatible models (#31311) · c9f191a0

Merve Noyan authored Jun 27, 2024



* fixed models

* format with bumped ruff version on my local

* fix copies

* add tracing checks

* format

* Update src/transformers/utils/generic.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* format

* style fix

* Update modeling_mobilevit.py

* add docstring and change name

* Update __init__.py

* Update __init__.py

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

c9f191a0

Generation: past kv can be None (#31051) · dc76e9fa
Raushan Turganbay authored Jun 27, 2024
```
* fix

* better
```
dc76e9fa

26 Jun, 2024 12 commits

Skip tests properly (#31308) · 1de7dc74

amyeroberts authored Jun 26, 2024

* Skip tests properly

* [test_all]

* Add 'reason' as kwarg for skipTest

* [test_all] Fix up

* [test_all]

1de7dc74

Fix dtype casting in swinv2 and swinv2sr to allow non-FP32 inference (#31589) · 1f9f57ab

Billy Cao authored Jun 27, 2024



* Fix dtype casting in modeling_swin2sr to allow non-FP32 inference

* Fix formattting

* Fix for swinv2 too

* Update src/transformers/models/swin2sr/modeling_swin2sr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/swinv2/modeling_swinv2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Add FP16 tests for swin2sr and swinv2

* [run_slow] swin2sr, swinv2

* [run_slow] swin2sr, swinv2

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

1f9f57ab

Generate: fix assisted generation with `past_key_values` passed as kwargs (#31644) · a3fb96a4
Joao Gante authored Jun 26, 2024

a3fb96a4
Fix paligemma detection inference (#31587) · 492ee17e
Pablo Montalvo authored Jun 26, 2024
```
* fix extended attention mask

* add slow test for detection instance

* [run-slow]paligemma
```
492ee17e

Add LLaVa NeXT Video (#31252) · e71f2863

Raushan Turganbay authored Jun 26, 2024



* squash into single commit

* run diff once more

* docstring

* tests

* minor chnages and ready to go

* Update src/transformers/models/llava_next_video/processing_llava_next_video.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/vipllava/test_modeling_vipllava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* [run-slow] llava-next-video

* [run-slow] llava-next-video

* [run-slow] llava_next_video

* fix two tests

* fix slow tests

* remove logit checks due to numeric errors

* run test once more

* [run-slow] llava_next_video

* final try to pass the test

* [run-slow] llava_next_video

* [run-slow] llava_next_video

* [run-slow] llava_next_video

* style

* fix

* style

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

e71f2863

Fix RT-DETR inference with float16 and bfloat16 (#31639) · b1ec7454

Pavel Iakubovskii authored Jun 26, 2024



* [run_slow] rt_detr

* Fix positional embeddings and anchors dtypes

* [run slow] rt_detr

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fixup

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

b1ec7454

Llama et al. / FSDP : Fix breaking change in 4.40 for FSDP (#31161) · 3f93fd06

Younes Belkada authored Jun 26, 2024



* fix llama fsdp

* fixup

* adding FSDP tests for CPU offloading

* fixes

* fix tests

* fix tests

* add it for mixtral

* propagate the changes on other models

* Update src/transformers/models/phi/modeling_phi.py

* Delete utils/testing_scripts/fsdp_cpu_offloading.py

Remove script - FSDP + CPU offloading it tested in the test suite

* Delete utils/testing_scripts/dummy_fsdp_config.yml

* Update + add cache_positions docstring

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

3f93fd06

Update RT-DETR code snippet (#31631) · ac52084b
Pavel Iakubovskii authored Jun 26, 2024
```
Update code snippet
```
ac52084b
Fix llama gguf converter (#31575) · 915cce39
Marc Sun authored Jun 26, 2024

915cce39

[`GPT-NeoX`] Add SDPA support (#31031) · b07770c5

Anton Vlasjuk authored Jun 26, 2024

* starting support for sdpa in `gptneox` models

* small comment on tests

* fix dropout

* documentation and style

* clarify concrete paths for reference

* generalise attn projections and rope application

added head mask check to sdpa mask creation

handle sdpa memory backend bug via own version flag

* update docs and style

* move dtype casting outside of general attn_projection_and_rope function

fix flash_attn_2 stuff

* more generic attn warning if output_attns or head_mask

* simplify head mask check by moving head mask creation to a later point

* remove copied llama artifact

* remove padding_mask from attention function signature

* removing unnecessary comments, only "save" attn implementation once

* [run_slow] gpt_neox

b07770c5

Removed unnecessary `self.projection` call in `VivitTubeletEmbeddings` (#31632) · 1218e439
Vladimir Iashin authored Jun 26, 2024
```
removes unnecessary second projection call
```
1218e439
docs: move translations to `i18n` (#31584) · 2daf2c3e
Saurav Maheshkar authored Jun 26, 2024
```
docs: move translations to i18n
```
2daf2c3e

25 Jun, 2024 2 commits
- Add ViTImageProcessorFast to tests (#31424) · 0f67ba1d
  amyeroberts authored Jun 25, 2024
```
* Add ViTImageProcessor to tests

* Correct data format

* Review comments
```
  0f67ba1d
- Improve error message for mismatched copies in code blocks (#31535) · aab08297
  Pablo Montalvo authored Jun 25, 2024
```
improve error message for mismatched code blocks
```
  aab08297