Commits · 574e68d554b1b52503e49708faa3cb88e86447fb · chenpangpang / transformers

11 Jul, 2024 14 commits

Allow `Trainer.get_optimizer_cls_and_kwargs` to be overridden (#31875) · 574e68d5

Apoorv Khandelwal authored Jul 11, 2024

* Change `Trainer.get_optimizer_cls_and_kwargs` to `self.`

* Make `get_optimizer_cls_and_kwargs` an instance method

* Fixing typo

* Revert `get_optimizer_cls_and_kwargs` to staticmethod

* restore newline to trainer.py eof

574e68d5

🚨 fix(SigLip): remove spurious exclusion of first vision output token (#30952) · 52585019
t11s authored Jul 11, 2024
```
fix(SigLip): remove spurious exclusion of first vision output token in classifier
```
52585019
Generate: fix `SlidingWindowCache.reset()` (#31917) · 6a05f68f
Joao Gante authored Jul 11, 2024
```
fix sliding cache
```
6a05f68f

Refactor flash attention implementation in transformers (#31446) · e3143952

Arthur authored Jul 11, 2024



* dumb commit

* nit

* update

* something like this

* unpack in modeling utils

* safe import

* oups

* update

* nits

* diff convert gemma

* update

* start propagating

* udpate other modeling code as well

* update for sliding window models

* nits

* more init cleanups

* styling

* fixup

* noice

* pass fixup

* typo typing_extension -> typing_extensions

* torch.nn.functionnal -> torch.nn.functional

* add to import structure

* unpack

* simplify a bit more for this first version

* nut

* update

* update

* nit

* ease the import of `Unpack`

* remove useless `use_sliding_window`

* no qua please

* protect import?

* style

* [run-slow]

* [run slow] llama,gemma,mistral,mixtral

* remove extra kwargs

* fix llama

* address review comments

* apply diff_model_converter to modeling_gemma.py

* remove cache_position 1

* remove cache_position 2

* some cleaning

* refactor gemma2 as well

* apply review comments

* rename file to modeling_flash_attention_utils.py

* siglip refactor

* remove dead code

* is the hub down?

* still down?

* fix siglip

* fix gemma2

* fatal: Could not read from remote repository.

* fix typo in softcap implem

* flacky

* Failed: Timeout >120.0s

---------
Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>

e3143952

Fix fx tests with inputs_embeds (#31862) · ad4ef3a2
fxmarty authored Jul 11, 2024
```
* fix tests

* [test_all] check

* address review comments
```
ad4ef3a2

Add warning message for beta and gamma parameters (#31654) · 1499a550

Omar Salman authored Jul 11, 2024

* Add warning message for  and  parameters

* Fix when the warning is raised

* Formatting changes

* Improve testing and remove duplicated warning from _fix_key

1499a550

add gather_use_object arguments II (#31799) · 23d6d0cc

Sangbum Daniel Choi authored Jul 11, 2024



* add gather_use_object arguments

* fix name and pass the CI test for Seq2SeqTrainer

* make style

* make it to functools

* fix typo

* add accelerate version:

* adding warning

* Update src/transformers/trainer.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* make style

* Update src/transformers/training_args.py

* check function move to initial part

* add test for eval_use_gather_object

* fix minor

---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

23d6d0cc

fix: Fixed the `1st argument` name in classmethods (#31907) · 2e48b3e8
Sai-Suraj-27 authored Jul 11, 2024
```
Fixed the first argument name in few classmethods.
```
2e48b3e8

Fix missing methods for Fuyu (#31880) · 48c20700

Isotr0py authored Jul 11, 2024

* add missing methods for FuyuForCausalLM

* fix a typo

* format code

* add missing tie_weights

* format code

48c20700

[`Gemma2`] Support FA2 softcapping (#31887) · f4ec7a28
Arthur authored Jul 11, 2024
```
* Support softcapping

* strictly greater than

* update
```
f4ec7a28
[`ConvertSlow`] make sure the order is preserved for addedtokens (#31902) · f67e0f7f
Arthur authored Jul 11, 2024
```
* preserve the order

* oups

* oups

* nit

* trick

* fix issues
```
f67e0f7f

Processor accepts any kwargs (#31889) · 14d3b3f0

Raushan Turganbay authored Jul 11, 2024

* accept kwargs in processors

* return unused kwargs

* fix tests

* typo

* update the other way

14d3b3f0

Fixes to alternating SWA layers in Gemma2 (#31775) · a695c186

turboderp authored Jul 11, 2024

* HybridCache: Flip order of alternating global-attn/sliding-attn layers

* HybridCache: Read sliding_window argument from cache_kwargs

* Gemma2Model: Flip order of alternating global-attn/sliding-attn layers

* Code formatting

a695c186

InstructBlipVideo: Update docstring (#31886) · d625294d
Raushan Turganbay authored Jul 11, 2024
```
* update docs

* one more change
```
d625294d

10 Jul, 2024 9 commits

Add a condition for nested_detach (#31855) · c54af4c7
haikuoxin authored Jul 11, 2024
```
fix bug: https://github.com/huggingface/transformers/issues/31852
```
c54af4c7

Modify `warnings` in a `with` block to avoid flaky tests (#31893) · 080e14b2

Yih-Dar authored Jul 10, 2024



* fix

* [test_all] check before merge

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

080e14b2

[RT-DETR] Add resources (#31815) · ec03d97b
NielsRogge authored Jul 10, 2024
```
* Add resources

* Address comments
```
ec03d97b
Push sharded checkpoint to hub when `push_to_hub=True` in `TrainingArguments` (#31808) · 8df28bb3
Marc Sun authored Jul 10, 2024
```
Save sharded checkpoint in Trainer
```
8df28bb3
fix: Removed `duplicate` field definitions in some classes (#31888) · da79b180
Sai-Suraj-27 authored Jul 10, 2024
```
Removed duplicate field definitions in classes.
```
da79b180

Fix failed tests in #31851 (#31879) · 9d98706b

Yih-Dar authored Jul 10, 2024

* Revert "Revert "Fix `_init_weights` for `ResNetPreTrainedModel`" (#31868)"

This reverts commit b45dd5de

.

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

9d98706b

Fix file type checks in data splits for contrastive training example script (#31720) · a0a3e2f4
Noah Young authored Jul 10, 2024
```
fix data split file type checks
```
a0a3e2f4
remove duplicate words in msg (#31876) · e9eeedaf
yukionfire authored Jul 10, 2024

e9eeedaf

Add conversion for interleave llava (#31858) · 97aa3e29

Raushan Turganbay authored Jul 10, 2024



* add conversion for interleave llava

* remove debug lines

* remove unused imports

* Update src/transformers/models/llava/convert_llava_weights_to_hf.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* small changes + docs

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

97aa3e29

09 Jul, 2024 15 commits

add warning when using gradient_checkpointing with FSDP full shard (#31578) · ad35309a

Yun Dai authored Jul 09, 2024



* add warning when using  with FSDP full shard

* fix style

* Update src/transformers/training_args.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/training_args.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add hybrid shard warn

* fix style

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

ad35309a

Bump certifi from 2023.7.22 to 2024.7.4 in /examples/research_projects/visual_bert (#31872) · 6176d8f5

dependabot[bot] authored Jul 09, 2024

Bump certifi in /examples/research_projects/visual_bert

Bumps [certifi](https://github.com/certifi/python-certifi) from 2023.7.22 to 2024.7.4.
- [Commits](https://github.com/certifi/python-certifi/compare/2023.07.22...2024.07.04

)

---
updated-dependencies:
- dependency-name: certifi
  dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

6176d8f5

Revert "Fix `_init_weights` for `ResNetPreTrainedModel`" (#31868) · b45dd5de
Yih-Dar authored Jul 09, 2024
```
Revert "Fix `_init_weights` for `ResNetPreTrainedModel` (#31851)"

This reverts commit 4c8149d6.
```
b45dd5de
Add return type annotation to PreTrainedModel.from_pretrained (#31869) · c5bc2d5f
Mauricio Villegas authored Jul 09, 2024
```
Update modeling_utils.py

Add return type annotation to PreTrainedModel.from_pretrained
```
c5bc2d5f

Bump zipp from 3.7.0 to 3.19.1 in /examples/research_projects/decision_transformer (#31871) · 6e59b308

dependabot[bot] authored Jul 09, 2024

Bump zipp in /examples/research_projects/decision_transformer

Bumps [zipp](https://github.com/jaraco/zipp) from 3.7.0 to 3.19.1.
- [Release notes](https://github.com/jaraco/zipp/releases)
- [Changelog](https://github.com/jaraco/zipp/blob/main/NEWS.rst)
- [Commits](https://github.com/jaraco/zipp/compare/v3.7.0...v3.19.1

)

---
updated-dependencies:
- dependency-name: zipp
  dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

6e59b308

Update depth estimation task guide (#31860) · e3a7d9bd

Merve Noyan authored Jul 09, 2024



---------
Co-authored-by: Merve Noyan <mervenoyan@Merve-MacBook-Pro.local>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

e3a7d9bd

Fix `_init_weights` for `ResNetPreTrainedModel` (#31851) · 4c8149d6
Yih-Dar authored Jul 09, 2024
```
* init

* test

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
4c8149d6
Generate: Add new decoding strategy "DoLa" in `.generate()` (#29619) · d094d8d9
Yung-Sung Chuang authored Jul 09, 2024
```
Co-authored-by: Joao Gante <joao@huggingface.co>
```
d094d8d9
docs: typo in tf qa example (#31864) · 99c0e553
chenk authored Jul 09, 2024
```
Signed-off-by: chenk <hen.keinan@gmail.com>
```
99c0e553
Test loading generation config with safetensor weights (#31550) · 4c2538b8
Joao Gante authored Jul 09, 2024
```
fix test
```
4c2538b8
save_pretrained: use tqdm when saving checkpoint shards from offloaded params (#31856) · cffa2b9c
kallewoof authored Jul 09, 2024

cffa2b9c
chore: remove duplicate words (#31853) · 350aed70
hatti authored Jul 09, 2024
```
remove duplicate words
```
350aed70
[Grounding DINO] Add processor to auto mapping (#31845) · bd760cd1
NielsRogge authored Jul 09, 2024
```
Add model
```
bd760cd1
FX symbolic_trace: do not test decoder_inputs_embeds (#31840) · 0abf5e8e
fxmarty authored Jul 09, 2024
```
only test input_embeds, not decoder_input_embeds
```
0abf5e8e

Deprecate `vocab_size` in other two VLMs (#31681) · 952dfd48

Raushan Turganbay authored Jul 09, 2024



* deprrecate `vocab_size` in other two VLMs

* Update src/transformers/models/fuyu/configuration_fuyu.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* depracate until 4.44

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

952dfd48

08 Jul, 2024 2 commits
- Mamba & RecurrentGemma: enable strict signature (#31549) · 594c1610
  Joao Gante authored Jul 08, 2024
```
* enable strict signature

* this should not have been deleted

* recurrent_gemma too
```
  594c1610
- Fix incorrect accelerator device handling for MPS in `TrainingArguments` (#31812) · ae9dd02e
  André Storhaug authored Jul 08, 2024
```
* Fix wrong acclerator device setup when using MPS

* More robust TrainingArguments MPS handling

* Update training_args.py

* Cleanup
```
  ae9dd02e