Commits · ac946aac257cadfa8264fa4a284cd0ea1061c5b5 · chenpangpang / transformers

16 Jul, 2024 1 commit

Fix the incorrect permutation of gguf (#31788) · ac946aac

Penut Chen authored Jul 16, 2024



* Fix the incorrect permutation of gguf

* rename num_kv_heads
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* add typing to num_kv_heads
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* rename variables

* refactor permute function name

* update the expected text of the llama3 q4 test

---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

ac946aac

15 Jul, 2024 7 commits
- Generate: doc nits (#31982) · 6fbea6d2
  Joao Gante authored Jul 15, 2024
```
nits
```
  6fbea6d2
- Masking: remove flakiness from test (#31939) · e4682de6
  Joao Gante authored Jul 15, 2024
  
  e4682de6
- Avoid race condition (#31973) · a1a34657
  Yih-Dar authored Jul 15, 2024
```
* [test_all] hub

* remove delete

* remove delete

* remove delete

* remove delete

* remove delete

* remove delete

* [test_all]

* [test_all]

* [test_all]

* [test_all]

* [test_all]

* [test_all]

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  a1a34657
- Notify new docker images built for circleci (#31701) · 11efb4fc
  Yih-Dar authored Jul 15, 2024
```
* hello

* hello

* hello

* hello

* hello

* hello

* hello

* notify

* trigger

* use new channel

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  11efb4fc
- fix: Fixed the arguments in `create_repo()` function call (#31947) · 556a4205
  Sai-Suraj-27 authored Jul 15, 2024
```
* Fixed the arguments in create_repo() function call.

* Formatted the code properly using ruff.

* Formatted the code more clearly.
```
  556a4205
- Generate: handle `logits_warper` update in models with custom generate fn (#31957) · 90750042
  Joao Gante authored Jul 15, 2024
```
handle logits_warper update in models with custom generate fn
```
  90750042
- fix: Removed a wrong key-word argument in `sigmoid_focal_loss()` function call (#31951) · 454bc14d
  Sai-Suraj-27 authored Jul 15, 2024
```
Removed a wrong key-word argument in sigmoid_focal_loss() function call.
```
  454bc14d
14 Jul, 2024 4 commits
- Whisper: move to tensor cpu before converting to np array at decode time (#31954) · a5c642fe
  Joao Gante authored Jul 14, 2024
  
  a5c642fe
- Generate: v4.42 deprecations 🧹🧹 (#31956) · df1c248a
  Joao Gante authored Jul 14, 2024
```
v4_42 deprecations
```
  df1c248a
- Generate: remove deprecated code due to `Cache` and `cache_position` being default (#31898) · 739a6316
  Joao Gante authored Jul 14, 2024
```
* tmp commit

* shorter

* nit

* explicit kwargs

* propagate changes

* mass propagation with a few manual touches (let's see how CI behaves)

* fix cacheless case

* Update src/transformers/generation/utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* make fixup

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
```
  739a6316
- Fix `GenerationMixin.generate` compatibility with pytorch profiler (#31935) · 8480fda6
  fxmarty authored Jul 14, 2024
```
use torch.compiler.is_compiling() when possible
```
  8480fda6
12 Jul, 2024 2 commits

fix prompt strip to support tensors and np arrays (#27818) · 7f79a973

Aviv Shamsian authored Jul 12, 2024



* fix prompt strip to support tensors and np arrays

* framework agnostic

* change logic check before converting prompt into list
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* adding _convert_to_list to tokenization_whisper_fast

* adding tests for prompt decoding

* adding comment
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* adding comment
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* revert minor

* make style formatting

* style formatting after update

* Update src/transformers/models/whisper/tokenization_whisper_fast.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* fixing _strip_prompt to handle _decode_with_timestamps

* fix copies

---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

7f79a973

Docker: TF pin on the consistency job (#31928) · d1a1bcf5
Joao Gante authored Jul 12, 2024
```
* pin

* dev-ci

* dev-ci

* dev-ci

* test pushed image
```
d1a1bcf5

11 Jul, 2024 16 commits

[Bug Fix] fix qa pipeline tensor to numpy (#31585) · aec1ca3a
jiqing-feng authored Jul 12, 2024
```
* fix qa pipeline

* fix tensor to numpy
```
aec1ca3a

Adding hiera (#30356) · c1e139c2

Naman Garg authored Jul 12, 2024



* initialized Structure

* Updated variable names

* Added Config class, basic HF setup, convert_to_hf

* Fixed Convert function, added hiera to HF files, Initilized test files

* better naming for x in forward pass

* Moved utils to hiera

* Change hiera -> hiera_model

* Fixed integration into tranformers

* Fix: Convert Checkpoint

* added documentation for hiera

* added documentation for hiera

* added Docstings to models, Transformers based changes

* make style and quality

* make style and quality

* Integration & Block tests running

* Fixed bugs

* initialized Structure

* Updated variable names

* Added Config class, basic HF setup, convert_to_hf

* Fixed Convert function, added hiera to HF files, Initilized test files

* better naming for x in forward pass

* Moved utils to hiera

* Change hiera -> hiera_model

* Fixed integration into tranformers

* Fix: Convert Checkpoint

* added documentation for hiera

* added documentation for hiera

* added Docstings to models, Transformers based changes

* make style and quality

* make style and quality

* Integration & Block tests running

* Fixed bugs

* Removed tim dependency

* added HieraBlock

* fixed: Model name

* added tests for HieraModel, HieraBlock

* fixed imports

* fixed quality & copies

* Fixes

* Update docs/source/en/model_doc/hiera.md

Fix name
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/hiera.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/hiera.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/hiera/configuration_hiera.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/hiera/configuration_hiera.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/hiera/modeling_hiera.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/hiera/modeling_hiera.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fixed formatting

* Code quality & Import differences

* quality and repo-consistency fix

* fixed no torch error

* Docstring fix

* Docstring fix

* doc string fix

* fixed example usage

* Resolved issues in modeling_hiera

* Removed Hiera MAE

* Added test and resolved bug

* fixed doc string

* First commit

* Finished conversion script and model forward working

* Resolved all issues

* nits

* Improving tests

* Nits

* More nits

* Improving HieraForMaskedImageModeling

* More improvements and nits

* Fixed docstrings of outputs

* More fixes

* More imrpovments

* Updated conversion script

* Fixed docstrings

* Improved tests

* Fixed attentou outputs test

* All tests green

* Removed unnecessary file

* contribution attribution

* Resolved a few issues

* Resolved Comments

* Updated model repo id and fixed bugs

* Removed loss print

* Make tests green

* Updated docstrings

* Fix style

* Fixed num_heads in config

* Removed unnecessary video checkpoint related code in the conversion script

* Fix style

* Changed atol in conversion script

* HieraConfig

* Fix copies

* Fixed typo

* Resolved few issues

* make

* converted conv_nd -> nn.Module

* Removed video complexities

* Removed video complexities

* fix style

* Addressing comments

* Update src/transformers/models/hiera/modeling_hiera.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/hiera/modeling_hiera.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/hiera/modeling_hiera.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fix style

* Fixed tests

* Fixed typo

* Fixed interpolate test

* Made torch fx compatible

* Made sure imageprocesor is correct

* Addressed comments

* Noise directly as torch

* Remove unnecesary attr

* Added return_dit

* Update src/transformers/models/hiera/__init__.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Updated checkpoints

* [run_slow] hiera

* Fixed device mismatch

* [run_slow] hiera

* Fixed GPU tests

* [run_slow] hiera

---------
Co-authored-by: Ubuntu <ubuntu@ip-172-31-29-50.us-east-2.compute.internal>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Eduardo Pacheco <eduardo.pach@hotmail.com>
Co-authored-by: Eduardo Pacheco <69953243+EduardoPach@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

c1e139c2

Allow `Trainer.get_optimizer_cls_and_kwargs` to be overridden (#31875) · 574e68d5

Apoorv Khandelwal authored Jul 11, 2024

* Change `Trainer.get_optimizer_cls_and_kwargs` to `self.`

* Make `get_optimizer_cls_and_kwargs` an instance method

* Fixing typo

* Revert `get_optimizer_cls_and_kwargs` to staticmethod

* restore newline to trainer.py eof

574e68d5

🚨 fix(SigLip): remove spurious exclusion of first vision output token (#30952) · 52585019
t11s authored Jul 11, 2024
```
fix(SigLip): remove spurious exclusion of first vision output token in classifier
```
52585019
Generate: fix `SlidingWindowCache.reset()` (#31917) · 6a05f68f
Joao Gante authored Jul 11, 2024
```
fix sliding cache
```
6a05f68f

Refactor flash attention implementation in transformers (#31446) · e3143952

Arthur authored Jul 11, 2024



* dumb commit

* nit

* update

* something like this

* unpack in modeling utils

* safe import

* oups

* update

* nits

* diff convert gemma

* update

* start propagating

* udpate other modeling code as well

* update for sliding window models

* nits

* more init cleanups

* styling

* fixup

* noice

* pass fixup

* typo typing_extension -> typing_extensions

* torch.nn.functionnal -> torch.nn.functional

* add to import structure

* unpack

* simplify a bit more for this first version

* nut

* update

* update

* nit

* ease the import of `Unpack`

* remove useless `use_sliding_window`

* no qua please

* protect import?

* style

* [run-slow]

* [run slow] llama,gemma,mistral,mixtral

* remove extra kwargs

* fix llama

* address review comments

* apply diff_model_converter to modeling_gemma.py

* remove cache_position 1

* remove cache_position 2

* some cleaning

* refactor gemma2 as well

* apply review comments

* rename file to modeling_flash_attention_utils.py

* siglip refactor

* remove dead code

* is the hub down?

* still down?

* fix siglip

* fix gemma2

* fatal: Could not read from remote repository.

* fix typo in softcap implem

* flacky

* Failed: Timeout >120.0s

---------
Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>

e3143952

Fix fx tests with inputs_embeds (#31862) · ad4ef3a2
fxmarty authored Jul 11, 2024
```
* fix tests

* [test_all] check

* address review comments
```
ad4ef3a2

Add warning message for beta and gamma parameters (#31654) · 1499a550

Omar Salman authored Jul 11, 2024

* Add warning message for  and  parameters

* Fix when the warning is raised

* Formatting changes

* Improve testing and remove duplicated warning from _fix_key

1499a550

add gather_use_object arguments II (#31799) · 23d6d0cc

Sangbum Daniel Choi authored Jul 11, 2024



* add gather_use_object arguments

* fix name and pass the CI test for Seq2SeqTrainer

* make style

* make it to functools

* fix typo

* add accelerate version:

* adding warning

* Update src/transformers/trainer.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* make style

* Update src/transformers/training_args.py

* check function move to initial part

* add test for eval_use_gather_object

* fix minor

---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

23d6d0cc

fix: Fixed the `1st argument` name in classmethods (#31907) · 2e48b3e8
Sai-Suraj-27 authored Jul 11, 2024
```
Fixed the first argument name in few classmethods.
```
2e48b3e8

Fix missing methods for Fuyu (#31880) · 48c20700

Isotr0py authored Jul 11, 2024

* add missing methods for FuyuForCausalLM

* fix a typo

* format code

* add missing tie_weights

* format code

48c20700

[`Gemma2`] Support FA2 softcapping (#31887) · f4ec7a28
Arthur authored Jul 11, 2024
```
* Support softcapping

* strictly greater than

* update
```
f4ec7a28
[`ConvertSlow`] make sure the order is preserved for addedtokens (#31902) · f67e0f7f
Arthur authored Jul 11, 2024
```
* preserve the order

* oups

* oups

* nit

* trick

* fix issues
```
f67e0f7f

Processor accepts any kwargs (#31889) · 14d3b3f0

Raushan Turganbay authored Jul 11, 2024

* accept kwargs in processors

* return unused kwargs

* fix tests

* typo

* update the other way

14d3b3f0

Fixes to alternating SWA layers in Gemma2 (#31775) · a695c186

turboderp authored Jul 11, 2024

* HybridCache: Flip order of alternating global-attn/sliding-attn layers

* HybridCache: Read sliding_window argument from cache_kwargs

* Gemma2Model: Flip order of alternating global-attn/sliding-attn layers

* Code formatting

a695c186

InstructBlipVideo: Update docstring (#31886) · d625294d
Raushan Turganbay authored Jul 11, 2024
```
* update docs

* one more change
```
d625294d

10 Jul, 2024 9 commits

Add a condition for nested_detach (#31855) · c54af4c7
haikuoxin authored Jul 11, 2024
```
fix bug: https://github.com/huggingface/transformers/issues/31852
```
c54af4c7

Modify `warnings` in a `with` block to avoid flaky tests (#31893) · 080e14b2

Yih-Dar authored Jul 10, 2024



* fix

* [test_all] check before merge

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

080e14b2

[RT-DETR] Add resources (#31815) · ec03d97b
NielsRogge authored Jul 10, 2024
```
* Add resources

* Address comments
```
ec03d97b
Push sharded checkpoint to hub when `push_to_hub=True` in `TrainingArguments` (#31808) · 8df28bb3
Marc Sun authored Jul 10, 2024
```
Save sharded checkpoint in Trainer
```
8df28bb3
fix: Removed `duplicate` field definitions in some classes (#31888) · da79b180
Sai-Suraj-27 authored Jul 10, 2024
```
Removed duplicate field definitions in classes.
```
da79b180

Fix failed tests in #31851 (#31879) · 9d98706b

Yih-Dar authored Jul 10, 2024

* Revert "Revert "Fix `_init_weights` for `ResNetPreTrainedModel`" (#31868)"

This reverts commit b45dd5de

.

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

9d98706b

Fix file type checks in data splits for contrastive training example script (#31720) · a0a3e2f4
Noah Young authored Jul 10, 2024
```
fix data split file type checks
```
a0a3e2f4
remove duplicate words in msg (#31876) · e9eeedaf
yukionfire authored Jul 10, 2024

e9eeedaf

Add conversion for interleave llava (#31858) · 97aa3e29

Raushan Turganbay authored Jul 10, 2024



* add conversion for interleave llava

* remove debug lines

* remove unused imports

* Update src/transformers/models/llava/convert_llava_weights_to_hf.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* small changes + docs

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

97aa3e29

09 Jul, 2024 1 commit

add warning when using gradient_checkpointing with FSDP full shard (#31578) · ad35309a

Yun Dai authored Jul 09, 2024



* add warning when using  with FSDP full shard

* fix style

* Update src/transformers/training_args.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/training_args.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add hybrid shard warn

* fix style

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

ad35309a