Commits · aee11fe427b2f2fd66c3ef3cd91757ec00420ac9 · chenpangpang / transformers

16 Feb, 2024 2 commits

Fix max_length criteria when using inputs_embeds (#28994) · aee11fe4

Raushan Turganbay authored Feb 16, 2024



* fix max_length for inputs_embeds

* make style

* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Static Cache: load models with MQA or GQA (#28975)

* fix

* fix tests

* fix tests

* Update src/transformers/generation/utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* more fixes

* make style

---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

aee11fe4

Update all references to canonical models (#29001) · f497f564
Lysandre Debut authored Feb 16, 2024
```
* Script & Manual edition

* Update
```
f497f564

15 Feb, 2024 3 commits

Patch to skip failing `test_save_load_low_cpu_mem_usage` tests (#29043) · 4156f517
amyeroberts authored Feb 15, 2024
```
* Patch to skip currently failing tests

* Whoops - wrong place
```
4156f517

DeformableDetrModel support fp16 (#29013) · 5b6fa230

Donggeun Yu authored Feb 15, 2024



* Update ms_deform_attn_cuda.cu

* Update ms_deform_attn_cuda.cuh

* Update modeling_deformable_detr.py

* Update src/transformers/models/deformable_detr/modeling_deformable_detr.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update modeling_deformable_detr.py

* python utils/check_copies.py --fix_and_overwrite

* Fix dtype missmatch error

* Update test_modeling_deformable_detr.py

* Update test_modeling_deformable_detr.py

* Update modeling_deformable_detr.py

* Update modeling_deformable_detr.py

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

5b6fa230

Fix static generation when compiling! (#28937) · f3788b09

Arthur authored Feb 15, 2024



* wow I was scared!

* fix everything

* nits

* make it BC?

* add todo

* nits

* is_tracing should still be used to pass tracing tests

* nits

* some nits to make sure genration works with static cache uncompiled

* fix sdpa

* fix FA2 for both static and dynamic in a better way?

* style

* fix-copies

* fix fix copies

* fix sequential beam searcg

* style

* use `keys_to_ignore`

* nit

* correct dtype inference when init

* :( the fix for FA2 is still not optimal to investigate!

* styling

* nits

* nit

* this might work better

* add comment

* Update src/transformers/models/llama/modeling_llama.py

* "position_ids" -> "cache_position"

* style

* nit

* Remove changes that should no be propagatted just yet

* Apply suggestions from code review

* Styling

* make sure we raise an errir for static cache with FA2 enabled

* move  to the bottom of the signature

* style

* Update src/transformers/models/llama/modeling_llama.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_llama.py

* nit in the name

---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

f3788b09

14 Feb, 2024 9 commits

FIX [`Trainer` / tags]: Fix trainer + tags when users do not pass `"tags"` to... · 7a0fccc6

Younes Belkada authored Feb 14, 2024

FIX [`Trainer` / tags]: Fix trainer + tags when users do not pass `"tags"` to `trainer.push_to_hub()` (#29009)

* fix trainer tags

* add test

7a0fccc6

Backbone kwargs in config (#28784) · 0199a484

amyeroberts authored Feb 14, 2024



* Enable instantiating model with pretrained backbone weights

* Clarify pretrained import

* Use load_backbone instead

* Add backbone_kwargs to config

* Pass kwargs to constructors

* Fix up

* Input verification

* Add tests

* Tidy up

* Update tests/utils/test_backbone_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

0199a484

Add tie_weights() to LM heads and set bias in set_output_embeddings() (#28948) · 725f4ad1

JB (Don) authored Feb 15, 2024

* Add tie_weights() to LM heads and set bias in set_output_embeddings()

The bias were not tied correctly in some LM heads, and this change should fix that.

* Moving test_save_and_load_low_cpu_mem_usage to ModelTesterMixin

* Adding _tie_weights() to MPNet and Vilt

* Skip test for low cpu mem usage for Deta/DeformableDetr since they cannot init on meta device

* Rename to test name to save_load to match the convention

725f4ad1

Fix flaky test vision encoder-decoder generate (#28923) · 354775bc
Raushan Turganbay authored Feb 14, 2024

354775bc

Introduce AcceleratorConfig dataclass (#28664) · 0507e69d

Zach Mueller authored Feb 14, 2024



* Introduce acceleratorconfig dataclass

* Extra second warn

* Move import

* Try moving import under is_accelerate_available

* Quality

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Clean

* Remove to_kwargs

* Change version

* Improve tests by including dispatch and split batches

* Improve reliability

* Update tests/trainer/test_trainer.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fixup tests and review nits

* Make tests pass

* protect import

* Protect import

* Empty-Commit

* Make training_args.to_dict handle the AcceleratorConfig

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

0507e69d

Set the dataset format used by `test_trainer` to float32 (#28920) · 69ca640d
Huazhong Ji authored Feb 14, 2024
```
Co-authored-by: unit_test <test@unit.com>
```
69ca640d

AQLM quantizer support (#28928) · 1ecf5f7c

Andrei Panferov authored Feb 14, 2024



* aqlm init

* calibration and dtypes

* docs

* Readme update

* is_aqlm_available

* Simpler link in docs

* Test TODO real reference

* init _import_structure fix

* AqlmConfig autodoc

* integration aqlm

* integrations in tests

* docstring fix

* legacy typing

* Less typings

* More kernels information

* Performance -> Accuracy

* correct tests

* remoced multi-gpu test

* Update docs/source/en/quantization.md
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/transformers/utils/quantization_config.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Brought back multi-gpu tests

* Update src/transformers/integrations/aqlm.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update tests/quantization/aqlm_integration/test_aqlm.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

---------
Co-authored-by: Andrei Panferov <blacksamorez@yandex-team.ru>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

1ecf5f7c

Add SiglipForImageClassification and CLIPForImageClassification (#28952) · 63ffd56d
NielsRogge authored Feb 14, 2024
```
* First draft

* Add CLIPForImageClassification

* Remove scripts

* Fix doctests
```
63ffd56d

Add `StableLM` (#28810) · de6029a0

Jonathan Tow authored Feb 14, 2024

* Add `StableLM`

* fix(model): re-create from `huggingface-cli add-new-model-like persimmon`

* fix: re-add changes to address comments

* fix(readme): add links to paper

* fix(tokenization_auto): remove `GPTNeoXTokenizerFastFast` ref

* fix(tests): re-add `@slow` decorator to integration tests

* fix(tests): import slow...

* fix(readme_hd): remove whitespace edit

* fix(tokenizer): auto tokenizer tuple

* skip doctests for `modeling_stablelm`

de6029a0

13 Feb, 2024 4 commits

[`DETR`] Update the processing to adapt masks & bboxes to reflect padding (#28363) · bd4b83e1

amyeroberts authored Feb 13, 2024

* Update the processing so bbox coords are adjusted for padding

* Just pad masks

* Tidy up, add tests

* Better tests

* Fix yolos and mark as slow for pycocotols

* Fix yolos - return_tensors

* Clarify padding and normalization behaviour

bd4b83e1

Static Cache: load models with MQA or GQA (#28975) · 3e70a207
Joao Gante authored Feb 13, 2024

3e70a207

Add sudachi_projection option to BertJapaneseTokenizer (#28503) · da20209d

Hiroshi Matsuda authored Feb 13, 2024



* add sudachi_projection option

* Upgrade sudachipy>=0.6.8

* add a test case for sudachi_projection

* Compatible with older versions of SudachiPy

* make fixup

* make style

* error message for unidic download

* revert jumanpp test cases

* format options for sudachi_projection
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* format options for sudachi_split_mode and sudachi_dict_type

* comment

* add tests for full_tokenizer kwargs

* pass projection arg directly

* require_sudachi_projection

* make style

* revert upgrade sudachipy

* check is_sudachi_projection_available()

* revert dependency_version_table and bugfix

* style format

* simply raise ImportError
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* simply raise ImportError

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

da20209d

[`NllbTokenizer`] refactor with added tokens decoder (#27717) · b4456753

Arthur authored Feb 13, 2024



* refactor with addedtokens decoder

* style

* get rid of lang code to id

* style

* keep some things for BC

* update tests

* add the mask token at the end of the vocab

* nits

* nits

* fix final tests

* style

* nits

* Update src/transformers/models/nllb/tokenization_nllb_fast.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* nits

* style?

* Update src/transformers/convert_slow_tokenizer.py

* make it a tad bit more custom

* ruff please stop
Co-Authored by avidale

<dale.david@mail.ru>

* Update
Co-authored-by: avidale <dale.david@mail.ru>

* Update
Co-authored-by: avidale <dale.david@mail.ru>

* oupts

* ouft

* nites

* test

* fix the remaining failing tests

* style

* fix failing test

* ficx other test

* temp dir + test the raw init

* update test

* style

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

b4456753

12 Feb, 2024 3 commits
- [Docs] Add language identifiers to fenced code blocks (#28955) · fe3df9d5
  Klaus Hipp authored Feb 12, 2024
```
Add language identifiers to code blocks
```
  fe3df9d5
- Tests: tag `test_save_load_fast_init_from_base` as flaky (#28930) · e30bbb26
  Joao Gante authored Feb 12, 2024
  
  e30bbb26
- [Nougat] Fix pipeline (#28242) · f278ef20
  NielsRogge authored Feb 12, 2024
```
* Fix pipeline

* Remove print statements

* Address comments

* Address issue

* Remove unused imports
```
  f278ef20
08 Feb, 2024 2 commits

Support batched input for decoder start ids (#28887) · d6286646

Raushan Turganbay authored Feb 08, 2024



* support batched input for decoder start ids

* Fix typos
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* minor changes

* fix: decoder_start_id as list

* empty commit

* empty commit

* empty commit

* empty commit

* empty commit

* empty commit

* empty commit

* empty commit

* empty commit

---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

d6286646

[`Core generation`] Adds support for static KV cache (#27931) · 115ac94d

Arthur authored Feb 08, 2024

Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

115ac94d

07 Feb, 2024 1 commit

⚠

️ Raise `Exception` when trying to generate 0 tokens

⚠

️ (#28621) · abf8f54a

Daniel Korat authored Feb 07, 2024



* change warning to exception

* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* validate `max_new_tokens` > 0 in `GenerationConfig`

* fix truncation test parameterization in `TextGenerationPipelineTests`

---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

abf8f54a

06 Feb, 2024 4 commits

Hotfix - make `torchaudio` get the correct version in `torch_and_flax_job` (#28899) · 40658be4
Yih-Dar authored Feb 06, 2024
```
* check

* check

* check

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
40658be4
Revert "[WIP] Hard error when ignoring tensors." (#28898) · 76b4f666
Yih-Dar authored Feb 06, 2024
```
Revert "[WIP] Hard error when ignoring tensors. (#27484)"

This reverts commit 2da28c4b.
```
76b4f666
Fix `FastSpeech2ConformerModelTest` and skip it on CPU (#28888) · 6529a5b5
Yih-Dar authored Feb 06, 2024
```
* fix

* fix

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
6529a5b5

Adds LlamaForQuestionAnswering class in modeling_llama.py along with AutoModel Support (#28777) · 2e7c942c

nakranivaibhav authored Feb 06, 2024

* This is a test commit

* testing commit

* final commit with some changes

* Removed copy statement

* Fixed formatting issues

* Fixed error added past_key_values in the forward method

* Fixed a trailing whitespace. Damn the formatting rules are strict

* Added the copy statement

2e7c942c

05 Feb, 2024 3 commits

Image Feature Extraction pipeline (#28216) · ba3264b4

amyeroberts authored Feb 05, 2024



* Draft pipeline

* Fixup

* Fix docstrings

* Update doctest

* Update pipeline_model_mapping

* Update docstring

* Update tests

* Update src/transformers/pipelines/image_feature_extraction.py
Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>

* Fix docstrings - review comments

* Remove pipeline mapping for composite vision models

* Add to pipeline tests

* Remove for flava (multimodal)

* safe pil import

* Add requirements for pipeline run

* Account for super slow efficientnet

* Review comments

* Fix tests

* Swap order of kwargs

* Use build_pipeline_init_args

* Add back FE pipeline for Vilt

* Include image_processor_kwargs in docstring

* Mark test as flaky

* Update TODO

* Update tests/pipelines/test_pipelines_image_feature_extraction.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Add license header

---------
Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

ba3264b4

Correct wav2vec2-bert inputs_to_logits_ratio (#28821) · 7addc934

Yoach Lacombe authored Feb 05, 2024

* Correct wav2vec2-bert inputs_to_logits_ratio

* correct ratio

* correct ratio, clean asr pipeline

* refactor on one line

7addc934

[WIP] Hard error when ignoring tensors. (#27484) · 2da28c4b

Nicolas Patry authored Feb 05, 2024



* [WIP] Hard error when ignoring tensors.

* Better selection/error when saving a checkpoint.

- Find all names we should normally drop (those are in the transformers
  config)
- Find all disjoint tensors (for those we can safely trigger a copy to
  get rid of the sharing before saving)
- Clone those disjoint tensors getting rid of the issue
- Find all identical names (those should be declared in the config
  but we try to find them all anyway.)
- For all identical names:
  - If they are in the config, just ignore them everything is fine
  - If they are not, warn about them.
- For all remainder tensors which are shared yet neither identical NOR
  disjoint. raise a hard error.

* Adding a failing test on `main` that passes here.

* We don't need to keep the subfolder logic in this test.

* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

2da28c4b

02 Feb, 2024 4 commits

Mark `test_encoder_decoder_model_generate` for `vision_encoder_deocder` as flaky (#28842) · 3d2900e8
amyeroberts authored Feb 02, 2024
```
Mark test as flaky
```
3d2900e8

fix / skip (for now) some tests before switch to torch 2.2 (#28838) · a7cb92aa

Yih-Dar authored Feb 02, 2024



* fix / skip some tests before we can switch to torch 2.2

* style

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

a7cb92aa

Add missing None check for hf_quantizer (#28804) · ec29d25d

Juri Ganitkevitch authored Feb 02, 2024



* Add missing None check for hf_quantizer

* Add test, fix logic.

* make style

* Switch test model to Mistral

* Comment

* Update tests/test_modeling_utils.py

---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

ec29d25d

[Docs] Fix spelling and grammar mistakes (#28825) · 721ee783

Klaus Hipp authored Feb 02, 2024

* Fix typos and grammar mistakes in docs and examples

* Fix typos in docstrings and comments

* Fix spelling of `tokenizer` in model tests

* Remove erroneous spaces in decorators

* Remove extra spaces in Markdown link texts

721ee783

01 Feb, 2024 2 commits

Fix symbolic_trace with kv cache (#28724) · 709dc432
fxmarty authored Feb 01, 2024
```
* fix symbolic_trace with kv cache

* comment & better test
```
709dc432

Adding [T5/MT5/UMT5]ForTokenClassification (#28443) · 0d26abdd

JB (Don) authored Feb 01, 2024

* Adding [T5/MT5/UMT5]ForTokenClassification

* Add auto mappings for T5ForTokenClassification and variants

* Adding ForTokenClassification to the list of models

* Adding attention_mask param to the T5ForTokenClassification test

* Remove outdated comment in test

* Adding EncoderOnly and Token Classification tests for MT5 and UMT5

* Fix typo in umt5 string

* Add tests for all the existing MT5 models

* Fix wrong comment in dependency_versions_table

* Reverting change to common test for _keys_to_ignore_on_load_missing

The test is correctly picking up redundant keys in _keys_to_ignore_on_load_missing.

* Removing _keys_to_ignore_on_missing from MT5 since the key is not used in the model

* Add fix-copies to MT5ModelTest

0d26abdd

31 Jan, 2024 3 commits

DeepSpeed: hardcode `torch.arange` dtype on `float` usage to avoid incorrect... · beb2a096
Joao Gante authored Jan 31, 2024
```
DeepSpeed: hardcode `torch.arange` dtype on `float` usage to avoid incorrect initialization (#28760)
```
beb2a096

Flax mistral (#26943) · f7076cd3

Kian Sierra McGettigan authored Jan 31, 2024

* direct copy from llama work

* mistral modules forward pass working

* flax mistral forward pass with sliding window

* added tests

* added layer collection approach

* Revert "added layer collection approach"

This reverts commit 0e2905bf2236ec323163fc1a9f0c016b21aa8b8f.

* Revert "Revert "added layer collection approach""

This reverts commit fb17b6187ac5d16da7c461e1130514dc3d137a43.

* fixed attention outputs

* added mistral to init and auto

* fixed import name

* fixed layernorm weight dtype

* freeze initialized weights

* make sure conversion consideres bfloat16

* added backend

* added docstrings

* added cache

* fixed sliding window causal mask

* passes cache tests

* passed all tests

* applied make style

* removed commented out code

* applied fix-copies ignored other model changes

* applied make fix-copies

* removed unused functions

* passed generation integration test

* slow tests pass

* fixed slow tests

* changed default dtype from jax.numpy.float32 to float32 for docstring check

* skip cache test  for FlaxMistralForSequenceClassification since if pad_token_id in input_ids it doesn't score previous input_ids

* updated checkpoint since from_pt not included

* applied black style

* removed unused args

* Applied styling and fixup

* changed checkpoint for doc back

* fixed rf after adding it to hf hub

* Add dummy ckpt

* applied styling

* added tokenizer to new ckpt

* fixed slice format

* fix init and slice

* changed ref for placeholder TODO

* added copies from Llama

* applied styling

* applied fix-copies

* fixed docs

* update weight dtype reconversion for sharded weights

* removed Nullable input ids

* Removed unnecessary output attentions in Module

* added embedding weight initialziation

* removed unused past_key_values

* fixed deterministic

* Fixed RMS Norm and added copied from

* removed input_embeds

* applied make style

* removed nullable input ids from sequence classification model

* added copied from GPTJ

* added copied from Llama on FlaxMistralDecoderLayer

* added copied from to FlaxMistralPreTrainedModel methods

* fix test deprecation warning

* freeze gpt neox random_params and fix copies

* applied make style

* fixed doc issue

* skipped docstring test to allign # copied from

* applied make style

* removed FlaxMistralForSequenceClassification

* removed unused padding_idx

* removed more sequence classification

* removed sequence classification

* applied styling and consistency

* added copied from in tests

* removed sequence classification test logic

* applied styling

* applied make style

* removed freeze and fixed copies

* undo test change

* changed repeat_kv to tile

* fixed to key value groups

* updated copyright year

* split casual_mask

* empty to rerun failed pt_flax_equivalence test FlaxWav2Vec2ModelTest

* went back to 2023 for tests_pr_documentation_tests

* went back to 2024

* changed tile to repeat

* applied make style

* empty for retry on Wav2Vec2

f7076cd3

[Whisper] Refactor forced_decoder_ids & prompt ids (#28687) · 65a926e8

Patrick von Platen authored Jan 31, 2024



* up

* Fix more

* Correct more

* Fix more tests

* fix fast tests

* Fix more

* fix more

* push all files

* finish all

* make style

* Fix timestamp wrap

* make style

* make style

* up

* up

* up

* Fix lang detection behavior

* Fix lang detection behavior

* Add lang detection test

* Fix lang detection behavior

* make style

* Update src/transformers/models/whisper/generation_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* better error message

* make style tests

* add warning

---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

65a926e8