Commits · eb8e7a005fcc1f7e7896ea9cbf3e2aaf7d260cca · chenpangpang / transformers

01 Feb, 2024 2 commits

Make `is_torch_bf16_available_on_device` more strict (#28796) · eb8e7a00
Yih-Dar authored Feb 01, 2024
```
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
eb8e7a00

Adding [T5/MT5/UMT5]ForTokenClassification (#28443) · 0d26abdd

JB (Don) authored Feb 01, 2024

* Adding [T5/MT5/UMT5]ForTokenClassification

* Add auto mappings for T5ForTokenClassification and variants

* Adding ForTokenClassification to the list of models

* Adding attention_mask param to the T5ForTokenClassification test

* Remove outdated comment in test

* Adding EncoderOnly and Token Classification tests for MT5 and UMT5

* Fix typo in umt5 string

* Add tests for all the existing MT5 models

* Fix wrong comment in dependency_versions_table

* Reverting change to common test for _keys_to_ignore_on_load_missing

The test is correctly picking up redundant keys in _keys_to_ignore_on_load_missing.

* Removing _keys_to_ignore_on_missing from MT5 since the key is not used in the model

* Add fix-copies to MT5ModelTest

0d26abdd

31 Jan, 2024 13 commits

[docs] Correct the statement in the docstirng of compute_transition_scores in... · 7b2bd1fb
Shichao Song authored Feb 01, 2024
```
[docs] Correct the statement in the docstirng of compute_transition_scores in generation/utils.py (#28786)
```
7b2bd1fb

Split daily CI using 2 level matrix (#28773) · 47358661

Yih-Dar authored Jan 31, 2024



* update / add new workflow files

* Add comment

* Use env.NUM_SLICES

* use scripts

* use scripts

* use scripts

* Fix

* using one script

* Fix

* remove unused file

* update

* fail-fast: false

* remove unused file

* fix

* fix

* use matrix

* inputs

* style

* update

* fix

* fix

* no model name

* add doc

* allow args

* style

* pass argument

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

47358661

Add artifact name in job step to maintain job / artifact correspondence (#28682) · 95346e9d
Yih-Dar authored Jan 31, 2024
```
* avoid using job name

* apply to other files

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
95346e9d
DeepSpeed: hardcode `torch.arange` dtype on `float` usage to avoid incorrect... · beb2a096
Joao Gante authored Jan 31, 2024
```
DeepSpeed: hardcode `torch.arange` dtype on `float` usage to avoid incorrect initialization (#28760)
```
beb2a096

Flax mistral (#26943) · f7076cd3

Kian Sierra McGettigan authored Jan 31, 2024

* direct copy from llama work

* mistral modules forward pass working

* flax mistral forward pass with sliding window

* added tests

* added layer collection approach

* Revert "added layer collection approach"

This reverts commit 0e2905bf2236ec323163fc1a9f0c016b21aa8b8f.

* Revert "Revert "added layer collection approach""

This reverts commit fb17b6187ac5d16da7c461e1130514dc3d137a43.

* fixed attention outputs

* added mistral to init and auto

* fixed import name

* fixed layernorm weight dtype

* freeze initialized weights

* make sure conversion consideres bfloat16

* added backend

* added docstrings

* added cache

* fixed sliding window causal mask

* passes cache tests

* passed all tests

* applied make style

* removed commented out code

* applied fix-copies ignored other model changes

* applied make fix-copies

* removed unused functions

* passed generation integration test

* slow tests pass

* fixed slow tests

* changed default dtype from jax.numpy.float32 to float32 for docstring check

* skip cache test  for FlaxMistralForSequenceClassification since if pad_token_id in input_ids it doesn't score previous input_ids

* updated checkpoint since from_pt not included

* applied black style

* removed unused args

* Applied styling and fixup

* changed checkpoint for doc back

* fixed rf after adding it to hf hub

* Add dummy ckpt

* applied styling

* added tokenizer to new ckpt

* fixed slice format

* fix init and slice

* changed ref for placeholder TODO

* added copies from Llama

* applied styling

* applied fix-copies

* fixed docs

* update weight dtype reconversion for sharded weights

* removed Nullable input ids

* Removed unnecessary output attentions in Module

* added embedding weight initialziation

* removed unused past_key_values

* fixed deterministic

* Fixed RMS Norm and added copied from

* removed input_embeds

* applied make style

* removed nullable input ids from sequence classification model

* added copied from GPTJ

* added copied from Llama on FlaxMistralDecoderLayer

* added copied from to FlaxMistralPreTrainedModel methods

* fix test deprecation warning

* freeze gpt neox random_params and fix copies

* applied make style

* fixed doc issue

* skipped docstring test to allign # copied from

* applied make style

* removed FlaxMistralForSequenceClassification

* removed unused padding_idx

* removed more sequence classification

* removed sequence classification

* applied styling and consistency

* added copied from in tests

* removed sequence classification test logic

* applied styling

* applied make style

* removed freeze and fixed copies

* undo test change

* changed repeat_kv to tile

* fixed to key value groups

* updated copyright year

* split casual_mask

* empty to rerun failed pt_flax_equivalence test FlaxWav2Vec2ModelTest

* went back to 2023 for tests_pr_documentation_tests

* went back to 2024

* changed tile to repeat

* applied make style

* empty for retry on Wav2Vec2

f7076cd3

Wrap Keras methods to support BatchEncoding (#28734) · 7a496100

Matt authored Jan 31, 2024

* Shim the Keras methods to support BatchEncoding

* Extract everything to a convert_batch_encoding function

* Convert BatchFeature too (thanks Amy)

* tf.keras -> keras

7a496100

canonical repos moves (#28795) · 721e2d94

Julien Chaumond authored Jan 31, 2024



* canonical repos moves

* Style

---------
Co-authored-by: Lysandre <lysandre@huggingface.co>

721e2d94

Resolve DeepSpeed cannot resume training with PeftModel (#28746) · bebeeee0

Hieu Lam authored Jan 31, 2024

* fix: resolve deepspeed resume peft model issues

* chore: update something

* chore: update model instance pass into is peft model checks

* chore: remove hard code value to tests

* fix: format code

bebeeee0

[Whisper] Refactor forced_decoder_ids & prompt ids (#28687) · 65a926e8

Patrick von Platen authored Jan 31, 2024



* up

* Fix more

* Correct more

* Fix more tests

* fix fast tests

* Fix more

* fix more

* push all files

* finish all

* make style

* Fix timestamp wrap

* make style

* make style

* up

* up

* up

* Fix lang detection behavior

* Fix lang detection behavior

* Add lang detection test

* Fix lang detection behavior

* make style

* Update src/transformers/models/whisper/generation_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* better error message

* make style tests

* add warning

---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

65a926e8

[`HFQuantizer`] Remove `check_packages_compatibility` logic (#28789) · f9f1f2ac
Younes Belkada authored Jan 31, 2024
```
remove `check_packages_compatibility` logic
```
f9f1f2ac

don't initialize the output embeddings if we're going to tie them to input embeddings (#28192) · ae0c27ad

tom-p-reichel authored Jan 30, 2024

* test that tied output embeddings aren't initialized on load

* don't initialize the output embeddings if we're going to tie them to the input embeddings

ae0c27ad

Prevent MLflow exception from disrupting training (#28779) · a937425e

Alessio Serra authored Jan 31, 2024



Modified MLflow logging metrics from synchronous to asynchronous
Co-authored-by: codiceSpaghetti <alessio.ser@hotmail.it>

a937425e

[`bnb`] Fix bnb slow tests (#28788) · d703eaae
Younes Belkada authored Jan 31, 2024
```
fix bnb slow tests
```
d703eaae

30 Jan, 2024 11 commits

Pin Torch to <2.2.0 (#28785) · 74c9cfea

Matt authored Jan 30, 2024



* Pin torch to <2.2.0

* Pin torchvision and torchaudio as well

* Playing around with versions to see if this helps

* twiddle something to restart the CI

* twiddle it back

* Try changing the natten version

* make fixup

* Revert "Try changing the natten version"

This reverts commit de0d6592c35dc39ae8b5a616c27285db28262d06.

* make fixup

* fix fix fix

* fix fix fix

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

74c9cfea

Add tf_keras imports to prepare for Keras 3 (#28588) · 415e9a09

Matt authored Jan 30, 2024

* Port core files + ESM (because ESM code is odd)

* Search-replace in modelling code

* Fix up transfo_xl as well

* Fix other core files + tests (still need to add correct import to tests)

* Fix cookiecutter

* make fixup, fix imports in some more core files

* Auto-add imports to tests

* Cleanup, add imports to sagemaker tests

* Use correct exception for importing tf_keras

* Fixes in modeling_tf_utils

* make fixup

* Correct version parsing code

* Ensure the pipeline tests correctly revert to float32 after each test

* Ensure the pipeline tests correctly revert to float32 after each test

* More tf.keras -> keras

* Add dtype cast

* Better imports of tf_keras

* Add a cast for tf.assign, just in case

* Fix callback imports

415e9a09

Task-specific pipeline init args (#28439) · 1d489b3e

amyeroberts authored Jan 30, 2024

* Abstract out pipeline init args

* Address PR comments

* Reword

* BC PIPELINE_INIT_ARGS

* Remove old arguments

* Small fix

1d489b3e

[`Backbone`] Use `load_backbone` instead of `AutoBackbone.from_config` (#28661) · 2fa1c808

amyeroberts authored Jan 30, 2024

* Enable instantiating model with pretrained backbone weights

* Remove doc updates until changes made in modeling code

* Use load_backbone instead

* Add use_timm_backbone to the model configs

* Add missing imports and arguments

* Update docstrings

* Make sure test is properly configured

* Include recent DPT updates

2fa1c808

Further pin pytest version (in a temporary way) (#28780) · c24c5245
Yih-Dar authored Jan 30, 2024
```
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
c24c5245
Fix transformers.utils.fx compatibility with torch<2.0 (#28774) · 6f7d5db5
fxmarty authored Jan 30, 2024
```
guard sdpa on torch>=2.0
```
6f7d5db5

Use Conv1d for TDNN (#25728) · 5c8d941d

Thien Tran authored Jan 30, 2024

* use conv for tdnn

* run make fixup

* update TDNN

* add PEFT LoRA check

* propagate tdnn warnings to others

* add missing imports

* update TDNN in wav2vec2_bert

* add missing imports

5c8d941d

[`HfQuantizer`] Move it to "Developper guides" (#28768) · 866253f8
Younes Belkada authored Jan 30, 2024
```
Update _toctree.yml
```
866253f8

`HfQuantizer` class for quantization-related stuff in `modeling_utils.py` (#26610) · d78e78a0

Poedator authored Jan 30, 2024



* squashed earlier commits for easier rebase

* rm rebase leftovers

* 4bit save enabled @quantizers

* TMP gptq test use exllama

* fix AwqConfigTest::test_wrong_backend for A100

* quantizers AWQ fixes

* _load_pretrained_model low_cpu_mem_usage branch

* quantizers style

* remove require_low_cpu_mem_usage attr

* rm dtype arg from process_model_before_weight_loading

* rm config_origin from Q-config

* rm inspect from q_config

* fixed docstrings in QuantizationConfigParser

* logger.warning fix

* mv is_loaded_in_4(8)bit to BnbHFQuantizer

* is_accelerate_available error msg fix in quantizer

* split is_model_trainable in bnb quantizer class

* rm llm_int8_skip_modules as separate var in Q

* Q rm todo

* fwd ref to HFQuantizer in type hint

* rm note re optimum.gptq.GPTQQuantizer

* quantization_config in __init__ simplified

* replaced NonImplemented with  create_quantized_param

* rm load_in_4/8_bit deprecation warning

* QuantizationConfigParser refactoring

* awq-related minor changes

* awq-related changes

* awq config.modules_to_not_convert

* raise error if no q-method in q-config in args

* minor cleanup

* awq quantizer docstring

* combine common parts in bnb process_model_before_weight_loading

* revert test_gptq

* .process_model_ cleanup

* restore dict config warning

* removed typevars in quantizers.py

* cleanup post-rebase 16 jan

* QuantizationConfigParser classmethod refactor

* rework of handling of unexpected aux elements of bnb weights

* moved q-related stuff from save_pretrained to quantizers

* refactor v1

* more changes

* fix some tests

* remove it from main init

* ooops

* Apply suggestions from code review
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* fix awq issues

* fix

* fix

* fix

* fix

* fix

* fix

* add docs

* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/hf_quantizer.md

* address comments

* fix

* fixup

* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* address final comment

* update

* Update src/transformers/quantizers/base.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/quantizers/auto.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix

* add kwargs update

* fixup

* add `optimum_quantizer` attribute

* oops

* rm unneeded file

* fix doctests

---------
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

d78e78a0

Move CLIP _no_split_modules to CLIPPreTrainedModel (#27841) · 1f5590d3
Zhan Ling authored Jan 30, 2024
```
Add _no_split_modules to CLIPModel
```
1f5590d3

Don't allow passing `load_in_8bit` and `load_in_4bit` at the same time (#28266) · a989c6c6

Omar Sanseviero authored Jan 30, 2024



* Update quantization_config.py

* Style

* Protect from setting directly

* add tests

* Update tests/quantization/bnb/test_4bit.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

a989c6c6

29 Jan, 2024 13 commits

Add French translation: french README.md (#28696) · cd2eb8cb

ThibaultLengagne authored Jan 29, 2024



* doc: french README
Signed-off-by: ThibaultLengagne <thibaultl@padok.fr>

* doc: Add Depth Anything
Signed-off-by: ThibaultLengagne <thibaultl@padok.fr>

* doc: Add french link in other docs
Signed-off-by: ThibaultLengagne <thibaultl@padok.fr>

* doc: Add missing links in fr docs

* doc: fix several mistakes in translation
Signed-off-by: ThibaultLengagne <thibaultl@padok.fr>

---------
Signed-off-by: ThibaultLengagne <thibaultl@padok.fr>
Co-authored-by: Sarapuce <alexandreh@padok.fr>

cd2eb8cb

Support saving only PEFT adapter in checkpoints when using PEFT + FSDP (#28297) · a055d09e

Ajay Patel authored Jan 29, 2024

* Update trainer.py

* Revert "Update trainer.py"

This reverts commit 0557e2cc9effa3a41304322032239a3874b948a7.

* Make trainer.py use adapter_only=True when using FSDP + PEFT

* Support load_best_model with adapter_only=True

* Ruff format

* Inspect function args for save_ load_ fsdp utility functions and only pass adapter_only=True if they support it

a055d09e

[Whisper] Make tokenizer normalization public (#28136) · da3c79b2
Sanchit Gandhi authored Jan 29, 2024
```
* [Whisper] Make tokenizer normalization public

* add to docs
```
da3c79b2
Fix typo of `Block`. (#28727) · e694e985
xkszltl authored Jan 29, 2024

e694e985
Mark test_constrained_beam_search_generate as flaky (#28757) · 9e8f35fa
amyeroberts authored Jan 29, 2024
```
* Make test_constrained_beam_search_generate as flaky

* Update tests/generation/test_utils.py
```
9e8f35fa
Pin pytest version <8.0.0 (#28758) · 0f8d015a
amyeroberts authored Jan 29, 2024
```
* Pin pytest version <8.0.0

* Update setup.py

* make deps_table_update
```
0f8d015a
small doc update for CamemBERT (#28644) · 26aa03a2
Julien Chaumond authored Jan 29, 2024

26aa03a2

Enable Gradient Checkpointing in Deformable DETR (#28686) · 0548af54

Nate Cibik authored Jan 29, 2024

* Enabled gradient checkpointing in Deformable DETR

* Enabled gradient checkpointing in Deformable DETR encoder

* Removed # Copied from headers in modeling_deta.py to break dependence on Deformable DETR code

0548af54

PatchtTST and PatchTSMixer fixes (#28083) · f72c7c22

Wesley Gifford authored Jan 29, 2024

* 🐛

 fix .max bug

* remove prediction_length from regression output dimensions

* fix parameter names, fix output names, update tests

* ensure shape for PatchTST

* ensure output shape for PatchTSMixer

* update model, batch, and expected for regression distribution test

* update test expected
Signed-off-by: Wesley M. Gifford <wmgifford@us.ibm.com>

* Update tests/models/patchtst/test_modeling_patchtst.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/patchtst/test_modeling_patchtst.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/patchtst/test_modeling_patchtst.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/patchtsmixer/test_modeling_patchtsmixer.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/patchtsmixer/test_modeling_patchtsmixer.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* standardize on patch_length
Signed-off-by: Wesley M. Gifford <wmgifford@us.ibm.com>

* Update tests/models/patchtsmixer/test_modeling_patchtsmixer.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/patchtsmixer/test_modeling_patchtsmixer.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Make arguments more explicit
Signed-off-by: Wesley M. Gifford <wmgifford@us.ibm.com>

* adjust prepared inputs
Signed-off-by: Wesley M. Gifford <wmgifford@us.ibm.com>

---------
Signed-off-by: Wesley M. Gifford <wmgifford@us.ibm.com>
Co-authored-by: Wesley M. Gifford <wmgifford@us.ibm.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

f72c7c22

[Docs] Fix Typo in English & Japanese CLIP Model Documentation (TMBD -> TMDB) (#28751) · 3a08cc48
Vinyzu authored Jan 29, 2024
```
* [Docs] Fix Typo in English CLIP model_doc

* [Docs] Fix Typo in Japanese CLIP model_doc
```
3a08cc48
Fix input data file extension in examples (#28741) · 39fa4009
Klaus Hipp authored Jan 29, 2024

39fa4009
Fix `DepthEstimationPipeline`'s docstring (#28733) · 5649c0cb
Yih-Dar authored Jan 29, 2024
```
* fix

* fix

* Fix

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
5649c0cb
Add serialization logic to pytree types (#27871) · 243e186e
Angela Yi authored Jan 29, 2024
```
* Add serialized type name to pytrees

* Modify context

* add serde test
```
243e186e

28 Jan, 2024 1 commit
- [`Siglip`] protect from imports if sentencepiece not installed (#28737) · f1cc6157
  amyeroberts authored Jan 28, 2024
```
[Siglip] protect from imports if sentencepiece not installed
```
  f1cc6157