Commits · 3f9f749325646efd46c16beafa901b8fdf89b89c · chenpangpang / transformers

05 Feb, 2024 6 commits

[`Doc`] update contribution guidelines (#28858) · 3f9f7493
Arthur authored Feb 05, 2024
```
update guidelines
```
3f9f7493

[WIP] Hard error when ignoring tensors. (#27484) · 2da28c4b

Nicolas Patry authored Feb 05, 2024



* [WIP] Hard error when ignoring tensors.

* Better selection/error when saving a checkpoint.

- Find all names we should normally drop (those are in the transformers
  config)
- Find all disjoint tensors (for those we can safely trigger a copy to
  get rid of the sharing before saving)
- Clone those disjoint tensors getting rid of the issue
- Find all identical names (those should be declared in the config
  but we try to find them all anyway.)
- For all identical names:
  - If they are in the config, just ignore them everything is fine
  - If they are not, warn about them.
- For all remainder tensors which are shared yet neither identical NOR
  disjoint. raise a hard error.

* Adding a failing test on `main` that passes here.

* We don't need to keep the subfolder logic in this test.

* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

2da28c4b

Ability to override clean_code_for_run (#28783) · 0466fd5c
w4ffl35 authored Feb 05, 2024
```
* Add clean_code_for_run function

* Call clean_code_for_run from agent method
```
0466fd5c
[Docs] Fix bad doc: replace save with logging (#28855) · c430d6ea
Zizhao Chen authored Feb 04, 2024
```
Fix bad doc: replace save with logging
```
c430d6ea
Support custom scheduler in deepspeed training (#26831) · 7b702836
Ziyang authored Feb 05, 2024
```
Reuse trainer.create_scheduler to create scheduler for deepspeed
```
7b702836

Bump dash from 2.3.0 to 2.15.0 in /examples/research_projects/decision_transformer (#28845) · ca8944c4

dependabot[bot] authored Feb 05, 2024

Bump dash in /examples/research_projects/decision_transformer

Bumps [dash](https://github.com/plotly/dash) from 2.3.0 to 2.15.0.
- [Release notes](https://github.com/plotly/dash/releases)
- [Changelog](https://github.com/plotly/dash/blob/dev/CHANGELOG.md)
- [Commits](https://github.com/plotly/dash/compare/v2.3.0...v2.15.0

)

---
updated-dependencies:
- dependency-name: dash
  dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

ca8944c4

02 Feb, 2024 9 commits
- Mark `test_encoder_decoder_model_generate` for `vision_encoder_deocder` as flaky (#28842) · 3d2900e8
  amyeroberts authored Feb 02, 2024
```
Mark test as flaky
```
  3d2900e8
- Reduce GPU memory usage when using FSDP+PEFT (#28830) · 80d50076
  Sourab Mangrulkar authored Feb 02, 2024
```
support FSDP+PEFT
```
  80d50076
- Use `-v` for `pytest` on CircleCI (#28840) · f4977959
  Yih-Dar authored Feb 02, 2024
```
use -v in pytest
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  f4977959
- fix / skip (for now) some tests before switch to torch 2.2 (#28838) · a7cb92aa
  Yih-Dar authored Feb 02, 2024
```
* fix / skip some tests before we can switch to torch 2.2

* style

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  a7cb92aa
- Fix issues caused by natten (#28834) · 0e75aeef
  Yih-Dar authored Feb 02, 2024
```
try
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  0e75aeef
- Add missing None check for hf_quantizer (#28804) · ec29d25d
  Juri Ganitkevitch authored Feb 02, 2024
```
* Add missing None check for hf_quantizer

* Add test, fix logic.

* make style

* Switch test model to Mistral

* Comment

* Update tests/test_modeling_utils.py

---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
```
  ec29d25d
- Explicitly check if token ID's are None in TFBertTokenizer constructor (#28824) · 1efb21c7
  skumar951 authored Feb 02, 2024
```
Add an explicit none-check, since token ids can be 0
```
  1efb21c7
- [Docs] Fix spelling and grammar mistakes (#28825) · 721ee783
  Klaus Hipp authored Feb 02, 2024
```
* Fix typos and grammar mistakes in docs and examples

* Fix typos in docstrings and comments

* Fix spelling of `tokenizer` in model tests

* Remove erroneous spaces in decorators

* Remove extra spaces in Markdown link texts
```
  721ee783
- [docs] HfQuantizer (#28820) · 2418c64a
  Steven Liu authored Feb 01, 2024
```
* tidy

* fix path
```
  2418c64a
01 Feb, 2024 8 commits

[docs] Backbone (#28739) · abbffc45
Steven Liu authored Feb 01, 2024
```
* backbones

* fix path

* fix paths

* fix code snippet

* fix links
```
abbffc45

Add models from deit (#28302) · 23ea6743

Rockerz authored Feb 01, 2024



* Add modelss

* Add 2 more models

* add models to tocrree

* Add modles

* Update docs/source/ja/model_doc/detr.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ja/model_doc/deit.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ja/model_doc/deplot.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix bugs

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

23ea6743

[docs] fix some bugs about parameter description (#28806) · d98591a1
zspo authored Feb 02, 2024
```
Co-authored-by: p_spozzhang <p_spozzhang@tencent.com>
```
d98591a1

enable graident checkpointing in DetaObjectDetection and add tests in Swin/Donut_Swin (#28615) · e19c12e0

Sangbum Daniel Choi authored Feb 02, 2024

* enable graident checkpointing in DetaObjectDetection

* fix missing part in original DETA

* make style

* make fix-copies

* Revert "make fix-copies"

This reverts commit 4041c86c29248f1673e8173b677c20b5a4511358.

* remove fix-copies of DetaDecoder

* enable swin gradient checkpointing

* fix gradient checkpointing in donut_swin

* add tests for deta/swin/donut

* Revert "fix gradient checkpointing in donut_swin"

This reverts commit 1cf345e34d3cc0e09eb800d9895805b1dd9b474d.

* change supports_gradient_checkpointing pipeline to PreTrainedModel

* Revert "add tests for deta/swin/donut"

This reverts commit 6056ffbb1eddc3cb3a99e4ebb231ae3edf295f5b.

* Revert "Revert "fix gradient checkpointing in donut_swin""

This reverts commit 24e25d0a14891241de58a0d86f817d0b5d2a341f.

* Simple revert

* enable deformable detr gradient checkpointing

* add gradient in encoder

e19c12e0

Add tip on setting tokenizer attributes (#28764) · 7bc6d763

Matt authored Feb 01, 2024

* Add tip on setting tokenizer attributes

* Grammar

* Remove the bit that was causing doc builds to fail

7bc6d763

Fix symbolic_trace with kv cache (#28724) · 709dc432
fxmarty authored Feb 01, 2024
```
* fix symbolic_trace with kv cache

* comment & better test
```
709dc432
Make `is_torch_bf16_available_on_device` more strict (#28796) · eb8e7a00
Yih-Dar authored Feb 01, 2024
```
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
eb8e7a00

Adding [T5/MT5/UMT5]ForTokenClassification (#28443) · 0d26abdd

JB (Don) authored Feb 01, 2024

* Adding [T5/MT5/UMT5]ForTokenClassification

* Add auto mappings for T5ForTokenClassification and variants

* Adding ForTokenClassification to the list of models

* Adding attention_mask param to the T5ForTokenClassification test

* Remove outdated comment in test

* Adding EncoderOnly and Token Classification tests for MT5 and UMT5

* Fix typo in umt5 string

* Add tests for all the existing MT5 models

* Fix wrong comment in dependency_versions_table

* Reverting change to common test for _keys_to_ignore_on_load_missing

The test is correctly picking up redundant keys in _keys_to_ignore_on_load_missing.

* Removing _keys_to_ignore_on_missing from MT5 since the key is not used in the model

* Add fix-copies to MT5ModelTest

0d26abdd

31 Jan, 2024 13 commits

[docs] Correct the statement in the docstirng of compute_transition_scores in... · 7b2bd1fb
Shichao Song authored Feb 01, 2024
```
[docs] Correct the statement in the docstirng of compute_transition_scores in generation/utils.py (#28786)
```
7b2bd1fb

Split daily CI using 2 level matrix (#28773) · 47358661

Yih-Dar authored Jan 31, 2024



* update / add new workflow files

* Add comment

* Use env.NUM_SLICES

* use scripts

* use scripts

* use scripts

* Fix

* using one script

* Fix

* remove unused file

* update

* fail-fast: false

* remove unused file

* fix

* fix

* use matrix

* inputs

* style

* update

* fix

* fix

* no model name

* add doc

* allow args

* style

* pass argument

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

47358661

Add artifact name in job step to maintain job / artifact correspondence (#28682) · 95346e9d
Yih-Dar authored Jan 31, 2024
```
* avoid using job name

* apply to other files

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
95346e9d
DeepSpeed: hardcode `torch.arange` dtype on `float` usage to avoid incorrect... · beb2a096
Joao Gante authored Jan 31, 2024
```
DeepSpeed: hardcode `torch.arange` dtype on `float` usage to avoid incorrect initialization (#28760)
```
beb2a096

Flax mistral (#26943) · f7076cd3

Kian Sierra McGettigan authored Jan 31, 2024

* direct copy from llama work

* mistral modules forward pass working

* flax mistral forward pass with sliding window

* added tests

* added layer collection approach

* Revert "added layer collection approach"

This reverts commit 0e2905bf2236ec323163fc1a9f0c016b21aa8b8f.

* Revert "Revert "added layer collection approach""

This reverts commit fb17b6187ac5d16da7c461e1130514dc3d137a43.

* fixed attention outputs

* added mistral to init and auto

* fixed import name

* fixed layernorm weight dtype

* freeze initialized weights

* make sure conversion consideres bfloat16

* added backend

* added docstrings

* added cache

* fixed sliding window causal mask

* passes cache tests

* passed all tests

* applied make style

* removed commented out code

* applied fix-copies ignored other model changes

* applied make fix-copies

* removed unused functions

* passed generation integration test

* slow tests pass

* fixed slow tests

* changed default dtype from jax.numpy.float32 to float32 for docstring check

* skip cache test  for FlaxMistralForSequenceClassification since if pad_token_id in input_ids it doesn't score previous input_ids

* updated checkpoint since from_pt not included

* applied black style

* removed unused args

* Applied styling and fixup

* changed checkpoint for doc back

* fixed rf after adding it to hf hub

* Add dummy ckpt

* applied styling

* added tokenizer to new ckpt

* fixed slice format

* fix init and slice

* changed ref for placeholder TODO

* added copies from Llama

* applied styling

* applied fix-copies

* fixed docs

* update weight dtype reconversion for sharded weights

* removed Nullable input ids

* Removed unnecessary output attentions in Module

* added embedding weight initialziation

* removed unused past_key_values

* fixed deterministic

* Fixed RMS Norm and added copied from

* removed input_embeds

* applied make style

* removed nullable input ids from sequence classification model

* added copied from GPTJ

* added copied from Llama on FlaxMistralDecoderLayer

* added copied from to FlaxMistralPreTrainedModel methods

* fix test deprecation warning

* freeze gpt neox random_params and fix copies

* applied make style

* fixed doc issue

* skipped docstring test to allign # copied from

* applied make style

* removed FlaxMistralForSequenceClassification

* removed unused padding_idx

* removed more sequence classification

* removed sequence classification

* applied styling and consistency

* added copied from in tests

* removed sequence classification test logic

* applied styling

* applied make style

* removed freeze and fixed copies

* undo test change

* changed repeat_kv to tile

* fixed to key value groups

* updated copyright year

* split casual_mask

* empty to rerun failed pt_flax_equivalence test FlaxWav2Vec2ModelTest

* went back to 2023 for tests_pr_documentation_tests

* went back to 2024

* changed tile to repeat

* applied make style

* empty for retry on Wav2Vec2

f7076cd3

Wrap Keras methods to support BatchEncoding (#28734) · 7a496100

Matt authored Jan 31, 2024

* Shim the Keras methods to support BatchEncoding

* Extract everything to a convert_batch_encoding function

* Convert BatchFeature too (thanks Amy)

* tf.keras -> keras

7a496100

canonical repos moves (#28795) · 721e2d94

Julien Chaumond authored Jan 31, 2024



* canonical repos moves

* Style

---------
Co-authored-by: Lysandre <lysandre@huggingface.co>

721e2d94

Resolve DeepSpeed cannot resume training with PeftModel (#28746) · bebeeee0

Hieu Lam authored Jan 31, 2024

* fix: resolve deepspeed resume peft model issues

* chore: update something

* chore: update model instance pass into is peft model checks

* chore: remove hard code value to tests

* fix: format code

bebeeee0

[Whisper] Refactor forced_decoder_ids & prompt ids (#28687) · 65a926e8

Patrick von Platen authored Jan 31, 2024



* up

* Fix more

* Correct more

* Fix more tests

* fix fast tests

* Fix more

* fix more

* push all files

* finish all

* make style

* Fix timestamp wrap

* make style

* make style

* up

* up

* up

* Fix lang detection behavior

* Fix lang detection behavior

* Add lang detection test

* Fix lang detection behavior

* make style

* Update src/transformers/models/whisper/generation_whisper.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* better error message

* make style tests

* add warning

---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

65a926e8

[`HFQuantizer`] Remove `check_packages_compatibility` logic (#28789) · f9f1f2ac
Younes Belkada authored Jan 31, 2024
```
remove `check_packages_compatibility` logic
```
f9f1f2ac

don't initialize the output embeddings if we're going to tie them to input embeddings (#28192) · ae0c27ad

tom-p-reichel authored Jan 30, 2024

* test that tied output embeddings aren't initialized on load

* don't initialize the output embeddings if we're going to tie them to the input embeddings

ae0c27ad

Prevent MLflow exception from disrupting training (#28779) · a937425e

Alessio Serra authored Jan 31, 2024



Modified MLflow logging metrics from synchronous to asynchronous
Co-authored-by: codiceSpaghetti <alessio.ser@hotmail.it>

a937425e

[`bnb`] Fix bnb slow tests (#28788) · d703eaae
Younes Belkada authored Jan 31, 2024
```
fix bnb slow tests
```
d703eaae

30 Jan, 2024 4 commits

Pin Torch to <2.2.0 (#28785) · 74c9cfea

Matt authored Jan 30, 2024



* Pin torch to <2.2.0

* Pin torchvision and torchaudio as well

* Playing around with versions to see if this helps

* twiddle something to restart the CI

* twiddle it back

* Try changing the natten version

* make fixup

* Revert "Try changing the natten version"

This reverts commit de0d6592c35dc39ae8b5a616c27285db28262d06.

* make fixup

* fix fix fix

* fix fix fix

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

74c9cfea

Add tf_keras imports to prepare for Keras 3 (#28588) · 415e9a09

Matt authored Jan 30, 2024

* Port core files + ESM (because ESM code is odd)

* Search-replace in modelling code

* Fix up transfo_xl as well

* Fix other core files + tests (still need to add correct import to tests)

* Fix cookiecutter

* make fixup, fix imports in some more core files

* Auto-add imports to tests

* Cleanup, add imports to sagemaker tests

* Use correct exception for importing tf_keras

* Fixes in modeling_tf_utils

* make fixup

* Correct version parsing code

* Ensure the pipeline tests correctly revert to float32 after each test

* Ensure the pipeline tests correctly revert to float32 after each test

* More tf.keras -> keras

* Add dtype cast

* Better imports of tf_keras

* Add a cast for tf.assign, just in case

* Fix callback imports

415e9a09

Task-specific pipeline init args (#28439) · 1d489b3e

amyeroberts authored Jan 30, 2024

* Abstract out pipeline init args

* Address PR comments

* Reword

* BC PIPELINE_INIT_ARGS

* Remove old arguments

* Small fix

1d489b3e

[`Backbone`] Use `load_backbone` instead of `AutoBackbone.from_config` (#28661) · 2fa1c808

amyeroberts authored Jan 30, 2024

* Enable instantiating model with pretrained backbone weights

* Remove doc updates until changes made in modeling code

* Use load_backbone instead

* Add use_timm_backbone to the model configs

* Add missing imports and arguments

* Update docstrings

* Make sure test is properly configured

* Include recent DPT updates

2fa1c808