Commits · 24d787ce9d362dc0e6151395cfd77337c6c8d475 · chenpangpang / transformers

"vscode:/vscode.git/clone" did not exist on "03bc6ece1be11a535fb2c263d0fbb710a2f7a067"

04 Apr, 2024 1 commit

[`ProcessingIdefics`] Attention mask bug with padding (#29449) · 75b76a5e

byi8220 authored Apr 04, 2024

* Defaulted IdeficsProcessor padding to 'longest', removed manual padding

* make fixup

* Defaulted processor call to padding=False

* Add padding to processor call in IdeficsModelIntegrationTest as well

* Defaulted IdeficsProcessor padding to 'longest', removed manual padding

* make fixup

* Defaulted processor call to padding=False

* Add padding to processor call in IdeficsModelIntegrationTest as well

* redefaulted padding=longest again

* fixup/doc

75b76a5e

03 Apr, 2024 5 commits

Fix vipllava for generation (#29874) · cc75f1ac
Raushan Turganbay authored Apr 03, 2024
```
* fix vipllava generation

* consistent llava code

* revert llava tests changes
```
cc75f1ac
Fix probability computation in `WhisperNoSpeechDetection` when recomputing scores (#29248) · 240e1062
Ondřej Cífka authored Apr 03, 2024
```
* Fix is_scores_logprobs in WhisperNoSpeechDetection

* Add test_whisper_longform_no_speech_detection

* Fix typo
```
240e1062

Fix `kwargs` handling in `generate_with_fallback` (#29225) · bcd42c4a

Ondřej Cífka authored Apr 03, 2024

* Fix generate_with_fallback **kwargs

* Change pop to get

* Delete keys from kwargs to prevent overriding generation_config

* Revert to passing kwargs by reference, but make a (shallow) copy

* dict -> copy.copy

* Add test_whisper_longform_multi_batch_beam

bcd42c4a

Fix Qwen2Tokenizer (#29929) · 851f253f

Ren Xuancheng authored Apr 03, 2024



qwen2: fixed tokens starting with # in slow tokenizer; add tests
Co-authored-by: jklj077 <17811943+jklj077@users.noreply.github.com>

851f253f

Update `tests/utils/tiny_model_summary.json` (#29941) · b44df05b
Yih-Dar authored Apr 03, 2024
```
update
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
b44df05b

02 Apr, 2024 6 commits

Hard error when ignoring tensors. (#27484) (#29906) · 9b0a8ea7

Nicolas Patry authored Apr 02, 2024



* Hard error when ignoring tensors. (#27484)

* [WIP] Hard error when ignoring tensors.

* Better selection/error when saving a checkpoint.

- Find all names we should normally drop (those are in the transformers
  config)
- Find all disjoint tensors (for those we can safely trigger a copy to
  get rid of the sharing before saving)
- Clone those disjoint tensors getting rid of the issue
- Find all identical names (those should be declared in the config
  but we try to find them all anyway.)
- For all identical names:
  - If they are in the config, just ignore them everything is fine
  - If they are not, warn about them.
- For all remainder tensors which are shared yet neither identical NOR
  disjoint. raise a hard error.

* Adding a failing test on `main` that passes here.

* We don't need to keep the subfolder logic in this test.

* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Add small tests.

* Dead variable.

* Fixup.

* Fixing tied_Weights_keys on generic models.

* Fixup + T5 encoder/decoder tying (with different layers)

* Code quality.

* Dynamic member.

* trigger

* Fixing encoder name for other types of encoder/decoder combos.

* Fix scoping.

* Update .github/workflows/self-scheduled.yml
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Fixing the tied_weights after the call.

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

9b0a8ea7

Fix `skip_special_tokens` for `Wav2Vec2CTCTokenizer._decode` (#29311) · 15cd6871

Minsub Lee (Matt) authored Apr 02, 2024

* Fix skip_special_tokens process for Wav2Vec2CTCTokenizer._decode

* Fix skip_special_tokens for Wav2Vec2CTCTokenizer._decode

* Exclude pad_token filtering since it is used as CTC-blank token

* Add small test for skip_special_tokens

* Update decoding test for added new token

15cd6871

Add Flash Attention 2 support to Musicgen and Musicgen Melody (#29939) · 0d04b1e2

Yoach Lacombe authored Apr 02, 2024

* add FA2 to o.g Musicgen

* make style

* add FA2 support to Musicgen Melody

* add generation FA2 tests to o.g Musicgen

* make style and fix copies

* add Musicgen to FA2 docs + deprecate list

* add sdpa supports to Musicgen's

* make style and fix copies

* refactor attention implementation arguments

* add Copied from to sdpa tests

* add copied form in sdpa tests melody

* add copied for FA2 generation tests

* add FA2 inference copied from

* make style

0d04b1e2

Adding FlaxNoRepeatNGramLogitsProcessor (#29677) · fed27ffc

théo gigant authored Apr 02, 2024

* fix issue with logit processor in beam search in Flax

* adding FlaxNoRepeatNGramLogitsProcessor class + unit test

* style correction and code verification

* add FlaxNoRepeatNGramLogitsProcessor to the test_processor_list and test_processor_list_jitted tests

* fix an issue where ngrams are banned only if they appear ==1 time + update description of get_previous_ngrams

* replace non-jit compatible masking of ngrams that are not yet generated with jittable version

* Revert "fix issue with logit processor in beam search in Flax"

This reverts commit 09b70d7e4dc32d0cc4db61af09a835a9cd238b50.

* add FlaxNoRepeatNGramLogitsProcessor to _get_logits_processor

* change the method of casting to boolean of banned tokens indices

* fix code style

* remove some useless operations + significantly faster computation of update indices using jax.lax.fori_loop

* remove useless loop iterations

* set some variables that were calculated and used multiple times

* fix format

fed27ffc

Fix 29807 sinusoidal positional encodings in Flaubert, Informer and XLM (#29904) · 416711c3

Hovnatan Karapetyan authored Apr 02, 2024

* Fix sinusoidal_embeddings in FlaubertModel

* Fix for Informer

* Fix for XLM

* Move sinusoidal emb for XLM

* Move sinusoidal emb for Flaubert

* Small cleanup

* Add comments on tests code copied from

* Add with Distilbert->

416711c3

[`generate`] fix breaking change for patch (#29976) · 83b26dd7

Arthur authored Apr 02, 2024

* fix bug and add tests

* nit

* otherway to get the cur len instead of attention mask

* more places where this might have been broken

* nit

* oups

* inputs_embeds vs input_embeds

* test generated outptus

* style

* nit

* fix

* skip failing biogpt

83b26dd7

01 Apr, 2024 4 commits
- Generate: move misplaced test (#29902) · c9f6e5e3
  Joao Gante authored Apr 01, 2024
  
  c9f6e5e3
- [tests] fix the wrong output in... · e4f5b57a
  Fanli Lin authored Apr 01, 2024
```
[tests] fix the wrong output in `ImageToTextPipelineTests.test_conditional_generation_llava` (#29975)

bug fix
```
  e4f5b57a
- Fix copies main ci (#29979) · fa2c49b0
  Arthur authored Apr 01, 2024
```
* fix copies

* nit

* style

* Update utils/check_copies.py
```
  fa2c49b0
- Fix FA2 tests (#29909) · 569f6c7d
  Yoach Lacombe authored Apr 01, 2024
```
* fix FA2 tests

* refactor inference test name
```
  569f6c7d
31 Mar, 2024 1 commit

Rework tests to compare trainer checkpoint args (#29883) · 3b8e2932

Zach Mueller authored Mar 30, 2024



* Start rework

* Fix failing test

* Include max

* Update src/transformers/trainer.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

3b8e2932

29 Mar, 2024 1 commit

Mark `test_eager_matches_sdpa_generate` flaky for some models (#29479) · 43d17c18

Yih-Dar authored Mar 29, 2024



* fix

* revert for qwen2

* revert for qwen2

* update

* update

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

43d17c18

28 Mar, 2024 7 commits

[`LlamaSlowConverter`] Slow to Fast better support (#29797) · 536ea2ac

Arthur authored Mar 29, 2024

* fix

* fix test

* style

* nit

* rather rely on concert token to id

* fix quality

* Update src/transformers/convert_slow_tokenizer.py

536ea2ac

Allow GradientAccumulationPlugin to be configured from AcceleratorConfig (#29589) · 4df5b9b4

Yu Chin Fabian Lim authored Mar 28, 2024



* add gradient_accumulation_kwargs to AcceleratorConfig

* add suggestions from @muellerzr to docstrings, new behavior and tests

* Documentation suggestions from @muellerz
Co-authored-by: Zach Mueller <muellerzr@gmail.com>

* addressed @muellerzr comments regarding tests and test utils

* moved accelerate version to top of file.

* @muellerzr's variable fix
Co-authored-by: Zach Mueller <muellerzr@gmail.com>

* address @amyeroberts. fix tests and docstrings

* address @amyeroberts additional suggestions

---------
Co-authored-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Co-authored-by: Zach Mueller <muellerzr@gmail.com>

4df5b9b4

[ `TokenizationLlama`] fix the way we convert tokens to strings to keep... · a2a7f716

Arthur authored Mar 28, 2024

[ `TokenizationLlama`] fix the way we convert tokens to strings to keep leading spaces 🚨 breaking fix (#29453)

* nit

* update test and fix test

* fixup

a2a7f716

RoPE models: add numerical sanity-check test for RoPE scaling (#29808) · 441de62f
Joao Gante authored Mar 28, 2024
```
* add hard rope scaling test

* make fixup

* quick rope scaling tests

* add copy statements
```
441de62f

add functions to inspect model and optimizer status to trainer.py (#29838) · aac7099c

Christopher Keibel authored Mar 28, 2024



* add functions to get number of params which require grad, get optimizer group for parameters and get learning rates of param groups to trainer.py

* add tests and raise ValueError when optimizer is None

* add second layer to test and freeze its weigths

* check if torch is available before running tests

* use decorator to check if torch is available
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix test indentation
Co-authored-by: Zach Mueller <muellerzr@gmail.com>

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Zach Mueller <muellerzr@gmail.com>

aac7099c

Tests: replace `torch.testing.assert_allclose` by `torch.testing.assert_close` (#29915) · 248d5d23
Joao Gante authored Mar 28, 2024
```
* replace torch.testing.assert_allclose by torch.testing.assert_close

* missing atol rtol
```
248d5d23

Adding Flash Attention 2 Support for GPT2 (#29226) · 22d159dd

Eduardo Pacheco authored Mar 28, 2024



* First commit to add flash attention 2 for GPT-2

* more improvements

* Make GPT2 pass tests and fixed Decison Transformers copies

* Fixed missing arg

* fix copies

* Added expected speedup

* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Added test

* Fixed attn attribute

* Update docs/source/en/model_doc/gpt2.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt2.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update Decision transformer attentions

* More updates

* Passing tests

* Fix copies

* Fix copies part 2

* Decision transformer updates

* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fix copies

* Decision transformer not supporting flash attn

* Addressed comments

* Addressed comments

* Addressed comments

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

22d159dd

27 Mar, 2024 6 commits

MixtralSparseMoeBlock: add gate jitter (#29865) · a25037be

Lorenzo Verardo authored Mar 27, 2024

This commit adds gate jitter to MixtralSparseMoeBlock's input data
before passing it through the MoE layer, if turned on.

a25037be

Move `eos_token_id` to stopping criteria (#29459) · 0efcf323

Raushan Turganbay authored Mar 27, 2024



* add eos stopping criteria

* minor fix

* Update tests/generation/test_stopping_criteria.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* check eos is not None and fix tests

* make style and fixup

* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update tests/generation/test_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update tests/generation/test_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/__init__.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/generation/stopping_criteria.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* camel case everywhere

* call stopping criteria list for candidate ids

* make style  and fixup

* Empty commit

* Empty commit to pass flaky test

* set max length in PromptLookupCandidateGenerator

* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* lets fix this typo in docs

* Update src/transformers/generation/utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/generation/utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* update PR

* empty commit

---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

0efcf323

Reimplement "Automatic safetensors conversion when lacking these files" (#29846) · 4d8427f7

Lysandre Debut authored Mar 27, 2024

* Automatic safetensors conversion when lacking these files (#29390)

* Automatic safetensors conversion when lacking these files

* Remove debug

* Thread name

* Typo

* Ensure that raises do not affect the main thread

* Catch all errors

4d8427f7

Fix 29807, sinusoidal positional encodings overwritten by post_init() (#29813) · a81cf9ee

Hovnatan Karapetyan authored Mar 27, 2024

* Check for requires_grad when initing weights

* Add unit test

* Move sinusoidal positional encoding generation after post_init()

* Add modules to skip init list

* Move create_sinusoidal_embeddings to _init_weights

a81cf9ee

Mamba `slow_forward` gradient fix (#29563) · cefb819f

Anton Vlasjuk authored Mar 27, 2024

* FIX: Cached slow forward in mamba
- additionally added mamba cached test
- added unused test (mamba causal lm forward and backward)
- fixed typo: "causl" --> "causal"

* formatting

* fix: use real `slow_forward` call instead of torch module's

* add shape assertion for mixer block test

* adjust shape assertion

cefb819f

Add Qwen2MoE (#29377) · 1c39974a

Bo Zheng authored Mar 27, 2024



* add support for qwen2 MoE models

* update docs

* add support for qwen2 MoE models

* update docs

* update model name & test

* update readme

* update class names & readme & model_doc of Qwen2MoE.

* update architecture name

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* update modeling_qwen2_moe.py

* fix model architecture

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* update modeling_qwen2_moe.py

* fix model architecture

* fix style

* fix test when there are sparse and non sparse layers

* fixup

* Update README.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixup

* fixup

* add archive back

* add support for qwen2 MoE models

* update docs

* update model name & test

* update readme

* update class names & readme & model_doc of Qwen2MoE.

* update architecture name

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* update modeling_qwen2_moe.py

* fix model architecture

* fixup

* fix qwen2_moe tests

* use Qwen2Tokenizer instead of Qwen2MoeTokenizer

* fix style

* fix test when there are sparse and non sparse layers

* fixup

* add archive back

* fix integration test

* fixup

---------
Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

1c39974a

26 Mar, 2024 4 commits

Add `cosine_with_min_lr` scheduler in Trainer (#29341) · ef609958
Yanyi Liu authored Mar 26, 2024
```
* Add cosine_with_min_lr scheduler

* Update error message for missing min_lr or min_lr_rate
```
ef609958
Allow `bos_token_id is None` during the generation with `inputs_embeds` (#29772) · 998b5bb5
Zhihao Lin authored Mar 26, 2024
```
* update

* add ut

* update
```
998b5bb5

Replace 'decord' with 'av' in VideoClassificationPipeline (#29747) · b32bf85b

yunxiangtang authored Mar 26, 2024



* replace the 'decord' with 'av' in VideoClassificationPipeline

* fix the check of backend in VideoClassificationPipeline

* adjust the order of imports

* format 'video_classification.py'

* format 'video_classification.py' with ruff

---------
Co-authored-by: wanqiancheng <13541261013@163.com>

b32bf85b

Add warnings if training args differ from checkpoint trainer state (#29255) · b5a6d6ee

Jonathan Flynn authored Mar 26, 2024



* add warnings if training args differ from checkpoint args stored in trainer_state.json

* run formatting and styling

* add a test

* format and styling

---------
Co-authored-by: Jonathan Flynn <jonl.flynn@guardian.co.uk>

b5a6d6ee

25 Mar, 2024 2 commits

Populate torch_dtype from model to pipeline (#28940) · 8e9a2207

Yuki Watanabe authored Mar 25, 2024



* Populate torch_dtype from model to pipeline
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

* use property
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

* lint
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

* Remove default handling
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

---------
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

8e9a2207

Remove static pretrained maps from the library's internals (#29112) · 39114c03

Lysandre Debut authored Mar 25, 2024



* [test_all] Remove static pretrained maps from the library's internals

* Deprecate archive maps instead of removing them

* Revert init changes

* [test_all] Deprecate instead of removing

* [test_all] PVT v2 support

* [test_all] Tests should all pass

* [test_all] Style

* Address review comments

* Update src/transformers/models/deprecated/_archive_maps.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/deprecated/_archive_maps.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* [test_all] trigger tests

* [test_all] LLAVA

* [test_all] Bad rebase

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

39114c03

22 Mar, 2024 1 commit

Correct llava mask & fix missing setter for `vocab_size` (#29389) · 13b23704

fxmarty authored Mar 22, 2024

* correct llava mask

* fix vipllava as wlel

* mask out embedding for padding tokens

* add test

* fix style

* add setter

* fix test on suggestion

13b23704

21 Mar, 2024 2 commits

Change in-place operations to out-of-place in LogitsProcessors (#29680) · fadb0533

Raushan Turganbay authored Mar 21, 2024



* change in-place -> out-of-place

* add tests

* add more tests

* naming consistency

* fix doctest

* forgot min-length processors

* empty

* Revert "fix doctest"

This reverts commit 4772768457f9bc057f1d4d9d67ea94eb7224eb8d.

* revert change in docstring

* Update tests/generation/test_logits_process.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/generation/test_logits_process.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

fadb0533

Prepend `bos token` to Blip generations (#29642) · b469ebc5

Raushan Turganbay authored Mar 21, 2024



* prepend "bos" to blip generation

* minor changes

* Update src/transformers/models/blip_2/modeling_blip_2.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/models/instructblip/modeling_instructblip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add generation tester mixin

---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

b469ebc5