Commits · eebce4470cd2b4cfda6572ff7995f10509a9f693 · chenpangpang / transformers

"vscode:/vscode.git/clone" did not exist on "ab26e7d7dba9ec5042b334cfc4c73bb24e664a21"

13 Jul, 2023 10 commits

Add accelerate version in transformers-cli env (#24806) · eebce447
amyeroberts authored Jul 13, 2023
```
* Add accelerate version in transformers-cli env

* Add accelerate config
```
eebce447

Llama/GPTNeoX: add RoPE scaling (#24653) · 34d94094

Joao Gante authored Jul 13, 2023



* add rope_scaling

* tmp commit

* add gptneox

* add tests

* GPTNeoX can now handle long inputs, so the pipeline test was wrong

* Update src/transformers/models/open_llama/configuration_open_llama.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* remove ntk

* remove redundant validation

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

34d94094

Deprecate models (#24787) · 9342c8fb

Sylvain Gugger authored Jul 13, 2023



* Deprecate some models

* Fix imports

* Fix inits too

* Remove tests

* Add deprecated banner to documentation

* Remove from init

* Fix auto classes

* Style

* Remote upgrade strategy 1

* Remove site package cache

* Revert this part

* Fix typo...

* Update utils

* Update docs/source/en/model_doc/bort.md
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

* Address review comments

* With all files saved

---------
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

9342c8fb

Skip torchscript tests for `MusicgenForConditionalGeneration` (#24782) · 717dadc6
Yih-Dar authored Jul 13, 2023
```
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
717dadc6
Fix MobileVitV2 doctest checkpoint (#24805) · e367a977
amyeroberts authored Jul 13, 2023
```
* Fix doctest checkpoint

* Add import torch for mobilevit
```
e367a977
Upgrade jax/jaxlib/flax pin versions (#24791) · e5381899
Yih-Dar authored Jul 13, 2023
```
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
e5381899

[DOC] Clarify relationshi load_best_model_at_end and save_total_limit (#24614) · 6ba4d5de

Bram Vanroy authored Jul 13, 2023



* Update training_args.py

Clarify the relationship between `load_best_model_at_end` and `save_total_limit`.

* fix: faulty quotes

* make quality

* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* DOCS: add explicit `True`

* DOCS: make style/quality

---------
Co-authored-by: Bram Vanroy <Bram.Vanroy@UGent.be>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

6ba4d5de

[fix] Change the condition of ValueError in... · 21946a8c

SeongBeomLEE authored Jul 13, 2023

[fix] Change the condition of ValueError in "convert_checkpoint_from_transformers_to_megatron" (#24769)

* fix: half inference error

norm_factor is still torch.float32 after using model.half

So I changed it to register_buffer so I can change it to torch.float16 after using model.half

* fix: Added a variable "persistent=False"

* run make style

* [fix] Change the condition of ValueError
convert_checkpoint_from_transformers_to_megatron

* [fix] error wording
layers -> attention heads

21946a8c

Removing unnecessary `device=device` in modeling_llama.py (#24696) · 1f6f32c2
Liyang90 authored Jul 13, 2023
```
* Update modeling_llama.py

Removing unnecessary `device=device`

* fix in all occurrences of _make_causal_mask
```
1f6f32c2
Revert "Unpin protobuf in docker file (for daily CI)" (#24800) · 906afa1d
Yih-Dar authored Jul 13, 2023
```
Revert "Unpin protobuf in docker file (for daily CI) (#24761)"

This reverts commit 45025d92.
```
906afa1d

12 Jul, 2023 9 commits

Rm duplicate pad_across_processes (#24780) · f1732e13
Zach Mueller authored Jul 12, 2023
```
Rm duplicate
```
f1732e13
Remove WWT from README (#24672) · cfc8a053
Lysandre Debut authored Jul 12, 2023

cfc8a053

gpt-bigcode: avoid `zero_` to support Core ML (#24755) · 395e566a

Pedro Cuenca authored Jul 12, 2023

gpt-bigcode: avoid `zeros_` to support Core ML.

In-place `zeros_` is not supported by the Core ML conversion process.
This PR replaces it with `zeros_like` so conversion can proceed.

The change only affects a workaround for a PyTorch bug on the `cpu`
device.

395e566a

Fix pad across processes dim in trainer and not being able to set the timeout (#24775) · 02842855

Zach Mueller authored Jul 12, 2023



* dim, and rm copy

* Don't rm copy for now

* Oops

* pad index

* Should be a working test

* Tickle down ddp timeout

* Put fix back in now that testing locally is done

* Better comment specifying timeout
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

02842855

Update default values of bos/eos token ids in `CLIPTextConfig` (#24773) · 4f85aaa6
Yih-Dar authored Jul 12, 2023
```
* fix

* fix

* fix

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
4f85aaa6

Replacement of 20 asserts with exceptions (#24757) · fc9e387d

Bauke Brenninkmeijer authored Jul 12, 2023



* initial replacements of asserts with errors/exceptions

* replace assert with exception in generation, align and bart

* reset formatting change

* reset another formatting issue

* Apply suggestion
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* don't touch this file

* change to 'is not False'

* fix type

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

fc9e387d

Docs: Update logit processors __call__ docs (#24729) · 430a04a7

Joao Gante authored Jul 12, 2023

* tmp commit

* __call__ docs

* kwargs documented; shorter input_ids doc

* nit

* Update src/transformers/generation/logits_process.py

430a04a7

Add MobileVitV2 to doctests (#24771) · 6e2f0696
amyeroberts authored Jul 12, 2023
```
* Add to doctests

* Alphabetical order
```
6e2f0696
Fix eval_accumulation_steps leading to incorrect metrics (#24756) · 7edc33ac
Zach Mueller authored Jul 12, 2023
```
Fix eval steps
```
7edc33ac

11 Jul, 2023 15 commits

Unpin protobuf in docker file (for daily CI) (#24761) · 45025d92
Yih-Dar authored Jul 11, 2023
```
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
45025d92
Allow existing configs to be registered (#24760) · 6aadb8d0
Sylvain Gugger authored Jul 11, 2023

6aadb8d0

🐛

Handle empty gen_kwargs for seq2seq trainer prediction_step function (#24759) · 4c0e251d

Gaurav Kumbhat authored Jul 11, 2023

* 🐛

 Handle empty gen_kwargs for seq2seq trainer prediction_step fn
Signed-off-by: gkumbhat <kumbhat.gaurav@gmail.com>

* Update src/transformers/trainer_seq2seq.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

---------
Signed-off-by: gkumbhat <kumbhat.gaurav@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

4c0e251d

Fix lr scheduler not being reset on reruns (#24758) · 253d43d4

Zach Mueller authored Jul 11, 2023

* Try this

* Solved!

* Rm extranious

* Rm extranious

* self

* Args'

* Check for if we created the lr scheduler

* Move comment

* Clean

253d43d4

Skip some slow tests for doctesting in PRs (Circle)CI (#24753) · 1be0145d
Yih-Dar authored Jul 11, 2023
```
* fix

* fix

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
1be0145d
[InstructBLIP] Fix bos token of LLaMa checkpoints (#24492) · bb13a928
NielsRogge authored Jul 11, 2023
```
* Add fix

* Fix doctest
```
bb13a928

Fix non-deterministic Megatron-LM checkpoint name (#24674) · aac4c799

janEbert authored Jul 11, 2023

Fix non-deterministic checkpoint name

`os.listdir`'s order is not deterministic, which is a problem when
querying the first listed file as in the code (`os.listdir(...)[0]`).

This can return a checkpoint name such as `distrib_optim.pt`, which does
not include desired information such as the saved arguments originally
given to Megatron-LM.

aac4c799

Skip keys not in the state dict when finding mismatched weights (#24749) · 33aafc26
Sylvain Gugger authored Jul 11, 2023

33aafc26
add gradient checkpointing for distilbert (#24719) · 3d869726
Zehan Li authored Jul 11, 2023
```
* add gradient checkpointing for distilbert

* reformatted
```
3d869726
Docs: add `kwargs` type to fix formatting (#24733) · 2642d8d0
Joao Gante authored Jul 11, 2023

2642d8d0

fix: Text splitting in the BasicTokenizer (#22280) · 5739726f

Connor Henderson authored Jul 11, 2023

* fix: Apostraphe splitting in the BasicTokenizer for CLIPTokenizer

* account for apostrophe at start of new word

* remove _run_split_on_punc, use re.findall instead

* remove debugging, make style and quality

* use pattern and punc splitting, repo-consistency will fail

* remove commented out debugging

* adds bool args to BasicTokenizer, remove pattern

* do_split_on_punc default True

* clean stray comments and line breaks

* rebase, repo-consistency

* update to just do punctuation split

* add unicode normalizing back

* remove redundant line

5739726f

Fix typo in LocalAgent (#24736) · 2489e380
Justin Martin authored Jul 11, 2023

2489e380

Add ViViT (#22518) · 8a5e8a9c

Jegor Kitškerkin authored Jul 11, 2023



* Add model

* Add ability to get classification head weights

* Add docs

* Add imports to __init__.py

* Run style

* Fix imports and add mdx doc

* Run style

* Fix copyright

* Fix config docstring

* Remove imports of ViViTLayer and load_tf_weights_in_vivit

* Remove FeatureExtractor and replace with ImageProcessor everywhere

* Remove ViViTForPreTraining from vivit.mdx

* Change ViViT -> Vivit everywhere

* Add model_doc to _toctree.yml

* Replace tuples with lists in arguments of VivitConfig

* Rename patch_size to tubelet_size in TubeletEmbeddings

* Fix checkpoint names

* Add tests

* Remove unused num_frames

* Fix imports for VivitImageProcessor

* Minor fixes

* Decrease number of frames in VivitModelTester from 32 to 16

* Decrease number of frames in VivitModelTester from 16 to 8

* Add initialization for pos embeddings

* Rename Vivit -> ViViT in some places

* Fix docstring and formatting

* Rename TubeletEmbeddings -> VivitTubeletEmbeddings

* Remove load_tf_weights_in_vivit

* Change checkpoint name

* Remove Vivit _TOKENIZER_FOR_DOC

* Fix

* Fix VivitTubeletEmbeddings and pass config object as parameter

* Use image_size and num_frames instead of video_size

* Change conversion script and fix differences with the orig implementation

* Fix docstrings

* Add attention head pruning

* Run style and fixup

* Fix tests

* Add ViViT to video_classification.mdx

* Save processor in conversion script

* Fix

* Add image processor test

* Run fixup and style

* Run fix-copies

* Update tests/models/vivit/test_modeling_vivit.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/vivit/test_modeling_vivit.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vivit/modeling_vivit.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Use PyAV instead of decord

* Add unittest.skip

* Run style

* Remove unneeded test

* Update docs/source/en/model_doc/vivit.mdx
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vivit/configuration_vivit.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vivit/modeling_vivit.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vivit/image_processing_vivit.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vivit/modeling_vivit.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vivit/modeling_vivit.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vivit/image_processing_vivit.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vivit/modeling_vivit.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Add model

* Add docs

* Run style

* Fix imports and add mdx doc

* Remove FeatureExtractor and replace with ImageProcessor everywhere

* Change ViViT -> Vivit everywhere

* Rename Vivit -> ViViT in some places

* Update src/transformers/models/vivit/image_processing_vivit.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Run make style

* Remove inputs save

* Fix image processor

* Fix

* Run `make style`

* Decrease parameters of VivitModelTester

* Decrease tubelet size

* Rename vivit.mdx

* Update src/transformers/models/vivit/image_processing_vivit.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vivit/image_processing_vivit.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vivit/image_processing_vivit.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fix default values in image_processing_vivit.py

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

8a5e8a9c

[Patch-t5-tokenizer] Patches the changes on T5 to make sure previous behaviour... · b15343de

Arthur authored Jul 11, 2023


[Patch-t5-tokenizer] Patches the changes on T5 to make sure previous behaviour is still valide for beginning of words (#24622)

* patch `_tokenize` function

* more tests

* properly fix

* fixup

* Update src/transformers/models/t5/tokenization_t5.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix without ifs

* update

* protect import

* add python processing

* is first needed

* add doc and update with lefacy

* updaate

* fix T5 SPM converter

* styling

* fix T5 warning

* add is_seqio_available

* remove is_first

* revert some changes

* more tests and update

* update llama test batterie

* fixup

* refactor T5 spm common tests

* draft the llama tests

* update

* uopdate test

* nits

* refine

* name nit

* fix t5 tests

* fix T5

* update

* revert convert slow to fast changes that fail lots of tests

* legacy support

* fixup

* nits is first not defined

* don't use legacy behaviour for switch transformers

* style

* My attempt to check.

* nits

* fixes

* update

* fixup

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* updates

* fixup

* add legacy warning

* fixup

* warning_once nit

* update t5 documentation test

* update llama tok documentation

* add space to warning

* nits

* nit

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* last nits

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

b15343de

Falcon port (#24523) · b3ab3fac

Matt authored Jul 11, 2023



* Initial commit

* Update src/transformers/models/falcon/configuration_falcon.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/falcon/configuration_falcon.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Cleanup config docstring

* Update src/transformers/models/falcon/configuration_falcon.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Convert to relative imports

* Remove torch < 1.8 warning

* Restructure cos_sin header

* qkv -> query, key, value

* Refactor attention calculation

* Add a couple of config variables to account for the different checkpoints

* Successful merging of the code paths!

* Fix misplaced line in the non-parallel attention path

* Update config and tests

* Add a pad_token_id when testing

* Support output_attentions when alibi is None

* make fixup

* Skip KV cache shape test

* No more _keys_to_ignore_on_load_missing

* Simplify self attention a bit

* Simplify self attention a bit

* make fixup

* stash commit

* Some more attention mask updates

* Should pass all tests except assisted generation!

* Add big model generation test

* make fixup

* Add temporary workaround for test

* Test overrides for assisted generation

* Update src/transformers/models/falcon/modeling_falcon.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/falcon/modeling_falcon.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/falcon/modeling_falcon.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update tests/models/falcon/test_modeling_falcon.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Test overrides for assisted generation

* Add generation demo

* Update copyright

* Make the docstring model actually small

* Add module-level docstring

* Remove all assertions

* Add copied from bloom

* Reformat the QKV layer

* Add copied from bloom

* Update src/transformers/models/falcon/modeling_falcon.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Remove unused line and reformat

* No single letter variables

* Cleanup return names

* Add copied from line

* Remove the deprecated arguments blocks

* Change the embeddings test to an alibi on/off test

* Remove position_ids from FalconForQA

* Remove old check for token type IDs

* Fix the alibi path when multi_query is False

* Update src/transformers/models/falcon/modeling_falcon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/falcon/modeling_falcon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/falcon/test_modeling_falcon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update config naming

* Fix typo for new_decoder_architecture

* Add some comments

* Fix docstring

* Fix docstring

* Create range in the right dtype from the start

* Review comment cleanup

* n_head_kv -> num_kv_heads

* self.alibi -> self.use_alibi

* self.num_kv -> self.num_kv_heads

* Reorder config args

* Made alibi arguments Optional

* Add all model docstrings

* Add extra checkpoints

* Add author info for Falcon

* Stop removing token_type_ids because our checkpoints shouldn't return it anymore

* Add one hopeful comment for the future

* Fix typo

* Update tests, fix cache issue for generation

* Use -1e9 instead of -inf to avoid float overflow

* Recompute the rotary embeddings much less often

* Re-enable disabled tests

* One final fix to attention mask calculation, and update tests

* Cleanup targeting falcon-40b equivalency

* Post-rebase docs update

* Update docstrings, especially in the config

* More descriptive variable names, and comments where we can't rename them

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

b3ab3fac

10 Jul, 2023 4 commits

add link to accelerate doc (#24601) · 35eac0df
Marc Sun authored Jul 10, 2023

35eac0df
Docs: change some `input_ids` doc reference from `BertTokenizer` to `AutoTokenizer` (#24730) · a074a5d3
Joao Gante authored Jul 10, 2023

a074a5d3

[`T5`] Adding model_parallel = False to `T5ForQuestionAnswering` and... · 25411085

Sebastian Husch Lee authored Jul 10, 2023

[`T5`] Adding model_parallel = False to `T5ForQuestionAnswering` and `MT5ForQuestionAnswering` (#24684)

Adding model_parallel = False

25411085

Add Multi Resolution Analysis (MRA) (New PR) (#24513) · 30ed3adf

novice authored Jul 10, 2023



* Add all files

* Update masked_language_modeling.md

* fix mlm models

* fix conflicts

* fix conflicts

* fix copies

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Reduce seq_len and hidden_size in ModelTester

* remove output_attentions

* fix conflicts

* remove copied from statements

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

30ed3adf

07 Jul, 2023 2 commits

Enable `conversational` pipeline for `GPTSw3Tokenizer` (#24648) · abaca9f9

Dan Saattrup Nielsen authored Jul 07, 2023

* feat: Add `_build_conversation_input_ids` to GPT-SW3 tokenizer, adjust line length

* feat: Merge in PR https://github.com/huggingface/transformers/pull/24504

.

This allows the GPT-SW3 models (and other GPT-2 based models) to be 4-bit quantised
using `load_in_4bit` with `bitsandbytes`.

* fix: F-string

* fix: F-string

* fix: Remove EOS token from all responses

* fix: Remove redundant newlines

* feat: Add `load_in_4bit` to `Pipeline`

* fix: Separate turns with `\n<s>\n` rather than `<s>`

* fix: Add missing newline in prompt

* tests: Add unit tests for the new `_build_conversation_input_ids` method

* style: Automatic style correction

* tests: Compare encodings rather than decodings

* fix: Remove `load_in_4bit` from pipeline arguments

* docs: Add description and references of the GPT-SW3 chat format

* style: Line breaks

* Apply suggestions from code review

Fix Conversation type hints
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix: Import TYPE_CHECKING

* style: Run automatic fixes

* tests: Remove `_build_conversation_input_ids` unit tests

* tests: Remove import of `Conversation` in GPT-SW3 unit test

* style: Revert formatting

* style: Move TYPE_CHECKING line after all imports

* style: Imports order

* fix: Change prompt to ensure that `sp_model.encode` and `encode` yields same result

* docs: Add TODO comment related to the addition of whitespace during decoding

* style: Automatic style checks

* fix: Remove final whitespace in prompt, as prefix whitespace is used by sentencepiece

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

abaca9f9

Whisper: fix prompted max length (#24666) · f614b6e3
Joao Gante authored Jul 07, 2023

f614b6e3