Commits · 8f36ab3e22c029018e020ce232a5f5310620b53f · chenpangpang / transformers

25 Jul, 2023 14 commits

[`T5`, `MT5`, `UMT5`] Add [T5, MT5, UMT5]ForSequenceClassification (#24726) · 8f36ab3e

Sebastian Husch Lee authored Jul 25, 2023

* Initial addition of t5forsequenceclassification

* Adding imports and adding tests

* Formatting

* Running make fix-copies

* Adding mt5forseq

* Formatting

* run make fix-copies

* Adding to docs

* Add model_parallel

* Fix bug

* Fix

* Remove TODO

* Fixing tests for T5ForSequenceClassification

* Undo changes to dependency_versions_table.py

* Change classification head to work with T5Config directly

* Change seq length to let tests pass

* PR comments for formatting

* Formatting

* Initial addition of UMT5ForSequenceClassification

* Adding to inits and formatting

* run make fix-copies

* Add doc for UMT5ForSeqClass

* Update UMT5 config

* Fix docs

* Skip torch fx test for SequenceClassification

* Formatting

* Add skip to UMT5 tests as well

* Fix umt5 tests

* Running make fix-copies

* PR comments

* Fix for change to sentence_representation

* Rename seq_len to hidden_size since that's what it is

* Use base_model to follow format of the rest of the library

* Update docs

* Extract the decoder_input_ids changes and make one liner

* Make one-liner

8f36ab3e

Hotfix for failing `MusicgenForConditionalGeneration` tests (#25091) · 21150cb0
Yih-Dar authored Jul 25, 2023
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
21150cb0

[ `PreTrainedTokenizerFast`] Keep properties from fast tokenizer (#25053) · f9cc3338

Arthur authored Jul 25, 2023

* draft solution

* use `setdefault`

* nits

* add tests and fix truncation issue

* fix test

* test passes locally

* quality

* updates

* update tsets

f9cc3338

[`TF`] Also apply patch to support left padding (#25085) · 2fac3422
Arthur authored Jul 25, 2023
```
* tf versions

* apply changes to other models

* 3 models slipped through the cracks
```
2fac3422

[ `ForSequenceClassification`] Support `left` padding (#24979) · f1045227

Arthur authored Jul 25, 2023

* support left padding

* nit

* Update src/transformers/models/gpt_neox/modeling_gpt_neox.py

* Update src/transformers/models/gpt_neox/modeling_gpt_neox.py

f1045227

Allow generic composite models to pass more kwargs (#24927) · 1e662f0f

Yih-Dar authored Jul 25, 2023



* fix

* Update src/transformers/generation/utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* update

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

1e662f0f

[DOCS] add example NoBadWordsLogitsProcessor (#25046) · b99f7bd4
Gema Parreño authored Jul 25, 2023
```
* add example NoBadWordsLogitsProcessor

* fix L764 & L767

* make style
```
b99f7bd4

[`MPT`] Add MosaicML's `MPT` model to transformers (#24629) · dcb183f4

Arthur authored Jul 25, 2023



* draft add new model like

* some cleaning of the config

* nits

* add nested configs

* nits

* update

* update

* added layer norms + triton kernels

* consider only LPLayerNorm for now.

* update

* all keys match.

* Update

* fixing nits here and there

* working forward pass.

* removed einops dependency

* nits

* format

* add alibi

* byebye head mask

* refactor attention

* nits.

* format

* fix nits.

* nuke ande updates

* nuke tokenizer test

* don't reshape query with kv heads

* added a bit of documentation.

* remove unneeded things

* nuke more stuff

* nit

* logits match - same generations

* rm unneeded methods

* 1 remaining failing CI test

* nit

* fix nits

* fix docs

* fix docs

* rm tokenizer

* fixup

* fixup

* fixup and fix tests

* fixed configuration object.

* use correct activation

* few minor fixes

* clarify docs a bit

* logits match à 1e-12

* skip and unskip a test

* added some slow tests.

* fix readme

* add more details

* Update docs/source/en/model_doc/mpt.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix configuration issues

* more fixes in config

* added more models

* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove unneeded position ids

* fix some  comments

* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* revert suggestion

* mpt alibi + added batched generation

* Update src/transformers/models/mpt/__init__.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove init config

* Update src/transformers/models/mpt/configuration_mpt.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix nit

* add another slow test

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fits in one line

* some refactor because make fixup doesn't pass

* add ft notebook

* update md

* correct doc path

---------
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

dcb183f4

Fix: repeat per sample for SAM image embeddings (#25074) · 1dbc1440
Xiaoke Huang authored Jul 25, 2023
```
Repeat per sample for SAM image embeddings
```
1dbc1440

[`generate`] Only warn users if the `generation_config`'s `max_length` is set... · f2c1df93

Arthur authored Jul 25, 2023

[`generate`]  Only warn users if the `generation_config`'s `max_length` is set to the default value (#25030)

* check max length is default

* nit

* update warning: no-longer deprecate

* comment in the configuration_utils in case max length's default gets changed in the futur

f2c1df93

Set `TF32` flag for PyTorch cuDNN backend (#25075) · 6bc61aa7
Xuehai Pan authored Jul 25, 2023

6bc61aa7
Fix last models for common tests that are too big. (#25058) · f295fc8a
Sylvain Gugger authored Jul 25, 2023
```
* Fix last models for common tests that are too big.

* Remove print statement
```
f295fc8a
[Docs] fix rope_scaling doc string (#25072) · faf25c04
Kashif Rasul authored Jul 25, 2023
```
fix rope_scaling doc string
```
faf25c04
Generate - add beam indices output in contrained beam search (#25042) · c0742b15
Joao Gante authored Jul 25, 2023

c0742b15

24 Jul, 2023 10 commits

Better error message when signal is not supported on OS (#25049) · d2295708
Sylvain Gugger authored Jul 24, 2023
```
* Better error message when signal is not supported on OS

* Address review comments
```
d2295708
[`8bit`] Fix 8bit corner case with Blip2 8bit (#25047) · b08f41e6
Younes Belkada authored Jul 24, 2023
```
fix 8bit corner case with Blip2 8bit
```
b08f41e6

compute_loss in trainer failing to label shift for PEFT model when label... · 3611fc90

Nate Brake authored Jul 24, 2023


compute_loss in trainer failing to label shift for PEFT model when label smoothing enabled. (#25044)

* added PeftModelForCausalLM to MODEL_FOR_CAUSAL_LM_MAPPING_NAMES dict

* check for PEFT model in compute_loss section

---------
Co-authored-by: Nathan Brake <nbrake3@mmm.com>

3611fc90

Pvt model (#24720) · a03d13c8

Rinat authored Jul 24, 2023

* pull and push updates

* add docs

* fix modeling

* Add and run test

* make copies

* add task

* fix tests and fix small issues

* Checks on a Pull Request

* fix docs

* add desc pvt.md

a03d13c8

Fix typo in LlamaTokenizerFast docstring example (#25018) · 8f1f0bf5
Sören Brunk authored Jul 24, 2023

8f1f0bf5
Add dispatch_batches to training arguments (#25038) · 3b734f50
Zach Mueller authored Jul 24, 2023
```
* Dispatch batches

* Copy items
```
3b734f50

Better handling missing SYS in llama conversation tokenizer (#24997) · efb2ba66

Iskren Ivov Chernev authored Jul 24, 2023

* Better handling missing SYS in llama conversation tokenizer

The existing code failed to add SYS if the conversation has history
without SYS, but did modify the passed conversation as it did.

Rearrange the code so modification to the conversation object are taken
into account for token id generation.

* Fix formatting with black

* Avoid one-liners

* Also fix fast tokenizer

* Drop List decl

efb2ba66

Support GatedRepoError + use raise from (#25034) · 67049231

Lucain authored Jul 24, 2023



* Support GatedRepoError + use raise from

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Use token instead of use_auth_token in error messages

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

67049231

fix(integrations): store serialized `TrainingArgs` to `wandb.config` without sanitization. (#25035) · 54ba8608

Bharat Ramanathan authored Jul 24, 2023

fix: store training args to wandb config without sanitization.

Allows resuming runs by reusing the wandb config.
Co-authored-by: Bharat Ramanathan <ramanathan.parameshwaran@gohuddl.com>

54ba8608

[`logging.py`] set default `stderr` path if `None` (#25033) · 0906d212
Arthur authored Jul 24, 2023
```
set default logger
```
0906d212

21 Jul, 2023 12 commits

improve from_pretrained for zero3 multi gpus mode (#24964) · ea41e18c

Ivan Sorokin authored Jul 21, 2023



* improve from_pretrained for zero3 multi gpus mode

* Add check if torch.distributed.is_initialized

* Revert torch.distributed

---------
Co-authored-by: Stas Bekman <stas@stason.org>

ea41e18c

[`Llama`] remove persistent `inv_freq` tensor (#24998) · 95f96b45
Arthur authored Jul 21, 2023
```
remove persistent tensor
```
95f96b45
[`bnb`] Add simple check for bnb import (#24995) · d3ce048c
Younes Belkada authored Jul 21, 2023
```
add simple check for bnb
```
d3ce048c
Fix `llama` tokenization doctest (#24990) · f1a1eb4a
Yih-Dar authored Jul 21, 2023
```
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
f1a1eb4a
Use main_input_name for include_inputs_for_metrics (#24993) · a7d21318
Sylvain Gugger authored Jul 21, 2023

a7d21318
Fix type annotation for deepspeed training arg (#24988) · a6484c89
Sylvain Gugger authored Jul 21, 2023

a6484c89
Avoid importing all models when instantiating a pipeline (#24960) · 5b7ffd54
Sylvain Gugger authored Jul 21, 2023
```
* Avoid importing all models when instantiating a pipeline

* Remove sums that don't work
```
5b7ffd54
[`LlamaConfig`] Nit: pad token should be None by default (#24958) · 0511369a
Arthur authored Jul 21, 2023
```
* pad token should be None by default

* fix tests

* nits
```
0511369a

Fix missing spaces in system prompt of Llama2 tokenizer (#24930) · f74560d0

Joya Chen authored Jul 21, 2023



* Update tokenization_llama.py

* Update tokenization_llama_fast.py

* Update src/transformers/models/llama/tokenization_llama_fast.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/llama/tokenization_llama.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/llama/tokenization_llama.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/llama/tokenization_llama_fast.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

f74560d0

fsdp fixes and enhancements (#24980) · f4eb459e

Sourab Mangrulkar authored Jul 21, 2023

* fix fsdp prepare to remove the warnings and fix excess memory usage

* Update training_args.py

* parity for FSDP+XLA

* Update trainer.py

f4eb459e

fix: cast input pixels to appropriate dtype for image_to_text pipelines (#24947) · 83f9314d

Jim Allanson authored Jul 21, 2023

* fix: cast input pixels to appropriate dtype for image_to_text tasks

* fix: add casting to pixel inputs of additional models after running copy checks

83f9314d

fix fsdp checkpointing issues (#24926) · 1c7e5e23
Sourab Mangrulkar authored Jul 21, 2023
```
* fix fsdp load

* Update trainer.py

* remove saving duplicate state_dict
```
1c7e5e23

20 Jul, 2023 4 commits

Fallback for missing attribute `Parameter.ds_numel` (#24942) · 9ef5256d
Apoorv Khandelwal authored Jul 20, 2023
```
* [trainer] fallback for deepspeed param count

* [trainer] more readable numel count
```
9ef5256d
Contrastive Search peak memory reduction (#24120) · caf5e369
Benjamin Badger authored Jul 20, 2023
```
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
```
caf5e369
[`RWKV`] Add Gradient Checkpointing support for RWKV (#24955) · 89a1f342
Younes Belkada authored Jul 20, 2023
```
add GC support for RWKV
```
89a1f342

fix type annotations for arguments in training_args (#24550) · e75cb0cb

Shauray Singh authored Jul 20, 2023

* testing

* example script

* fix typehinting

* some tests

* make test

* optional update

* Union of arguments

* does this fix the issue

* remove reports

* set default to False

* documentation change

* None support

* does not need None

* Fix typing annotations for FSDP and DeepSpeed in TrainingArguments (#24549)

* Fix typing annotations for FSDP and DeepSpeed in TrainingArguments

* Change dict to Dict

* Revert "Fix typing annotations for FSDP and DeepSpeed in TrainingArguments" (#24574)

Revert "Fix typing annotations for FSDP and DeepSpeed in TrainingArguments (#24549)"

This reverts commit c5e29d43

.

* Fix typing annotations for FSDP and DeepSpeed in TrainingArguments (#24549)

* Fix typing annotations for FSDP and DeepSpeed in TrainingArguments

* Change dict to Dict

* merge

* hacky fix

* fixup

---------
Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

e75cb0cb