Commits · 285a48011da3145ae77c5b22bcfbe77d367e5173 · chenpangpang / transformers

21 Jun, 2023 2 commits

Fix gradient checkpointing + fp16 autocast for most models (#24247) · 285a4801

Younes Belkada authored Jun 21, 2023



* fix gc bug

* continue PoC on OPT

* fixes

* :exploding_head:

* fix tests

* remove pytest.mark

* fixup

* forward contrib credits from discussions

* forward contrib credits from discussions

* reverting changes on untouched files.

---------
Co-authored-by: zhaoqf123 <zhaoqf123@users.noreply.github.com>
Co-authored-by: 7eu7d7 <7eu7d7@users.noreply.github.com>

285a4801

Update deprecated torch.ger (#24387) · cb8f6755
Sergii Dymchenko authored Jun 20, 2023

cb8f6755

20 Jun, 2023 7 commits

[Wav2Vec2 - MMS] Correct directly loading adapters weights (#24335) · b0513b01

Patrick von Platen authored Jun 20, 2023

* Correct direct lang loading

* correct more

* revert black

* Use tie weights instead=

* add tests

* add tests

* make style

b0513b01

[GPTNeoX] Nit in config (#24349) · e5c760d6

Arthur authored Jun 20, 2023

* add raise value error for attention size

* nits to fix test_config

* style

e5c760d6

[Whisper Docs] Nits (#24367) · c2882403

Arthur authored Jun 20, 2023



* nits

* config doc did not match

* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

c2882403

Remove print statement · a6b4d1ad
Sylvain Gugger authored Jun 20, 2023

a6b4d1ad
Update tiny models for pipeline testing. (#24364) · c23d131e
Yih-Dar authored Jun 20, 2023
```
* fix

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
c23d131e

TensorFlow CI fixes (#24360) · 56efbf43

Matt authored Jun 20, 2023

* Fix saved_model_creation_extended

* Skip the BLIP model creation test for now

* Fix TF SAM test

* Fix longformer tests

* Fix Wav2Vec2

* Add a skip for XLNet

* make fixup

* make fix-copies

* Add comments

56efbf43

Allow passing kwargs through to TFBertTokenizer (#24324) · 0875b250
Matt authored Jun 20, 2023

0875b250

19 Jun, 2023 5 commits

Fix the order in `GPTNeo`'s docstring (#24358) · c5454eba
Quentin Gallouédec authored Jun 19, 2023
```
* Fix arg sort in docstring

* further order fix

* make style
```
c5454eba
Fix ImageGPT doctest (#24353) · 7e71eb2e
amyeroberts authored Jun 19, 2023
```
Fix doctest
```
7e71eb2e
Make `AutoFormer` work with previous torch version (#24357) · a4de24f6
Yih-Dar authored Jun 19, 2023
```
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
a4de24f6

Fix device issue in `SwitchTransformers` (#24352) · 5fca839f

Yih-Dar authored Jun 19, 2023



* fix

* Update src/transformers/models/switch_transformers/modeling_switch_transformers.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

5fca839f

error bug on saving distributed optim state when using data parallel (#24108) · 881c0df9
Xiaoyang Sun authored Jun 19, 2023
```
Update checkpoint_reshaping_and_interoperability.py
```
881c0df9

16 Jun, 2023 6 commits

Add test for proper TF input signatures (#24320) · 91389950

Matt authored Jun 16, 2023

* Add test for proper input signatures

* No more signature pruning

* Test the dummy inputs are valid too

* fine-tine -> fine-tune

* Fix indent in test_dataset_conversion

91389950

Fix ImageGPT doc example (#24317) · bdfd57d1

amyeroberts authored Jun 16, 2023

* Fix ImageGPT doc example

* Update src/transformers/models/imagegpt/image_processing_imagegpt.py

* Fix types

bdfd57d1

Big TF test cleanup (#24282) · 34037129

Matt authored Jun 16, 2023

* Fix one BLIP arg not being optional, remove misspelled arg

* Remove the lxmert test overrides and just use the base test_saved_model_creation

* saved_model_creation fixes and re-enabling tests across the board

* Remove unnecessary skip

* Stop caching sinusoidal embeddings in speech_to_text

* Fix transfo_xl compilation

* Fix transfo_xl compilation

* Fix the conditionals in xglm

* Set the save spec only when building

* Clarify comment

* Move comment correctly

* Correct embeddings generation for speech2text

* Mark RAG generation tests as @slow

* Remove redundant else:

* Add comment to clarify the save_spec line in build()

* Fix size tests for XGLM at last!

* make fixup

* Remove one band_part operation

* Mark test_keras_fit as @slow

34037129

Byebye pytorch 1.9 (#24080) · 896a58de

Yih-Dar authored Jun 16, 2023



byebye

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

896a58de

Fix functional TF Whisper and modernize tests (#24301) · 62d71f40

Matt authored Jun 16, 2023

* Revert whisper change and modify the test_compile_tf_model test

* make fixup

* Tweak test slightly

* Add functional model saving to test

* Ensure TF can infer shapes for data2vec

* Add override for efficientformer

* Mark test as slow

62d71f40

[`SwitchTransformers`] Fix return values (#24300) · ba3fb4b8
Arthur authored Jun 16, 2023
```
* clean history

* remove other changes

* fix

* fix coipes
```
ba3fb4b8

15 Jun, 2023 4 commits

[EnCodec] Changes for 32kHz ckpt (#24296) · 4124a09f

Sanchit Gandhi authored Jun 15, 2023

* [EnCodec] Changes for 32kHz ckpt

* Update src/transformers/models/encodec/convert_encodec_checkpoint_to_pytorch.py

* Update src/transformers/models/encodec/convert_encodec_checkpoint_to_pytorch.py

4124a09f

remove unused is_decoder parameter in DetrAttention (#24226) · a611ac9b

JayL0321 authored Jun 15, 2023

* issue#24161 remove unused is_decoder parameter in DetrAttention

* #24161 fix check_repository_consistency fail

a611ac9b

Fix LLaMa beam search when using parallelize (#24224) · 33196b45

Fei Wang authored Jun 15, 2023

* Fix LLaMa beam search when using parallelize

same issue as T5 #11717

* fix code format in modeling_llama.py

* fix format of _reorder_cache in modeling_llama.py

33196b45

Fix `check_config_attributes`: check all configuration classes (#24231) · 7504be35
Yih-Dar authored Jun 15, 2023
```
* fix

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
7504be35

14 Jun, 2023 7 commits

Add MMS CTC Fine-Tuning (#24281) · 1609a436

Patrick von Platen authored Jun 15, 2023

* Add mms ctc fine tuning

* make style

* More fixes that are needed

* make fix-copies

* make draft for README

* add new file

* move to new file

* make style

* make style

* add quick test

* make style

* make style

1609a436

[WIP] add EnCodec model (#23655) · 0c3fdccf

Matthijs Hollemans authored Jun 14, 2023



* boilerplate stuff

* messing around with the feature extractor

* fix feature extractor

* unit tests for feature extractor

* rename speech to audio

* quick-and-dirty import of Meta's code

* import weights (sort of)

* cleaning up

* more cleaning up

* move encoder/decoder args into config

* cleanup model

* rename EnCodec -> Encodec

* RVQ parameters in config

* add slow test

* add lstm init and test_init

* Add save & load

* finish EncodecModel

* remove decoder_input_values as they are ont used anywhere (not removed from doc yet)

* fix test feature extraction model name

* Add better slow test

* Fix tests

* some fixup and cleaning

* Improve further

* cleaning up quantizer

* fix up conversion script

* test don't pass, _encode_fram does not work

* update tests with output per encode and decode

* more cleanup

* rename _codebook

* remove old config cruft

* ratios & hop_length

* use ModuleList instead of Sequential

* clean up resnet block

* update types

* update tests

* fixup

* quick cleanup

* fix padding

* more styl,ing

* add patrick feedback

* fix copies

* fixup

* fix lstm

* fix shape issues

* fixup

* rename conv layers

* fixup

* fix decoding

* small conv refactoring

* remove norm_params

* simplify conv layers

* rename conv layers

* stuff

* Clean up

* Add padding logic

use padding mask

small conv refactoring

remove norm_params

simplify conv layers

rename conv layers

stuff

add batched test

update

Clean up

merge and update for padding

fix padding

fixup

* clean up more

* clean up more

* More clean ups

* cleanup convolutions

* typo

* fix typos

* fixup

* build PR doc?

* start refactoring docstring

* fix don't pad when no strid and chunk

* update docstring

* update docstring

* nits

* update going to lunch

* update config and model

* fix broken testse (becaue of the config changes)

* fix scale computation

* fixu[

* only return dict if speciefied or if config returns it

* remove todos

* update defaults in config

* update conversion script

* fix doctest

* more docstring + fixup

* nits on batched_tests

* more nits

* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* update basxed on review

* fix update

* updaet tests

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fixup

* add overlap and chunl_length_s

* cleanup feature extraction

* teste edge cases truncation and padding

* correct processor values

* update config encodec, nits

* fix tests

* fixup

* fix 24Hz test

* elle tests are green

* fix fixup

* Apply suggestions from code review

* revert readme changes

* fixup

* add example

* use facebook checkpoints

* fix typo

* no pipeline tests

* use slef.pad everywhere we can

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* update based on review

* update

* update mdx

* fix bug and tests

* fixup

* fix doctest

* remove comment

* more nits

* add more coverage for `test_truncation_and_padding`

* fixup

* add last test

* fix text

* nits

* Update tests/models/encodec/test_modeling_encodec.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* take care of the last comments

* typo

* fix test

* nits

* fixup

* Update src/transformers/models/encodec/feature_extraction_encodec.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: arthur.zucker@gmail.com <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

0c3fdccf

Fix Debertav2 embed_proj (#24205) · 860d11ff

Wissam Antoun authored Jun 14, 2023

* MLM prediction head output size from embed_size

Take the output size of the dense projection layer from embedding_size instead of hidden_size since there could be a projection of the input embedding into hidden_size if they are different

* project TFDebertaV2 mlm output to embedding size

embedding size can be different that hidden_size, so the final layer needs to project back to embedding size. like in ELECTRA or DeBERTaV3 style pertaining.

This should solve an error that occurs when loading models like "almanach/camemberta-base-generator".

* fix the same issue for reshaping after projection

* fix layernorm size

* add self.embedding_size to scope

* fix embed_proj scope name

* apply the same changes to TF Deberta

* add the changes to deberta

* added self.embedding_size instead of config.embedding_size

* added the same change to debertav2

* added coppied from deberta to deberta2 model

* config.embedding_size fix

* black

* fix deberta config name

860d11ff

`Pix2StructImageProcessor` requires `torch>=1.11.0` (#24270) · a04ebc8b
Yih-Dar authored Jun 14, 2023
```
* fix

* fix

* fix

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
a04ebc8b
Adapt Wav2Vec2 conversion for MMS lang identification (#24234) · c4fec38b
Patrick von Platen authored Jun 14, 2023
```
* Add conversion for mms lid

* make style
```
c4fec38b
TF: CTRL with native embedding layers (#23456) · 4626df50
Joao Gante authored Jun 14, 2023

4626df50

Fix URL in comment for contrastive loss function (#24271) · 6ab045d6

TAE YOUNGDON authored Jun 14, 2023

* Update language_modeling.py

in "class TextDatasetForNextSentencePrediction(Dataset)", double considering "self.tokenizer.num_special_tokens_to_add(pair=True)" 

so, i remove self.block_size, and add parameter for "def create_examples_from_document". like "class LineByLineWithSOPTextDataset" do

* Update language_modeling.py

* Fix URL in comment for contrastive loss function

6ab045d6

13 Jun, 2023 5 commits

Add `torch >=1.12` requirement for `Tapas` (#24251) · cf561d7c

Yih-Dar authored Jun 13, 2023



* fix

* fix

* fix

* Update src/transformers/models/tapas/modeling_tapas.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

cf561d7c

TF: standardize `test_model_common_attributes` for language models (#23457) · 7bb6933b
Joao Gante authored Jun 13, 2023

7bb6933b
[Time Series] use mean scaler when scaling is a boolean True (#24237) · 4ed07528
Kashif Rasul authored Jun 13, 2023
```
* use mean scaler when scaling is boolean True

* remove debug
```
4ed07528

Tied params cleanup (#24211) · 695928e1

Sylvain Gugger authored Jun 13, 2023

* First test

* Add info for all models

* style

* Repo consistency

* Fix last model and cleanup prints

* Repo consistency

* Use consistent function for detecting tied weights

695928e1

fix overflow when training mDeberta in fp16 (#24116) · 3e142cb0

Sebastian authored Jun 13, 2023

* Porting changes from https://github.com/microsoft/DeBERTa/ that hopefully allows for fp16 training of mdeberta

* Updates to deberta modeling from microsoft repo

* Performing some cleanup

* Undoing changes that weren't necessary

* Undoing float calls

* Minimally change the p2c block

* Fix error

* Minimally changing the c2p block

* Switch to torch sqrt

* Remove math

* Adding back the to calls to scale

* Undoing attention_scores change

* Removing commented out code

* Updating modeling_sew_d.py to satisfy utils/check_copies.py

* Missed changed

* Further reduce changes needed to get fp16 working

* Reverting changes to modeling_sew_d.py

* Make same change in TF

3e142cb0

12 Jun, 2023 3 commits
- Update `WhisperForAudioClassification` doc example (#24188) · 0675600a
  Yih-Dar authored Jun 12, 2023
```
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  0675600a
- Remove unnecessary aten::to overhead in llama (#24203) · e5dd7432
  fxmarty authored Jun 13, 2023
```
* fix dtype init

* fix copies

* fix fixcopies mess

* edit forward as well

* copy
```
  e5dd7432
- Fix device issue in `OpenLlamaModelTest::test_model_parallelism` (#24195) · a9cdb059
  Yih-Dar authored Jun 12, 2023
```
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  a9cdb059
09 Jun, 2023 1 commit
- [BlenderBotSmall] Update doc example (#24092) · a7501f6f
  Arthur authored Jun 09, 2023
```
* small tokenizer uses `__start__` and `__end__`

* fix PR doctest
```
  a7501f6f