Commits · 19ade2426a6ec12cfbec2779f32573e7b02d49a0 · chenpangpang / transformers

27 Mar, 2023 3 commits

[WIP]`NLLB-MoE` Adds the moe model (#22024) · 19ade242

Arthur authored Mar 27, 2023

* Initial commit

* update modeling code

* update doc

* add functions necessary

* fix impotrs

* revert changes

* fixup

* more styling to get going

* remove standalone encoder

* update code

* styling

* fix config and model

* update code and some refactoring

* make more tests pass

* Adding NLLB-200 - MoE - 54.5B for no language left behind
Fixes #21300

* fix mor common tests

* styke

* update testing file

* update

* update

* Router2 doc

* update check config with sparse layer

* add dummy router

* update current conversion script

* create on the fly conversion script

* Fixup

* style

* style 2

* fix empty return

* fix return

* Update default config sparse layers

* easier to create sparse layers

* update

* update conversion script

* update modeling

* add to toctree

* styling

* make ruff happy

* update docstring

* update conversion script

* update, will break tests but impelemting top2

* update

* ❗local groups are supported here

* ⚠️ Support for local groups is now removed ⚠️

This is because it has to work with model parallelism that we do not support

* finish simplificaiton

* Fix forward

* style

* fixup

* Update modelling and test, refactoring

* update tests

* remove final layer)norm as it is done in the FF

* routing works! Logits test added

* nit in test

* remove top1router

* style

* make sure sparse are tested. Had to change route_tokens a liottle bit

* add support for unslip models when converting

* fixup

* style

* update test s

* update test

* REFACTOR

* encoder outputs match!

* style

* update testing

* 🎉encoder and decoder logits match 🎉



* styleing

* update tests

* cleanup tests

* fix router test and CIs

* cleanup

* cleanup test styling

* fix tests

* Finally the generation tests match!

* cleanup

* update test

* style testing file

* remove script

* cleanup

* more cleanup

* nits

* update

* NLLB tokenizer is wrong and will be fixed soon

* use LongTensors

* update tests

* revert some small changes

* fix second expert sampling and batch prioritized routing

* update tests

* finish last tests

* make ruff happy

* update

* ruff again

* style

* Update docs/source/en/model_doc/nllb-moe.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Updates based on review

* style and fix import issue

* nit

* more nits

* cleanup

* styling

* update test_seconde_expert_policy

* fix name

* last nit on the markdown examples

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

19ade242

[Pix2Struct] Add support to resize embeddings (#22394) · 0e708178

NielsRogge authored Mar 27, 2023

* First draft

* Fix integration test

* Remove script

* Fix test and typos

* Fix one more test

* Skip tied embeddings test

* Remove line

* Address comments

0e708178

Generate: support for left-padding on GPTNeoX and Llama (#22382) · 7dcd8703
Joao Gante authored Mar 27, 2023

7dcd8703

24 Mar, 2023 3 commits

Resnet flax (#21472) · a0cbbba3

Shubhamai authored Mar 25, 2023



* [WIP] flax resnet

* added pretrained flax models, results reproducible

* Added pretrained flax models, results reproducible

* working on tests

* no real code change, just some comments

* [flax] adding support for batch norm layers

* fixing bugs related to pt+flax integration

* removing loss from modeling flax output class

* fixing classifier tests

* fixing comments, model output

* cleaning comments

* review changes

* review changes

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* renaming Flax to PyTorch

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

a0cbbba3

Add Mega: Moving Average Equipped Gated Attention (#21766) · 57f25f4b

Mitch Naylor authored Mar 24, 2023



* add mega file structure and plain pytorch version of mega source code

* added config class with old naming conventions

* filled in mega documentation

* added config class and embeddings with optional token types

* updated notes

* starting the conversion process, deleted intermediate and added use_cache back to config

* renamed config attributes in modeling_mega.py

* checkpointing before refactoring incremental decoding functions

* removed stateful incremental key/values for EMA and self-attention

* refactored MovingAverageGatedAttention to remove stateful k/v history and use unified attention mask

* MovingAverageGatedAttention works with incremental decoding + past values, added sequence length enforcement

* more comments in MovingAverageGatedAttention + checkpointing before GatedCrossAttention

* bug fix in attention mask handling in MovingAverageGatedAttention

* removed incremental state from GatedCrossAttention and removed IncrementalState class

* finished gated cross attention and got MegaLayer working

* fixed causal masking in mega decoder

* fixed how padding and causal masks are passed through MegaLayer with and without k/v caching

* finished MegaModel; tested with encoder, decoder-only, and cross-attention type inputs; started work on downstream classes; removed mentions of position_ids

* added optional dense hidden layer for masked and causal LM classes

* docstring updates in MultiHeadEMA and GatedCrossAttention, removed unnecessary inputs in cross-attention

* removed before_attn_fn in Mega class and updated docstrings and comments up to there

* bug fix in MovingAverageGatedAttention masking

* working conversion of MLM checkpoint in scratchpad script -- perfect matches

* moved arg for hidden dense layer in LM head to config; discovered issue where from_pretrained is renaming gamma and beta parameters

* renamed gamma and beta parameters to avoid HF renaming when loading from checkpoint

* finished checkpoint conversion script

* cleanup old class in mega config script

* removed 'copied from' statements and passing integration tests

* added num_attention_heads=1 to config for integration compatibility, decoder tests working, generation tests failing

* fixed tuple output of megamodel

* all common tests passing after fixing issues in decoder, gradient retention, and initialization

* added mega-specific tests, ready for more documentation and style checks

* updated docstrings; checkpoint before style fixes

* style and quality checks, fixed initialization problem in float_tensor, ready for PR

* added mega to toctree

* removed unnecessary arg in megaconfig

* removed unused arg and fixed code samples with leftover roberta models

* Apply suggestions from code review

Applied all suggestions except the one renaming a class, as I'll need to update that througout
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixed issue where .view breaks batch dimension, conversion script fixed with absolute imports, updated readme with Mega->MEGA

* removed asserts in Mega code, renamed sequencenorm, gatedcrossattention, and NFFN, replaced get_activation_fn with ACTFN, and added sequencenorm to layer norms

* reformatted .forward() docstrings to match style and removed unused mask input in cross-attention

* removed all reset_parameters() methods and rolled into MegaPreTrainedModel._init_weights()

* renamed all single-letter variables and improved readability in tensor size comments, Mega->MEGA in 2 documentation files

* variable names in NFFN

* manual Mega->MEGA changes in docs

* Mega->MEGA in config auto

* style and quality fixes

* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* renamed parameters and variables with confusing names, added copied from statements, moved fft conv to its own method, other cleanup from PR comments

* commit before dealing with merge conflicts

* made new attention activation functions available in ACT2FN and added generation test from OPT

* style and quality in activations and tests

* documentation fixes, renaming variables in dropout and rotary positions, used built-in causal masking, encoders->layers in MegaModel, moved comments into docstrings

* style and quality fixes after latest updates, before rotary position ids

* causal mask in MegaBlock docstring + added missing device passing

* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* added Mega prefixes where missing, reverted MegaSequenceNorm to if-else, other module renaming requested in PR

* style and quality fixes + readme updates pointing to main

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

57f25f4b

Generate: Add GPTNeoX integration test (#22346) · 0fa46524
Joao Gante authored Mar 24, 2023

0fa46524

23 Mar, 2023 3 commits
- Automatically create/update tiny models (#22275) · e8cc0255
  Yih-Dar authored Mar 23, 2023
```
* Automatically create or update tiny models

* Skip failed tests

* update workflow file

* use revision

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  e8cc0255
- Generate: add test for left-padding support (#22322) · 502fec77
  Joao Gante authored Mar 23, 2023
  
  502fec77
- Fix quality due to ruff release · ef28df05
  Sylvain authored Mar 22, 2023
  
  ef28df05
22 Mar, 2023 5 commits

Fix PipelineTests skip conditions (#22320) · 8b05ace0

Yih-Dar authored Mar 22, 2023



* check what tests fail

* Skip failing tests

* Skip failing tests

* Skip failing tests

* Skip failing tests

* clean up

* clean up

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

8b05ace0

Add Pix2Struct (#21400) · 0f68a7f4

Younes Belkada authored Mar 22, 2023



* v1 all keys match

* clean up

* forward pass ok

* add correct image transform

* generate works, logits matching

* clean up

* more refactor

* revert

* revert

* clean up

* clean ups

* clean up

* refactor

* refactor

* fix doc

* fix tokenizer test

* fix toctree

* revert toctree

* oops

* few fixes

* replace to `pixel_embeds`

* make fixup

* test processing & feat extractor

* fix some tests

* more fixes

* make fixup

* clean up

* more clean up

* add a single slow test

* fix test

* make fixup

* fix

* fix authors

* fix toctree

* update docs

* add docstring

* revert change

* Update src/transformers/models/pix2struct/__init__.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix tokenizer

* fix processor test

* fix test

* make fixup

* refactor

* fix config

* Update src/transformers/models/pix2struct/image_processing_pix2struct.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* format

* fix

* Update src/transformers/models/pix2struct/image_processing_pix2struct.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* make fixup

* add docstring

* fix issues

* fix

* fix

* fix

* add slow test

* fix

* fix

* fix batched issue

* fix training issues

* fix ci test

* fix slow test

* fix conversion script

* remove unneeded classes

* fix slow test

* fix require backends

* fix masked fill

* revert

* fix softmax

* add large models support

* fix conditional generation

* few fixes

* add instructions

* rm unneeded file

* Update src/transformers/models/pix2struct/convert_pix2struct_original_pytorch_to_hf.py

* fix ci test

* fix ci test really

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix nit

* fix nits

* fix image processors nits

* docstring

* clean up

* fix nit

* fix tests

* docstring nit

* fix reshape

* Update src/transformers/models/pix2struct/image_processing_pix2struct.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* fix nit

* fix repetition

* refactor processor

* make patch size consistent

* refactor forward

* fix docstring

* fix max_patches issue

* update docstirng

* update docstring

* fix coped from

* add skip reasons

* few fixes

* Update src/transformers/models/pix2struct/image_processing_pix2struct.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* format

* fix doctests

* refactor and fix

* fix doc build issue

* fix processor test

* small fix conversion script

* replace correct weights

* make fixup

* fix some issues

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* revert config and fixes

* Update src/transformers/models/pix2struct/image_processing_pix2struct.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* more details

* fixes

* fix processor

* fix processor test

* fix

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* make fixup

* fix processor

* Update src/transformers/models/pix2struct/modeling_pix2struct.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add copied

* make fixup

* fix copies

* update docstring

* refactor

* fix docstring

* fix conversion script

* fix vqa issue

* replace to `flattened_patches`

* nit

* fix numpy issue

* fix image processors

* add batched vqa support

* fix vqa conversion

* make fixup

* fix conversion script

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* make fixup

* add correct docstring

* update docstring

* fix module level + channel dim

* use `make_list_of_images`

* refactor

* correct docstring

* fix authors

* remove `data_format`

* add header text test

* Apply suggestions from code review
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* make fixup

* add checkpoints

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

0f68a7f4

Beef up Llama tests (#22314) · fd3eb3e3
Joao Gante authored Mar 22, 2023
```
* tmp commit

* beef up llama tests
```
fd3eb3e3

Fixed bug to calculate correct xpath_sub_list in MarkupLMTokenizer (#22302) · 48bef3a7

silentghoul-spec authored Mar 22, 2023



Fixed bug to calculate correct xpath_sub_list in MarkupLMTokenizer. Earlier xpath_sub_list was same as xpath_tags_list
Co-authored-by: dusejat <dusejat@amazon.com>

48bef3a7

Add MaskedImageModelingOutput (#22212) · 0558914d
Alara Dirik authored Mar 22, 2023
```
* Add MaskedImageModelingOutput
```
0558914d

21 Mar, 2023 2 commits

Time to Say Goodbye, torch 1.7 and 1.8 (#22291) · 67c2dbdb

Yih-Dar authored Mar 21, 2023



* time to say goodbye, torch 1.7 and 1.8

* clean up torch_int_div

* clean up is_torch_less_than_1_8-9

* update

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

67c2dbdb

Fix error in mixed precision training of `TFCvtModel` (#22267) · 5a2b77a6

Gerald Cuder authored Mar 21, 2023



* Make sure CVT can be trained using mixed precision

* Add test for keras-fit with mixed-precision

* Update tests/models/cvt/test_modeling_tf_cvt.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

---------
Co-authored-by: gcuder <Gerald.Cuder@iacapps.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

5a2b77a6

17 Mar, 2023 1 commit

Add LlamaForSequenceClassification (#22209) · f2514413

lewtun authored Mar 17, 2023



* Add LlamaForSequenceClassification

* Update src/transformers/models/llama/modeling_llama.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_llama.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Add docstring

* Add test

* Add input embedding getter and setter

* Remove dead code

---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

f2514413

16 Mar, 2023 2 commits

LLaMA Implementation (#21955) · 0041be5b

Jason Phang authored Mar 16, 2023



* LLaMA

* sharding and docs

* tweak

* black

* inits

* ruff

* LLAMA_PRETRAINED_CONFIG_ARCHIVE_MAP

* init

* no checkpoint

* docs

* ruff

* type_vocab_size

* tokenizer fixes

* tokenizer fixes

* Update tokenization_llama.py

* Update tokenization_llama.py

* Update configuration_llama.py

* Update modeling_llama.py

* tokenizer add_bos by default

* licenses

* remove decoder

* norms and mlp

* rope overhaul

* tweaks

* black

* mention OPT implementation

* off-by-one naming

* typo

* fix

* tokenization fix and slicing bug

* padding config

* cleanup

* black

* update tests

* undo typo

* fix vocab caching logic

* ruff

* docbuilder

* attn fix from BlackSamorez

* initial feedback

* typo

* docs

* llama case

* llama case

* load checkpoint docs

* comment about tokenizer

* tokenizer defaults

* clear past_key_values if use_cache=False

* last tweaks

* last tweaks

* last tweaks

* last tweaks

---------
Co-authored-by: Stella Biderman <stellabiderman@gmail.com>

0041be5b

Update expected values in `MgpstrModelIntegrationTest` (#22195) · 52a57f7c
Yih-Dar authored Mar 16, 2023
```
Update values
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
52a57f7c

15 Mar, 2023 2 commits

Update BridgeTowerForContrastiveLearning (#22145) · 16121bae

Anahita Bhiwandiwalla authored Mar 15, 2023



* Use return_loss for BridgeTowerForContrastiveLearning, add example

* fix tests

* Update example in BridgeTowerForContrastiveLearning

* Update test_modeling_bridgetower.py

* update model output format

* minor update

* Update src/transformers/models/bridgetower/modeling_bridgetower.py

* make style

---------
Co-authored-by: Tiep Le <97980157+tileintel@users.noreply.github.com>
Co-authored-by: Tiep Le <tiep.le@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

16121bae

Revert 22152 MaskedImageCompletionOutput changes (#22187) · 73768147
amyeroberts authored Mar 15, 2023
```
Revert changes
```
73768147

14 Mar, 2023 3 commits
- Create MaskedImageCompletionOutput and fix ViT docs (#22152) · 3b22bfbc
  Alara Dirik authored Mar 14, 2023
```
* create MaskedImageCompletionOutput

* fix bugs

* fix bugs
```
  3b22bfbc
- Add ConvNeXT V2 (#21679) · cdddfbff
  Alara Dirik authored Mar 14, 2023
```
* Add ConvNeXt V2 to transformers
* TF model is separated from the PR to fix issues
```
  cdddfbff
- Move `is_pipeline_test_to_skip` to specific model test classes (#21999) · 6c2ad00c
  Yih-Dar authored Mar 14, 2023
```
* Move `is_pipeline_test_to_skip` to specific model test classes

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  6c2ad00c
13 Mar, 2023 3 commits

[`Whiper`] add `get_input_embeddings` to `WhisperForAudioClassification` (#22133) · d979cf6e

Younes Belkada authored Mar 13, 2023



* add `get_input_embeddings` to `WhisperForAudioClassification`

* add common tests

* fix another common test

* Update tests/models/whisper/test_modeling_whisper.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix style

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

d979cf6e

[`Blip2`] skip accelerate test (#22124) · 6652e7da
Younes Belkada authored Mar 13, 2023
```
skip accelerate test
```
6652e7da

add new model of MGP-STR (#21418) · 102b5ff4

wangpeng authored Mar 13, 2023



* add new model of MGP-STR

* fix the check failings

* remove torch and numpy from mgp_tokenization

* remove unused import from modeling_mgp_str

* add test_processing_mgp_str

* rm test_processing_mgp_str.py

* add test_processing_mgp_str

* add test_processing_mgp_str

* add test_processing_mgp_str

* rm test_processing_mgp_str and add softmax outs to model

* rm test_processing_mgp_str and add softmax outs to model

* rewrite the code of mgp-str according to PR suggestions

* rewrite the code of mgp-str according to PR suggestions

* add new model of MGP-STR

* fix the check failings

* remove torch and numpy from mgp_tokenization

* remove unused import from modeling_mgp_str

* add test_processing_mgp_str

* rm test_processing_mgp_str.py

* add test_processing_mgp_str

* add test_processing_mgp_str

* add test_processing_mgp_str

* rm test_processing_mgp_str and add softmax outs to model

* rewrite the code of mgp-str according to PR suggestions

* rewrite the code of mgp-str according to PR suggestions

* remove representation_size from MGPSTRConfig

* reformat configuration_mgp_str.py

* format test_processor_mgp_str.py

* add test for tokenizer and complete model/processer test and model file

* rm Unnecessary tupple in modeling_mgp_str

* reduce hidden_size/layers/label_size in test_model

* add integration tests and change MGPSTR to Mgpstr

* add test for logit values

* reformat test model file

---------
Co-authored-by: yue kun <yuekun.wp@alibaba-inc.com>

102b5ff4

10 Mar, 2023 2 commits

Revert "[GPT2] Propose fix for #21080" (#22093) · 2f320661

Yih-Dar authored Mar 10, 2023

Revert "[GPT2] Propose fix for #21080 (#21853)" to avoid CI failure

This reverts commit a3fef89b.

2f320661

[GPT2] Propose fix for #21080 (#21853) · a3fef89b

Arthur authored Mar 10, 2023

* Make sure position ids are masked

* test that padded input produce the same results

* fix failing tests

* fixup

* fix batch test

a3fef89b

09 Mar, 2023 2 commits

Skip 3 tests for `WhisperEncoderModelTest` (#22060) · ab81d31d
Yih-Dar authored Mar 09, 2023
```
* skip 3 tests

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
ab81d31d

Remove set_access_token usage + fail tests if FutureWarning (#22051) · 923110b7

Lucain authored Mar 09, 2023



* Remove set_access_token usage + fail tests if FutureWarning

* do not fail on FutureWarning in CI

---------
Co-authored-by: testbot <lucainp@hf.co>

923110b7

08 Mar, 2023 2 commits

Mark all `BridgeTower` tests slow for now (#22039) · 1cbac686
Yih-Dar authored Mar 08, 2023
```
* slow me

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
1cbac686

[WIP] Add BridgeTowerForContrastiveLearning (#21964) · de81adf9

Anahita Bhiwandiwalla authored Mar 08, 2023



* Add BridgeTower for ITC

* Fix review feedback

* Rename BridgeTowerForITC, cleanup

* Fix style and quality

* implement tests

---------
Co-authored-by: Tiep Le <97980157+tileintel@users.noreply.github.com>
Co-authored-by: Tiep Le <tiep.le@intel.com>

de81adf9

07 Mar, 2023 7 commits

Update tiny model creation script and some others files (#22006) · b338414e

Yih-Dar authored Mar 07, 2023



* Update 1

* Update 2

* Update 3

* Update 4

* Update 5

* Update 6

* Update 7

* Update 8

* Update 9

* Update 10

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

b338414e

[Time-Series] informer model (#21099) · 8abe4930

Eli Simhayev authored Mar 08, 2023

* added informer to gitignore

* added informer to gitignore

* WIP informer2020

* added checking that instantiate works

* added config using gluonTS by kashif

* WIP config

* adding informeConfig. need to remove FeatureEmbedder

* done InformerConfig, but need to change the names

* Done informer model init. working on enc-dec

* added things to address, after reading again enc-dec in the paper

* done modeling - checking initialization work

* added informer to gitignore

* WIP informer2020

* added checking that instantiate works

* added config using gluonTS by kashif

* WIP config

* adding informeConfig. need to remove FeatureEmbedder

* done InformerConfig, but need to change the names

* Done informer model init. working on enc-dec

* added things to address, after reading again enc-dec in the paper

* done modeling - checking initialization work

* moved enc-dec init to InformerEncoder/Decoder init

* added 'init_std' to config, now model init works!

* WIP conversion script, and added code sources

* WIP conversion script: loading original informer pth works

* WIP conversion script: change defaults in the config

* WIP conversion script: supporting Informer input embedding

* WIP conversion script: added parameters for the informer embed

* WIP conversion script: change dim_feedforward=2048

* WIP conversion script: remove unused args for loading checkpoint

* just cleaning up

* DataEmbedding removed, after thinking with Kashif

* working on forward pass

* WIP forward pass: trying to establish working batch for forward pass

* cleaning and finalizing

* adding HF names and docs

* init after cleaning works

* WIP in tests

* added docs for the informer specific args

* fix style

* undo change

* cleaning informer, now need to work only enc-dec

* initial enc-dec classes

* added encoder and decoder

* added todo

* add todos for conv_layers

* added decoder docs from vanilla

* added encoder docs from vanilla

* remove encoder decoder from the original informer

* removed AttentionLayer from the original paper

* removed TriangularCausalMask, same as decoder_attention_mask

* initial sparse attention

* use conv_layers

* fixed test_config test

* fix parenthesis when itearting zip(layers, conv_layers)

* error found in prob attention, added sizes as comments

* fix sizes

* added proposal for q_reduce indexing, and remove unused

* WIP ProbMask, and changed factor=2 for testing

* remove unused libs for this PR for creating the env

* fix checking the attn_weights.size() after bmm

* Q_reduce: changed from torch.gather to simple slicing

* WIP calculate final attn_output

* finish adding v_aggregated, attn_output ready

* changed tgt_len to u in attention_mask, need to fix the size error

* comment attention_mask for encoder, and fix if cond for v_agg

* added ProbMask support (wip), removed old original code

* finished ProbMask 😃



* Revert "remove unused libs for this PR for creating the env"

This reverts commit 11a081e09e92771e51a5d2758d53a9afb59547f0.

* fixes

* make style

* fix initial tests

* fix more tests

* dry

* make style

* remove unused files

* style

* added integration tests

* fix num_static_real_features

* fix header

* remove unused function

* fix example

* fix docs

* Update src/transformers/models/informer/configuration_informer.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/informer/modeling_informer.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/informer/configuration_informer.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/informer/configuration_informer.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/informer/configuration_informer.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/informer/configuration_informer.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* fixes for reviewer

* use prediction_length from model

* fix style

* fixed informer.mdx

* added to index

* updated readme

* undo

* make fix-copies

* typo

* fix copy

* added Informer to toctree

* in order

* fixed comments

* remove unneeded new lines in docs

* make static real and cat optional

* fix use of distil conv layers

* fixed integration test

* added checkpoint for convlayer

* make fix-copies

* updated from time series model

* make fix-copies

* copy decoder

* fix unit tests

* updated scaling config

* fix integration tests

* IGNORE_NON_TESTED

* IGNORE_NON_AUTO_CONFIGURED

* IGNORE_NON_AUTO_CONFIGURED

* updated check configs

* fix formatting

* undo change from time series

* prediction_length should not be None

* aliign with the blog: prettify ProbSparse and change attention_factor  to sampling_factor

* make style

* make fix-copies

* niels CR: update contributed by

* niels CR: update configuration_informer.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* niels CR: update kashif -> huggingface
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* niels CR: `sampling_factor` only relevant when `attention_type`=prob

* make style

* fixed U_part: added multiplication by `L_Q`

* fixed bug: remove `is not None` from `if config.distil`

* fixed test: `decoder_seq_length` to `encoder_seq_length` in cross_attentions check

* fix integration tests

* updated model hub

* do not shift as in training

* undo

* fix make-copies

* make fix-copies

* added `if prediction_length is None`

* changed `ProbSparseAttention` to `InformerProbSparseAttention`

* changed `V_sum` -> `v_mean_dim_time`

* changed `ConvLayer` to `InformerConvLayer` and fixed `super()`

* TimeSeriesTansformer->Informer in decoder's Copied from

* more descriptive in ProbSparse

* make style

* fix coped from

* Revert "added `if prediction_length is None`"

This reverts commit b4cbddfa05e3bd739b79569cd3c3b89e316f2451.

* fixed indent

* use InformerSinusoidalPositionalEmbedding

* make fix-style

* fix from #21860

* fix name

* make fix-copies

* use time series utils

* fix dec num_heads

* docstring

* added time series util doc

* _import_structure

* formatting

* changes from review

* make style

* fix docs

* fix doc

* removed NegativeLogLikelihood

---------
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

8abe4930

[DETR and friends] Remove is_timm_available (#21814) · dde718e7

NielsRogge authored Mar 07, 2023



* First draft

* Fix to_dict

* Improve conversion script

* Update config

* Remove timm dependency

* Fix dummies

* Fix typo, add integration test

* Upload 101 model as well

* Remove timm dummies

* Fix style

---------
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>

dde718e7

[Whisper] Add model for audio classification (#21754) · 7c393181

Sanchit Gandhi authored Mar 07, 2023

* [Whisper] Add model for audio classification

* make fix-copies

* add to docs

* add docstring

* empty returns

* add code example

* switch to fleurs

* stick everything on one line

7c393181

Skip `test_multi_gpu_data_parallel_forward` for some model tests (#21991) · 9402788b

Yih-Dar authored Mar 07, 2023



skip test_multi_gpu_data_parallel_forward for some model tests
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

9402788b

[DETR, YOLOS] Fix device bug (#21974) · 95408e99
NielsRogge authored Mar 07, 2023
```
* Fix integration test

* Add test

* Add test
```
95408e99

Update `Jukebox` tests (#21984) · 5b28b783

Yih-Dar authored Mar 07, 2023



* update expected values for jukebox

* update expected values for jukebox

* update expected values for jukebox

* update expected values for jukebox

* update expected values for jukebox

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

5b28b783