1. 08 Feb, 2022 1 commit
    • Joao Gante's avatar
      Add TFSpeech2Text (#15113) · 8406fa6d
      Joao Gante authored
      * Add wrapper classes
      
      * convert inner layers to tf
      
      * Add TF Encoder and Decoder layers
      
      * TFSpeech2Text models
      
      * Loadable model
      
      * TF model with same outputs as PT model
      
      * test skeleton
      
      * correct tests and run the fixup
      
      * correct attention expansion
      
      * TFSpeech2Text pask_key_values with TF format
      8406fa6d
  2. 27 Jan, 2022 1 commit
  3. 14 Jan, 2022 1 commit
  4. 10 Jan, 2022 1 commit
    • Yih-Dar's avatar
      Add TFVisionEncoderDecoderModel (#14148) · b67fd797
      Yih-Dar authored
      
      
      * Start the work on TFVisionEncoderDecoderModel
      
      * Expose TFVisionEncoderDecoderModel
      
      * fix import
      
      * Add modeling_tf_vision_encoder_decoder to _ignore_modules in get_model_modules()
      
      * reorder
      
      * Apply the fix for checkpoint loading as in #14016
      
      * remove attention_mask + fix VISION_DUMMY_INPUTS
      
      * A minimal change to make TF generate() work for vision models as encoder in encoder-decoder setting
      
      * fix wrong condition: shape_list(input_ids) == 2
      
      * add tests
      
      * use personal TFViTModel checkpoint (for now)
      
      * Add equivalence tests + projection layer
      
      * style
      
      * make sure projection layer can run
      
      * Add examples
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Clean comments (need to work on TODOs for PyTorch models)
      
      * Remove TF -> PT in check_pt_tf_equivalence for TFVisionEncoderDecoderModel
      
      * fixes
      
      * Revert changes in PT code.
      
      * Update tests/test_modeling_tf_vision_encoder_decoder.py
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Add test_inference_coco_en for TF test
      
      * fix quality
      
      * fix name
      
      * build doc
      
      * add main_input_name
      
      * Fix ckpt name in test
      
      * fix diff between master and this PR
      
      * fix doc
      
      * fix style and quality
      
      * fix missing doc
      
      * fix labels handling
      
      * Delete auto.rst
      
      * Add the changes done in #14016
      
      * fix prefix
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * make style
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      b67fd797
  5. 23 Dec, 2021 1 commit
    • Yih-Dar's avatar
      Add TFCLIPModel (#13967) · 8f2cc1c3
      Yih-Dar authored
      
      
      * Start the work for TFCLIPModel
      
      * Convert to TF code (TODO: loss + doc)
      
      * Clean up
      
      * Fix pooled_output for TFCLIPTextTransformer - using tf.gather_nd
      
      * assert -> raise error
      
      * Expose TFCLIPModel
      
      * Deal with dummy_inputs
      
      * Add tests
      
      * Fix all tests. TODO: manual check weight loading + add more comments
      
      * Fix pt tf equivalence test
      
      * fixes
      
      * update TFCLIPVisionEmbeddings's Conv2D
      
      * Fix loss + overwrite test_pt_tf_model_equivalence from common
      
      * Add a comment about the change about MainLayer in test_keras_save_load
      
      * Set return_loss=True in TFCLIPModelTester + make tests pass
      
      * overwrite test_pt_tf_model_equivalence from tf common
      
      * fix base_model_prefix
      
      * Fix examples
      
      * remove unused
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * apply review suggestions
      
      * change self.pre_layrnorm to self.pre_layernorm
      
      * apply more review suggestions
      
      * return attention probs before dropout (to align with PT)
      
      * fix weight init
      
      * fix
      
      * build doc
      
      * fix missing doc
      
      * fix for test
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      8f2cc1c3
  6. 30 Nov, 2021 1 commit
    • Kamal Raj's avatar
      Tapas tf (#13393) · c468a87a
      Kamal Raj authored
      * TF Tapas first commit
      
      * updated docs
      
      * updated logger message
      
      * updated pytorch weight conversion
      script to support scalar array
      
      * added use_cache to tapas model config to
      work properly with tf input_processing
      
      * 1. rm embeddings_sum
      2. added # Copied
      3. + TFTapasMLMHead
      4. and lot other small fixes
      
      * updated docs
      
      * + test for tapas
      
      * updated testing_utils to check
      is_tensorflow_probability_available
      
      * converted model logits post processing using
      numpy to work with both PT and TF models
      
      * + TFAutoModelForTableQuestionAnswering
      
      * added TF support
      
      * added test for
      TFAutoModelForTableQuestionAnswering
      
      * added test for
      TFAutoModelForTableQuestionAnswering pipeline
      
      * updated auto model docs
      
      * fixed typo in import
      
      * added tensorflow_probability to run tests
      
      * updated MLM head
      
      * updated tapas.rst with TF  model docs
      
      * fixed optimizer import in docs
      
      * updated convert to np
      data from pt model is not
      `transformers.tokenization_utils_base.BatchEncoding`
      after pipeline upgrade
      
      * updated pipeline:
      1. with torch.no_gard removed, pipeline forward handles
      2. token_type_ids converted to numpy
      
      * updated docs.
      
      * removed `use_cache` from config
      
      * removed floats_tensor
      
      * updated code comment
      
      * updated Copyright Year and
      logits_aggregation Optional
      
      * updated docs and comments
      
      * updated docstring
      
      * fixed model weight loading
      
      * make fixup
      
      * fix indentation
      
      * added tf slow pipeline test
      
      * pip upgrade
      
      * upgrade python to 3.7
      
      * removed from_pt from tests
      
      * revert commit f18cfa9
      c468a87a
  7. 21 Nov, 2021 1 commit
  8. 16 Nov, 2021 1 commit
  9. 09 Nov, 2021 1 commit
    • Yih-Dar's avatar
      Add TFViTModel (#13778) · be4a6c64
      Yih-Dar authored
      
      
      * Start the work for TFViTModel
      
      * Convert to TF code - need to check in the follow up commits
      
      * Clean up model code
      
      * Expose TFViTModel
      
      * make style
      
      * make quality
      
      * Add test
      
      * make style & quality
      
      * Fix some imports
      
      * fix wrong usage - *kwargs => ** kwargs
      
      * Fix Conv2D weight loading (PT->TF) issue
      
      * Add tests for images with different sizes + fix model
      
      * Fix some common tests for TFViTModel
      
      * Use inputs instead of input_ids in test_compile_tf_model
      
      * Add a comment about transpose and Conv2D in convert_tf_weight_name_to_pt_weight_name
      
      * Avoid transpose in TFViT call
      
      * Fix Conv2D issue in load_tf2_weights_in_pytorch_model
      
      * Use tf.keras.layers.Conv2D instead of tf.nn.conv2d
      
      * Using simpler heuristic to detect Conv2D layer
      
      * Change convert_tf_weight_name_to_pt_weight_name to return TransposeType
      
      * Check tf_weight_shape is not None before using it
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * fix missing comma
      
      * fix input dtype
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      be4a6c64
  10. 02 Nov, 2021 1 commit
  11. 12 Oct, 2021 1 commit
    • Yih-Dar's avatar
      Add TFEncoderDecoderModel + Add cross-attention to some TF models (#13222) · 8b240a06
      Yih-Dar authored
      
      
      * Add cross attentions to TFGPT2Model
      
      * Add TFEncoderDecoderModel
      
      * Add TFBaseModelOutputWithPoolingAndCrossAttentions
      
      * Add cross attentions to TFBertModel
      
      * Fix past or past_key_values argument issue
      
      * Fix generation
      
      * Fix save and load
      
      * Add some checks and comments
      
      * Clean the code that deals with past keys/values
      
      * Add kwargs to processing_inputs
      
      * Add serving_output to TFEncoderDecoderModel
      
      * Some cleaning + fix use_cache value issue
      
      * Fix tests + add bert2bert/bert2gpt2 tests
      
      * Fix more tests
      
      * Ignore crossattention.bias when loading GPT2 weights into TFGPT2
      
      * Fix return_dict_in_generate in tf generation
      
      * Fix is_token_logit_eos_token bug in tf generation
      
      * Finalize the tests after fixing some bugs
      
      * Fix another is_token_logit_eos_token bug in tf generation
      
      * Add/Update docs
      
      * Add TFBertEncoderDecoderModelTest
      
      * Clean test script
      
      * Add TFEncoderDecoderModel to the library
      
      * Add cross attentions to TFRobertaModel
      
      * Add TFRobertaEncoderDecoderModelTest
      
      * make style
      
      * Change the way of position_ids computation
      
      * bug fix
      
      * Fix copies in tf_albert
      
      * Remove some copied from and apply some fix-copies
      
      * Remove some copied
      
      * Add cross attentions to some other TF models
      
      * Remove encoder_hidden_states from TFLayoutLMModel.call for now
      
      * Make style
      
      * Fix TFRemBertForCausalLM
      
      * Revert the change to longformer + Remove copies
      
      * Revert the change to albert and convbert + Remove copies
      
      * make quality
      
      * make style
      
      * Add TFRembertEncoderDecoderModelTest
      
      * make quality and fix-copies
      
      * test TFRobertaForCausalLM
      
      * Fixes for failed tests
      
      * Fixes for failed tests
      
      * fix more tests
      
      * Fixes for failed tests
      
      * Fix Auto mapping order
      
      * Fix TFRemBertEncoder return value
      
      * fix tf_rembert
      
      * Check copies are OK
      
      * Fix missing TFBaseModelOutputWithPastAndCrossAttentions is not defined
      
      * Add TFEncoderDecoderModelSaveLoadTests
      
      * fix tf weight loading
      
      * check the change of use_cache
      
      * Revert the change
      
      * Add missing test_for_causal_lm for TFRobertaModelTest
      
      * Try cleaning past
      
      * fix _reorder_cache
      
      * Revert some files to original versions
      
      * Keep as many copies as possible
      
      * Apply suggested changes - Use raise ValueError instead of assert
      
      * Move import to top
      
      * Fix wrong require_torch
      
      * Replace more assert by raise ValueError
      
      * Add test_pt_tf_model_equivalence (the test won't pass for now)
      
      * add test for loading/saving
      
      * finish
      
      * finish
      
      * Remove test_pt_tf_model_equivalence
      
      * Update tf modeling template
      
      * Remove pooling, added in the prev. commit, from MainLayer
      
      * Update tf modeling test template
      
      * Move inputs["use_cache"] = False to modeling_tf_utils.py
      
      * Fix torch.Tensor in the comment
      
      * fix use_cache
      
      * Fix missing use_cache in ElectraConfig
      
      * Add a note to from_pretrained
      
      * Fix style
      
      * Change test_encoder_decoder_save_load_from_encoder_decoder_from_pt
      
      * Fix TFMLP (in TFGPT2) activation issue
      
      * Fix None past_key_values value in serving_output
      
      * Don't call get_encoderdecoder_model in TFEncoderDecoderModelTest.test_configuration_tie until we have a TF checkpoint on Hub
      
      * Apply review suggestions - style for cross_attns in serving_output
      
      * Apply review suggestions - change assert + docstrings
      
      * break the error message to respect the char limit
      
      * deprecate the argument past
      
      * fix docstring style
      
      * Update the encoder-decoder rst file
      
      * fix Unknown interpreted text role "method"
      
      * fix typo
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      8b240a06
  12. 31 Aug, 2021 1 commit
    • Kamal Raj's avatar
      Deberta_v2 tf (#13120) · 3efcfeab
      Kamal Raj authored
      * Deberta_v2 tf
      
      * added new line at the end of file, make style
      
      * +V2, typo
      
      * remove never executed branch of code
      
      * rm cmnt and fixed typo in url filter
      
      * cleanup according to review comments
      
      * added #Copied from
      3efcfeab
  13. 12 Aug, 2021 1 commit
  14. 24 Jul, 2021 1 commit
    • Thibault FEVRY's avatar
      Add RemBERT model code to huggingface (#10692) · 434022ad
      Thibault FEVRY authored
      
      
      * Faster list concat for trainer_pt_utils.get_length_grouped_indices() (#11825)
      
      get_length_grouped_indices() in LengthGroupedSampler and DistributedLengthGroupedSampler
      is prohibitively slow for large number of megabatches (in test case takes hours for ~270k
      megabatches with 100 items each) due to slow list concatenation with sum(megabatches, []).
      
      Resolves: #11795
      Co-authored-by: default avatarctheodoris <cvtheodo@ds.dfci.harvard.edu>
      
      * Replace double occurrences as the last step (#11367)
      
      * [Flax] Fix PyTorch import error (#11839)
      
      * fix_torch_device_generate_test
      
      * remove @
      
      * change pytorch import to flax import
      
      * Fix reference to XLNet (#11846)
      
      * Switch mem metrics flag (#11851)
      
      * Switch mem metrics flag
      
      * Update src/transformers/training_args.py
      Co-authored-by: default avatarStas Bekman <stas00@users.noreply.github.com>
      Co-authored-by: default avatarStas Bekman <stas00@users.noreply.github.com>
      
      * Fix flos single node (#11844)
      
      * fixing flos bug/typo in non-distributed setting
      
      * storing flos every logging_interval
      
      * Fix two typos in docs (#11852)
      
      * typo2
      
      * fix typo
      
      * [Trainer] Report both steps and num samples per second (#11818)
      
      * [Trainer] Report both steps and num samples per second
      
      * Fix batch number
      
      * Update src/transformers/trainer_utils.py
      Co-authored-by: default avatarStas Bekman <stas00@users.noreply.github.com>
      
      * Address review comments
      Co-authored-by: default avatarStas Bekman <stas00@users.noreply.github.com>
      
      * Add some tests to the slow suite #11860
      
      * Enable memory metrics in tests that need it (#11859)
      
      * fixed a small typo in the doc (#11856)
      
      * typo (#11858)
      
      * Add option to log only once in multinode training (#11819)
      
      * Add option to long only once in multinode training
      
      * Use an alternate property
      
      * [Wav2Vec2] SpecAugment Fast (#11764)
      
      * first try
      
      * finish
      
      * [lm examples] fix overflow in perplexity calc (#11855)
      
      * fix overflow in perplexity calc
      
      * use inf
      
      * fix
      
      * [Examples] create model with custom config on the fly (#11798)
      
      * create custom model on the flight
      
      * better wording
      
      * add update_from_string
      
      * cleanup
      
      * cleanup
      
      * Update src/transformers/configuration_utils.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * more bool options
      
      * style
      
      * fix logger
      
      * add test
      
      * add the doc
      
      * assert on conflict of options
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * [Wav2Vec2ForCTC] example typo fixed (#11878)
      
      * Ensure input tensor are on device. (#11874)
      
      The feature extractor does not create tensors on the appropriate device,
      so we call `ensure_tensor_on_device` before feeding the processed inputs
      to the model.
      
      * Fix usage of head masks by TF encoder-decoder models' `generate()` function (#11775)
      
      * Fix Bart
      
      * Fix Blenderbot{,_small}
      
      * Fix LED
      
      * Fix Marian
      
      * Fix MBart
      
      * Fix Pegasus
      
      * Fix T5
      
      * Add test for generation with head_mask
      
      * Add a common TF test
      
      * Override a test for the LED model as head masking is not yet properly implemented
      
      * Remove all head_masks from input preparation for LED
      
      * Drop masking for T5 as it needs a bit of refactor
      
      * Correcting comments in T5Stack to reflect correct tuple order  (#11330)
      
      * Correcting comments to reflect correct tuple order
      
      In order to match the actual order (line 513 and 516, and as accessed in 968), I've changed the order mentioned in comments L962 and L966-967.
      
      * Update modeling_t5.py
      
      Updating another comment as well
      
      * Removing extra space
      
      * Fixing style and quality
      
      * style & quality
      
      * Update src/transformers/models/t5/modeling_t5.py
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * [Flax] Allow dataclasses to be jitted (#11886)
      
      * fix_torch_device_generate_test
      
      * remove @
      
      * change dataclasses to flax ones
      
      * fix typo
      
      * fix jitted tests
      
      * fix bert & electra
      
      * changing find_batch_size to work with tokenizer outputs (#11890)
      
      * changing find_batch_size to work with tokenizer outputs
      
      trainer_pt_utils.find_batch_size does not recognize the batch size of BatchEncoding objects. This can cause an error when a trainer relies on find_batch_size to report the number of observed examples in the evaluation loop.
      
      * Trigger CI
      Co-authored-by: default avatarjrenner <joseph.renner@inria.fr>
      
      * Link official Cloud TPU JAX docs (#11892)
      
      * Flax Generate (#11777)
      
      * fix_torch_device_generate_test
      
      * remove @
      
      * add
      
      * indexing
      
      * correct a couple of tests
      
      * fix tests
      
      * add logits processor
      
      * finish top_k, top_p, temp
      
      * add docs
      
      * correct flax prng key default
      
      * improve generate
      
      * add generation docs
      
      * add docs
      
      * make style
      
      * revert model outputs change
      
      * make style
      
      * correct typo
      
      * fix tests
      
      * fix slow test
      
      * add raise
      
      * finish generation
      Co-authored-by: default avatarPatrick von Platen <patrick@huggingface.co>
      
      * Add Emotion Speech Noteboook (#11900)
      
      * Update deepspeed config to reflect hyperparameter search parameters (#11896)
      
      * rebuild deepspeed config for hyperparameter search
      
      * reformat code to fix style issues
      
      * Adding new argument `max_new_tokens` for generate. (#11476)
      
      * Adding new argument `max_new_tokens` for generate.
      
      This is a proposal to add a new argument `max_new_tokens` to `generate`.
      This include a `MaxNewTokensCriteria` that enables callers that don't
      know about the token length ahead (like pipelines callers) to manage
      more easily the length of their generated output.
      
      * Adding a test for the user warning when both`max_length` and
      `max_new_tokens` are used together.
      
      * Removed redundant `no_grad`.
      
      * Added Sequence Classification class in GPTNeo (#11906)
      
      * seq classification changes
      
      * fix tests
      
      * [Flax] Return Attention from BERT, ELECTRA, RoBERTa and GPT2 (#11918)
      
      * Added logic to return attention from flax-bert model and added test cases to check that
      
      * Added new line at the end of file to test_modeling_flax_common.py
      
      * fixing code style
      
      * Fixing Roberta and Elextra models too from cpoying bert
      
      * Added temporary hack to not run test_attention_outputs for FlaxGPT2
      
      * Returning attention weights from GPT2 and changed the tests accordingly.
      
      * last fixes
      
      * bump flax dependency
      Co-authored-by: default avatarjayendra <jayendra@infocusp.in>
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Test optuna and ray (#11924)
      
      * Remove `datasets` submodule
      
      * fix assert (#11935)
      
      * Remove redundant `nn.log_softmax` in `run_flax_glue.py` (#11920)
      
      * Remove redundant `nn.log_softmax` in `run_flax_glue.py`
      
      `optax.softmax_cross_entropy` expects unnormalized logits, and so it already calls `nn.log_softmax`, so I believe it is not needed here. `nn.log_softmax` is idempotent so mathematically it shouldn't have made a difference.
      
      * Remove unused 'flax.linen' import
      
      * Add MT5ForConditionalGeneration as supported arch. to summarization README (#11961)
      
      * Add MT5ForConditionalGeneration as supported arch.
      
      * Update README.md
      
      * Add FlaxCLIP (#11883)
      
      * add flax CLIP
      
      * default input_shape
      
      * add tests
      
      * fix test
      
      * fix name
      
      * fix docs
      
      * fix shapes
      
      * attend at least 1 token
      
      * flax conv to torch conv
      
      * return floats
      
      * fix equivalence tests
      
      * fix import
      
      * return attention_weights and update tests
      
      * fix dosctrings
      
      * address patricks comments
      
      * input_shape arg
      
      * add tests for get_image_features and get_text_features methods
      
      * fix tests
      
      * RAG-2nd2end-revamp (#11893)
      
      * initial
      
      * code quality test
      
      * code quality
      
      * added test functions in test_modeling_rag.py and test_retrieval_rag.py to test end2end retreiver
      
      * minor change in test_modeling_rag
      
      * fixed tests
      
      * Update examples/research_projects/rag-end2end-retriever/README.md
      
      typo corrected as suggested by lhoestq
      Co-authored-by: default avatarQuentin Lhoest <42851186+lhoestq@users.noreply.github.com>
      
      * Update examples/research_projects/rag-end2end-retriever/finetune_rag.py
      
      type change suggested by lhoestq
      Co-authored-by: default avatarQuentin Lhoest <42851186+lhoestq@users.noreply.github.com>
      
      * Update src/transformers/models/rag/retrieval_rag.py
      
      Adding this change as mentioned by lhoestq.
      Co-authored-by: default avatarQuentin Lhoest <42851186+lhoestq@users.noreply.github.com>
      
      * completed the minor changes suggested by the reviewers
      Co-authored-by: default avatarQuentin Lhoest <42851186+lhoestq@users.noreply.github.com>
      
      * modify qa-trainer (#11872)
      
      * modify qa-trainer
      
      * fix flax model
      
      * bugfixes training_args.py (#11922)
      
      modified according to:
      https://pytorch.org/xla/release/1.8.1/_modules/torch_xla/core/xla_model.html
      
      
      
      * reinitialize wandb config for each hyperparameter search run (#11945)
      
      * Add regression tests for slow sentencepiece tokenizers.  (#11737)
      
      * add test_vocab_size for sentencepiece tok.
      
      * add test_get_vocab for sentencepiece tok.
      
      * add test_convert_token_and_id for sentencepiece tok.
      
      * add test_tokenize_and_convert_tokens_to_string for all tok.
      
      * improve test_tokenize_and_convert_tokens_to_string for sp. tok.
      
      * add common tokenizer integration tests
      - for albert
      - for barthez
      
      * add tokenizer integration tests to bert gen.
      
      * add most tokenizer integration tests
      
      * fix camembert tokenizer integration test
      
      * add tokenizer integration test to marian
      
      * add tokenizer integration test to reformer
      
      * add typing and doc to tokenizer_integration_test_util
      
      * fix tokenizer integration test of reformer
      
      * improve test_sentencepiece_tokenize_and_convert_tokens_to_string
      
      * empty commit to trigger CI
      
      * fix tokenizer integration test of reformer
      
      * remove code not needed anymore
      
      * empty commit to trigger CI
      
      * empty commit to trigger CI
      
      * Authorize args when instantiating an AutoModel (#11956)
      
      * Neptune.ai integration (#11937)
      
      An option that turns on neptune.ai logging
      --report_to 'neptune'
      
      Additional ENV variables:
      	NEPTUNE_PROJECT
      	NEPTUNE_API_TOKEN
      	NEPTUNE_RUN_NAME (optional)
      	NEPTUNE_STOP_TIMEOUT (optional)
      
      * Run the integration tests on schedule tests instead of master tests
      
      * [deepspeed] docs (#11940)
      
      * deepspeed docs
      
      * cleanup
      
      * cleanup
      
      * typo correction (#11973)
      
      * typo correction
      
      * type corrections
      
      * ByT5 model (#11971)
      
      * allow tf to use uneven num of layers
      
      * add tokenizer
      
      * finish docs
      
      * finish docs
      
      * Apply suggestions from code review
      
      * include in index
      
      * finish
      
      * Update docs/source/model_doc/byt5.rst
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      
      * apply sylvais suggestions
      
      * make style
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      
      * Typo in usage example, changed to device instead of torch_device (#11979)
      
      * [DeepSpeed] decouple `DeepSpeedConfigHF` from `Trainer` (#11966)
      
      * decouple DeepSpeedConfigHF from Trainer
      
      * add LoggingLevel ctx manager; add new test
      
      * cleanup
      
      * add docs
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * implemented suggested renames
      
      * formatter workaround
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * [Trainer] add train loss and flops metrics reports (#11980)
      
      * add train loss and flops metrics reports
      
      * consistency
      
      * add train_loss to skip keys
      
      * restore on_train_end call timing
      
      * Bump urllib3 from 1.25.8 to 1.26.5 in /examples/research_projects/lxmert (#11983)
      
      Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.25.8 to 1.26.5.
      - [Release notes](https://github.com/urllib3/urllib3/releases)
      - [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
      - [Commits](https://github.com/urllib3/urllib3/compare/1.25.8...1.26.5
      
      )
      
      ---
      updated-dependencies:
      - dependency-name: urllib3
        dependency-type: direct:production
      ...
      Signed-off-by: default avatardependabot[bot] <support@github.com>
      Co-authored-by: default avatardependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
      
      * [RAG] Fix rag from pretrained question encoder generator behavior (#11962)
      
      * fix_torch_device_generate_test
      
      * remove @
      
      * fix rag from pretrained loading
      
      * add test
      
      * uplaod
      
      * finish
      
      * VisualBERT (#10534)
      
      * Init VisualBERT
      
      * Add cookie-cutter, Config, and Embeddings
      
      * Add preliminary Model
      
      * Add Bert analogous classes
      
      * Add basic code for NLVR, VQA, Flickr
      
      * Update Init
      
      * Fix VisualBert Downstream Models
      
      * Rename classifier to cls
      
      * Comment position_ids buffer
      
      * Remove sentence image predictor output
      
      * Update output dicts
      
      * Remove unnecessary files
      
      * Fix Auto Modeling
      
      * Fix transformers init
      
      * Add conversion script
      
      * Add conversion script
      
      * Fix docs
      
      * Update visualbert modelling
      
      * Update configuration
      
      * Style fixes
      
      * Add model and integration tests
      
      * Add all tests
      
      * Update model mapping
      
      * Add simple detector from original repository
      
      * Update docs and configs
      
      * Fix style
      
      * Fix style
      
      * Update docs
      
      * Fix style
      
      * Fix import issues in style
      
      * Fix style
      
      * Add changes from review
      
      * Fix style
      
      * Fix style
      
      * Update docs
      
      * Fix style
      
      * Fix style
      
      * Update docs/source/model_doc/visual_bert.rst
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/models/visual_bert/modeling_visual_bert.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update tests/test_modeling_visual_bert.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/models/visual_bert/modeling_visual_bert.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/models/visual_bert/modeling_visual_bert.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/models/visual_bert/modeling_visual_bert.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Add changes from review
      
      * Remove convert run script
      
      * Add changes from review
      
      * Update src/transformers/models/visual_bert/modeling_visual_bert.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/models/visual_bert/modeling_visual_bert.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/models/visual_bert/modeling_visual_bert.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/models/visual_bert/modeling_visual_bert.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/models/visual_bert/modeling_visual_bert.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Add changes from review
      
      * Add changes from review
      
      * Add visual embedding example in docs
      
      * Fix "copied from" comments
      
      * Add changes from review
      
      * Fix error, style, checkpoints
      
      * Update docs
      
      * Fix integration tests
      
      * Fix style
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Fix examples (#11990)
      
      * [docs] fix xref to `PreTrainedModel.generate` (#11049)
      
      * fix xref to generate
      
      * do the same for search methods
      
      * style
      
      * style
      
      * Update return introduction (#11976)
      
      Make it clear that the `forward` method now returns a dict instead of tuple.
      
      Fix style
      
      * [deepspeed] Move code and doc into standalone files (#11984)
      
      * move code and docs
      
      * style
      
      * moved
      
      * restore
      
      * [deepspeed] add nvme test skip rule (#11997)
      
      * add nvme skip rule
      
      * fix
      
      * Fix weight decay masking in `run_flax_glue.py` (#11964)
      
      * Fix weight decay masking in `run_flax_glue.py`
      
      Issues with the previous implementation:
      - The `dict` from `traverse_util.flatten_dict` has keys which are tuples of strings, not one long string with the path separated by periods.
      - `optax.masked` applies the transformation wherever the mask is True, so the masks are flipped.
      - Flax's LayerNorm calls the scale parameter `scale` not `weight`
      
      * Fix formatting with black
      
      * adapt results
      Co-authored-by: default avatarPatrick von Platen <patrick@huggingface.co>
      
      * [Flax] Refactor MLM  (#12013)
      
      * fix_torch_device_generate_test
      
      * remove @
      
      * finish refactor
      Co-authored-by: default avatarPatrick von Platen <patrick@huggingface.co>
      
      * [Deepspeed] Assert on mismatches between ds and hf args (#12021)
      
      * wip
      
      * add mismatch validation + test
      
      * renames
      
      * Update docs/source/main_classes/deepspeed.rst
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * renames
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * [TrainerArguments] format and sort __repr__, add __str__ (#12018)
      
      * format and sort __repr__, add __str__
      
      * typo
      
      * use __str__ directly
      
      * alias __repr__ = __str__
      
      * Fixed Typo in modeling_bart.py (#12035)
      
      * Fixed Typo in modeling_bart.py - Issue #11895
      
      * Fixed Typo in modeling_bart.py
      
      * fix deberta 2 tokenizer integration test (#12017)
      
      * fix docs of past_key_values (#12049)
      
      * [JAX] Bump jax lib (#12053)
      
      * fix_torch_device_generate_test
      
      * remove @
      
      * bump up jax lib
      
      * Fixes bug that appears when using QA bert and distilation. (#12026)
      
      * Fixing bug that appears when using distilation (and potentially other uses).
      During backward pass Pytorch complains with:
      RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
      This happens because the QA model code modifies the start_positions and end_positions input tensors, using clamp_ function: as a consequence the teacher and the student both modifies the inputs, and backward pass fails.
      
      * Fixing all models QA clamp_ bug.
      
      * Extend pipelines for automodel tupels (#12025)
      
      * fix_torch_device_generate_test
      
      * remove @
      
      * finish
      
      * refactor
      
      * add test
      
      * fix test
      
      * Attempt at simplification.
      
      * Small fix.
      
      * Fixing non existing AutoModel for TF.
      
      * Naming.
      
      * Remove extra condition.
      Co-authored-by: default avatarpatrickvonplaten <patrick.v.platen@gmail.com>
      
      * Add optional grouped parsers description to HfArgumentParser (#12042)
      
      * Adding optional argument group to HfArgumentParser
      
      * Minor
      
      * remove whitespace
      
      * Minor styling
      
      * adds metric prefix. (#12057)
      
      * adds metric prefix.
      
      * update tests to include prefix
      
      * skip failing test (#12059)
      
      * Fix integration tests (#12066)
      
      * Fix tapas issue (#12063)
      
      * Fix scatter function to be compatible with torch-scatter 2.7.0
      
      * Allow test again
      
      * updated the original RAG implementation to be compatible with latest Pytorch-Lightning (#11806)
      
      * updated the original RAG implementation to be compatible with the latest PL version
      
      * updated the requirements.txt file
      
      * execute make style
      
      * code quality test
      
      * code quality
      
      * conflix resolved in requirement.txt
      
      * code quality
      
      * changed the MyDDP class name to CustomDDP
      
      * Replace legacy tensor.Tensor with torch.tensor/torch.empty (#12027)
      
      * Replace legacy torch.Tensor constructor with torch.{tensor, empty}
      
      * Remove torch.Tensor in examples
      
      * Add torch to requirements.txt in language-modeling (#12040)
      
      * Add torch to requirements.txt in language-modeling
      
      * Update examples/pytorch/language-modeling/requirements.txt
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Properly indent block_size (#12070)
      
      * [Deepspeed] various fixes (#12058)
      
      * replace deprecated config
      
      * sub_group_size was too big
      
      * complete deprecation removal
      
      * [Deepspeed Wav2vec2] integration (#11638)
      
      * wip
      
      * wip - but working with https://github.com/microsoft/DeepSpeed/pull/1044
      
      * cleanup
      
      * workaround
      
      * working 5/8 modes
      
      * solve fp32 distributed zero3
      
      * style
      
      * sync
      
      * sync
      
      * rework
      
      * deprecation
      
      * cleanup
      
      * https://github.com/microsoft/DeepSpeed/pull/1044
      
       pr was merged
      
      * clean up
      
      * add a guide
      
      * more prose
      
      * more prose
      
      * fix
      
      * more prose
      
      * sub_group_size was too big
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * refactor
      
      * bug fix
      
      * make the true check explicit
      
      * new deepspeed release
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * typo
      
      * Update run_ner.py with id2label config (#12001)
      
      * sync LayerDrop for Wav2Vec2Encoder + tests (#12076)
      
      * Add DETR (#11653)
      
      * Squash all commits of modeling_detr_v7 branch into one
      
      * Improve docs
      
      * Fix tests
      
      * Style
      
      * Improve docs some more and fix most tests
      
      * Fix slow tests of ViT, DeiT and DETR
      
      * Improve replacement of batch norm
      
      * Restructure timm backbone forward
      
      * Make DetrForSegmentation support any timm backbone
      
      * Fix name of output
      
      * Address most comments by @LysandreJik
      
      * Give better names for variables
      
      * Conditional imports + timm in setup.py
      
      * Address additional comments by @sgugger
      
      * Make style, add require_timm and require_vision to testsé
      
      * Remove train_backbone attribute of DetrConfig, add methods to freeze/unfreeze backbone
      
      * Add png files to fixtures
      
      * Fix type hint
      
      * Add timm to workflows
      
      * Add `BatchNorm2d` to the weight initialization
      
      * Fix retain_grad test
      
      * Replace model checkpoints by Facebook namespace
      
      * Fix name of checkpoint in test
      
      * Add user-friendly message when scipy is not available
      
      * Address most comments by @patrickvonplaten
      
      * Remove return_intermediate_layers attribute of DetrConfig and simplify Joiner
      
      * Better initialization
      
      * Scipy is necessary to get sklearn metrics
      
      * Rename TimmBackbone to DetrTimmConvEncoder and rename DetrJoiner to DetrConvModel
      
      * Make style
      
      * Improve docs and add 2 community notebooks
      Co-authored-by: default avatarLysandre <lysandre.debut@reseau.eseo.fr>
      
      * [test] support more than 2 gpus (#12074)
      
      * support more than 2 gpus
      
      * style
      
      * Wav2Vec2 Pretraining (#11306)
      
      * Working quantizer forward
      
      * Working quantizer forward
      
      * Clean up unused model parts, test reproducibility
      
      * Working quantizer forward
      
      * Clean up unused model parts, test reproducibility
      
      * Remove custom outputs from the shared ones
      
      * correct conversion
      
      * correct bug
      
      * add first pretrain script
      
      * save intermediate
      
      * static shapes
      
      * save intermediate
      
      * finish first pretrain script version
      
      * more refactor
      
      * remove wanddb
      
      * refactor more
      
      * improve test
      
      * correct perplexity compute bug
      
      * finish model implementation
      
      * add to docs
      
      * finish docs
      
      * finish pretraining script
      
      * finish pretraining script
      
      * remove wandb
      
      * finish PR for merge
      
      * finish config
      
      * finish
      
      * make deepspeed work
      
      * Apply suggestions from code review
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * apply suggestions
      
      * fix flaky test
      Co-authored-by: default avatarpatrickvonplaten <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * pass decay_mask fn to optimizer (#12087)
      
      * rm require_version_examples (#12088)
      
      * [Wav2Vec2ForPretraining] Correct checkpoints wav2vec2 & fix tests (#12089)
      
      * fix_torch_device_generate_test
      
      * remove @
      
      * fix tests
      
      * Add text_column_name and label_column_name to run_ner and run_ner_no_trainer args (#12083)
      
      * Add text_column_name and label_column_name to run_ner args
      
      * Minor fix: grouping for text and label column name
      
      * CLIPFeatureExtractor should resize images with kept aspect ratio (#11994)
      
      * Resize with kept aspect ratio
      
      * Fixed failed test
      
      * Overload center_crop and resize methods instead
      
      * resize should handle non-PIL images
      
      * update slow test
      
      * Tensor => tensor
      Co-authored-by: default avatarpatil-suraj <surajp815@gmail.com>
      
      * New TF GLUE example (#12028)
      
      * Pushing partially-complete new GLUE example
      
      * First draft of the new TF GLUE example! Needs a little more testing to be sure but it's almost ready.
      
      * Fix to the fit() call
      
      * Bugfixes, making sure TPU and multi-GPU support is ready
      
      * Remove logger line that depends on Pytorch
      
      * Style pass
      
      * Deleting old TF GLUE example
      
      * Include label2id and id2label in the saved model config
      
      * Don't clobber the existing model.config.label2id
      
      * Style fixes
      
      * Update examples/tensorflow/text-classification/run_glue.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Fix quality
      
      * Update README.md to cover the TF GLUE example.
      
      * Minor style edits
      
      * Appending label2id and id2label to models to ensure inference works properly (#12102)
      
      * Fix a condition in test_generate_with_head_masking (#11911)
      
      * Fix a condition in test_generate_with_head_masking
      
      * Fix usage of head_mask in bigbirg_pegasus
      
      * Fix head masking for speech2text
      
      * Resolve copy mismatch + drop unwanted print statement
      
      * Fix the condition
      
      * Flax VisionTransformer (#11951)
      
      * adding vit for flax
      
      * added test for Flax-vit and some bug-fixes
      
      * overrided methods where variable changes were necessary for flax_vit test
      
      * added FlaxViTForImageClassification for test
      
      * Update src/transformers/models/vit/modeling_flax_vit.py
      Co-authored-by: default avatarSuraj Patil <surajp815@gmail.com>
      
      * made changes suggested in PR
      
      * Adding jax-vit models for autoimport
      
      * swapping num_channels and height,width dimension
      
      * fixing the docstring for torch-like inputs for VIT
      
      * add model to main init
      
      * add docs
      
      * doc, fix-copies
      
      * docstrings
      
      * small test fixes
      
      * fix docs
      
      * fix docstr
      
      * Apply suggestions from code review
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * style
      Co-authored-by: default avatarjayendra <jayendra@infocusp.in>
      Co-authored-by: default avatarSuraj Patil <surajp815@gmail.com>
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * add relevant description to tqdm in examples (#11927)
      
      * add relevant `desc` in examples
      
      * require_version datasets>=1.8.0
      
      * Fix head masking generate tests (#12110)
      
      * fix_torch_device_generate_test
      
      * remove @
      
      * fix tests
      
      * Flax CLM script (#12023)
      
      * first draft
      
      * max_seq_length => block_size
      
      * fix arg names
      
      * fix typos
      
      * fix loss calculation
      
      * add max examples, fix  train eval steps, metrics
      
      * optimizer mask
      
      * fix perpelexity, metric logging
      
      * fix logging
      
      * data_collator = > data_loader
      
      * refactor loss_fn
      
      * support single GPU
      
      * pass distributed to write_metric
      
      * fix jitting
      
      * fix single device training
      
      * fix single device metrics
      
      * close inner progress bars once finished
      
      * add overwrite_cache arg
      
      * ifx dataset caching issue
      
      * add more logs
      
      * few small fixes,
      
      * address nicholas suggestions
      
      * fix docstr
      
      * address patricks suggestions
      
      * make flake happy
      
      * pass new new_dropout_rng to apply_gradients
      
      * reset train metrics after every epoc
      
      * remove distributed logis, small fixes
      
      * Add from_pretrained to dummy timm objects (#12097)
      
      * Add from_pretrained to dummy timm
      
      * Fix at the source
      
      * Update utils/check_dummies.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Missing pretrained dummies
      
      * Style
      Co-authored-by: default avatarSylvain Gugger <sylvain.gugger@gmail.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Fix t5 error message (#12136)
      
      * Fix t5 error message
      
      * Fix again
      
      * Fix megatron_gpt2 attention block's causal mask (#12007)
      
      * Fix megatron_gpt2 attention block's causal mask.
      
      * compatibility with checkpoints created with recent versions of Megatron-LM
      
      * added integration test for the released Megatron-GPT2 model
      
      * code style changes
      
      * added option to megatron conversion script to read from config file
      Co-authored-by: default avatarGuido Novati <gnovati@nvidia.com>
      
      * Add mlm pretraining xla torch readme (#12011)
      
      * fix_torch_device_generate_test
      
      * remove @
      
      * upload
      
      * Apply suggestions from code review
      
      * Apply suggestions from code review
      
      * Apply suggestions from code review
      
      * Update examples/flax/language-modeling/README.md
      
      * add more info
      
      * finish
      
      * fix
      Co-authored-by: default avatarPatrick von Platen <patrick@huggingface.co>
      
      * add readme for flax clm (#12111)
      
      * add readme for flax clm
      
      * use section link for tokenizer
      
      * Apply suggestions from code review
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * update metrics
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * FlaxBart (#11537)
      
      * Start working on FlaxBart
      
      * Create modeling_flax_bart.py
      
      * Write FlaxBartAttention
      
      * Add FlaxBartEncoderLayer
      
      * Add FlaxBartDecoderLayer and some typing
      
      * Add helepr function for FlaxBart
      
      * shift_tokens_right
      
      * _make_causal_mask
      
      * _expand_mask
      
      * Add PositionalEmbedding and fix init_std naming
      
      * Add FlaxBartPretrainedModel
      
      * Add FlaxBartEncoder
      
      * Add FlaxBartEncoder
      
      * Add FlaxBartEncoder among modules to be imported
      
      * YET WE CANNOT INITIALIZE THAT!! :(
      
      * Make BartEncoder working
      
      Change BartEncoder to instance of nn.Module so far
      
      * Add FlaxBartDecoder
      
      * Add FlaxBartModel
      
      * TODO to make model run -> Prepapre model inputs
      
      * Resolve padding
      
      * Add FlaxBartModel
      
      * Add FlaxBartModel into importable modules
      
      * Remove FlaxBartEncoder and FlaxBartDecoder from importable modules
      
      * make style; not properly working
      
      * make style; make quality not pass due to some import I left
      
      * Remove TODO for padding_idx in nn.Embed so far
      
      * Add FlaxBartForConditionalGeneration
      
      * Incorporate Flax model output classes, i.e. return_dict
      
      * Add another models and incorporate use_cache arg
      
      * Add FlaxBartForSequenceClassification and FlaxBartForQuestionAnswering
      
      * Incorporate use_cache arg from PyTorch implementation
      
      * Add all necessary Flax output utils
      
      * Add FlaxBartForCausalLM; not working yet'
      
      * Add minor improvements; still lacks some functionality
      
      * Update docs, src and tests
      
      * Add support of FlaxBart to docs/source
      
      * Fix some bugs in FlaxBart souce code
      
      * Add some neccessary tests for FlaxBart models - jit_compilation not passing
      
      * Fix tests and add test_head_masking
      
      * Fix tests for @jax.jit computation
      
      * Add test_head_masking
      
      * Migrate FlaxBart tests from jax.numpy to numpy
      
      * Remove FlaxBartForCausalLM
      
      * Clean repo
      
      * fix bart model weight structure
      
      * Fix FlaxBartForSequenceClassification
      
      Slicing is not possible to use below jit, therefore, selecting sentence
      representation from hidden_states must be changed.
      
      * Allow FlaxBartForSequenceClassification for testing pt_flax equivalence
      
      * Allow testing for FlaxBartForQA for pt_flax equivalence
      
      * Add a comment to FlaxBartForSequenceClassification + change noise from 1e-3 to 1e-6
      
      * remove past_key_values
      
      * remove inputs_mebeds and make input_ids required
      
      * add position ids
      
      * re-write attention layer
      
      * fix dataclass
      
      * fix pos embeds and attention output
      
      * fix pos embeds
      
      * expose encode method
      
      * expose decode method
      
      * move docstring to top
      
      * add cache for causal attn layer
      
      * remove head masking for now
      
      * s2s greedy search first pass
      
      * boom boom
      
      * fix typos
      
      * fix greedy generate for bart
      
      * use encoder, decoder layers instead of num_hidden_layers
      
      * handle encoder_outputs
      
      * cleanup
      
      * simplify decoding
      
      * more clean-up
      
      * typos
      
      * Change header + add {decoder_,}position_ids into 2 models
      
      * add BartConfig
      
      * fix existing tests
      
      * add encode, decode methods
      
      * Fix shift_tokens_right for JIT compilation + clarify one condition
      
      * fix decode
      
      * encoder => encode
      
      * simplify generate
      
      * add tests for encode and decode
      
      * style
      
      * add tests for cache
      
      * fix equivalence tests
      
      * sample generate now works with seq2seq
      
      * generation tests
      
      * initialize dense layers
      
      * docstring and cleanup
      
      * quality
      
      * remove get/set input_embeddings
      
      * address Patricks suggestions
      
      * decode for every model, remove encoder_outputs from call
      
      * update tests accordingly
      
      * decode returns only decoder outputs and logits
      
      * fix arguments
      
      * doc encode, decode methods
      
      * correct base_model_prefix
      
      * fix test for seq classif model
      
      * fix docs
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarSuraj Patil <surajp815@gmail.com>
      
      * Feature to use the PreTrainedTokenizerFast class as a stand-alone tokenizer (#11810)
      
      * feature for tokenizer without slow/legacy version
      
      * format
      
      * modify common test
      
      * add tests
      
      * add PreTrainedTokenizerFast to AutoTokenizer
      
      * format
      
      * change tokenizer common test in order to be able to run test without a slow version
      
      * update tokenizer fast test in order to use `rust_tokenizer_class` attribute instead of `tokenizer_class`
      
      * add autokenizer test
      
      * replace  `if self.tokenizer_class is not None` with ` if self.tokenizer_class is None`
      
      * remove obsolete change in comment
      
      * Update src/transformers/tokenization_utils_base.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Update src/transformers/tokenization_utils_fast.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * change `get_main_tokenizer` into `get_tokenizers`
      
      * clarify `get_tokenizers` method
      
      * homogenize with `test_slow_tokenizer` and `test_rust_tokenizer`
      
      * add `test_rust_tokenizer = False` to tokenizer which don't define a fast version
      
      * `test_rust_tokenizer = False` for BertJapaneseTokenizer
      
      * `test_rust_tokenizer = False` for BertJapaneseCharacterTokenizationTest
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * [Flax] Add links to google colabs (#12146)
      
      * fix_torch_device_generate_test
      
      * remove @
      
      * add colab links
      
      * Don't log anything before logging is setup in examples (#12121)
      
      * Don't log anything before logging is setup in examples
      
      * Last example
      
      * Use text_column_name variable instead of "text" (#12132)
      
      * Use text_column_name variable instead of "text"
      
      `text_column_name` was already defined above where I made the changes and it was also used below where I made changes.
      
      This is a very minor change. If a dataset does not use "text" as the column name, then the `tokenize_function` will now use whatever column is assigned to `text_column_name`. `text_column_name` is just the first column name if "text" is not a column name. It makes the function a little more robust, though I would assume that 90% + of datasets use "text" anyway.
      
      * black formatting
      
      * make style
      Co-authored-by: default avatarNicholas Broad <nicholas@nmbroad.com>
      
      * [lm examples] Replicate --config_overrides addition to other LM examples (#12135)
      
      * [lm examples] Replicate --config_overrides addition to other LM examples
      
      * Removing no trainer files changes
      
      * Update README
      Co-authored-by: default avatarKumar Abhishek <kabhishek@expedia.com>
      
      * fix error message (#12148)
      
      * [optim] implement AdafactorSchedule (#12123)
      
      * implement AdafactorSchedule
      
      * typo
      
      * fix
      
      * Update src/transformers/optimization.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * [style] consistent nn. and nn.functional (#12124)
      
      * consistent nn. and nn.functional
      
      * fix glitch
      
      * fix glitch #2
      
      * Adding TFWav2Vec2Model (#11617)
      
      * [WIP] Add TFWav2Vec2Model
      
      Work in progress for adding a tensorflow version of Wav2Vec2
      
      * feedback changes
      
      * small fix
      
      * Test Feedback Round 1
      
      * Add SpecAugment and CTC Loss
      
      * correct spec augment mask creation
      
      * docstring and correct copyright
      
      * correct bugs
      
      * remove bogus file
      
      * finish tests correction
      
      * del unnecessary layers
      
      * Update src/transformers/models/wav2vec2/modeling_tf_wav2vec2.py
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * make style
      
      * correct final bug
      
      * Feedback Changes
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * [Flax] Fix flax pt equivalence tests (#12154)
      
      * fix_torch_device_generate_test
      
      * remove @
      
      * upload
      
      * consistent nn. and nn.functional: p2 templates (#12153)
      
      * Flax Big Bird (#11967)
      
      * add flax bert
      
      * bert -> bigbird
      
      * original_full ported
      
      * add debugger
      
      * init block sparse
      
      * fix copies ; gelu_fast -> gelu_new
      
      * block sparse port
      
      * fix block sparse
      
      * block sparse working
      
      * all ckpts working
      
      * fix-copies
      
      * make quality
      
      * init tests
      
      * temporary fix for FlaxBigBirdForMultipleChoice
      
      * skip test_attention_outputs
      
      * fix
      
      * gelu_fast -> gelu_new ; fix multiple choice model
      
      * remove nsp
      
      * fix sequence classifier
      
      * fix
      
      * make quality
      
      * make fix-copies
      
      * finish
      
      * Delete debugger.ipynb
      
      * Update src/transformers/models/big_bird/modeling_flax_big_bird.py
      
      * make style
      
      * finish
      
      * bye bye jit flax tests
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * [style] consistent nn. and nn.functional: part 3 `tests` (#12155)
      
      * consistent nn. and nn.functional: p3 templates
      
      * restore
      
      * [style] consistent nn. and nn.functional: part 4 `examples` (#12156)
      
      * consistent nn. and nn.functional: p4 examples
      
      * restore
      
      * consistent nn. and nn.functional: part 5 docs (#12161)
      
      * Add video links to the documentation (#12162)
      
      * [Flax generate] Add params to generate (#12171)
      
      * fix_torch_device_generate_test
      
      * remove @
      
      * add params as input
      
      * finish
      
      * Use a released version of optax rather than installing from Git. (#12173)
      
      Use a released version of optax rather than installing from Git
      
      * Have dummy processors have a `from_pretrained` method (#12145)
      
      * Add course banner (#12157)
      
      * Add course banner
      
      * Update course banner
      
      * Adjust banner width
      
      * Enable add_prefix_space if model_type is roberta or gpt2 (#12116)
      
      * Update AutoModel classes in summarization example (#12178)
      
      - Convert use of deprecated AutoModelWithLMHead to AutoModelForSeq2SeqLM
      - Add newly required `truncation=True` to `tokenizer.encode` with `max_length`
      
      This silences all warnings.
      
      * Ray Tune Integration Updates (#12134)
      
      * fix
      
      * fixes
      
      * add back to scheduled tests
      
      * formatting
      
      * Update integrations.py
      
      * [testing] ensure concurrent pytest workers use a unique port for torch.dist (#12166)
      
      * ensure concurrent pytest workers use a unique port for torch.distributed.launch
      
      * reword
      
      * Model card defaults (#12122)
      
      * [WIP] Model card defaults
      
      * finetuned_from default value
      
      * Add all mappings to the mapping file
      
      * Be more defensive on finetuned_from arg
      
      * Add default task tag
      
      * Separate tags from tasks
      
      * Edge case for dataset
      
      * Apply suggestions from code review
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Temporarily deactivate torch-scatter while we wait for new release (#12181)
      
      * Temporarily deactivate torch-scatter while we wait for new release
      
      * torch-1.8.1 binary for scatter
      
      * Revert to 1.8.0
      
      * Pin torch dependency
      
      * torchaudio and torchvision
      
      * Temporarily deactivate torchhub test (#12184)
      
      * [Flax] Add Beam Search (#12131)
      
      * fix_torch_device_generate_test
      
      * remove @
      
      * push new logit processors
      
      * add processors
      
      * save first working version
      
      * save intermediate
      
      * finish
      
      * make style
      
      * make fix-copies
      
      * finish
      
      * Update tests/test_modeling_flax_bart.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSuraj Patil <surajp815@gmail.com>
      Co-authored-by: default avatarPatrick von Platen <patrick@huggingface.co>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarSuraj Patil <surajp815@gmail.com>
      
      * Hubert (#11889)
      
      * fix_torch_device_generate_test
      
      * remove @
      
      * add hubert
      
      * add first test file
      
      * more docs
      
      * fix bugs
      
      * fix bug
      
      * finish
      
      * finish
      
      * finish docstring
      
      * fix
      
      * fix
      
      * finalize
      
      * add to ignored
      
      * finish
      
      * Apply suggestions from code review
      
      * correct naming
      
      * finish
      
      * fix auto config
      
      * finish
      
      * correct convert script
      
      * Apply suggestions from code review
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      Co-authored-by: default avatarSuraj Patil <surajp815@gmail.com>
      
      * apply suggestions lysandre & suraj
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      Co-authored-by: default avatarSuraj Patil <surajp815@gmail.com>
      
      * updated DLC images and sample notebooks (#12191)
      
      * Enabling AutoTokenizer for HubertConfig. (#12198)
      
      * Use yaml to create metadata (#12185)
      
      * Use yaml to create metadata
      
      * Fix typo
      
      * Remove pin
      
      * [Docs] fixed broken link (#12205)
      
      * fixed broken link
      
      * Update docs/source/benchmarks.rst
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update docs/source/benchmarks.rst
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Pipeline update & tests (#12207)
      
      * Improve detr (#12147)
      
      * Remove unused variables
      
      * Improve docs
      
      * Fix docs of segmentation masks
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Add link to the course (#12229)
      
      * Support for torch 1.9.0 (#12224)
      
      * Support for torch 1.9.0
      
      * Torch scatter for 1.9.0
      
      * Github Actions run on 1.9.0
      
      * fix pt-1.9.0 `add_` deprecation (#12217)
      
      * fix pt-1.9.0 add_ deprecation
      
      * add () for clarity
      
      * Trigger CI
      
      * require_version(torch
      
      * Release: v4.7.0
      
      * Docs for v4.8.0
      
      * AutoTokenizer: infer the class from the tokenizer config if possible (#12208)
      
      * AutoTokenizer: infer the class from the tokenizer config if possible
      
      * Add tests
      
      * Update src/transformers/models/auto/tokenization_auto.py
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * update desc for map in all examples (#12226)
      
      * update desc for map in all examples
      
      * added plm
      
      * suggestions
      
      * [Flax] FlaxAutoModelForSeq2SeqLM (#12228)
      
      * add FlaxAutoModelForSeq2SeqLM
      
      * [FlaxBart] few small fixes (#12247)
      
      * boom boom
      
      * remove flax clip example
      
      * few small fixes
      
      * Depreciate pythonic Mish and support PyTorch 1.9 version of Mish (#12240)
      
      * Moved Mish to Torch 1.9 version
      
      * Run black formatting
      
      * [t5 doc] make the example work out of the box (#12239)
      
      * [run_clm.py] restore caching
      
      * style
      
      * [t5 doc] make the example work out of the box
      
      This PR expands the training example to include the correct model type for the example to work, e.g. with `T5Model` this example will break.
      
      * Update docs/source/model_doc/t5.rst
      Co-authored-by: default avatarSuraj Patil <surajp815@gmail.com>
      
      * expand the other example
      Co-authored-by: default avatarSuraj Patil <surajp815@gmail.com>
      
      * Fix the scheduled CI
      
      * Better CI feedback (#12279)
      
      * Better run ID
      
      * Only part of CI
      
      * Revert "Only part of CI"
      
      This reverts commit 29f7f248d21e0f5792e0670ba8705b31ad8967b7.
      
      * Fix for making student ProphetNet for Seq2Seq Distillation (#12130)
      
      * make_student.py: fix to make student ProphetNet
      
      * reformat
      
      * [FlaxClip] fix test from/save pretrained test (#12284)
      
      * boom boom
      
      * remove flax clip example
      
      * fix from_save_pretrained
      
      * [Flax] [WIP] allow loading head model with base model weights (#12255)
      
      * boom boom
      
      * remove flax clip example
      
      * allow loading head model with base model weights
      
      * add test
      
      * fix imports
      
      * disable save, load test for clip
      
      * add test_save_load_to_base
      
      * [DeepSpeed] don't ignore --adafactor (#12257)
      
      * [Flax] Fix flax test save pretrained (#12256)
      
      * fix_torch_device_generate_test
      
      * remove @
      
      * fix flax save pretrained test
      
      * Tensorflow QA example (#12252)
      
      * New Tensorflow QA example!
      
      * Style pass
      
      * Updating README.md for the new example
      
      * flake8 fixes
      
      * Update examples/tensorflow/question-answering/README.md
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * [Flax] Add jax flax to env command (#12251)
      
      * fix_torch_device_generate_test
      
      * remove @
      
      * add commands for flax/jax
      
      * reset report_to to none, avoid deprecation warning (#12293)
      
      * [trainer + examples] set log level from CLI (#12276)
      
      * set log level from CLI
      
      * add log_level_replica + test + extended docs
      
      * cleanup
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * rename datasets objects to allow datasets module
      
      * improve the doc
      
      * style
      
      * doc improve
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * [tests] multiple improvements (#12294)
      
      * [tests] multiple improvements
      
      * cleanup
      
      * style
      
      * todo to investigate
      
      * fix
      
      * Fix for the issue of device-id getting hardcoded for token_type_ids during Tracing [WIP] (#11252)
      
      * registering a buffer for token_type_ids, to pass the error of device-id getting hardcoded when tracing
      
      * sytle format
      
      * adding persistent flag to the resgitered buffers that prevent from adding them to the state_dict and addresses the Backward compatibility issue
      
      * adding the try catch to the fix as persistent flag is only available from PT >1.6
      
      * adding version check
      
      * added the condition to only use the token_type_ids buffer when its autogenerated not passed by user
      
      * adding comments and making the conidtion where token_type_ids are None to use the registered buffer
      
      * taking out position-embeddding from the if block
      
      * adding comments
      
      * handling the case if buffer for position_ids was not registered
      
      * reverted the changes on position_ids, fix the issue with size of token_type_ids buffer, moved the modification for generated token_type_ids to Bertmodel, instead of Embeddings
      
      * reverting the token_type_ids in case of None to the previous version
      
      * reverting changes on position_ids adding back the if block
      
      * changes added by running make fix-copies
      
      * changes added by running make fix-copies and added the import version as it was getting used
      
      * changes added by running make fix-copies
      
      * changes added by running make fix-copies
      
      * fixing the import format
      
      * fixing the import format
      
      * modified to use temp tensor for trimed and expanded token_type_ids buffer
      
      * changes made by fix-copies after temp tensor modifications
      
      * changes made by fix-copies after temp tensor modifications
      
      * changes made by fix-copies after temp tensor modifications
      
      * clean up
      
      * clean up
      
      * clean up
      
      * clean up
      
      * Nit
      
      * Nit
      
      * Nit
      
      * modified according to support device conversion on traced models
      
      * modified according to support device conversion on traced models
      
      * modified according to support device conversion on traced models
      
      * modified according to support device conversion on traced models
      
      * changes based on latest in master
      
      * Adapt templates
      
      * Add version import
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-32-81.us-west-2.compute.internal>
      Co-authored-by: default avatarLysandre <lysandre.debut@reseau.eseo.fr>
      
      * trainer_tf: adjust wandb installation command (#12291)
      
      * add FlaxAutoModelForImageClassification in main init (#12298)
      
      * Fix and improve documentation for LEDForConditionalGeneration (#12303)
      
      * Replace conditional generation example (fixes #12268)
      
      * Replace model in summarization example with finetuned checkpoint, adapt example text
      
      * Fix typo in new summarization example
      
      * Fix docstring formatting, add missing import statement to example
      
      * [Flax] Main doc for event orga (#12305)
      
      * fix_torch_device_generate_test
      
      * remove @
      
      * push
      
      * finish
      
      * some typos
      
      * add more info on communication
      
      * add suggestions
      
      * [trainer] 2 bug fixes and a rename (#12309)
      
      * bug fixes and a rename
      
      * add extended DDP test
      
      * FlaxBartPretrainedModel -> FlaxBartPreTrainedModel (#12313)
      
      * [docs]  performance  (#12258)
      
      * initial performance document
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * rewrites based on suggestions
      
      * 8x multiple is for AMP only
      
      * add contribute section
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Add CodeCarbon Integration (#12304)
      
      * Add optional dependency
      
      * Add CodeCarbon integration
      
      * Add CodeCarbon integration
      
      * Add CodeCarbon integration
      
      * typo
      
      * Optimizing away the `fill-mask` pipeline. (#12113)
      
      * Optimizing away the `fill-mask` pipeline.
      
      - Don't send anything to the tokenizer unless needed. Vocab check is
      much faster
      - Keep BC by sending data to the tokenizer when needed. User handling warning messages will see performance benefits again
      - Make `targets` and `top_k` work together better `top_k` cannot be
      higher than `len(targets)` but can be smaller still.
      - Actually simplify the `target_ids` in case of duplicate (it can happen
      because we're parsing raw strings)
      - Removed useless code to fail on empty strings. It works only if empty
      string is in first position, moved to ignoring them instead.
      - Changed the related tests as only the tests would fail correctly
      (having incorrect value in first position)
      
      * Make tests compatible for 2 different vocabs... (at the price of a
      warning).
      
      Co-authored-by: @EtaoinWu
      
      * ValueError working globally
      
      * Update src/transformers/pipelines/fill_mask.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * `tokenizer.vocab` -> `tokenizer.get_vocab()` for more compatiblity +
      fallback.
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Add output in a dictionary for TF `generate` method (#12139)
      
      * Add output args to greedy search
      
      * Fix critical typo + make style quality
      
      * Handle generate_beam_search
      
      * Add dict_specific tests and fix the placement of encoder outputs
      
      * Add  specific outputs
      
      * Update doc
      
      * Fix typo
      
      * Adjust handling encoder_outputs + Fix generating for T5
      
      * Fix generate for RAG
      
      * Fix handling ouptut_attentions when target_mapping is not None
      
      Take care of situations when target_mapping is provided
      as there are 2-tuple of attentions
      
      Change from:
      if inputs["output_attentions"]:
          attentions = tuple(tf.transpose(t, perm(2, 3, 0, 1)) for t in attentions)
      
      to:
      if inputs["output_attentions"]:
          if inputs["target_mapping"] is not None:
              # when target_mapping is provided, there are 2-tuple of attentions
               attentions = tuple(
                   tuple(tf.transpose(attn_stream, perm=(2, 3, 0, 1)) for attn_stream in t) for t in attentions
              )
          else:
              attentions = tuple(tf.transpose(t, perm=(2, 3, 0, 1)) for t in attentions)
      
      * Rename kwargs to model_kwargs
      
      * make style quality
      
      * Move imports in test_modeling_tf_common.py
      
      Move ModelOutput-related imports in test_modeling_tf_common.py
      into the `is_tf_available():` statement.
      
      * Rewrite nested if-statements
      
      * Fix added tests
      
      * Flax summarization script  (#12230)
      
      * add summrization script
      
      * fix arguments, preprocessing, metrics
      
      * add generation and metrics
      
      * auto model, prediction loop
      
      * prettify
      
      * label smoothing
      
      * adress Sylvain and Patricks suggestions
      
      * dynamically import shift_tokens_right
      
      * fix shift_tokens_right_fn call
      
      * Rewrite ProphetNet to adapt converting ONNX friendly (#11981)
      
      * Rewrite
      
      * [ONNX] rewrite
      
      * Flax T5 (#12150)
      
      * copy pytorch-t5
      
      * init
      
      * boom boom
      
      * forward pass same
      
      * make generation work
      
      * add more tests
      
      * make test work
      
      * finish normal tests
      
      * make fix-copies
      
      * finish quality
      
      * correct slow example
      
      * correct slow test
      
      * version table
      
      * upload models
      
      * Update tests/test_modeling_flax_t5.py
      
      * correct incorrectly deleted line
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarPatrick von Platen <patrick@huggingface.co>
      
      * Add mention of the huggingface_hub methods for offline mode (#12320)
      
      * [Flax/JAX] Add how to propose projects markdown (#12311)
      
      * fix_torch_device_generate_test
      
      * remove @
      
      * finish
      
      * make style
      
      * [TFWav2Vec2] Fix docs (#12283)
      
      * fix error
      
      * make style check happy
      Co-authored-by: default avatarchenhaitao <chenhaitao@qiyi.com>
      
      * Clean push to hub API (#12187)
      
      * Clean push to hub API
      
      * Create working dir if it does not exist
      
      * Different tweak
      
      * New API + all models + test Flax
      
      * Adds the Trainer clean up
      
      * Update src/transformers/file_utils.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Address review comments
      
      * (nit) output types
      
      * No need to set clone_from when folder exists
      
      * Update src/transformers/trainer.py
      Co-authored-by: default avatarJulien Chaumond <julien@huggingface.co>
      
      * Add generated_from_trainer tag
      
      * Update to new version
      
      * Fixes
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      Co-authored-by: default avatarJulien Chaumond <julien@huggingface.co>
      Co-authored-by: default avatarLysandre <lysandre.debut@reseau.eseo.fr>
      
      * Add all XxxPreTrainedModel to the main init (#12314)
      
      * Add all XxxPreTrainedModel to the main init
      
      * Add to template
      
      * Add to template bis
      
      * Add FlaxT5
      
      * Conda build (#12323)
      
      * Temporarily revert the `fill-mask` improvements.
      
      * changed modeling_fx_utils.py to utils/fx.py for clarity (#12326)
      Co-authored-by: default avatarMichael Benayoun <michael@huggingface.co>
      
      * Pin good version of huggingface_hub
      
      * [Flax T5] Fix weight initialization and fix docs (#12327)
      
      * finish t5 flax fixes
      
      * improve naming
      
      * Release: v4.8.0
      
      * v4.9.0.dev0
      
      * Update training_args.py (#12328)
      
      mention in `save_strategy` param description that `load_best_model_at_end` can override
      
      * [Deepspeed] new docs (#12077)
      
      * document sub_group_size
      
      * style
      
      * install + issues reporting
      
      * style
      
      * style
      
      * Update docs/source/main_classes/deepspeed.rst
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * indent 4
      
      * restore
      
      * style
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Fix default to logging_dir lost in merge conflict
      
      * try-this (#12338)
      Signed-off-by: default avatarRichard Liaw <rliaw@berkeley.edu>
      
      * [examples/Flax] move the examples table up (#12341)
      
      * Fix torchscript tests (#12336)
      
      * Fix torchscript tests
      
      * Better test
      
      * Remove bogus print
      
      * Document patch release v4.8.1
      
      * Add flax/jax quickstart (#12342)
      
      * Update README.md
      
      * fixed typo (#12356)
      
      * Fix exception in prediction loop occurring for certain batch sizes (#12350)
      
      * fix distributed_concat for scalar outputs
      
      * Update README.md
      
      * fixed typo (#12356)
      
      * simplify fix with terser syntax
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Trigger CI
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarmichal pitr <21157924+MichalPitr@users.noreply.github.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Add FlaxBigBird QuestionAnswering script (#12233)
      
      * port bigbird script
      
      * adapt script a bit
      
      * change location
      
      * adapt more
      
      * save progress
      
      * init commit
      
      * style
      
      * dataset script tested
      
      * readme add
      
      * Replace NotebookProgressReporter by ProgressReporter in Ray Tune run (#12357)
      
      * Replace NotebookProgressReporter by ProgressReporter in Ray Tune run
      
      * Move to local import
      
      * Style
      
      * remove extra white space from log format (#12360)
      
      * fixed multiplechoice tokenization (#12362)
      
      * fixed multiplechoice tokenization
      
      The model would have seen two sequences:
      1. [CLS]prompt[SEP]prompt[SEP]
      2. [CLS]choice0[SEP]choice1[SEP]
      that is not correct as we want a contextualized embedding of prompt and choice
      
      * removed outer brackets for proper sequence generation
      
      * [trainer] add main_process_first context manager (#12351)
      
      * main_process_first context manager
      
      * handle multi-node, add context description
      
      * sync desc
      
      * [Examples] Replicates the new --log_level feature to all trainer-based pytorch (#12359)
      
      * added log_level
      
      * fix comment
      
      * fixed log_level
      
      * Trigger CI
      
      * Unfied logging
      
      * simplified args for log_level
      
      * updated example template (#12365)
      
      * replace print with logger (#12368)
      
      * [Documentation] Warn that DataCollatorForWholeWordMask is limited to BertTokenizer-like tokenizers (#12371)
      
      * Notify users that DataCollatorForWholeWordMask is limited to BertTokenier-like tokenizers
      
      * Fix code formatting
      
      * Update run_mlm.py (#12344)
      
      Before the code could not be used for validation only because of this line:
      extension = data_args.train_file.split(".")[-1]
      was assuming that extension must be extracted from the training dataset. This line would run regardless of the training or validation options of the user. This would lead to an error if the user only wants to run an evaluation only and does not want to do train (because the training file does not exist). I modified it to extract extension from the training file if the user wants to do train and extract it from the validation file if the user wants to run eval. This way the code can be used for both training and validation separately.
      
      * Add possibility to maintain full copies of files (#12312)
      
      * [CI] add dependency table sync verification (#12364)
      
      * add dependency table sync verification
      
      * improve the message
      
      * improve the message
      
      * revert
      
      * ready to merge
      
      * [Examples] Added context manager to datasets map (#12367)
      
      * added cotext manager to datasets map
      
      * fixed style and spaces
      
      * fixed warning of deprecation
      
      * changed desc
      
      * [Flax community event] Add more description to readme (#12398)
      
      * fix_torch_device_generate_test
      
      * remove @
      
      * boom boom
      
      * correct typos
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSuraj Patil <surajp815@gmail.com>
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSuzana Ilić <io.suzanai@gmail.com>
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSuraj Patil <surajp815@gmail.com>
      Co-authored-by: default avatarSuzana Ilić <io.suzanai@gmail.com>
      
      * Update README.md
      
      * Fix copies
      
      * Remove the need for `einsum` in Albert's attention computation (#12394)
      
      * debug albert einsum
      
      * Fix matmul computation
      
      * Let's use torch linear layer.
      
      * Style.
      
      * [Flax] Adapt flax examples to include `push_to_hub` (#12391)
      
      * fix_torch_device_generate_test
      
      * remove @
      
      * finish
      
      * correct summary writer
      
      * correct push to hub
      
      * fix indent
      
      * finish
      
      * finish
      
      * finish
      
      * finish
      
      * finish
      Co-authored-by: default avatarPatrick von Platen <patrick@huggingface.co>
      
      * Tensorflow LM examples (#12358)
      
      * Tensorflow MLM example
      
      * Add CLM example
      
      * Style fixes, adding missing checkpoint code from the CLM example
      
      * Fix TPU training, avoid massive dataset warnings
      
      * Fix incorrect training length calculation for multi-GPU training
      
      * Fix incorrect training length calculation for multi-GPU training
      
      * Refactors and nitpicks from the review
      
      * Style pass
      
      * Adding README
      
      * pass the matching trainer log level to deepspeed (#12401)
      
      * [Flax] Add T5 pretraining script (#12355)
      
      * fix_torch_device_generate_test
      
      * remove @
      
      * add length computatan
      
      * finish masking
      
      * finish
      
      * upload
      
      * fix some bugs
      
      * finish
      
      * fix dependency table
      
      * correct tensorboard
      
      * Apply suggestions from code review
      
      * correct processing
      
      * slight change init
      
      * correct some more mistakes
      
      * apply suggestions
      
      * improve readme
      
      * fix indent
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSaulLu <55560583+SaulLu@users.noreply.github.com>
      
      * correct tokenizer
      
      * finish
      
      * finish
      
      * finish
      
      * finish
      Co-authored-by: default avatarPatrick von Platen <patrick@huggingface.co>
      Co-authored-by: default avatarSaulLu <55560583+SaulLu@users.noreply.github.com>
      
      * [models] respect dtype of the model when instantiating it (#12316)
      
      * [models] respect dtype of the model when instantiating it
      
      * cleanup
      
      * cleanup
      
      * rework to handle non-float dtype
      
      * fix
      
      * switch to fp32 tiny model
      
      * improve
      
      * use dtype.is_floating_point
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * fix the doc
      
      * recode to use explicit torch_dtype_auto_detect, torch_dtype args
      
      * docs and tweaks
      
      * docs and tweaks
      
      * docs and tweaks
      
      * merge 2 args, add docs
      
      * fix
      
      * fix
      
      * better doc
      
      * better doc
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Rename detr targets to labels (#12280)
      
      * Rename target to labels in DetrFeatureExtractor
      
      * Update DetrFeatureExtractor tests accordingly
      
      * Improve docs of DetrFeatureExtractor
      
      * Improve docs
      
      * Make style
      
      * Add out of vocabulary error to ASR models (#12288)
      
      * Add OOV error to ASR models
      
      * Feedback changes
      
      * Fix TFWav2Vec2 SpecAugment (#12289)
      
      * Fix TFWav2Vec2 SpecAugment
      
      * Invert masks
      
      * Feedback changes
      
      * [example/flax] add summarization readme (#12393)
      
      * add readme
      
      * update readme and add requirements
      
      * Update examples/flax/summarization/README.md
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * [Flax] Example scripts - correct weight decay  (#12409)
      
      * fix_torch_device_generate_test
      
      * remove @
      
      * finish
      
      * finish
      
      * correct style
      
      * fix ids_to_tokens naming error in tokenizer of deberta v2 (#12412)
      Co-authored-by: default avatarJipeng Huang <jihuan@microsoft.com>
      
      * minor fixes in original RAG training (#12395)
      
      * Added talks (#12415)
      
      * Easily train a new fast tokenizer from a given one (#12361)
      
      * [WIP] Easily train a new fast tokenizer from a given one
      
      * Fix test
      
      * Roll out to other tokenizers and add tests
      
      * Fix bug with unk id and add emoji to test
      
      * Really use something different in test
      
      * Implement special tokens map
      
      * Map special tokens in the Transformers tokenizers
      
      * Fix test
      
      * Make test more robust
      
      * Fix test for BPE
      
      * More robust map and test
      
      Co-authored-by SaulLu
      
      * Test file
      
      * Stronger tests
      Co-authored-by: default avatarSaulLu <lucilesaul.com@gmail.com>
      
      * Map unk token for Wordpiece and address review comment
      
      * Fix lowercase test and address review comment
      
      * Fix all tests
      
      * Simplify test
      
      * Fix tests for realsies
      
      * Easily train a new fast tokenizer from a given one - tackle the special tokens format (str or AddedToken) (#12420)
      
      * Propose change in tests regarding lower case
      
      * add new test for special tokens types
      
      * put back the test part about decoding
      
      * add feature: the AddedToken is re-build with the different mapped content
      
      * Address review comment: simplify AddedToken building
      Co-authored-by: default avatarsgugger <sylvain.gugger@gmail.com>
      
      * Update src/transformers/tokenization_utils_fast.py
      Co-authored-by: default avatarsgugger <sylvain.gugger@gmail.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarSaulLu <lucilesaul.com@gmail.com>
      Co-authored-by: default avatarSaulLu <55560583+SaulLu@users.noreply.github.com>
      
      * [modelcard] fix (#12422)
      
      this PR is fixing an incorrect attribute - probably some tests are needed?
      
      * Add option to save on each training node (#12421)
      
      * Add option to save on each training node
      
      * Apply suggestions from code review
      Co-authored-by: default avatarStas Bekman <stas00@users.noreply.github.com>
      
      * Address review comments
      Co-authored-by: default avatarStas Bekman <stas00@users.noreply.github.com>
      
      * Added to talks section (#12433)
      
      Added one more confirmed speaker, zoom links and gcal event links
      
      * Fix default bool in argparser (#12424)
      
      * Fix default bool in argparser
      
      * Add more to test
      
      * Add default bos_token and eos_token for tokenizer of deberta_v2 (#12429)
      
      * fix ids_to_tokens naming error in tokenizer of deberta v2
      
      * Update tokenization_deberta_v2.py
      
      Add bos_token and eos_token.
      
      * format code
      Co-authored-by: default avatarJipeng Huang <jihuan@microsoft.com>
      
      * Add CANINE (#12024)
      
      * First pass
      
      * More progress
      
      * Add support for local attention
      
      * More improvements
      
      * More improvements
      
      * Conversion script working
      
      * Add CanineTokenizer
      
      * Make style & quality
      
      * First draft of integration test
      
      * Remove decoder test
      
      * Improve tests
      
      * Add documentation
      
      * Mostly docs improvements
      
      * Add CanineTokenizer tests
      
      * Fix most tests on GPU, improve upsampling projection
      
      * Address most comments by @dhgarrette
      
      * Remove decoder logic
      
      * Improve Canine tests, improve docs of CanineConfig
      
      * All tokenizer tests passing
      
      * Make fix-copies and fix tokenizer tests
      
      * Fix test_model_outputs_equivalence test
      
      * Apply suggestions from @sgugger's review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Address some more comments
      
      * Add support for hidden_states and attentions of shallow encoders
      
      * Define custom CanineModelOutputWithPooling, tests pass
      
      * First pass
      
      * More progress
      
      * Add support for local attention
      
      * More improvements
      
      * More improvements
      
      * Conversion script working
      
      * Add CanineTokenizer
      
      * Make style & quality
      
      * First draft of integration test
      
      * Remove decoder test
      
      * Improve tests
      
      * Add documentation
      
      * Mostly docs improvements
      
      * Add CanineTokenizer tests
      
      * Fix most tests on GPU, improve upsampling projection
      
      * Address most comments by @dhgarrette
      
      * Remove decoder logic
      
      * Improve Canine tests, improve docs of CanineConfig
      
      * All tokenizer tests passing
      
      * Make fix-copies and fix tokenizer tests
      
      * Fix test_model_outputs_equivalence test
      
      * Apply suggestions from @sgugger's review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Address some more comments
      
      * Make conversion script work for Canine-c too
      
      * Fix tokenizer tests
      
      * Remove file
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Document patch release v4.8.2
      
      * fix typo in mt5 configuration docstring (#12432)
      
      * Add to talks section (#12442)
      
      * [JAX/Flax readme] add philosophy doc (#12419)
      
      * add philosophy doc
      
      * fix typos
      
      * update doc
      
      * Apply suggestions from code review
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * address Patricks suggestions
      
      * add a training example and fix typos
      
      * jit the training step
      
      * jit train step
      
      * fix example code
      
      * typo
      
      * Apply suggestions from code review
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * [Flax] Add wav2vec2 (#12271)
      
      * fix_torch_device_generate_test
      
      * remove @
      
      * start flax wav2vec2
      
      * save intermediate
      
      * forward pass has correct shape
      
      * add weight norm
      
      * add files
      
      * finish ctc
      
      * make style
      
      * finish gumbel quantizer
      
      * correct docstrings
      
      * correct some more files
      
      * fix vit
      
      * finish quality
      
      * correct tests
      
      * correct docstring
      
      * correct tests
      
      * start wav2vec2 pretraining script
      
      * save intermediate
      
      * start pretraining script
      
      * finalize pretraining script
      
      * finish
      
      * finish
      
      * small typo
      
      * finish
      
      * correct
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarSuraj Patil <surajp815@gmail.com>
      
      * make style
      
      * push
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarSuraj Patil <surajp815@gmail.com>
      
      * Add missing Copied from statements
      
      * Reference model uploaded under Google org
      
      * Fix various duplicates from merging
      
      * Rembert-large -> rembert, fix overeager Copied from, return type
      
      * Incorporate PR comments from Patrick and Sylvain
      Co-authored-by: default avatarctheodoris <seanymphoceana@yahoo.com>
      Co-authored-by: default avatarctheodoris <cvtheodo@ds.dfci.harvard.edu>
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarStas Bekman <stas00@users.noreply.github.com>
      Co-authored-by: default avatarTeven <teven.lescao@gmail.com>
      Co-authored-by: default avatarNick Lane-Smith <nlanesmith@gmail.com>
      Co-authored-by: default avatarShiro T <stsuchi@users.noreply.github.com>
      Co-authored-by: default avatarWang Ran (汪然) <wrran@outlook.com>
      Co-authored-by: default avatarAhmet Akkoç <themadprogramer@gmail.com>
      Co-authored-by: default avatarfrancescorubbo <francescorubbo@users.noreply.github.com>
      Co-authored-by: default avatarDaniel Stancl <46073029+stancld@users.noreply.github.com>
      Co-authored-by: default avatartalkhaldi <tareq.alkhaldi@gmail.com>
      Co-authored-by: default avatarjoerenner <joepeterrenner@gmail.com>
      Co-authored-by: default avatarjrenner <joseph.renner@inria.fr>
      Co-authored-by: default avatarAvital Oliver <avitalo@google.com>
      Co-authored-by: default avatarPatrick von Platen <patrick@huggingface.co>
      Co-authored-by: default avatarJosh Tanner <mindful.jt@gmail.com>
      Co-authored-by: default avatarNicolas Patry <patry.nicolas@protonmail.com>
      Co-authored-by: default avatarBhadresh Savani <bhadreshpsavani@gmail.com>
      Co-authored-by: default avatarJayendra <jayendra0parmar@gmail.com>
      Co-authored-by: default avatarjayendra <jayendra@infocusp.in>
      Co-authored-by: default avatarLysandre <lysandre.debut@reseau.eseo.fr>
      Co-authored-by: default avatarPhilip May <philip@may.la>
      Co-authored-by: default avatarNicholas Vadivelu <nicholas.vadivelu@gmail.com>
      Co-authored-by: default avatarSuraj Patil <surajp815@gmail.com>
      Co-authored-by: default avatarShamane Siri <shamane@ahlab.org>
      Co-authored-by: default avatarQuentin Lhoest <42851186+lhoestq@users.noreply.github.com>
      Co-authored-by: default avatarFan Zhang <zhangfan.tju@gmail.com>
      Co-authored-by: default avatarRiccardo Bassani <48254418+BassaniRiccardo@users.noreply.github.com>
      Co-authored-by: default avatarVolodymyr Byno <volodymyr.byno@gmail.com>
      Co-authored-by: default avatarJeoung-Minju <51041861+JminJ@users.noreply.github.com>
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      Co-authored-by: default avatarAlberto Villa <a.villa.diez@gmail.com>
      Co-authored-by: default avatardependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
      Co-authored-by: default avatarGunjan Chhablani <chhablani.gunjan@gmail.com>
      Co-authored-by: default avatarKou Yong Kang <kou.yongkang@dhs.sg>
      Co-authored-by: default avatarShiva Pundir <36535845+ceevaaa@users.noreply.github.com>
      Co-authored-by: default avatarFrançois Lagunas <francois.lagunas@gmail.com>
      Co-authored-by: default avatarPeter Izsak <232524+peteriz@users.noreply.github.com>
      Co-authored-by: default avatarRussell Klopfer <russell@klopfer.us>
      Co-authored-by: default avatarMario Šaško <mariosasko777@gmail.com>
      Co-authored-by: default avatarcdleong <4109253+cdleong@users.noreply.github.com>
      Co-authored-by: default avatarKoichi Yasuoka <yasuoka@kanji.zinbun.kyoto-u.ac.jp>
      Co-authored-by: default avatarAnton Lozhkov <aglozhkov@gmail.com>
      Co-authored-by: default avatarkumapo <kumapo@users.noreply.github.com>
      Co-authored-by: default avatarTobias Norlund <tobias@norlund.se>
      Co-authored-by: default avatarMatt <Rocketknight1@users.noreply.github.com>
      Co-authored-by: default avatarSylvain Gugger <sylvain.gugger@gmail.com>
      Co-authored-by: default avatarBhavitvya Malik <bhavitvya.malik@gmail.com>
      Co-authored-by: default avatarJonathan Chang <31893406+cccntu@users.noreply.github.com>
      Co-authored-by: default avatarGuido Novati <16716298+novatig@users.noreply.github.com>
      Co-authored-by: default avatarGuido Novati <gnovati@nvidia.com>
      Co-authored-by: default avatarSaulLu <55560583+SaulLu@users.noreply.github.com>
      Co-authored-by: default avatarNicholas Broad <nbroad94@gmail.com>
      Co-authored-by: default avatarNicholas Broad <nicholas@nmbroad.com>
      Co-authored-by: default avatarKumar Abhishek <kr.abhish@gmail.com>
      Co-authored-by: default avatarKumar Abhishek <kabhishek@expedia.com>
      Co-authored-by: default avatarWill Rice <will@spokestack.io>
      Co-authored-by: default avatarVasudev Gupta <7vasudevgupta@gmail.com>
      Co-authored-by: default avatarKilian Kluge <32523967+ionicsolutions@users.noreply.github.com>
      Co-authored-by: default avatarAmog Kamsetty <amogkam@users.noreply.github.com>
      Co-authored-by: default avatarPhilipp Schmid <32632186+philschmid@users.noreply.github.com>
      Co-authored-by: default avatarXa9aX ツ <mishradiganta91@gmail.com>
      Co-authored-by: default avatarVishal Burman <vishal.a.burman23@gmail.com>
      Co-authored-by: default avatarHamid Shojanazeri <hamid.nazeri2010@gmail.com>
      Co-authored-by: default avatarUbuntu <ubuntu@ip-172-31-32-81.us-west-2.compute.internal>
      Co-authored-by: default avatarStefan Schweter <stefan@schweter.it>
      Co-authored-by: default avatarKevin Canwen Xu <canwenxu@126.com>
      Co-authored-by: default avatarDavid Fan <30608893+jiafatom@users.noreply.github.com>
      Co-authored-by: default avatarchenht2010 <chenht2010@yahoo.com>
      Co-authored-by: default avatarchenhaitao <chenhaitao@qiyi.com>
      Co-authored-by: default avatarJulien Chaumond <julien@huggingface.co>
      Co-authored-by: default avatarMichael Benayoun <mickbenayoun@gmail.com>
      Co-authored-by: default avatarMichael Benayoun <michael@huggingface.co>
      Co-authored-by: default avatarSam Havens <47401552+sam-qordoba@users.noreply.github.com>
      Co-authored-by: default avatarRichard Liaw <rliaw@berkeley.edu>
      Co-authored-by: default avatarMarc van Zee <marcvanzee@gmail.com>
      Co-authored-by: default avatarmichal pitr <21157924+MichalPitr@users.noreply.github.com>
      Co-authored-by: default avatarjglaser <glaserj@ornl.gov>
      Co-authored-by: default avatarKai Fricke <krfricke@users.noreply.github.com>
      Co-authored-by: default avatarcronoik <johannes.schaffrath@mail.de>
      Co-authored-by: default avatarTaha ValizadehAslani <47432410+TahaAslani@users.noreply.github.com>
      Co-authored-by: default avatarSuzana Ilić <io.suzanai@gmail.com>
      Co-authored-by: default avatarFuntowicz Morgan <mfuntowicz@users.noreply.github.com>
      Co-authored-by: default avatarWill Rice <wrice20@gmail.com>
      Co-authored-by: default avatarJabin Huang <huangjipengnju@gmail.com>
      Co-authored-by: default avatarJipeng Huang <jihuan@microsoft.com>
      Co-authored-by: default avatarSaulLu <lucilesaul.com@gmail.com>
      Co-authored-by: default avatarfcakyon <34196005+fcakyon@users.noreply.github.com>
      434022ad
  15. 09 Jul, 2021 1 commit
    • Will Rice's avatar
      Add TFHubertModel (#12206) · fb65f65e
      Will Rice authored
      * TFHubert
      
      * Update with TFWav2Vec Bug Fixes
      
      * Add OOV Error
      
      * Feedback changes
      
      * Fix kwargs call
      fb65f65e
  16. 23 Jun, 2021 1 commit
  17. 14 Jun, 2021 1 commit
    • Will Rice's avatar
      Adding TFWav2Vec2Model (#11617) · d438eee0
      Will Rice authored
      
      
      * [WIP] Add TFWav2Vec2Model
      
      Work in progress for adding a tensorflow version of Wav2Vec2
      
      * feedback changes
      
      * small fix
      
      * Test Feedback Round 1
      
      * Add SpecAugment and CTC Loss
      
      * correct spec augment mask creation
      
      * docstring and correct copyright
      
      * correct bugs
      
      * remove bogus file
      
      * finish tests correction
      
      * del unnecessary layers
      
      * Update src/transformers/models/wav2vec2/modeling_tf_wav2vec2.py
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * make style
      
      * correct final bug
      
      * Feedback Changes
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      d438eee0
  18. 11 Jun, 2021 1 commit
  19. 20 May, 2021 1 commit
  20. 07 Apr, 2021 1 commit
  21. 25 Mar, 2021 1 commit
    • Amir Tahmasbi's avatar
      Layout lm tf 2 (#10636) · 4684bfc7
      Amir Tahmasbi authored
      
      
      * Added embeddings layer
      
      * Added layoutlm layers, main model, maskedlm and token classification classes
      
      * Added model classes to tf auto models
      
      * Added model to PT to TF conversion script
      
      * Added model to doc README
      
      * Added tests
      
      * Removed unused imports
      
      * Added layoutlm model, test, and doc for sequence classification, and fix imports in __init__.py
      
      * Made tests pass!
      
      * Fixed typos in imports and docs
      
      * Fixed a typo in embeddings layer
      
      * Removed imports
      
      * Fixed formatting issues, imports, tests
      
      * Added layoutlm layers, main model, maskedlm and token classification classes
      
      * Added model classes to tf auto models
      
      * Added model to PT to TF conversion script
      
      * Removed unused imports
      
      * Added layoutlm model, test, and doc for sequence classification, and fix imports in __init__.py
      
      * Made tests pass!
      
      * Fixed typos in imports and docs
      
      * Removed imports
      
      * Fixed small formatting issues
      
      * Removed duplicates import from main __init__.py
      
      * Chnaged deafult arg to true for adding  pooling layer to tf layoutlm
      
      * Fixed formatting issues
      
      * Style
      
      * Added copied from to classes copied from bert
      
      * Fixed doc strings examples to work with layoutlm inputs
      
      * Removed PyTorch reference in doc strings example
      
      * Added integration tests
      
      * Cleaned up initialization file
      
      * Updated model checkpoint identifiers
      
      * Fixed imports
      Co-authored-by: default avatarAmir Tahmasbi <amir@ehsai.ca>
      Co-authored-by: default avatarLysandre <lysandre.debut@reseau.eseo.fr>
      4684bfc7
  22. 19 Mar, 2021 1 commit
  23. 08 Mar, 2021 1 commit
    • Ratthachat (Jung)'s avatar
      Add TFRag (#9002) · 696e8a43
      Ratthachat (Jung) authored
      * Create modeling_tf_dpr.py
      
      * Add TFDPR
      
      * Add back TFPegasus, TFMarian, TFMBart, TFBlenderBot
      
      last commit accidentally deleted these 4 lines, so I recover them back
      
      * Add TFDPR
      
      * Add TFDPR
      
      * clean up some comments, add TF input-style doc string
      
      * Add TFDPR
      
      * Make return_dict=False as default
      
      * Fix return_dict bug (in .from_pretrained)
      
      * Add get_input_embeddings()
      
      * Create test_modeling_tf_dpr.py
      
      The current version is already passed all 27 tests!
      Please see the test run at : 
      https://colab.research.google.com/drive/1czS_m9zy5k-iSJbzA_DP1k1xAAC_sdkf?usp=sharing
      
      
      
      * fix quality
      
      * delete init weights
      
      * run fix copies
      
      * fix repo consis
      
      * del config_class, load_tf_weights
      
      They shoud be 'pytorch only'
      
      * add config_class back
      
      after removing it, test failed ... so totally only removing "use_tf_weights = None" on Lysandre suggestion
      
      * newline after .. note::
      
      * import tf, np (Necessary for ModelIntegrationTest)
      
      * slow_test from_pretrained with from_pt=True
      
      At the moment we don't have TF weights (since we don't have official official TF model)
      Previously, I did not run slow test, so I missed this bug
      
      * Add simple TFDPRModelIntegrationTest
      
      Note that this is just a test that TF and Pytorch gives approx. the same output.
      However, I could not test with the official DPR repo's output yet
      
      * upload correct tf model
      
      * remove position_ids as missing keys
      
      * create modeling_tf_rag
      
      * add tests for tf
      
      * add tf tests
      
      * revert wrong pt commit
      
      * further refactor
      
      * further refactor
      
      * refactor
      
      * Update modeling_tf_rag.py
      
      - input_processing
      - fix prepare_input_for_generation (mostly fix generate bug)
      - bring back from_pretrained hack in order to test generate
      
      * delete colab pieces of code
      
      * Show case of greedy "generate"
      
      Temporarily change from beam_search test to greedy_search test to show case that TF and PT do get equivalent output.
      
      * cosmetic update
      
      * correct typos
      
      * update
      
      * push some progress
      
      * make easy check
      
      * fix rag save from pretrained
      
      * Update src/transformers/modeling_tf_utils.py
      
      * remove commented out lines
      
      * delete unnecessary lines
      
      * add simple test case for nq_checkpoint
      
      Add nq_checkpoint test to show that current version without hack still fails
      
      * temporarily put ugly hack back again
      
      * Add TFRagSequenceForGeneration!!
      
      * __init__.py , import TFRagSequenceForGeneration
      
      * Add TFRagSequence tests!
      
      * rag init.py - add TFRagSequenceForGeneration
      
      * fix from_pretrained
      
      * fix prepare_inputs_for_generation
      
      * Beam search for RagToken!
      
      * minor clean up
      
      * add tf.cast in TFRagModel
      
      * More tf.cast
      
      * Add all remaining tests (still have issues)
      
      * delete all T5 related
      
      * make style
      
      * fix load weight prefix
      
      * fix bart
      
      * fix return_dict for tf_rag
      
      make all tests pass .. Hooray
      
      * fix some tests
      
      * fix code quality
      
      * fix qualtiy check
      
      * finish tests tf rag
      
      * add tf rag to docs
      
      * remove TFT5 from docstring
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * remove TFT5 from docstring
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Delete outdated comments
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * improve doc strings
      
      * add generative model classes
      
      * fix adjust token logic
      
      * refactor generate for TFRag
      
      * using shape_list, not _get_shape
      Co-authored-by: default avatarJulien Plu <plu.julien@gmail.com>
      
      * axis=[1]->axis=1
      
      * delete NEED_HELP comment
      
      * improve readability
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * improve readability
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * improve readability
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Indicating model is in a developing state in docstrings
      
      As suggested by Julien
      
      * small last changes
      
      * apply sylvains suggestions
      
      * finish tf rag
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarpatrickvonplaten <patrick@huggingface.co>
      Co-authored-by: default avatarJulien Plu <plu.julien@gmail.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      696e8a43
  24. 27 Jan, 2021 1 commit
  25. 12 Jan, 2021 1 commit
    • Patrick von Platen's avatar
      [TFBart] Split TF-Bart (#9497) · 7f286132
      Patrick von Platen authored
      * make templates ready
      
      * make add_new_model_command_ready
      
      * finish tf bart
      
      * prepare tf mbart
      
      * finish tf bart
      
      * add tf mbart
      
      * add marian
      
      * prep pegasus
      
      * add tf pegasus
      
      * push blenderbot tf
      
      * add blenderbot
      
      * add blenderbot small
      
      * clean-up
      
      * make fix copy
      
      * define blend bot tok
      
      * fix
      
      * up
      
      * make style
      
      * add to docs
      
      * add copy statements
      
      * overwrite changes
      
      * improve
      
      * fix docs
      
      * finish
      
      * fix last slow test
      
      * fix missing git conflict line
      
      * fix blenderbot
      
      * up
      
      * fix blenderbot small
      
      * load changes
      
      * finish copied from
      
      * upload fix
      7f286132
  26. 05 Jan, 2021 1 commit
    • Patrick von Platen's avatar
      LED (#9278) · 189387e9
      Patrick von Platen authored
      * create model
      
      * add integration
      
      * save current state
      
      * make integration tests pass
      
      * add one more test
      
      * add explanation to tests
      
      * remove from bart
      
      * add padding
      
      * remove unnecessary test
      
      * make all tests pass
      
      * re-add cookie cutter tests
      
      * finish PyTorch
      
      * fix attention test
      
      * Update tests/test_modeling_common.py
      
      * revert change
      
      * remove unused file
      
      * add string to doc
      
      * save intermediate
      
      * make tf integration tests pass
      
      * finish tf
      
      * fix doc
      
      * fix docs again
      
      * add led to doctree
      
      * add to auto tokenizer
      
      * added tips for led
      
      * make style
      
      * apply jplus statements
      
      * correct tf longformer
      
      * apply lysandres suggestions
      
      * apply sylvains suggestions
      
      * Apply suggestions from code review
      189387e9
  27. 19 Dec, 2020 1 commit
    • sandip's avatar
      Added TF TransfoXL Sequence Classification (#9169) · e0e255be
      sandip authored
      * TF Transfoxl seq classification
      
      * Update test_modeling_tf_transfo_xl.py
      
      Added num_labels to config level
      
      * TF Transfoxl seq classification
      
      * Update test_modeling_tf_transfo_xl.py
      
      Added num_labels to config level
      
      * code refactor
      
      * code refactor
      
      * code refator
      e0e255be
  28. 17 Dec, 2020 1 commit
  29. 15 Dec, 2020 2 commits
  30. 09 Dec, 2020 1 commit
  31. 07 Dec, 2020 1 commit
    • sandip's avatar
      Add TFGPT2ForSequenceClassification based on DialogRPT (#8714) · 483e1327
      sandip authored
      * Add TFGPT2ForSequenceClassification based on DialogRPT
      
      * Add TFGPT2ForSequenceClassification based on DialogRPT
      
      * TFGPT2ForSequenceClassification based on DialogRPT-refactored code, implemented review comments and added input processing
      
      * Add TFGPT2ForSequenceClassification based on DialogRPT
      
      * TFGPT2ForSequenceClassification based on DialogRPT-refactored code, implemented review comments and added input processing
      
      * code refactor for latest other TF PR
      
      * code refactor
      
      * code refactor
      
      * Update modeling_tf_gpt2.py
      483e1327
  32. 30 Nov, 2020 1 commit
    • Ahmed Elnaggar's avatar
      Add T5 Encoder for Feature Extraction (#8717) · 40ecaf0c
      Ahmed Elnaggar authored
      
      
      * Add T5 Encoder class for feature extraction
      
      * fix T5 encoder add_start_docstrings indent
      
      * update init with T5 encoder
      
      * update init with TFT5ModelEncoder
      
      * remove TFT5ModelEncoder
      
      * change T5ModelEncoder order in init
      
      * add T5ModelEncoder to transformers init
      
      * clean T5ModelEncoder
      
      * update init with TFT5ModelEncoder
      
      * add TFModelEncoder for Tensorflow
      
      * update init with TFT5ModelEncoder
      
      * Update src/transformers/models/t5/modeling_t5.py
      
      change output from Seq2SeqModelOutput to BaseModelOutput
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * remove encoder_outputs
      
      1. remove encoder_outputs from the function call.
      2. remove the encoder_outputs If statement.
      3. remove isinstance from return_dict.
      
      * Authorize missing decoder keys
      
      * remove unnecessary input parameters
      
      remove pask_key_values and use_cache
      
      * remove use_cache
      
      remove use_cache from the forward method
      
      * add doctoring for T5 encoder
      
      add doctoring for T5 encoder with T5_ENCODER_INPUTS_DOCSTRING
      
      * change return_dict to dot access
      
      * add T5_ENCODER_INPUTS_DOCSTRING for TF T5
      
      * change TFT5Encoder output type to BaseModelOutput
      
      * remove unnecessary parameters for TFT5Encoder
      
      * remove unnecessary if statement
      
      * add import BaseModelOutput
      
      * fix BaseModelOutput typo to TFBaseModelOutput
      
      * update T5 doc with T5ModelEncoder
      
      * add T5ModelEncoder to tests
      
      * finish pytorch
      
      * finish docs and mt5
      
      * add mtf to init
      
      * fix init
      
      * remove n_positions
      
      * finish PR
      
      * Update src/transformers/models/mt5/modeling_mt5.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Update src/transformers/models/t5/modeling_t5.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Update src/transformers/models/t5/modeling_tf_t5.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Update src/transformers/models/mt5/modeling_tf_mt5.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * make style
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      40ecaf0c
  33. 19 Nov, 2020 1 commit
    • elk-cloner's avatar
      Tf longformer for sequence classification (#8231) · 5362bb8a
      elk-cloner authored
      
      
      * working on LongformerForSequenceClassification
      
      * add TFLongformerForMultipleChoice
      
      * add TFLongformerForTokenClassification
      
      * use add_start_docstrings_to_model_forward
      
      * test TFLongformerForSequenceClassification
      
      * test TFLongformerForMultipleChoice
      
      * test TFLongformerForTokenClassification
      
      * remove test from repo
      
      * add test and doc for TFLongformerForSequenceClassification, TFLongformerForTokenClassification, TFLongformerForMultipleChoice
      
      * add requested classes to modeling_tf_auto.py
      update dummy_tf_objects
      fix tests
      fix bugs in requested classes
      
      * pass all tests except test_inputs_embeds
      
      * sync with master
      
      * pass all tests except test_inputs_embeds
      
      * pass all tests
      
      * pass all tests
      
      * work on test_inputs_embeds
      
      * fix style and quality
      
      * make multi choice work
      
      * fix TFLongformerForTokenClassification signature
      
      * fix TFLongformerForMultipleChoice, TFLongformerForSequenceClassification signature
      
      * fix mult choice
      
      * fix mc hint
      
      * fix input embeds
      
      * fix input embeds
      
      * refactor input embeds
      
      * fix copy issue
      
      * apply sylvains changes and clean more
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      5362bb8a
  34. 17 Nov, 2020 2 commits
    • Patrick von Platen's avatar
      T5 & mT5 (#8552) · 86822a35
      Patrick von Platen authored
      * add mt5 and t5v1_1 model
      
      * fix tests
      
      * correct some imports
      
      * add tf model
      
      * finish tf t5
      
      * improve examples
      
      * fix copies
      
      * clean doc
      86822a35
    • Sylvain Gugger's avatar
      Reorganize repo (#8580) · c89bdfbe
      Sylvain Gugger authored
      * Put models in subfolders
      
      * Styling
      
      * Fix imports in tests
      
      * More fixes in test imports
      
      * Sneaky hidden imports
      
      * Fix imports in doc files
      
      * More sneaky imports
      
      * Finish fixing tests
      
      * Fix examples
      
      * Fix path for copies
      
      * More fixes for examples
      
      * Fix dummy files
      
      * More fixes for example
      
      * More model import fixes
      
      * Is this why you're unhappy GitHub?
      
      * Fix imports in conver command
      c89bdfbe
  35. 11 Nov, 2020 1 commit
    • Ratthachat (Jung)'s avatar
      Add TFDPR (#8203) · 026a2ff2
      Ratthachat (Jung) authored
      * Create modeling_tf_dpr.py
      
      * Add TFDPR
      
      * Add back TFPegasus, TFMarian, TFMBart, TFBlenderBot
      
      last commit accidentally deleted these 4 lines, so I recover them back
      
      * Add TFDPR
      
      * Add TFDPR
      
      * clean up some comments, add TF input-style doc string
      
      * Add TFDPR
      
      * Make return_dict=False as default
      
      * Fix return_dict bug (in .from_pretrained)
      
      * Add get_input_embeddings()
      
      * Create test_modeling_tf_dpr.py
      
      The current version is already passed all 27 tests!
      Please see the test run at : 
      https://colab.research.google.com/drive/1czS_m9zy5k-iSJbzA_DP1k1xAAC_sdkf?usp=sharing
      
      
      
      * fix quality
      
      * delete init weights
      
      * run fix copies
      
      * fix repo consis
      
      * del config_class, load_tf_weights
      
      They shoud be 'pytorch only'
      
      * add config_class back
      
      after removing it, test failed ... so totally only removing "use_tf_weights = None" on Lysandre suggestion
      
      * newline after .. note::
      
      * import tf, np (Necessary for ModelIntegrationTest)
      
      * slow_test from_pretrained with from_pt=True
      
      At the moment we don't have TF weights (since we don't have official official TF model)
      Previously, I did not run slow test, so I missed this bug
      
      * Add simple TFDPRModelIntegrationTest
      
      Note that this is just a test that TF and Pytorch gives approx. the same output.
      However, I could not test with the official DPR repo's output yet
      
      * upload correct tf model
      
      * remove position_ids as missing keys
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarpatrickvonplaten <patrick@huggingface.co>
      026a2ff2
  36. 10 Nov, 2020 1 commit
  37. 30 Oct, 2020 1 commit
    • Sam Shleifer's avatar
      TFMarian, TFMbart, TFPegasus, TFBlenderbot (#7987) · 566b083e
      Sam Shleifer authored
      
      
      * Start plumbing
      
      * Marian close
      
      * Small stubs for all children
      
      * Fixed bart
      
      * marian working
      
      * pegasus test is good, but failing
      
      * Checkin tests
      
      * More model files
      
      * Subtle marian, pegasus integration test failures
      
      * Works well
      
      * rm print
      
      * boom boom
      
      * Still failing model2doc
      
      * merge master
      
      * Equivalence test failing, all others fixed
      
      * cleanup
      
      * Fix embed_scale
      
      * Cleanup marian pipeline test
      
      * Undo extra changes
      
      * Smaller delta
      
      * Cleanup model testers
      
      * undo delta
      
      * fix tests import structure
      
      * cross test decorator
      
      * Cleaner set_weights
      
      * Respect authorized_unexpected_keys
      
      * No warnings
      
      * No warnings
      
      * style
      
      * Nest tf import
      
      * black
      
      * Apply suggestions from code review
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * functional dropout
      
      * fixup
      
      * Fixup
      
      * style_doc
      
      * embs
      
      * shape list
      
      * delete slow force_token_id_to_be_generated func
      
      * fixup
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      566b083e
  38. 21 Oct, 2020 1 commit
    • Sam Shleifer's avatar
      Add TFBartForConditionalGeneration (#5411) · 82984215
      Sam Shleifer authored
      
      
      * half done
      
      * doc improvement
      
      * Cp test file
      
      * brokedn
      
      * broken test
      
      * undo some mess
      
      * ckpt
      
      * borked
      
      * Halfway
      
      * 6 passing
      
      * boom boom
      
      * Much progress but still 6
      
      * boom boom
      
      * merged master
      
      * 10 passing
      
      * boom boom
      
      * Style
      
      * no t5 changes
      
      * 13 passing
      
      * Integration test failing, but not gibberish
      
      * Frustrated
      
      * Merged master
      
      * 4 fail
      
      * 4 fail
      
      * fix return_dict
      
      * boom boom
      
      * Still only 4
      
      * prepare method
      
      * prepare method
      
      * before delete classif
      
      * Skip tests to avoid adding boilerplate
      
      * boom boom
      
      * fast tests passing
      
      * style
      
      * boom boom
      
      * Switch to supporting many input types
      
      * remove FIXMENORM
      
      * working
      
      * Fixed past_key_values/decoder_cached_states confusion
      
      * new broken test
      
      * Fix attention mask kwarg name
      
      * undo accidental
      
      * Style and reviewers
      
      * style
      
      * Docs and common tests
      
      * Cleaner assert messages
      
      * copy docs
      
      * style issues
      
      * Sphinx fix
      
      * Simplify caching logic
      
      * test does not require torch
      
      * copy _NoLayerEmbedTokens
      
      * Update src/transformers/modeling_tf_bart.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Update tests/test_modeling_tf_bart.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Update src/transformers/modeling_tf_bart.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Update src/transformers/modeling_tf_bart.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Update src/transformers/modeling_tf_bart.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Line length and dont document None
      
      * Add pipeline test coverage
      
      * assert msg
      
      * At parity
      
      * Assert messages
      
      * mark slow
      
      * Update compile test
      
      * back in init
      
      * Merge master
      
      * Fix tests
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      82984215