1. 05 Feb, 2024 3 commits
    • amyeroberts's avatar
      Image Feature Extraction pipeline (#28216) · ba3264b4
      amyeroberts authored
      
      
      * Draft pipeline
      
      * Fixup
      
      * Fix docstrings
      
      * Update doctest
      
      * Update pipeline_model_mapping
      
      * Update docstring
      
      * Update tests
      
      * Update src/transformers/pipelines/image_feature_extraction.py
      Co-authored-by: default avatarOmar Sanseviero <osanseviero@gmail.com>
      
      * Fix docstrings - review comments
      
      * Remove pipeline mapping for composite vision models
      
      * Add to pipeline tests
      
      * Remove for flava (multimodal)
      
      * safe pil import
      
      * Add requirements for pipeline run
      
      * Account for super slow efficientnet
      
      * Review comments
      
      * Fix tests
      
      * Swap order of kwargs
      
      * Use build_pipeline_init_args
      
      * Add back FE pipeline for Vilt
      
      * Include image_processor_kwargs in docstring
      
      * Mark test as flaky
      
      * Update TODO
      
      * Update tests/pipelines/test_pipelines_image_feature_extraction.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Add license header
      
      ---------
      Co-authored-by: default avatarOmar Sanseviero <osanseviero@gmail.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      ba3264b4
    • Yoach Lacombe's avatar
      Correct wav2vec2-bert inputs_to_logits_ratio (#28821) · 7addc934
      Yoach Lacombe authored
      * Correct wav2vec2-bert inputs_to_logits_ratio
      
      * correct ratio
      
      * correct ratio, clean asr pipeline
      
      * refactor on one line
      7addc934
    • Nicolas Patry's avatar
      [WIP] Hard error when ignoring tensors. (#27484) · 2da28c4b
      Nicolas Patry authored
      
      
      * [WIP] Hard error when ignoring tensors.
      
      * Better selection/error when saving a checkpoint.
      
      - Find all names we should normally drop (those are in the transformers
        config)
      - Find all disjoint tensors (for those we can safely trigger a copy to
        get rid of the sharing before saving)
      - Clone those disjoint tensors getting rid of the issue
      - Find all identical names (those should be declared in the config
        but we try to find them all anyway.)
      - For all identical names:
        - If they are in the config, just ignore them everything is fine
        - If they are not, warn about them.
      - For all remainder tensors which are shared yet neither identical NOR
        disjoint. raise a hard error.
      
      * Adding a failing test on `main` that passes here.
      
      * We don't need to keep the subfolder logic in this test.
      
      * Apply suggestions from code review
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      ---------
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      2da28c4b
  2. 02 Feb, 2024 4 commits
  3. 01 Feb, 2024 2 commits
    • fxmarty's avatar
      Fix symbolic_trace with kv cache (#28724) · 709dc432
      fxmarty authored
      * fix symbolic_trace with kv cache
      
      * comment & better test
      709dc432
    • JB (Don)'s avatar
      Adding [T5/MT5/UMT5]ForTokenClassification (#28443) · 0d26abdd
      JB (Don) authored
      * Adding [T5/MT5/UMT5]ForTokenClassification
      
      * Add auto mappings for T5ForTokenClassification and variants
      
      * Adding ForTokenClassification to the list of models
      
      * Adding attention_mask param to the T5ForTokenClassification test
      
      * Remove outdated comment in test
      
      * Adding EncoderOnly and Token Classification tests for MT5 and UMT5
      
      * Fix typo in umt5 string
      
      * Add tests for all the existing MT5 models
      
      * Fix wrong comment in dependency_versions_table
      
      * Reverting change to common test for _keys_to_ignore_on_load_missing
      
      The test is correctly picking up redundant keys in _keys_to_ignore_on_load_missing.
      
      * Removing _keys_to_ignore_on_missing from MT5 since the key is not used in the model
      
      * Add fix-copies to MT5ModelTest
      0d26abdd
  4. 31 Jan, 2024 4 commits
    • Joao Gante's avatar
      DeepSpeed: hardcode `torch.arange` dtype on `float` usage to avoid incorrect... · beb2a096
      Joao Gante authored
      DeepSpeed: hardcode `torch.arange` dtype on `float` usage to avoid incorrect initialization (#28760)
      
      beb2a096
    • Kian Sierra McGettigan's avatar
      Flax mistral (#26943) · f7076cd3
      Kian Sierra McGettigan authored
      * direct copy from llama work
      
      * mistral modules forward pass working
      
      * flax mistral forward pass with sliding window
      
      * added tests
      
      * added layer collection approach
      
      * Revert "added layer collection approach"
      
      This reverts commit 0e2905bf2236ec323163fc1a9f0c016b21aa8b8f.
      
      * Revert "Revert "added layer collection approach""
      
      This reverts commit fb17b6187ac5d16da7c461e1130514dc3d137a43.
      
      * fixed attention outputs
      
      * added mistral to init and auto
      
      * fixed import name
      
      * fixed layernorm weight dtype
      
      * freeze initialized weights
      
      * make sure conversion consideres bfloat16
      
      * added backend
      
      * added docstrings
      
      * added cache
      
      * fixed sliding window causal mask
      
      * passes cache tests
      
      * passed all tests
      
      * applied make style
      
      * removed commented out code
      
      * applied fix-copies ignored other model changes
      
      * applied make fix-copies
      
      * removed unused functions
      
      * passed generation integration test
      
      * slow tests pass
      
      * fixed slow tests
      
      * changed default dtype from jax.numpy.float32 to float32 for docstring check
      
      * skip cache test  for FlaxMistralForSequenceClassification since if pad_token_id in input_ids it doesn't score previous input_ids
      
      * updated checkpoint since from_pt not included
      
      * applied black style
      
      * removed unused args
      
      * Applied styling and fixup
      
      * changed checkpoint for doc back
      
      * fixed rf after adding it to hf hub
      
      * Add dummy ckpt
      
      * applied styling
      
      * added tokenizer to new ckpt
      
      * fixed slice format
      
      * fix init and slice
      
      * changed ref for placeholder TODO
      
      * added copies from Llama
      
      * applied styling
      
      * applied fix-copies
      
      * fixed docs
      
      * update weight dtype reconversion for sharded weights
      
      * removed Nullable input ids
      
      * Removed unnecessary output attentions in Module
      
      * added embedding weight initialziation
      
      * removed unused past_key_values
      
      * fixed deterministic
      
      * Fixed RMS Norm and added copied from
      
      * removed input_embeds
      
      * applied make style
      
      * removed nullable input ids from sequence classification model
      
      * added copied from GPTJ
      
      * added copied from Llama on FlaxMistralDecoderLayer
      
      * added copied from to FlaxMistralPreTrainedModel methods
      
      * fix test deprecation warning
      
      * freeze gpt neox random_params and fix copies
      
      * applied make style
      
      * fixed doc issue
      
      * skipped docstring test to allign # copied from
      
      * applied make style
      
      * removed FlaxMistralForSequenceClassification
      
      * removed unused padding_idx
      
      * removed more sequence classification
      
      * removed sequence classification
      
      * applied styling and consistency
      
      * added copied from in tests
      
      * removed sequence classification test logic
      
      * applied styling
      
      * applied make style
      
      * removed freeze and fixed copies
      
      * undo test change
      
      * changed repeat_kv to tile
      
      * fixed to key value groups
      
      * updated copyright year
      
      * split casual_mask
      
      * empty to rerun failed pt_flax_equivalence test FlaxWav2Vec2ModelTest
      
      * went back to 2023 for tests_pr_documentation_tests
      
      * went back to 2024
      
      * changed tile to repeat
      
      * applied make style
      
      * empty for retry on Wav2Vec2
      f7076cd3
    • Patrick von Platen's avatar
      [Whisper] Refactor forced_decoder_ids & prompt ids (#28687) · 65a926e8
      Patrick von Platen authored
      
      
      * up
      
      * Fix more
      
      * Correct more
      
      * Fix more tests
      
      * fix fast tests
      
      * Fix more
      
      * fix more
      
      * push all files
      
      * finish all
      
      * make style
      
      * Fix timestamp wrap
      
      * make style
      
      * make style
      
      * up
      
      * up
      
      * up
      
      * Fix lang detection behavior
      
      * Fix lang detection behavior
      
      * Add lang detection test
      
      * Fix lang detection behavior
      
      * make style
      
      * Update src/transformers/models/whisper/generation_whisper.py
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * better error message
      
      * make style tests
      
      * add warning
      
      ---------
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      65a926e8
    • tom-p-reichel's avatar
      don't initialize the output embeddings if we're going to tie them to input embeddings (#28192) · ae0c27ad
      tom-p-reichel authored
      * test that tied output embeddings aren't initialized on load
      
      * don't initialize the output embeddings if we're going to tie them to the input embeddings
      ae0c27ad
  5. 30 Jan, 2024 4 commits
  6. 29 Jan, 2024 3 commits
  7. 26 Jan, 2024 1 commit
  8. 25 Jan, 2024 1 commit
    • NielsRogge's avatar
      Add Depth Anything (#28654) · 963db81a
      NielsRogge authored
      * First draft
      
      * More improvements
      
      * More improvements
      
      * More improvements
      
      * More improvements
      
      * Add docs
      
      * Remove file
      
      * Add copied from
      
      * Address comments
      
      * Address comments
      
      * Address comments
      
      * Fix style
      
      * Update docs
      
      * Convert all checkpoints, add integration test
      
      * Rename checkpoints
      
      * Add pretrained backbone attributes
      
      * Fix default config
      
      * Address comment
      
      * Add figure to docs
      
      * Fix bug thanks to @xenova
      
      * Update conversion script
      
      * Fix integration test
      963db81a
  9. 24 Jan, 2024 1 commit
    • Khai Mai's avatar
      Exclude the load balancing loss of padding tokens in Mixtral-8x7B (#28517) · c5c69096
      Khai Mai authored
      * fix the function load_balancing_loss_func in Mixtral_Moe to include attention_mask
      
      * format code using black and ruff
      
      * skip computing mask if attention_mask=None
      
      * add tests for load balancing loss Mixtral-Moe
      
      * fix assert loss is different in mixtral_test
      
      * fix pad_leng
      
      * use assertNotAlmostEqual and print to debug
      
      * remove print for debug
      
      * minor updates
      
      * reduce rtol and atol
      c5c69096
  10. 23 Jan, 2024 2 commits
  11. 22 Jan, 2024 1 commit
  12. 21 Jan, 2024 1 commit
  13. 19 Jan, 2024 7 commits
  14. 18 Jan, 2024 5 commits
    • Sanchit Gandhi's avatar
      [Whisper] Fix audio classification with weighted layer sum (#28563) · 186aa6be
      Sanchit Gandhi authored
      * fix
      
      * tests
      
      * fix test
      186aa6be
    • Yih-Dar's avatar
      Use `LoggingLevel` context manager in 3 tests (#28575) · 0754217c
      Yih-Dar authored
      
      
      * inside with LoggingLevel
      
      * remove is_flaky
      
      ---------
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      0754217c
    • Yoach Lacombe's avatar
      Add new meta w2v2-conformer BERT-like model (#28165) · d2cdefb9
      Yoach Lacombe authored
      
      
      * first commit
      
      * correct default value non causal
      
      * update config and modeling code
      
      * update converting checkpoint
      
      * clean modeling and fix tests
      
      * make style
      
      * add new config parameters to docstring
      
      * fix copied from statements
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * make position_embeddings_type docstrings clearer
      
      * clean converting script
      
      * remove function not used
      
      * clean modeling file
      
      * apply suggestion for test file + add convert script to not_doctested
      
      * modify tests according to review - cleaner logic and more tests
      
      * Apply nit suggestions from code review
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * add checker of valid position embeddings type
      
      * instantiate new layer norm layer with the right eps
      
      * fix freeze_feature_encoder since it can be None in some cases
      
      * add test same output in convert script
      
      * restore wav2vec2conformer and add new model
      
      * create processor and FE + clean
      
      * add new model code
      
      * fix convert script and set default config parameters
      
      * correct model id paths
      
      * make style
      
      * make fix-copies and cleaning files
      
      * fix copied from statements
      
      * complete .md and fixe copies
      
      * clean convert script argument defaults
      
      * fix config parameters docstrings
      
      * fix config docstring
      
      * add copied from and enrich FE tests
      
      * fix copied from and repo-consistency
      
      * add autotokenizer
      
      * make test input length shorter and change docstring code
      
      * fix docstrings and copied from
      
      * add add_adapter to ASR training example
      
      * make testing of adapters more robust
      
      * adapt to multi adapter layers
      
      * refactor input_values->input_features and remove w2v2-bert feature extractor
      
      * remove pretraining model
      
      * remove depreciated features and useless lines
      
      * add copied from and ignore statements to modeling tests
      
      * remove pretraining model #2
      
      * change import in convert script
      
      * change default in convert script
      
      * update readme and remove useless line
      
      * Update tests/models/wav2vec2_bert/test_processor_wav2vec2_bert.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * refactor BERT to Bert for consistency
      
      * remove useless ignore copy statement
      
      * add persistent to buffer in rotary
      
      * add eps in LayerNorm init and remove copied from
      
      * add adapter activation parameters and add copied from statements
      
      * Fix copied statements and add unitest.skip reasons
      
      * add copied statement in test_processor
      
      * refactor processor
      
      * make style
      
      * replace numpy random by torch rand
      
      * remove expected output CTC
      
      * improve converting script with processor class
      
      * Apply suggestions from code review
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * remove gumbel class
      
      * remove tests related to previously deleted class
      
      * Update src/transformers/models/wav2vec2_bert/configuration_wav2vec2_bert.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * correct typos
      
      * remove uused parameters
      
      * update processor to takes both text and audio
      
      * update checkpoints
      
      * update expected output and add ctc expected output
      
      * add label_attention_mask
      
      * replace pt with np in processor tests
      
      * fix typo
      
      * revert to behaviour with labels_attention_mask
      
      ---------
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      d2cdefb9
    • Arthur's avatar
      [`Core Tokenization`] Support a fix for spm fast models (#26678) · 81899778
      Arthur authored
      * fix
      
      * last attempt
      
      * current work
      
      * fix forward compatibility
      
      * save all special tokens
      
      * current state
      
      * revert additional changes
      
      * updates
      
      * remove tokenizer.model
      
      * add a test and the fix
      
      * nit
      
      * revert one more break
      
      * fix typefield issue
      
      * quality
      
      * more tests
      
      * fix fields for FC
      
      * more nits?
      
      * new additional changes
      
      * how
      
      * some updates
      
      * the fix
      
      * where do we stand
      
      * nits
      
      * nits
      
      * revert unrelated changes
      
      * nits nits nits
      
      * styling
      
      * don't break llama just yet
      
      * revert llama changes
      
      * safe arg check
      
      * fixup
      
      * Add a test for T5
      
      * Necessary changes
      
      * Tests passing, added tokens need to not be normalized. If the added tokens are normalized, it will the stripping which seems to be unwanted for a normal functioning
      
      * Add even more tests, when normalization is set to True (which does not work 馃槗 )
      
      * Add even more tests, when normalization is set to True (which does not work 馃槗 )
      
      * Update to main
      
      * nits
      
      * fmt
      
      * more and more test
      
      * comments
      
      * revert change as tests are failing
      
      * make the test more readble
      
      * nits
      
      * refactor the test
      
      * nit
      
      * updates
      
      * simplify
      
      * style
      
      * style
      
      * style convert slow
      
      * Update src/transformers/convert_slow_tokenizer.py
      81899778
    • Yih-Dar's avatar
      Save `Processor` (#27761) · 3005f965
      Yih-Dar authored
      
      
      * save processor
      
      * Update tests/models/auto/test_processor_auto.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update tests/test_processing_common.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * fix
      
      ---------
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      3005f965
  15. 17 Jan, 2024 1 commit
    • fxmarty's avatar
      Fix SDPA tests (#28552) · 2c1eebc1
      fxmarty authored
      
      
      * skip bf16 test if not supported by device
      
      * fix
      
      * fix bis
      
      * use is_torch_bf16_available_on_device
      
      * use is_torch_fp16_available_on_device
      
      * fix & use public llama
      
      * use 1b model
      
      * fix flacky test
      
      ---------
      Co-authored-by: default avatarYour Name <you@example.com>
      2c1eebc1