1. 08 Feb, 2024 1 commit
  2. 07 Feb, 2024 1 commit
  3. 06 Feb, 2024 4 commits
  4. 05 Feb, 2024 3 commits
    • amyeroberts's avatar
      Image Feature Extraction pipeline (#28216) · ba3264b4
      amyeroberts authored
      
      
      * Draft pipeline
      
      * Fixup
      
      * Fix docstrings
      
      * Update doctest
      
      * Update pipeline_model_mapping
      
      * Update docstring
      
      * Update tests
      
      * Update src/transformers/pipelines/image_feature_extraction.py
      Co-authored-by: default avatarOmar Sanseviero <osanseviero@gmail.com>
      
      * Fix docstrings - review comments
      
      * Remove pipeline mapping for composite vision models
      
      * Add to pipeline tests
      
      * Remove for flava (multimodal)
      
      * safe pil import
      
      * Add requirements for pipeline run
      
      * Account for super slow efficientnet
      
      * Review comments
      
      * Fix tests
      
      * Swap order of kwargs
      
      * Use build_pipeline_init_args
      
      * Add back FE pipeline for Vilt
      
      * Include image_processor_kwargs in docstring
      
      * Mark test as flaky
      
      * Update TODO
      
      * Update tests/pipelines/test_pipelines_image_feature_extraction.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Add license header
      
      ---------
      Co-authored-by: default avatarOmar Sanseviero <osanseviero@gmail.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      ba3264b4
    • Yoach Lacombe's avatar
      Correct wav2vec2-bert inputs_to_logits_ratio (#28821) · 7addc934
      Yoach Lacombe authored
      * Correct wav2vec2-bert inputs_to_logits_ratio
      
      * correct ratio
      
      * correct ratio, clean asr pipeline
      
      * refactor on one line
      7addc934
    • Nicolas Patry's avatar
      [WIP] Hard error when ignoring tensors. (#27484) · 2da28c4b
      Nicolas Patry authored
      
      
      * [WIP] Hard error when ignoring tensors.
      
      * Better selection/error when saving a checkpoint.
      
      - Find all names we should normally drop (those are in the transformers
        config)
      - Find all disjoint tensors (for those we can safely trigger a copy to
        get rid of the sharing before saving)
      - Clone those disjoint tensors getting rid of the issue
      - Find all identical names (those should be declared in the config
        but we try to find them all anyway.)
      - For all identical names:
        - If they are in the config, just ignore them everything is fine
        - If they are not, warn about them.
      - For all remainder tensors which are shared yet neither identical NOR
        disjoint. raise a hard error.
      
      * Adding a failing test on `main` that passes here.
      
      * We don't need to keep the subfolder logic in this test.
      
      * Apply suggestions from code review
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      ---------
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      2da28c4b
  5. 02 Feb, 2024 4 commits
  6. 01 Feb, 2024 2 commits
    • fxmarty's avatar
      Fix symbolic_trace with kv cache (#28724) · 709dc432
      fxmarty authored
      * fix symbolic_trace with kv cache
      
      * comment & better test
      709dc432
    • JB (Don)'s avatar
      Adding [T5/MT5/UMT5]ForTokenClassification (#28443) · 0d26abdd
      JB (Don) authored
      * Adding [T5/MT5/UMT5]ForTokenClassification
      
      * Add auto mappings for T5ForTokenClassification and variants
      
      * Adding ForTokenClassification to the list of models
      
      * Adding attention_mask param to the T5ForTokenClassification test
      
      * Remove outdated comment in test
      
      * Adding EncoderOnly and Token Classification tests for MT5 and UMT5
      
      * Fix typo in umt5 string
      
      * Add tests for all the existing MT5 models
      
      * Fix wrong comment in dependency_versions_table
      
      * Reverting change to common test for _keys_to_ignore_on_load_missing
      
      The test is correctly picking up redundant keys in _keys_to_ignore_on_load_missing.
      
      * Removing _keys_to_ignore_on_missing from MT5 since the key is not used in the model
      
      * Add fix-copies to MT5ModelTest
      0d26abdd
  7. 31 Jan, 2024 4 commits
    • Joao Gante's avatar
      DeepSpeed: hardcode `torch.arange` dtype on `float` usage to avoid incorrect... · beb2a096
      Joao Gante authored
      DeepSpeed: hardcode `torch.arange` dtype on `float` usage to avoid incorrect initialization (#28760)
      
      beb2a096
    • Kian Sierra McGettigan's avatar
      Flax mistral (#26943) · f7076cd3
      Kian Sierra McGettigan authored
      * direct copy from llama work
      
      * mistral modules forward pass working
      
      * flax mistral forward pass with sliding window
      
      * added tests
      
      * added layer collection approach
      
      * Revert "added layer collection approach"
      
      This reverts commit 0e2905bf2236ec323163fc1a9f0c016b21aa8b8f.
      
      * Revert "Revert "added layer collection approach""
      
      This reverts commit fb17b6187ac5d16da7c461e1130514dc3d137a43.
      
      * fixed attention outputs
      
      * added mistral to init and auto
      
      * fixed import name
      
      * fixed layernorm weight dtype
      
      * freeze initialized weights
      
      * make sure conversion consideres bfloat16
      
      * added backend
      
      * added docstrings
      
      * added cache
      
      * fixed sliding window causal mask
      
      * passes cache tests
      
      * passed all tests
      
      * applied make style
      
      * removed commented out code
      
      * applied fix-copies ignored other model changes
      
      * applied make fix-copies
      
      * removed unused functions
      
      * passed generation integration test
      
      * slow tests pass
      
      * fixed slow tests
      
      * changed default dtype from jax.numpy.float32 to float32 for docstring check
      
      * skip cache test  for FlaxMistralForSequenceClassification since if pad_token_id in input_ids it doesn't score previous input_ids
      
      * updated checkpoint since from_pt not included
      
      * applied black style
      
      * removed unused args
      
      * Applied styling and fixup
      
      * changed checkpoint for doc back
      
      * fixed rf after adding it to hf hub
      
      * Add dummy ckpt
      
      * applied styling
      
      * added tokenizer to new ckpt
      
      * fixed slice format
      
      * fix init and slice
      
      * changed ref for placeholder TODO
      
      * added copies from Llama
      
      * applied styling
      
      * applied fix-copies
      
      * fixed docs
      
      * update weight dtype reconversion for sharded weights
      
      * removed Nullable input ids
      
      * Removed unnecessary output attentions in Module
      
      * added embedding weight initialziation
      
      * removed unused past_key_values
      
      * fixed deterministic
      
      * Fixed RMS Norm and added copied from
      
      * removed input_embeds
      
      * applied make style
      
      * removed nullable input ids from sequence classification model
      
      * added copied from GPTJ
      
      * added copied from Llama on FlaxMistralDecoderLayer
      
      * added copied from to FlaxMistralPreTrainedModel methods
      
      * fix test deprecation warning
      
      * freeze gpt neox random_params and fix copies
      
      * applied make style
      
      * fixed doc issue
      
      * skipped docstring test to allign # copied from
      
      * applied make style
      
      * removed FlaxMistralForSequenceClassification
      
      * removed unused padding_idx
      
      * removed more sequence classification
      
      * removed sequence classification
      
      * applied styling and consistency
      
      * added copied from in tests
      
      * removed sequence classification test logic
      
      * applied styling
      
      * applied make style
      
      * removed freeze and fixed copies
      
      * undo test change
      
      * changed repeat_kv to tile
      
      * fixed to key value groups
      
      * updated copyright year
      
      * split casual_mask
      
      * empty to rerun failed pt_flax_equivalence test FlaxWav2Vec2ModelTest
      
      * went back to 2023 for tests_pr_documentation_tests
      
      * went back to 2024
      
      * changed tile to repeat
      
      * applied make style
      
      * empty for retry on Wav2Vec2
      f7076cd3
    • Patrick von Platen's avatar
      [Whisper] Refactor forced_decoder_ids & prompt ids (#28687) · 65a926e8
      Patrick von Platen authored
      
      
      * up
      
      * Fix more
      
      * Correct more
      
      * Fix more tests
      
      * fix fast tests
      
      * Fix more
      
      * fix more
      
      * push all files
      
      * finish all
      
      * make style
      
      * Fix timestamp wrap
      
      * make style
      
      * make style
      
      * up
      
      * up
      
      * up
      
      * Fix lang detection behavior
      
      * Fix lang detection behavior
      
      * Add lang detection test
      
      * Fix lang detection behavior
      
      * make style
      
      * Update src/transformers/models/whisper/generation_whisper.py
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * better error message
      
      * make style tests
      
      * add warning
      
      ---------
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      65a926e8
    • tom-p-reichel's avatar
      don't initialize the output embeddings if we're going to tie them to input embeddings (#28192) · ae0c27ad
      tom-p-reichel authored
      * test that tied output embeddings aren't initialized on load
      
      * don't initialize the output embeddings if we're going to tie them to the input embeddings
      ae0c27ad
  8. 30 Jan, 2024 4 commits
  9. 29 Jan, 2024 3 commits
  10. 26 Jan, 2024 1 commit
  11. 25 Jan, 2024 1 commit
    • NielsRogge's avatar
      Add Depth Anything (#28654) · 963db81a
      NielsRogge authored
      * First draft
      
      * More improvements
      
      * More improvements
      
      * More improvements
      
      * More improvements
      
      * Add docs
      
      * Remove file
      
      * Add copied from
      
      * Address comments
      
      * Address comments
      
      * Address comments
      
      * Fix style
      
      * Update docs
      
      * Convert all checkpoints, add integration test
      
      * Rename checkpoints
      
      * Add pretrained backbone attributes
      
      * Fix default config
      
      * Address comment
      
      * Add figure to docs
      
      * Fix bug thanks to @xenova
      
      * Update conversion script
      
      * Fix integration test
      963db81a
  12. 24 Jan, 2024 1 commit
    • Khai Mai's avatar
      Exclude the load balancing loss of padding tokens in Mixtral-8x7B (#28517) · c5c69096
      Khai Mai authored
      * fix the function load_balancing_loss_func in Mixtral_Moe to include attention_mask
      
      * format code using black and ruff
      
      * skip computing mask if attention_mask=None
      
      * add tests for load balancing loss Mixtral-Moe
      
      * fix assert loss is different in mixtral_test
      
      * fix pad_leng
      
      * use assertNotAlmostEqual and print to debug
      
      * remove print for debug
      
      * minor updates
      
      * reduce rtol and atol
      c5c69096
  13. 23 Jan, 2024 2 commits
  14. 22 Jan, 2024 1 commit
  15. 21 Jan, 2024 1 commit
  16. 19 Jan, 2024 7 commits