1. 16 May, 2024 6 commits
    • Yih-Dar's avatar
      Make `Gemma` work with `torch.compile` (#30775) · 1b3dba94
      Yih-Dar authored
      
      
      * fix
      
      * [run-slow] gemma
      
      * add test
      
      * add `test_compile_static_cache`
      
      * fix
      
      * style
      
      * remove subprocess
      
      * use attribute
      
      * fix
      
      * style
      
      * update
      
      * [run-slow] dbrx,gemma,jetmoe,phi3,recurrent_gemma
      
      ---------
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      1b3dba94
    • Mohit Sharma's avatar
      Disable the FA backend for SDPA on AMD GPUs (#30850) · 0753134f
      Mohit Sharma authored
      * disable fa
      
      * disable fa
      
      * update warning
      
      * update warning
      0753134f
    • Joao Gante's avatar
      Cache: add new flag to distinguish models that `Cache` but not static cache (#30800) · 9d889f87
      Joao Gante authored
      * jamba cache
      
      * new flag
      
      * generate exception
      9d889f87
    • NielsRogge's avatar
      [Idefics2] Improve docs, add resources (#30717) · 17cc71e1
      NielsRogge authored
      
      
      * Add resources
      
      * Address comment
      
      * Address comments
      
      * Update docs/source/en/model_doc/idefics2.md
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update figure
      
      ---------
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      17cc71e1
    • hyenal's avatar
      add sdpa to ViT [follow up of #29325] (#30555) · 1c21f48a
      hyenal authored
      
      
      remove blank line (+1 squashed commit)
      Squashed commits:
      [24ccd2061] [run-slow]vit_msn,vision_encoder_decoder (+24 squashed commits)
      Squashed commits:
      [08bd27e7a] [run-slow]vit_msn,vision_encoder_decoder
      [ec96a8db3] [run-slow]vit_msn
      [ead817eca] fix vit msn multi gpu
      [d12cdc8fd] [run-slow]audio_spectrogram_transformer,deit,vision_encoder_decoder,vision_text_dual_encoder,vit,vit_hybrid,vit_mae,vit_msn,videomae,yolos
      [3fdbfa88f] doc
      [a3ff33e4a] finish implementation
      [e20b7b7fb] Update test_modeling_common.py
      [e290c5810] Update test_modeling_flax_common.py
      [d3af86f46] comment
      [ff7dd32d8] more comments
      [59b137889] suggestion
      [7e2ba6d67] attn_implementation as attribute of the class
      [fe66ab71f] minor
      [38642b568] Apply suggestions from code review
      
      Accept comments
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      [22cde7d52] Update tests/test_modeling_common.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      [48e137cc6] Update tests/test_modeling_common.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      [99f4c679f] Update tests/test_modeling_common.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      [96cf20a6d] Update src/transformers/models/vit_msn/modeling_vit_msn.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      [c59377d23] Update src/transformers/models/vit_mae/modeling_vit_mae.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      [b70a47259] Update tests/models/vision_text_dual_encoder/test_modeling_vision_text_dual_encoder.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      [00c84d216] [run-slow]audio_spectrogram_transformer,deit,vision_encoder_decoder,vision_text_dual_encoder,vit,vit_hybrid,vit_mae,vit_msn,videomae,yolos
      [61f00ebb0] all tests are passing locally
      [e9e0b82b7] vision encoder/decoder
      [4d5076b56] test-vision (+20 squashed commits)
      Squashed commits:
      [d1add8db9] yolo
      [9fde65716] fix flax
      [986566c28] minor
      [ca2f21d1f] vit
      [3333efd7a] easy models change
      [ebfc21402] [run-slow]audio_spectrogram_transformer,deit,vision_encoder_decoder,vision_text_dual_encoder,vit,vit_hybrid,vit_mae,vit_msn,videomae,yolos
      [b8b8603ed] [run-slow]vision_encoder_decoder,vision_text_dual_encoder,yolos
      [48ecc7e26] all tests are passing locally
      [bff7fc366] minor
      [62f88306f] fix yolo and text_encoder tests
      [121507555] [run-slow]audio_spectrogram_transformer,deit,vit,vit_hybrid,vit_mae,vit_msn,videomae
      [1064cae0a] [run-slow]vision_encoder_decoder,vision_text_dual_encoder,yolos
      [b7f52ff3a] [run-slow]audio_spectrogram_transformer,deit,vit,vit_hybrid,vit_mae,vit_msn,videomae
      [cffaa10dd] fix-copies
      [ef6c511c4] test vit hybrid
      [7d4ba8644] vit hybrid
      [66f919033] [run-slow]audio_spectrogram_transformer,deit,vit,vit_hybrid,vit_mae,vit_msn,videomae
      [1fcc0a031] fixes
      [cfde6eb21] fixup
      [e77df1ed3] all except yolo end encoder decoder (+17 squashed commits)
      Squashed commits:
      [602913e22] vit + vit_mae are working
      [547f6c4cc] RUN_SLOW=1 pytest tests/models/audio_spectrogram_transformer/ tests/models/deit/ tests/models/videomae/  passes
      [61a97dfa9] it s the complete opposite...
      [aefab37d4] fix more tests
      [71802a1b9] fix all torch tests
      [40b12eb58] encoder - decoder tests
      [941552b69] slow decorator where appropriate
      [14d055d80] has_attentions to yolo and msn
      [3381fa19f] add correct name
      [e261316a7] repo consistency
      [31c6d0c08] fixup
      [9d214276c] minor fix
      [11ed2e1b7] chore
      [eca6644c4] add sdpa to vit-based models
      [cffbf390b] make fix-copies result
      [6468319b0] fix style
      [d324cd02a] add sdpa for vit
      Co-authored-by: default avatarLiubov Yaronskaya <luba.yaronskaya@gmail.com>
      1c21f48a
    • NielsRogge's avatar
      [LLaVa-NeXT] Small fixes (#30841) · 9fd606db
      NielsRogge authored
      * First draft
      
      * Update docstring
      9fd606db
  2. 15 May, 2024 14 commits
  3. 14 May, 2024 10 commits
  4. 13 May, 2024 10 commits
    • fxmarty's avatar
      CI: update to ROCm 6.0.2 and test MI300 (#30266) · 37bba2a3
      fxmarty authored
      
      
      * update to ROCm 6.0.2 and test MI300
      
      * add callers for mi300
      
      * update dockerfile
      
      * fix trainer tests
      
      * remove apex
      
      * style
      
      * Update tests/trainer/test_trainer_seq2seq.py
      
      * Update tests/trainer/test_trainer_seq2seq.py
      
      * Update tests/trainer/test_trainer_seq2seq.py
      
      * Update tests/trainer/test_trainer_seq2seq.py
      
      * update to torch 2.3
      
      * add workflow dispatch target
      
      * we may need branches: mi300-ci after all
      
      * nit
      
      * fix docker build
      
      * nit
      
      * add check runner
      
      * remove docker-gpu
      
      * fix issues
      
      * fix
      
      ---------
      Co-authored-by: default avatarYih-Dar <2521628+ydshieh@users.noreply.github.com>
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      37bba2a3
    • Marc Sun's avatar
      skip low_cpu_mem_usage tests (#30782) · 539ed75d
      Marc Sun authored
      539ed75d
    • amyeroberts's avatar
      Deprecate models script (#30184) · 0f8fefd4
      amyeroberts authored
      * Add utility for finding candidate models for deprecation
      
      * Update model init
      
      * Make into configurable script
      
      * Fix path
      
      * Add sorting of base object alphabetically
      
      * Tidy
      
      * Refactor __init__ alpha ordering
      
      * Update script with logging
      
      * fix import
      
      * Fix logger
      
      * Fix logger
      
      * Get config file before moving files
      
      * Take models from CLI
      
      * Split models into lines to make easier to feed to deprecate_models script
      
      * Update
      
      * Use posix path
      
      * Print instead
      
      * Add example in module docstring
      
      * Fix up
      
      * Add clarifying comments; add models to DEPRECATE_MODELS
      
      * Address PR comments
      
      * Don't update relative paths on the same level
      0f8fefd4
    • Yih-Dar's avatar
      Save other CI jobs' result (torch/tf pipeline, example, deepspeed etc) (#30699) · 82c1625e
      Yih-Dar authored
      
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * Update utils/notification_service.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      ---------
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      82c1625e
    • Joao Gante's avatar
      Generate: assistant should be greedy in assisted decoding (#30778) · 2e27291c
      Joao Gante authored
      
      
      * assistant should be greedy
      
      * better comment
      
      * Update src/transformers/generation/candidate_generator.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      ---------
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      2e27291c
    • Alazar's avatar
      Port IDEFICS to tensorflow (#26870) · 94306352
      Alazar authored
      
      
      * Initial commit
      
      * Just a copy of modeling_idefics.py that will be ported to TF
      
      * - Prepend TF to the name of all classes
      - Convert pytorch ops to TF (not all operations are converted yet)
      
      * Add TF imports
      
      * Add autotranslated files
      
      * Add TF classes to model_tf_auto.py
      
      * Add the TF classes in model_doc
      
      * include auto-translated code
      
      * Adopted from auto-translated version
      
      * Add a forgotten super().build
      
      * Add test code for TF version.
      
      * Fix indentation and load pytorch weights for now
      
      * Some fixes. Many tests are still failing but some are passing now.
      
      - I have added TODO's for some of the hacks I made to unblock me
        and I will address them soon
      - I have the processing_idefics.py hacked in my view to support TF temporarily
      
      * Add ALL_LAYERNORM_LAYERS to match pytorch
      
      * Revert "Add ALL_LAYERNORM_LAYERS to match pytorch"
      
      This reverts commit 7e0a35119b4d7a6284d04d8c543fba1b29e573c9 as it
      is not needed in the tf implementation.
      
      * Fix freeze_relevant_params()
      
      * Some more fixes
      
      * Fix test_attention_outputs
      
      * Add tf stuff to processing_idefics.py
      
      processing_idefics.py supports both pytorch and tf now.
      
      test_processor_idefics.py for pytorch is passing, so i didn't break anything
      but still some issues with tf. I also need to add tf tests in
      test_processor_idefics.py.
      
      * Pass return_tensors to image processing code and fix test
      
      * Pass return_tensors to the image processor __init__
      
      * Fix several test cases
      
      - Make input to some of the forward pass of type `TFModelInputType`
      - Decorate main layer forward pass with `@unpack_inputs`
      - Decorate main layer with `@keras_serializable`
      - Pass `inputs` to TFIdeficsModel
      
      * Some more fixes forgotten in last commit
      
      * Fix processing code and vision_tf.py
      
      * Fix perceiver bug
      
      * Import from
      
      * Auto-add build() methods + style pass
      
      * Fix build() errors due to `None` being passed as shape to some layers
      
      * Change name in TFIdeficsForVisionText2Text to attribute in IdeficsForVisionText2Text
      
      * Fix pytorch weights load for tf2
      
      There were a lot of `name=` missing in weight initialization code.
      
      * Attempt to fix CI
      
      * Add back accidently removed line
      
      * Remove torch-specific stuff from the TF test file
      
      * make fix-copies, make style, remove autotranslated files
      
      * Fixes to imports/docstrings
      
      * Let's try the from future import in desperation
      
      * Fix the core random_attention_mask fn to match the torch/flax behaviour
      
      * Clean random_attention_mask up correctly
      
      * Remove torch-only test
      
      * Fix loss shape, couple of nits
      
      * make style
      
      * Don't test for OOB embeddings because IDEFICS uses those deliberately
      
      * Fix loss computation to handle masking
      
      * Fix test failures when flattening
      
      * Fix some test failures
      
      - Add cross attention gate which was missing and wasn't being passed arround
      - Fix overwriting of image_attention_mask due to hack I had for dummy inputs
      
      * Add a proper stateless scaled_dot_product_attention
      
      * make style
      
      * Adding missing attribute from the PyTorch version
      
      * Small cleanups to decoupledlinearlayer in case that helps
      
      * Pass epsilon to LayerNormalization
      
      * Attemp to fix pytorch weight cross-loading for TFIdeficsEmbedding
      
      * Fix a bug in TFIdeficsGatedCrossAttentionLayer
      
      * Patching up build() methods
      
      * Constant self.inv_freq
      
      * Constant self.inv_freq
      
      * First working version
      
      The TF implementation works now, there was a bug in the TFIdeficsDecoupledLinear
      where the weights were mis-intialized (in_features,out_features)
      when it should be: (out_features, in_features)
      
      I have tested this so far with tiny-random and idefics-9b-instruct
      and gives correct output.
      
      I also dumped the final outputs for both pytorch and TF
      and they are identical.
      
      * Fix some test failures
      
      * remove print statement
      
      * Fix return_tensors
      
      * Fix CI test failure check_code_quality
      
      * Attempt to fix CI failures by running `make fixup`
      
      The hardcoded IDs in test_modeling_tf_idefics.py are for the integration
      test and makes that file unreadable and should probably be moved to a seperate file.
      
      * Attempt to fix tests_pr_documentation_tests
      
      * Fix a test failure in test_image_processing_idefics.py
      
      * Fix test test_pt_tf_model_equivalence
      
      * Fix a few failures
      
      * Tiny fix
      
      * Some minor fixes
      
      * Remove a duplicate test
      
      * Override a few test failures for IDEFICS
      
      - `test_keras_save_load` is passing now
      - `test_compile_tf_model` is still failing
      
      * Fix processing_idefics.py after rebase
      
      * Guard import keras with is_tf_available
      
      * fix check code quality
      
      * fix check code quality
      
      * Minor fixes
      
      * Skip test_save_load temporarily
      
      This test passed on my local box but fails on the CI, skipping
      for now to see if there are other remaining failures on the CI.
      
      * Run `ruff format tests src utils`
      
      * Fix last failing test, `test_compile_tf_model`
      
      * Add fixes for vision_tf.py
      
      I forgot to add this file in last commit.
      
      * Minor fixes
      
      * Replace "<<<" with "<<" for doc tests
      
      IDEFICS-9B is too big for doctest runner, so don't run it there
      
      * Make code more readable
      
      * Fix bug after code review
      
      I added a layer_norm_eps to IdeficsConfig but I don't even need it
      since the vision config has a layer_norm_eps.
      
      * Fix after code review
      
      Use original code tokenizer.convert_tokens_to_ids
      
      * Keep PyTorch as the default return_tensors
      
      * Fixes to modeling_tf after code review
      
      * Fixes from code review
      
      - Remove all references of `TF_IDEFICS_PRETRAINED_MODEL_ARCHIVE_LIST`
      - Pass 1e-5 to LayerNormalization in perceiver
      
      * Run ruff
      
      * Undo a change
      
      * Refactor processing code after Matt's suggestion
      
      * Remove TODO's that aren't needed anymore
      
      * For pytorch, Use original pytorch processing code from main
      
      Since this PR is a TF port it shouldn't make any modifications
      to pytorch IDEFICS code. This changes undo's the pytorch processing
      modifications I made and uses original code from main.
      
      * Update tests/models/idefics/test_modeling_idefics.py
      
      * Update tests/models/idefics/test_modeling_tf_idefics.py
      
      * Add missing imports for is_pt_tf_cross_test
      
      * [DO NOT MERGE]: This is a commit for debugging and will be reverted
      
      The cross test `test_pt_tf_model_equivalence` passes locally but
      fails when running on the CI. This commit is to help debug that
      and will be reverted.
      
      * Revert "[DO NOT MERGE]: This is a commit for debugging and will be reverted"
      
      This reverts commit 8f0d709ec5bd46685fb0b4259d914ffee794875b.
      
      * [DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted
      
      * [DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted
      
      * Revert "[DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted"
      
      This reverts commit 998cc38b8c3d313bf5e5eb55a7f5b7b881897b89.
      
      * Revert "[DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted"
      
      This reverts commit 1c695ac4219c4ae4d39b330b01744dc27deb7dd4.
      
      * Don't skip test_save_load
      
      IIRC test_save_load was also failing on the CI but not on my local
      box, it might be easier to debug that on the CI first than the cross tests
      
      * Debugging commit, will be reverted
      
      * Revert "Debugging commit, will be reverted"
      
      This reverts commit 8eafc8e41e20c4e95a3a90834f06a6e9f445e2d5.
      
      * Override `test_save_load` and push model to save
      
      Maybe this will help me repro this weird bug
      
      * pass my repo_id
      
      * add endpoint
      
      * Pass a temp (write) token just for this CI
      
      * Undo last few commits, still pushing to hub for model debugging
      
      The issue seems to be with save_pretrained(),  when I looked at the model saved
      from the CI test failure it is basically empty and has no weights.
      `self.save_weights(..)` seems to be failing in save_pretrained but needs
      more debugging
      
      * Add logging to modeling tf utils, will be reverted just for debugging
      
      * Debugging, will revert
      
      * Revert "Debugging, will revert"
      
      This reverts commit 9d0d3075fb7c82d8cde3a5c76bc8f3876c5c55d3.
      
      * Revert "Add logging to modeling tf utils, will be reverted just for debugging"
      
      This reverts commit 774b6b7b1c17b3ce5d7634ade768f2f686cee617.
      
      * Remove `test_save_load`
      
      The CI failures are gone after my latest rebase, no idea why
      but I was still saving the model to my hub on HF and the tf_model.h5
      file now has everything.
      
      * Run make fix-copies
      
      * Run ruff format tests src utils
      
      * Debugging commit, will be reverted
      
      * Run ruff, also trigger CI run
      
      * Run ruff again
      
      * Undo debugging commit
      
      ---------
      Co-authored-by: default avatarMatt <rocketknight1@gmail.com>
      Co-authored-by: default avatarMatt <Rocketknight1@users.noreply.github.com>
      94306352
    • Joao Gante's avatar
      de2f7221
    • NielsRogge's avatar
      [Object detection pipeline] Lower threshold (#30710) · ce87dca1
      NielsRogge authored
      * Lower threshold
      
      * Address comment
      ce87dca1
    • Fanli Lin's avatar
      enable Pipeline to get device from model (#30534) · 69d9bca5
      Fanli Lin authored
      
      
      * check model.device
      
      * fix
      
      * style fix
      
      * move model device
      
      * remove print
      
      * add comment
      
      * fix
      
      * add unit test
      
      * optimize
      
      * change test names and add more cases
      
      * Update tests/pipelines/test_pipelines_common.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      ---------
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      69d9bca5
    • Joao Gante's avatar
      Qwen: incorrect setup flag (#30776) · f4dc26d4
      Joao Gante authored
      qwen does not support the new cache classes
      f4dc26d4