1. 08 Aug, 2024 1 commit
  2. 06 Aug, 2024 1 commit
    • Ao Tang's avatar
      Add Nemotron HF Support (#31699) · 6a03942d
      Ao Tang authored
      * Add nemotron support
      
      * fix inference
      
      * add unit test
      
      * add layernorm1p as a class to avoid meta device mismatch
      
      * test fixed
      
      * Add copied_from statements
      
      * remove pretraining_tp args
      
      * remove nemotronlayernorm
      
      * force LN computation done in FP32
      
      * remove nemotrontokenizer and use llamatokenizer
      
      * license update
      
      * add option for kv_channels for minitron8b
      
      * remove assert
      
      * o_proj fixed
      
      * o_proj reshape
      
      * add gated_proj option
      
      * typo
      
      * remove todos
      
      * fix broken test after merging latest main
      
      * remove nezha/nat after meging main
      
      * chnage default config to 15b model
      
      * add nemo conversion script
      
      * rename conversion script
      
      * remove gate_proj option
      
      * pr comment resolved
      
      * fix unit test
      
      * rename kv_channels to head_dim
      
      * resolve PR issue
      
      * add nemotron md
      
      * fix broken tests
      
      * refactor rope for nemotron
      
      * test fix
      
      * remove linearscaling
      
      * whitespace and import
      
      * fix some copied-from
      
      * code style fix
      
      * reformatted
      
      * add position_embedding to nemotronattention
      
      * rope refactor to only use config, copied-from fix
      
      * format
      
      * Run make fix-copies
      
      * nemotron md with autodoc
      
      * doc  fix
      
      * fix order
      
      * pass check_config_docstrings.py
      
      * fix config_attributes
      
      * remove all llama BC related code
      
      * Use PreTrainedTokenizerFast
      
      * ruff check examples
      
      * conversion script update
      
      * add nemotron to toctree
      6a03942d
  3. 18 Jul, 2024 1 commit
    • Pavel Iakubovskii's avatar
      Add `sdpa` and FA2 for CLIP (#31940) · 1c37e8c1
      Pavel Iakubovskii authored
      
      
      * Squashed commit of the following:
      
      commit 102842cd477219b9f9bcb23a0bca3a8b92bd732f
      Author: Pavel Iakubovskii <qubvel@gmail.com>
      Date:   Fri Jul 12 18:23:52 2024 +0000
      
          Add model-specific sdpa tests
      
      commit 60e4c88581abf89ec098da84ed8e92aa904c997d
      Author: Pavel Iakubovskii <qubvel@gmail.com>
      Date:   Fri Jul 12 18:20:53 2024 +0000
      
          Add fallback to eager (expensive operation)
      
      commit c29033d30e7ffde4327e8a15cbbc6bee37546f80
      Author: Pavel Iakubovskii <qubvel@gmail.com>
      Date:   Thu Jul 11 17:09:55 2024 +0000
      
          Fix attn_implementation propagation
      
      commit 783aed05f0f38cb2f99e758f81db6838ac55b9f8
      Author: sayakpaul <spsayakpaul@gmail.com>
      Date:   Sat May 25 09:05:27 2024 +0530
      
          style
      
      commit e77e703ca75d00447cda277eca6b886cd32bddc0
      Author: sayakpaul <spsayakpaul@gmail.com>
      Date:   Sat May 25 09:04:57 2024 +0530
      
          add comment to explain why I had to touch forbidden codebase.
      
      commit ab9d8849758e7773a31778ccba71588d18552623
      Author: sayakpaul <spsayakpaul@gmail.com>
      Date:   Sat May 25 09:03:02 2024 +0530
      
          fix: flax attribute access.
      
      commit c570fc0abf9d1bd58c291aae3c7e384f995996d2
      Author: sayakpaul <spsayakpaul@gmail.com>
      Date:   Sat May 25 08:23:54 2024 +0530
      
          fix tensorflow attribute name.
      
      commit 32c812871cfdb268d8a6e3e2c61c5c925c8ed47e
      Author: sayakpaul <spsayakpaul@gmail.com>
      Date:   Sat May 25 07:57:10 2024 +0530
      
          fix attribute access.
      
      commit 4f41a0138b6c417aed9c9332278f8bcd979cb7c2
      Author: sayakpaul <spsayakpaul@gmail.com>
      Date:   Sat May 25 07:44:02 2024 +0530
      
          _from_config.
      
      commit 35aed64ff602422adcf41d7f677a0a24bd9eccae
      Author: sayakpaul <spsayakpaul@gmail.com>
      Date:   Fri May 24 18:46:52 2024 +0530
      
          propagation of attn_implementation.
      
      commit 4c25c19845438b1dc1d35a5adf9436151c8c5940
      Author: sayakpaul <spsayakpaul@gmail.com>
      Date:   Fri May 24 09:24:36 2024 +0530
      
          style again
      
      commit 5f7dc5c5015c0f8116408f737e8c318d1802c80c
      Author: sayakpaul <spsayakpaul@gmail.com>
      Date:   Fri May 24 09:19:05 2024 +0530
      
          use from_config.
      
      commit b70c409956d0359fa6ae5372275d2a20ba7e3389
      Author: sayakpaul <spsayakpaul@gmail.com>
      Date:   Fri May 24 09:13:43 2024 +0530
      
          quality
      
      commit a7b63beff53d0fc754c6564e2a7b51731ddee49d
      Author: sayakpaul <spsayakpaul@gmail.com>
      Date:   Fri May 10 14:35:10 2024 +0200
      
          add benchmark numbers
      
      commit 455b0eaea50862b8458c8f422b60fe60ae40fdcb
      Author: sayakpaul <spsayakpaul@gmail.com>
      Date:   Fri May 10 13:50:16 2024 +0200
      
          Revert "reflect feedback more"
      
          This reverts commit dc123e71eff60aae74d5f325f113d515d0d71117.
      
      commit ca674829d28787349c2a9593a14e0f1d41f04ea4
      Author: sayakpaul <spsayakpaul@gmail.com>
      Date:   Fri May 10 13:50:05 2024 +0200
      
          Revert "fix"
      
          This reverts commit 37a1cb35b87acdc4cf7528b8b1ed6da27d244e52.
      
      commit fab2dd8576c099eb1a3464958cb206a664d28247
      Author: sayakpaul <spsayakpaul@gmail.com>
      Date:   Fri May 10 13:47:46 2024 +0200
      
          fix
      
      commit fbc6ae50fd6f2d36294d31e191761631b701d696
      Author: sayakpaul <spsayakpaul@gmail.com>
      Date:   Fri May 10 13:38:30 2024 +0200
      
          reflect feedback more
      
      commit 87245bb020b2d60a89afe318a951df0159404fc9
      Author: sayakpaul <spsayakpaul@gmail.com>
      Date:   Fri May 3 08:54:34 2024 +0530
      
          fixes
      
      commit 1057cc26390ee839251e7f8b3326c4207595fb23
      Author: sayakpaul <spsayakpaul@gmail.com>
      Date:   Fri May 3 07:49:03 2024 +0530
      
          don't explicit set attn_implementation in tests
      
      commit e33f75916fc8a99f516b1cf449dbbe9d3aabda81
      Author: sayakpaul <spsayakpaul@gmail.com>
      Date:   Fri May 3 07:43:54 2024 +0530
      
          explicitly override attn_implementation in the towers.
      
      commit 4cf41cb1bc885c39df7cb8f2a0694ebf23299235
      Author: sayakpaul <spsayakpaul@gmail.com>
      Date:   Fri May 3 07:38:42 2024 +0530
      
          import in one-line.
      
      commit f2cc447ae9e74ccfacb448140cdf88259d4afc8c
      Author: sayakpaul <spsayakpaul@gmail.com>
      Date:   Fri May 3 07:34:58 2024 +0530
      
          move sdpa mention to usage tips.
      
      commit 92884766c64dbb456926a3a84dd427be1349fa95
      Author: sayakpaul <spsayakpaul@gmail.com>
      Date:   Mon Apr 29 10:58:26 2024 +0530
      
          fix: memory allocation problem.
      
      commit d7ffbbfe12f7750b7d0a361420f35c13e0ea787d
      Author: sayakpaul <spsayakpaul@gmail.com>
      Date:   Mon Apr 29 09:56:59 2024 +0530
      
          fix-copies
      
      commit 8dfc3731cedd02e36acd3fe56bb2e6d61efd25d8
      Author: sayakpaul <spsayakpaul@gmail.com>
      Date:   Fri Apr 26 20:16:12 2024 +0530
      
          address arthur's comments.
      
      commit d2ed7b4ce4ff15ae9aa4d3d0500f1544e3dcd9e9
      Author: Sayak Paul <spsayakpaul@gmail.com>
      Date:   Fri Apr 26 20:08:15 2024 +0530
      
          Apply suggestions from code review
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      commit 46e04361f37ded5c522ff05e9f725b9f82dce40e
      Author: sayakpaul <spsayakpaul@gmail.com>
      Date:   Wed Apr 24 09:55:27 2024 +0530
      
          add to docs.
      
      commit 831629158ad40d34d8983f209afb2740ba041af2
      Author: sayakpaul <spsayakpaul@gmail.com>
      Date:   Wed Apr 24 09:33:10 2024 +0530
      
          styling.g
      
      commit d263a119c77314250f4b4c8469caf42559197f22
      Author: sayakpaul <spsayakpaul@gmail.com>
      Date:   Wed Apr 24 09:15:20 2024 +0530
      
          up
      
      commit d44f9d3d7633d4c241a737a1bc317f791f6aedb3
      Author: sayakpaul <spsayakpaul@gmail.com>
      Date:   Tue Apr 23 18:40:42 2024 +0530
      
          handle causal and attention mask
      
      commit 122f1d60153df6666b634a94e38d073f3f260926
      Author: sayakpaul <spsayakpaul@gmail.com>
      Date:   Tue Apr 23 15:18:21 2024 +0530
      
          test fixes.
      
      commit 4382d8cff6fa1dee5dbcf0d06b3e2841231e36f5
      Author: sayakpaul <spsayakpaul@gmail.com>
      Date:   Tue Apr 23 09:39:25 2024 +0530
      
          fix: scaling inside sdpa.
      
      commit 0f629989efc48b7315cf19405a81e02955efe7e5
      Author: Sayak Paul <spsayakpaul@gmail.com>
      Date:   Tue Apr 23 08:14:58 2024 +0530
      
          Update src/transformers/models/clip/modeling_clip.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      commit 14367316877dc27ea40f767ad1aee38bbc97e4ce
      Author: sayakpaul <spsayakpaul@gmail.com>
      Date:   Mon Apr 22 16:21:36 2024 +0530
      
          add: sdpa support to clip.
      
      * Remove fallback for empty attention mask (expensive operation)
      
      * Fix typing in copies
      
      * Add flash attention
      
      * Add flash attention tests
      
      * List CLIP in FA docs
      
      * Fix embeddings attributes and tf
      
      * [run-slow] clip
      
      * Update clip documentation
      
      * Remove commented code, skip compile dynamic for CLIPModel
      
      * Fix doc
      
      * Fix doc 2
      
      * Remove double transpose
      
      * Add torch version check for contiguous()
      
      * Add comment to test mixin
      
      * Fix copies
      
      * Add comment for mask
      
      * Update docs
      
      * [run-slow] clip
      1c37e8c1
  4. 17 Jul, 2024 1 commit
  5. 08 Jul, 2024 1 commit
    • Pavel Iakubovskii's avatar
      Add FA2 and `sdpa` support for SigLIP (#31499) · a177821b
      Pavel Iakubovskii authored
      * Rebase to main
      
      * Fix attention implementation autoset for tex and vision configs
      
      * Fixup
      
      * Minor fixes
      
      * Fix copies
      
      * Fix attention_mask for FA2
      
      * Add eqvivalence tests for siglip
      
      * Remove right padding test
      
      * Uncomment flaky
      
      * Fix import
      
      * Add to docs
      
      * Fix test message
      
      * Add sdpa
      
      * Add sdpa equivalence test
      
      * Add siglip sdpa to docs
      
      * Fix typing for attention output
      
      * Add sdpa tests
      
      * Fix signature of FA2
      
      * Autoset attn_implementation in config
      
      * Rename bsz -> batch_size
      
      * Move back autoset attn method
      
      * Mark as flaky
      
      * Correct attention mask padding
      
      * [run-slow] siglip
      
      * Add FA2 and sdpa docs
      
      * Style fix
      
      * Remove flaky for FA2 test
      
      * Change attention implementation set
      
      * Change attn_implementaiton propogation
      
      * Fix typos
      
      * Add modality to assert message
      
      * Add more sdpa backends in test
      
      * [run slow] siglip
      
      * Add math sdpa backend for all options
      
      * [run slow] siglip
      a177821b
  6. 27 Jun, 2024 1 commit
  7. 26 Jun, 2024 2 commits
    • Raushan Turganbay's avatar
      Add LLaVa NeXT Video (#31252) · e71f2863
      Raushan Turganbay authored
      
      
      * squash into single commit
      
      * run diff once more
      
      * docstring
      
      * tests
      
      * minor chnages and ready to go
      
      * Update src/transformers/models/llava_next_video/processing_llava_next_video.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update tests/models/vipllava/test_modeling_vipllava.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * [run-slow] llava-next-video
      
      * [run-slow] llava-next-video
      
      * [run-slow] llava_next_video
      
      * fix two tests
      
      * fix slow tests
      
      * remove logit checks due to numeric errors
      
      * run test once more
      
      * [run-slow] llava_next_video
      
      * final try to pass the test
      
      * [run-slow] llava_next_video
      
      * [run-slow] llava_next_video
      
      * [run-slow] llava_next_video
      
      * style
      
      * fix
      
      * style
      
      ---------
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      e71f2863
    • Anton Vlasjuk's avatar
      [`GPT-NeoX`] Add SDPA support (#31031) · b07770c5
      Anton Vlasjuk authored
      * starting support for sdpa in `gptneox` models
      
      * small comment on tests
      
      * fix dropout
      
      * documentation and style
      
      * clarify concrete paths for reference
      
      * generalise attn projections and rope application
      
      added head mask check to sdpa mask creation
      
      handle sdpa memory backend bug via own version flag
      
      * update docs and style
      
      * move dtype casting outside of general attn_projection_and_rope function
      
      fix flash_attn_2 stuff
      
      * more generic attn warning if output_attns or head_mask
      
      * simplify head mask check by moving head mask creation to a later point
      
      * remove copied llama artifact
      
      * remove padding_mask from attention function signature
      
      * removing unnecessary comments, only "save" attn implementation once
      
      * [run_slow] gpt_neox
      b07770c5
  8. 19 Jun, 2024 1 commit
    • Anton Vlasjuk's avatar
      [`GPT2`] Add SDPA support (#31172) · b275a410
      Anton Vlasjuk authored
      * `gpt2` sdpa support
      
      * fix (at least) one test, style, repo consistency
      
      * fix sdpa mask in forward --> fixes generation
      
      * test
      
      * test2
      
      * test3
      
      * test4
      
      * simplify shapes for attn mask creation and small comments
      
      * hub fail test
      
      * benchmarks
      
      * flash attn 2 mask should not be inverted on enc-dec setup
      
      * fix comment
      
      * apply some suggestion from code review
      
      - only save _attn_implentation once
      - remove unnecessary comment
      
      * change elif logic
      
      * [run-slow] gpt2
      
      * modify `test_gpt2_sample_max_time` to follow previous assertion patterns
      b275a410
  9. 30 May, 2024 1 commit
  10. 16 May, 2024 1 commit
    • hyenal's avatar
      add sdpa to ViT [follow up of #29325] (#30555) · 1c21f48a
      hyenal authored
      
      
      remove blank line (+1 squashed commit)
      Squashed commits:
      [24ccd2061] [run-slow]vit_msn,vision_encoder_decoder (+24 squashed commits)
      Squashed commits:
      [08bd27e7a] [run-slow]vit_msn,vision_encoder_decoder
      [ec96a8db3] [run-slow]vit_msn
      [ead817eca] fix vit msn multi gpu
      [d12cdc8fd] [run-slow]audio_spectrogram_transformer,deit,vision_encoder_decoder,vision_text_dual_encoder,vit,vit_hybrid,vit_mae,vit_msn,videomae,yolos
      [3fdbfa88f] doc
      [a3ff33e4a] finish implementation
      [e20b7b7fb] Update test_modeling_common.py
      [e290c5810] Update test_modeling_flax_common.py
      [d3af86f46] comment
      [ff7dd32d8] more comments
      [59b137889] suggestion
      [7e2ba6d67] attn_implementation as attribute of the class
      [fe66ab71f] minor
      [38642b568] Apply suggestions from code review
      
      Accept comments
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      [22cde7d52] Update tests/test_modeling_common.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      [48e137cc6] Update tests/test_modeling_common.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      [99f4c679f] Update tests/test_modeling_common.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      [96cf20a6d] Update src/transformers/models/vit_msn/modeling_vit_msn.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      [c59377d23] Update src/transformers/models/vit_mae/modeling_vit_mae.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      [b70a47259] Update tests/models/vision_text_dual_encoder/test_modeling_vision_text_dual_encoder.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      [00c84d216] [run-slow]audio_spectrogram_transformer,deit,vision_encoder_decoder,vision_text_dual_encoder,vit,vit_hybrid,vit_mae,vit_msn,videomae,yolos
      [61f00ebb0] all tests are passing locally
      [e9e0b82b7] vision encoder/decoder
      [4d5076b56] test-vision (+20 squashed commits)
      Squashed commits:
      [d1add8db9] yolo
      [9fde65716] fix flax
      [986566c28] minor
      [ca2f21d1f] vit
      [3333efd7a] easy models change
      [ebfc21402] [run-slow]audio_spectrogram_transformer,deit,vision_encoder_decoder,vision_text_dual_encoder,vit,vit_hybrid,vit_mae,vit_msn,videomae,yolos
      [b8b8603ed] [run-slow]vision_encoder_decoder,vision_text_dual_encoder,yolos
      [48ecc7e26] all tests are passing locally
      [bff7fc366] minor
      [62f88306f] fix yolo and text_encoder tests
      [121507555] [run-slow]audio_spectrogram_transformer,deit,vit,vit_hybrid,vit_mae,vit_msn,videomae
      [1064cae0a] [run-slow]vision_encoder_decoder,vision_text_dual_encoder,yolos
      [b7f52ff3a] [run-slow]audio_spectrogram_transformer,deit,vit,vit_hybrid,vit_mae,vit_msn,videomae
      [cffaa10dd] fix-copies
      [ef6c511c4] test vit hybrid
      [7d4ba8644] vit hybrid
      [66f919033] [run-slow]audio_spectrogram_transformer,deit,vit,vit_hybrid,vit_mae,vit_msn,videomae
      [1fcc0a031] fixes
      [cfde6eb21] fixup
      [e77df1ed3] all except yolo end encoder decoder (+17 squashed commits)
      Squashed commits:
      [602913e22] vit + vit_mae are working
      [547f6c4cc] RUN_SLOW=1 pytest tests/models/audio_spectrogram_transformer/ tests/models/deit/ tests/models/videomae/  passes
      [61a97dfa9] it s the complete opposite...
      [aefab37d4] fix more tests
      [71802a1b9] fix all torch tests
      [40b12eb58] encoder - decoder tests
      [941552b69] slow decorator where appropriate
      [14d055d80] has_attentions to yolo and msn
      [3381fa19f] add correct name
      [e261316a7] repo consistency
      [31c6d0c08] fixup
      [9d214276c] minor fix
      [11ed2e1b7] chore
      [eca6644c4] add sdpa to vit-based models
      [cffbf390b] make fix-copies result
      [6468319b0] fix style
      [d324cd02a] add sdpa for vit
      Co-authored-by: default avatarLiubov Yaronskaya <luba.yaronskaya@gmail.com>
      1c21f48a
  11. 15 May, 2024 1 commit
  12. 14 May, 2024 2 commits
  13. 13 May, 2024 1 commit
    • fxmarty's avatar
      CI: update to ROCm 6.0.2 and test MI300 (#30266) · 37bba2a3
      fxmarty authored
      
      
      * update to ROCm 6.0.2 and test MI300
      
      * add callers for mi300
      
      * update dockerfile
      
      * fix trainer tests
      
      * remove apex
      
      * style
      
      * Update tests/trainer/test_trainer_seq2seq.py
      
      * Update tests/trainer/test_trainer_seq2seq.py
      
      * Update tests/trainer/test_trainer_seq2seq.py
      
      * Update tests/trainer/test_trainer_seq2seq.py
      
      * update to torch 2.3
      
      * add workflow dispatch target
      
      * we may need branches: mi300-ci after all
      
      * nit
      
      * fix docker build
      
      * nit
      
      * add check runner
      
      * remove docker-gpu
      
      * fix issues
      
      * fix
      
      ---------
      Co-authored-by: default avatarYih-Dar <2521628+ydshieh@users.noreply.github.com>
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      37bba2a3
  14. 26 Apr, 2024 2 commits
    • amyeroberts's avatar
      Fix GroundingDINO, DPR after BERT SDPA update (#30506) · e7d52a10
      amyeroberts authored
      Fix GroundingDINO, DPR after BET SDPA update
      e7d52a10
    • JB (Don)'s avatar
      [`BERT`] Add support for sdpa (#28802) · dfa7b580
      JB (Don) authored
      * Adding SDPA support for BERT
      
      * Using the proper input name for testing model input in inference()
      
      * Adding documentation for SDPA in BERT model page
      
      * Use the stable link for the documentation
      
      * Adding a gate to only call .contiguous() for torch < 2.2.0
      
      * Additions and fixes to the documentation
      
      * Minor updates to documentation
      
      * Adding extra requirements needed for the contiguous() bug
      
      * Adding "Adapted from" in plcae of the "Copied from"
      
      * Add benchmark speedup tables to the documentation
      
      * Minor fixes to the documentation
      
      * Use ClapText as a replacemenet for Bert in the Copied-From
      
      * Some more fixes for the fix-copies references
      
      * Overriding the test_eager_matches_sdpa_generate in bert tests to not load with low_cpu_mem_usage
      
      [test all]
      
      * Undo changes to separate test
      
      * Refactored SDPA self attention code for KV projections
      
      * Change use_sdpa to attn_implementation
      
      * Fix test_sdpa_can_dispatch_on_flash by preparing input (required for MultipleChoice models)
      dfa7b580
  15. 24 Apr, 2024 1 commit
    • Gustavo de Rosa's avatar
      Phi-3 (#30423) · c9693db2
      Gustavo de Rosa authored
      * chore(root): Initial commit of Phi-3 files.
      
      * fix(root): Fixes Phi-3 missing on readme.
      
      * fix(root): Ensures files are consistent.
      
      * fix(phi3): Fixes unit tests.
      
      * fix(tests): Fixes style of phi-3 test file.
      
      * chore(tests): Adds integration tests for Phi-3.
      
      * fix(phi3): Removes additional flash-attention usage, .e.g, swiglu and rmsnorm.
      
      * fix(phi3): Fixes incorrect docstrings.
      
      * fix(phi3): Fixes docstring typos.
      
      * fix(phi3): Adds support for Su and Yarn embeddings.
      
      * fix(phi3): Improves according first batch of reviews.
      
      * fix(phi3): Uses up_states instead of y in Phi3MLP.
      
      * fix(phi3): Uses gemma rotary embedding to support torch.compile.
      
      * fix(phi3): Improves how rotary embedding classes are defined.
      
      * fix(phi3): Fixes inv_freq not being re-computed for extended RoPE.
      
      * fix(phi3): Adds last suggestions to modeling file.
      
      * fix(phi3): Splits inv_freq calculation in two lines.
      c9693db2
  16. 22 Apr, 2024 1 commit
  17. 18 Apr, 2024 3 commits
    • Abhi Venigalla's avatar
      Add DBRX Model (#29921) · 005b957f
      Abhi Venigalla authored
      
      
      * wip
      
      * fix __init__.py
      
      * add docs
      
      * Apply suggestions from code review
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * address comments 1
      
      * work on make fixup
      
      * pass configs down
      
      * add sdpa attention
      
      * remove DbrxBlock
      
      * add to configuration_auto
      
      * docstring now passes formatting test
      
      * fix style
      
      * update READMEs
      
      * add dbrx to modeling_auto
      
      * make fix-copies generated this
      
      * add DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP
      
      * config docstring passes formatting test
      
      * rename moe_loss_weight to router_aux_loss_coef
      
      * add to flash-attn documentation
      
      * fix model-path in tests
      
      * Explicitly make `"suli"` the default `ffn_act_fn`
      Co-authored-by: default avatarWing Lian <wing.lian@gmail.com>
      
      * default to using router_aux_loss_coef over ffn_config[moe_loss_weight]
      
      * fix _flash_attn_uses_top_left_mask and is_causal
      
      * fix tests path
      
      * don't use token type IDs
      
      * follow Llama and remove token_type_ids from test
      
      * init ConfigTester differently so tests pass
      
      * remove multiple choice test
      
      * remove question + answer test
      
      * remove sequence classification test
      
      * remove token classification test
      
      * copy Llama tests and remove token_type_ids from test inputs
      
      * do not test pruning or headmasking; style code
      
      * add _tied_weights_keys parameter to pass test
      
      * add type hints
      
      * fix type check
      
      * update config tester
      
      * remove masked_lm test
      
      * remove encoder tests
      
      * initialize DbrxModelTester with correct params
      
      * style
      
      * torch_dtype does not rely on torch
      
      * run make fixup, fix-copies
      
      * use https://huggingface.co/v2ray/dbrx-base-fixed/blob/main/modeling_dbrx.py
      
      
      
      * add copyright info
      
      * fix imports and DbrxRotaryEmbedding
      
      * update DbrxModel docstring
      
      * use copies
      
      * change model path in docstring
      
      * use config in DbrxFFN
      
      * fix flashattention2, sdpaattention
      
      * input config to DbrXAttention, DbrxNormAttentionNorm
      
      * more fixes
      
      * fix
      
      * fix again!
      
      * add informative comment
      
      * fix ruff?
      
      * remove print statement + style
      
      * change doc-test
      
      * fix doc-test
      
      * fix docstring
      
      * delete commented out text
      
      * make defaults match dbrx-instruct
      
      * replace `router_aux_loss_coef` with `moe_loss_weight`
      
      * is_decoder=True
      
      * remove is_decoder from configtester
      
      * implement sdpa properly
      
      * make is_decoder pass tests
      
      * start on the GenerationTesterMixin tests
      
      * add dbrx to sdpa documentation
      
      * skip weight typing test
      
      * style
      
      * initialize smaller model
      Co-authored-by: default avatarMatt <Rocketknight1@users.noreply.github.com>
      
      * Add DBRX to toctree
      
      * skip test_new_cache_format
      
      * make config defaults smaller again
      
      * add pad_token_id
      
      * remove pad_token_id from config
      
      * Remove all references to DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP
      
      * Update src/transformers/models/dbrx/__init__.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/models/dbrx/modeling_dbrx.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update docs/source/en/model_doc/dbrx.md
      Co-authored-by: default avatarMatt <Rocketknight1@users.noreply.github.com>
      
      * Update src/transformers/models/dbrx/configuration_dbrx.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update docs/source/en/model_doc/dbrx.md
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * fix typo
      
      * Apply suggestions from code review
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * update docs, fix configuration_auto.py
      
      * address pr comments
      
      * remove is_decoder flag
      
      * slice
      
      * fix requires grad
      
      * remove grad
      
      * disconnect differently
      
      * remove grad
      
      * enable grads
      
      * patch
      
      * detach expert
      
      * nissan al ghaib
      
      * Update modeling_dbrx.py
      
      * Update src/transformers/models/dbrx/modeling_dbrx.py
      Co-authored-by: default avatarMatt <Rocketknight1@users.noreply.github.com>
      
      * replace "Gemma" with "Dbrx"
      
      * remove # type: ignore
      
      * don't hardcode vocab_size
      
      * remove ToDo
      
      * Re-add removed idefics2 line
      
      * Update test to use tiny-random!
      
      * Remove TODO
      
      * Remove one more case of loading the entire dbrx-instruct in the tests
      
      * Update src/transformers/models/dbrx/modeling_dbrx.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * address some comments
      
      * small model
      
      * add dbrx to tokenization_auto
      
      * More docstrings with add_start_docstrings
      
      * Dbrx for now
      
      * add PipelineTesterMixin
      
      * Update src/transformers/models/dbrx/configuration_dbrx.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * remove flash-attn2 import error
      
      * fix docstring
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * add useage example
      
      * put on one line
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * fix ffn_act_fn
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * change "dbrx" to "DBRX" for display purposes.
      
      * fix __init__.py?
      
      * fix __init__.py
      
      * fix README
      
      * return the aux_loss
      
      * remove extra spaces
      
      * fix configuration_auto.py
      
      * fix format in tokenization_auto
      
      * remove new line
      
      * add more useage examples
      
      ---------
      Co-authored-by: default avatarAbhi Venigalla <abhi.venigalla@databricks.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      Co-authored-by: default avatarEitan Turok <eitan.turok@databricks.com>
      Co-authored-by: default avatarEitan Turok <150733043+eitanturok@users.noreply.github.com>
      Co-authored-by: default avatarWing Lian <wing.lian@gmail.com>
      Co-authored-by: default avatarEitan Turok <eitanturok@gmail.com>
      Co-authored-by: default avatarMatt <Rocketknight1@users.noreply.github.com>
      Co-authored-by: default avatarMatt <rocketknight1@gmail.com>
      Co-authored-by: default avatarYour Name <you@example.com>
      Co-authored-by: default avatarMihir Patel <mihir.v.patel7@gmail.com>
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      005b957f
    • tomeras91's avatar
      Add jamba (#29943) · 3f20877d
      tomeras91 authored
      * Add jamba arch
      
      * apply "make fix-copies" changes
      
      * fix link to model in JambaConfig docstring
      
      * Add n_ctx in modeling file because repo-consistency wants that
      
      * Add jamba to flash attention and sdpa documentation
      
      * mamba dt_proj quant fix now works for LoRA as well
      
      * override test_left_padding_compatibility and use a more permissive tolerance. left padding numerical difference are accentuated by mamba layers
      
      * add jamba to tokenization auto
      
      * fix comments of shape (PR #24 in the model page: https://huggingface.co/ai21labs/Jamba-v0.1/discussions/24)
      
      * simple PR fixes
      
      * remove unnecessary kwargs from JambaAttentionDecoderLayer and JambaMambaDecoderLayer
      
      * remove the LoRA hack for the mamba dt_proj bias. It was solved in huggingface/peft#1530 (https://github.com/huggingface/peft/pull/1530)
      
      * Add copied comment on JambaMLP (it's the same as MixtralMLP)
      
      * remove padding_mask warnings. It's not supported anymore
      
      * fix docstring. Float instead of int
      
      * A few more minor PR fixes
      
      * (1) lowercase names for mamba layernorms (2) remove _apply_inner_layernorms and do it directly in the forward pass
      
      * Return None attention weights from mamba layers. Append to all attentions only if not None.
      
      * remove some leftover jamba archive lists
      
      * Better separation between expert vs non-expert layers. non-expert layers return None as router_logits, and it is not concatenated to all_router_logits returned from JambaModel
      
      * no need to take router_logits at config.expert_layer_offset anymore. result.router_logits now holds results only for expert layers
      
      * Add Jamba paper on READMEs
      
      * (1) rename n_ctx -> max_position_embeddings (2) don't use it in the modeling file since it's not needed (set it as an exception to check_config_attributes)
      
      * Add copied from comment
      
      * remove the code path for apply_inner_layernorms=False. Jamba always has the inner mamba layernorms
      
      * clearer docstring for _convert_to_standard_cache
      
      * style fixes
      
      * Change calc_logits_for_entire_prompt (bool) to num_logits_to_keep (int). Adapt assisted decoding code tp use it. Also small change in low memory beam search decoding path to support this new int value in model_inputs
      
      * rename test so it still overrides what its meant to override
      
      * draft
      
      * oups
      
      * nit
      
      * remove more complexe logic
      
      * fix names used in config
      
      * fix fix fix
      
      * style
      
      * fix some more failing tests
      
      * generate did not init the cache 馃檭
      
      
      
      * more small nits
      
      * typo
      
      * config.mamba_expand * config.hidden_size for the intermediate size of the mamba shapes
      
      * fix init of pkv with torch.tensor()
      
      * empty tensor
      
      * fix some init issues
      
      * stupid changes required by generate because it does not even support it's own DynamicCache class
      
      * more fixes
      
      * fix general assisted gen cache_position bug
      
      * tests passing
      
      * Add offsets and periods as SPECIAL_CASES_TO_ALLOW in check_config_attributes.py
      
      * fix reorder_cache to reorder mamba states and override some more functions in HybridMambaAttentionDynamicCache
      
      * no need to override test_past_key_values_format() and _check_past_key_values_for_generate() in tests anymore
      
      * fix docstrings and typehints for past_key_values
      
      * style fixes
      
      * fix docs
      
      * change typehint due to copy from Mixtral
      
      * forgot import
      
      * import order
      
      * Add configuration_jamba and modeling_jamba to not_doctested because the model is too big to download (in docstring of JambaForCausalLM.forward)
      
      * Add integration test with tiny tandom Jamba model on hub
      
      * fix flash attention cache shapes
      
      * bring back forgotten hidden states
      
      * rename HybridMambaAttentionDynamicCache.seqlen_offset to has_previous_state (and make bool) and bugfix - it should be set to True after a finished forward pass of the entire model
      
      * align integration test after modeling fixes
      
      * bugfix - mamba can use precomputed states only of forward pass is on a single token
      
      * bugfix - mamba can use precomputed states only if they match the batch size
      
      * typo
      
      * remove making _prepare_4d_causal_attention_mask a leaf function
      
      * stop using past_seq_len.get_seq_length(). Use cache positions instead. Adjust test (test_decoder_model_past_with_large_inputs) accordingly
      
      ---------
      Co-authored-by: default avatarArthur Zucker <arthur.zucker@gmail.com>
      Co-authored-by: default avatarJoao Gante <joao@huggingface.co>
      3f20877d
    • Alexander Visheratin's avatar
      Add Flash Attention 2 to M2M100 model (#30256) · b65df514
      Alexander Visheratin authored
      
      
      * Added flash attention 2.
      
      * Fixes.
      
      * Fix inheritance.
      
      * Fixed init.
      
      * Remove stuff.
      
      * Added documentation.
      
      * Add FA2 to M2M100 documentation.
      
      * Add test.
      
      * Fixed documentation.
      
      * Update src/transformers/models/m2m_100/modeling_m2m_100.py
      Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
      
      * Update docs/source/en/model_doc/nllb.md
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Fixed variable name.
      
      ---------
      Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      b65df514
  18. 17 Apr, 2024 1 commit
    • Shane A's avatar
      Add OLMo model family (#29890) · e4ea19b9
      Shane A authored
      * Add OLMo using add-new-model-like with Llama
      
      * Fix incorrect tokenizer for OLMo
      
      * Copy-paste relevant OLMo methods and their imports
      
      * Add OLMo config
      
      * Modify OLMo config to follow HF conventions
      
      * Remove unneeded Llama code from OLMo model
      
      * Add ability for OLMo model to output attentions
      
      * Add OLMoPreTrainedModel and OLMoModel
      
      * Add OLMoForCausalLM
      
      * Minor fixes to OLMo model for style and missing functions
      
      * Implement OLMo tokenizer
      
      * Implement OLMo to HF conversion script
      
      * Add tests for OLMo model
      
      * Add tests for OLMo fast tokenizer
      
      * Add auto-generated dummy objects
      
      * Remove unimplemented OLMo classes from auto and init classes and re-format
      
      * Add README and associated auto-generated files
      
      * Use OLMo names for common properties
      
      * Run make fixup
      
      * Remove `|` from OLMo typing
      
      * Remove unneeded tokenization_olmo.py
      
      * Revert model, config and converter to add-new-model-like Llama
      
      * Move logic for adding bos/eos token into GPTNeoxTokenizerFast
      
      * Change OLMoConfig defaults to match OLMo-7B
      
      * Use GPTNeoXToknizerFast in OLMo tokenizer tests
      
      * Modify auto-generated OLMoModelTests to work for OLMo
      
      * Add non-parametric layer norm OLMoLayerNorm
      
      * Update weight conversion script for OLMo
      
      * Fix __init__ and auto structure for OLMo
      
      * Fix errors from make fixup
      
      * Remove OLMoTokenizerFast from documentation
      
      * Add missing 'Copied from' for OLMoModel._update_causal_mask
      
      * Run make fix-copies
      
      * Rearrange string replacements in OLMoForCausalLM Copied from
      
      * Move OLMo and Llama CausalLM.forward example into global constants
      
      * Fix OLMO_GENERATION_EXAMPLE doc string typo
      
      * Add option for qkv clipping to OLMo
      
      * Rearrange OLMoConfig kwargs in convert_olmo_weights_to_hf
      
      * Add clip_qkv to OLMoConfig in convert_olmo_weights_to_hf
      
      * Fix OLMo tokenization bug using conversion script
      
      * Keep model in full precision after conversion
      
      * Do not add eos token automatically
      
      * Update references to OLMo model in HF Hub
      
      * Do not add eos token during encoding by default
      
      * Fix Llama generation example
      
      * Run make fixup
      
      * OLMo 7B integration test fix
      
      * Remove unneeded special case for OLMoConfig
      
      * OLMo 7B Twin 2T integration test fix
      
      * Fix test_model_7b_greedy_generation
      
      * Remove test_compile_static_cache
      
      * Fix OLMo and Llama generation example
      
      * Run make fixup
      
      * Revert "OLMo 7B integration test fix"
      
      This reverts commit 4df56a4b150681bfa559846f40e9b7b7f97d7908.
      
      * Revert "OLMo 7B Twin 2T integration test fix"
      
      This reverts commit 9ff65a4a294ace89ab047b793ca55e623a9ceefc.
      
      * Ungate 7B integration tests and fix greedy generation test
      
      * Add retries for flaky test_eager_matches_sdpa_generate
      
      * Fix output of doc example for OLMoForCausalLM.forward
      
      * Downsize OLMo doc test for OLMoForCausalLM.forward to 1B model
      
      * Try fix incorrect characters in OLMoForCausalLM.forward doct test
      
      * Try fix incorrect characters in OLMoForCausalLM.forward doc test using end quotes
      
      * Remove pretraining_tp from OLMo config and model
      
      * Add missing 'Copied from' instances
      
      * Remove unneeded causal_mask from OLMoModel
      
      * Revert Llama changes
      
      * Ignore copy for OLMoForCausalLM.forward
      
      * Change 'OLMo' to 'Olmo' in classes
      
      * Move minimal OLMo tokenization tests to model tests
      
      * Add missed 'Copied from' for repeat_kv
      e4ea19b9
  19. 15 Apr, 2024 1 commit
    • amyeroberts's avatar
      Add Idefics2 (#30253) · 6b78360e
      amyeroberts authored
      
      
      * Initial add model additions
      
      * Test
      
      * All weights loading
      
      * Can perform full forward pass
      
      * Local and remote the same
      
      * Matching local and remote
      
      * Fixup
      
      * Idefics2Model importable; fixup docstrings
      
      * Don't skip by default
      
      * Remove deprecated use_resampler arg
      
      * Remove self.config
      
      * DecoupledLinear takes config
      
      * Tidy up
      
      * Enable eager attention and tidy up
      
      * Most tests passing
      
      * Update for batch of processed images
      
      * Add image processor
      
      * Update doc pages
      
      * Update conversion script
      
      * Remove erroneous breakpoint
      
      * Remove accidendtal spelling change
      
      * Update to reflect changes on hub - make generate work
      
      * Fix up
      
      * Image processor tests
      
      * Update tests
      
      * Add a processor
      
      * Add a processor
      
      * Update convert script
      
      * Update modeling file - remove fixmes
      
      * Bug fix
      
      * Add processing test
      
      * Use processor
      
      * Fix up
      
      * Update src/transformers/models/idefics2/modeling_idefics2.py
      Co-authored-by: default avatarVictor SANH <victorsanh@gmail.com>
      
      * Update src/transformers/models/idefics2/modeling_idefics2.py
      Co-authored-by: default avatarVictor SANH <victorsanh@gmail.com>
      
      * Fix test
      
      * Update config - PR comments and defaults align with checkpoint
      
      * Reviewer comments
      
      * Add copied froms for flahs attention
      
      * Update src/transformers/models/idefics2/modeling_idefics2.py
      Co-authored-by: default avatarVictor SANH <victorsanh@gmail.com>
      
      * Apply suggestions from code review
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Remove qk_layer_norm and freeze_layers functionality
      
      * Fix
      
      * Remove freeze_layer options from config
      
      * Sync with upstream main
      
      * Fix attention shapes siglip
      
      * Remove Llava-next refs - TO REBASE
      
      * Use AutoModel for text model
      
      * Add comment to explain vision embeddings
      
      * Fix issue with tie_word_embeddings
      
      * Address review comments
      
      * Fix and fix up
      
      * Chat templates for idefics
      
      * Fix copies
      
      * Fix
      
      * Add layer norms to FA2
      
      * Fix tests
      
      * Apply suggestions from code review
      Co-authored-by: default avatarVictor SANH <victorsanh@gmail.com>
      
      * Fix
      
      * Review comments
      
      * Update src/transformers/models/idefics2/modeling_idefics2.py
      Co-authored-by: default avatarVictor SANH <victorsanh@gmail.com>
      
      * Update inputs merger
      
      * Merge weights in correct order
      
      * Update convert script
      
      * Update src/transformers/models/idefics2/processing_idefics2.py
      Co-authored-by: default avatarVictor SANH <victorsanh@gmail.com>
      
      * Update template
      
      * Model code examples (fix idefics too)
      
      * More review comments
      
      * Tidy up
      
      * Update processing
      
      * Fix attention mask preparation
      
      * Update inputs_merger inputs
      
      * Vectorize inputs_merger
      
      * Update src/transformers/models/idefics2/__init__.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/models/idefics2/modeling_idefics2.py
      
      * Review comments
      
      * saying bye to the `qk_layer_norms`
      
      * Simplify
      
      * Update latents
      
      * Remove erroneuous readme changes
      
      * Return images when applying chat template
      
      * Fix bug - prompt images are for a single sample
      
      * Update src/transformers/models/idefics2/modeling_idefics2.py
      
      * image splitting
      
      * fix test
      
      * some more comment
      
      * some comment
      
      * Apply suggestions from code review
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update src/transformers/models/idefics2/image_processing_idefics2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update processor
      
      * Update model tests
      
      * Update src/transformers/models/idefics2/processing_idefics2.py
      Co-authored-by: default avatarVictor SANH <victorsanh@gmail.com>
      
      * Update src/transformers/models/idefics2/processing_idefics2.py
      Co-authored-by: default avatarVictor SANH <victorsanh@gmail.com>
      
      * Don't add BOS in template
      
      * Update src/transformers/models/idefics2/processing_idefics2.py
      Co-authored-by: default avatarVictor SANH <victorsanh@gmail.com>
      
      * Remove index in examples
      
      * Update tests to reflect #13
      
      * Update src/transformers/models/idefics2/processing_idefics2.py
      Co-authored-by: default avatarVictor SANH <victorsanh@gmail.com>
      
      * PR comment - consistent typing
      
      * Update readme and model doc
      
      * Update docs
      
      * Update checkpoint references
      
      * Update examples
      
      * Fix and update tests
      
      * Small addition
      
      * Update tests - remove copied from as no ignore placement copy could be found
      
      * Update example
      
      * small fixes
      
      * Update docs/source/en/model_doc/idefics2.md
      Co-authored-by: default avatarVictor SANH <victorsanh@gmail.com>
      
      * Update docs/source/en/model_doc/idefics2.md
      Co-authored-by: default avatarVictor SANH <victorsanh@gmail.com>
      
      * Update README.md
      Co-authored-by: default avatarVictor SANH <victorsanh@gmail.com>
      
      * Connector model as bridge
      
      * Fix up
      
      * Fix up
      
      * Don't pass model inputs for generation kwargs update
      
      * IDEFICS-2 -> Idefics2
      
      * Remove config archive name
      
      * IDEFICS-2 -> Idefics2
      
      * Add back llava-next
      
      * Update readmes
      
      * Add requirements for processor tester
      
      * Use custom convert_to_rgb to avoid possible BC
      
      * Fix doc example
      
      * Fix doc example
      
      * Skip model doc tests - as model to large
      
      * More doc example - account for image splitting
      
      * Update src/transformers/image_transforms.py
      
      * Fix config doctest
      
      ---------
      Co-authored-by: default avatarPablo Montalvo <39954772+molbap@users.noreply.github.com>
      Co-authored-by: default avatarArthurZucker <arthur.zucker@gmail.com>
      Co-authored-by: default avatarVictor SANH <victorsanh@gmail.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      6b78360e
  20. 02 Apr, 2024 1 commit
    • Yoach Lacombe's avatar
      Add Flash Attention 2 support to Musicgen and Musicgen Melody (#29939) · 0d04b1e2
      Yoach Lacombe authored
      * add FA2 to o.g Musicgen
      
      * make style
      
      * add FA2 support to Musicgen Melody
      
      * add generation FA2 tests to o.g Musicgen
      
      * make style and fix copies
      
      * add Musicgen to FA2 docs + deprecate list
      
      * add sdpa supports to Musicgen's
      
      * make style and fix copies
      
      * refactor attention implementation arguments
      
      * add Copied from to sdpa tests
      
      * add copied form in sdpa tests melody
      
      * add copied for FA2 generation tests
      
      * add FA2 inference copied from
      
      * make style
      0d04b1e2
  21. 28 Mar, 2024 1 commit
  22. 27 Mar, 2024 1 commit
    • Bo Zheng's avatar
      Add Qwen2MoE (#29377) · 1c39974a
      Bo Zheng authored
      
      
      * add support for qwen2 MoE models
      
      * update docs
      
      * add support for qwen2 MoE models
      
      * update docs
      
      * update model name & test
      
      * update readme
      
      * update class names & readme & model_doc of Qwen2MoE.
      
      * update architecture name
      
      * fix qwen2_moe tests
      
      * use Qwen2Tokenizer instead of Qwen2MoeTokenizer
      
      * update modeling_qwen2_moe.py
      
      * fix model architecture
      
      * fix qwen2_moe tests
      
      * use Qwen2Tokenizer instead of Qwen2MoeTokenizer
      
      * update modeling_qwen2_moe.py
      
      * fix model architecture
      
      * fix style
      
      * fix test when there are sparse and non sparse layers
      
      * fixup
      
      * Update README.md
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * fixup
      
      * fixup
      
      * add archive back
      
      * add support for qwen2 MoE models
      
      * update docs
      
      * update model name & test
      
      * update readme
      
      * update class names & readme & model_doc of Qwen2MoE.
      
      * update architecture name
      
      * fix qwen2_moe tests
      
      * use Qwen2Tokenizer instead of Qwen2MoeTokenizer
      
      * update modeling_qwen2_moe.py
      
      * fix model architecture
      
      * fixup
      
      * fix qwen2_moe tests
      
      * use Qwen2Tokenizer instead of Qwen2MoeTokenizer
      
      * fix style
      
      * fix test when there are sparse and non sparse layers
      
      * fixup
      
      * add archive back
      
      * fix integration test
      
      * fixup
      
      ---------
      Co-authored-by: default avatarbozheng-hit <dsoul0621@gmail.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      1c39974a
  23. 20 Mar, 2024 1 commit
    • NielsRogge's avatar
      Add LLaVa-1.6, bis (#29586) · d91fd7f9
      NielsRogge authored
      
      
      * First draft
      
      * Fix tests, add docs
      
      * Improve docstrings
      
      * Fix test
      
      * Address comments
      
      * Address comments
      
      * Remove vocab_size attribute
      
      * Remove batch_size
      
      * Address comment
      
      * Add image processor tests
      
      * Support fx
      
      * Update docstring
      
      * Add support for 34b
      
      * Convert 34b model
      
      * Add integration tests
      
      * Update checkpoints
      
      * Convert vicuna-13b, remove doc tests
      
      * Remove script
      
      * Remove file
      
      * Address comments
      
      * Improve docstrings
      
      * Deprecate vocab_size
      
      * Remove aspect_ratio_setting
      
      * Address comments
      
      * Update READMEs
      
      * Add tips about chat templates
      
      * Fix tests
      
      * Deprecate vocab_size safely
      
      * Update tests
      
      ---------
      Co-authored-by: default avatarAmy Roberts <22614925+amyeroberts@users.noreply.github.com>
      d91fd7f9
  24. 15 Mar, 2024 1 commit
    • Saurabh Dash's avatar
      Cohere Model Release (#29622) · 0e4a1c34
      Saurabh Dash authored
      
      
      * Cohere Model Release (#1)
      
      Cohere Model Release
      
      * Remove unnecessary files and code (#2)
      
      Some cleanup
      
      * Delete cohere-model directory (#3)
      
      * Make Fix (#5)
      
      * Pr fixes (#6)
      
      * fixes for pr
      
      * pr fixes for the format
      
      * pr fixes for the format
      
      * src/transformers/models/auto/tokenization_auto.py
      
      * Tokenizer test (#8)
      
      * tokenizer test
      
      * format fix
      
      * Adding Docs and other minor changes (#7)
      
      * Add modeling tests (#9)
      
      * Smol Fix (#11)
      
      * tokenization tests are fixed
      
      * format fixes
      
      * fix pr doc tests
      
      * fix pr doc tests
      
      * fix pr doc tests
      
      * fix pr style check
      
      * small changes in cohere.md
      
      * FIX: Address final comments for transformers integration (#13)
      
      * fix modeling final nits and add proper test file
      
      * for now leave empty tests
      
      * add integration test
      
      * push new test
      
      * fix modeling cohere (#14)
      
      * Update chat templates to use the new API (#15)
      
      ---------
      Co-authored-by: default avatarahmetustun <ahmetustun89@gmail.com>
      Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
      Co-authored-by: default avatarMatt <Rocketknight1@users.noreply.github.com>
      0e4a1c34
  25. 13 Mar, 2024 1 commit
  26. 28 Feb, 2024 1 commit
  27. 22 Feb, 2024 1 commit
  28. 21 Feb, 2024 2 commits
  29. 20 Feb, 2024 1 commit
  30. 16 Feb, 2024 1 commit
  31. 14 Feb, 2024 1 commit
    • Jonathan Tow's avatar
      Add `StableLM` (#28810) · de6029a0
      Jonathan Tow authored
      * Add `StableLM`
      
      * fix(model): re-create from `huggingface-cli add-new-model-like persimmon`
      
      * fix: re-add changes to address comments
      
      * fix(readme): add links to paper
      
      * fix(tokenization_auto): remove `GPTNeoXTokenizerFastFast` ref
      
      * fix(tests): re-add `@slow` decorator to integration tests
      
      * fix(tests): import slow...
      
      * fix(readme_hd): remove whitespace edit
      
      * fix(tokenizer): auto tokenizer tuple
      
      * skip doctests for `modeling_stablelm`
      de6029a0
  32. 17 Jan, 2024 1 commit
    • Junyang Lin's avatar
      Add qwen2 (#28436) · d6ffe74d
      Junyang Lin authored
      
      
      * add config, modeling, and tokenization
      
      * add auto and init
      
      * update readme
      
      * update readme
      
      * update team name
      
      * fixup
      
      * fixup
      
      * update config
      
      * update code style
      
      * update for fixup
      
      * update for fixup
      
      * update for fixup
      
      * update for testing
      
      * update for testing
      
      * fix bug for config and tokenization
      
      * fix bug for bos token
      
      * not doctest
      
      * debug tokenizer
      
      * not doctest
      
      * debug tokenization
      
      * debug init for tokenizer
      
      * fix style
      
      * update init
      
      * delete if in token auto
      
      * add tokenizer doc
      
      * add tokenizer in init
      
      * Update dummy_tokenizers_objects.py
      
      * update
      
      * update
      
      * debug
      
      * Update tokenization_qwen2.py
      
      * debug
      
      * Update convert_slow_tokenizer.py
      
      * add copies
      
      * add copied from and make style
      
      * update files map
      
      * update test
      
      * fix style
      
      * fix merge reading and update tests
      
      * fix tests
      
      * fix tests
      
      * fix style
      
      * debug a variable in readme
      
      * Update src/transformers/models/qwen2/configuration_qwen2.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * update test and copied from
      
      * fix style
      
      * update qwen2 tokenization  and tests
      
      * Update tokenization_qwen2.py
      
      * delete the copied from after property
      
      * fix style
      
      * update tests
      
      * update tests
      
      * add copied from
      
      * fix bugs
      
      * update doc
      
      * add warning for sliding window attention
      
      * update qwen2 tokenization
      
      * fix style
      
      * Update src/transformers/models/qwen2/modeling_qwen2.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * fix tokenizer fast
      
      ---------
      Co-authored-by: default avatarRen Xuancheng <jklj077@users.noreply.github.com>
      Co-authored-by: default avatarrenxuancheng.rxc <renxuancheng.rxc@alibaba-inc.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      d6ffe74d
  33. 22 Dec, 2023 1 commit
  34. 18 Dec, 2023 1 commit