- 08 Aug, 2024 1 commit
-
-
Wonseok Lee (Jack) authored
* Change `_supports_sdpa` to True * add phi3 to sdpa support list
-
- 06 Aug, 2024 1 commit
-
-
Ao Tang authored
* Add nemotron support * fix inference * add unit test * add layernorm1p as a class to avoid meta device mismatch * test fixed * Add copied_from statements * remove pretraining_tp args * remove nemotronlayernorm * force LN computation done in FP32 * remove nemotrontokenizer and use llamatokenizer * license update * add option for kv_channels for minitron8b * remove assert * o_proj fixed * o_proj reshape * add gated_proj option * typo * remove todos * fix broken test after merging latest main * remove nezha/nat after meging main * chnage default config to 15b model * add nemo conversion script * rename conversion script * remove gate_proj option * pr comment resolved * fix unit test * rename kv_channels to head_dim * resolve PR issue * add nemotron md * fix broken tests * refactor rope for nemotron * test fix * remove linearscaling * whitespace and import * fix some copied-from * code style fix * reformatted * add position_embedding to nemotronattention * rope refactor to only use config, copied-from fix * format * Run make fix-copies * nemotron md with autodoc * doc fix * fix order * pass check_config_docstrings.py * fix config_attributes * remove all llama BC related code * Use PreTrainedTokenizerFast * ruff check examples * conversion script update * add nemotron to toctree
-
- 18 Jul, 2024 1 commit
-
-
Pavel Iakubovskii authored
* Squashed commit of the following: commit 102842cd477219b9f9bcb23a0bca3a8b92bd732f Author: Pavel Iakubovskii <qubvel@gmail.com> Date: Fri Jul 12 18:23:52 2024 +0000 Add model-specific sdpa tests commit 60e4c88581abf89ec098da84ed8e92aa904c997d Author: Pavel Iakubovskii <qubvel@gmail.com> Date: Fri Jul 12 18:20:53 2024 +0000 Add fallback to eager (expensive operation) commit c29033d30e7ffde4327e8a15cbbc6bee37546f80 Author: Pavel Iakubovskii <qubvel@gmail.com> Date: Thu Jul 11 17:09:55 2024 +0000 Fix attn_implementation propagation commit 783aed05f0f38cb2f99e758f81db6838ac55b9f8 Author: sayakpaul <spsayakpaul@gmail.com> Date: Sat May 25 09:05:27 2024 +0530 style commit e77e703ca75d00447cda277eca6b886cd32bddc0 Author: sayakpaul <spsayakpaul@gmail.com> Date: Sat May 25 09:04:57 2024 +0530 add comment to explain why I had to touch forbidden codebase. commit ab9d8849758e7773a31778ccba71588d18552623 Author: sayakpaul <spsayakpaul@gmail.com> Date: Sat May 25 09:03:02 2024 +0530 fix: flax attribute access. commit c570fc0abf9d1bd58c291aae3c7e384f995996d2 Author: sayakpaul <spsayakpaul@gmail.com> Date: Sat May 25 08:23:54 2024 +0530 fix tensorflow attribute name. commit 32c812871cfdb268d8a6e3e2c61c5c925c8ed47e Author: sayakpaul <spsayakpaul@gmail.com> Date: Sat May 25 07:57:10 2024 +0530 fix attribute access. commit 4f41a0138b6c417aed9c9332278f8bcd979cb7c2 Author: sayakpaul <spsayakpaul@gmail.com> Date: Sat May 25 07:44:02 2024 +0530 _from_config. commit 35aed64ff602422adcf41d7f677a0a24bd9eccae Author: sayakpaul <spsayakpaul@gmail.com> Date: Fri May 24 18:46:52 2024 +0530 propagation of attn_implementation. commit 4c25c19845438b1dc1d35a5adf9436151c8c5940 Author: sayakpaul <spsayakpaul@gmail.com> Date: Fri May 24 09:24:36 2024 +0530 style again commit 5f7dc5c5015c0f8116408f737e8c318d1802c80c Author: sayakpaul <spsayakpaul@gmail.com> Date: Fri May 24 09:19:05 2024 +0530 use from_config. commit b70c409956d0359fa6ae5372275d2a20ba7e3389 Author: sayakpaul <spsayakpaul@gmail.com> Date: Fri May 24 09:13:43 2024 +0530 quality commit a7b63beff53d0fc754c6564e2a7b51731ddee49d Author: sayakpaul <spsayakpaul@gmail.com> Date: Fri May 10 14:35:10 2024 +0200 add benchmark numbers commit 455b0eaea50862b8458c8f422b60fe60ae40fdcb Author: sayakpaul <spsayakpaul@gmail.com> Date: Fri May 10 13:50:16 2024 +0200 Revert "reflect feedback more" This reverts commit dc123e71eff60aae74d5f325f113d515d0d71117. commit ca674829d28787349c2a9593a14e0f1d41f04ea4 Author: sayakpaul <spsayakpaul@gmail.com> Date: Fri May 10 13:50:05 2024 +0200 Revert "fix" This reverts commit 37a1cb35b87acdc4cf7528b8b1ed6da27d244e52. commit fab2dd8576c099eb1a3464958cb206a664d28247 Author: sayakpaul <spsayakpaul@gmail.com> Date: Fri May 10 13:47:46 2024 +0200 fix commit fbc6ae50fd6f2d36294d31e191761631b701d696 Author: sayakpaul <spsayakpaul@gmail.com> Date: Fri May 10 13:38:30 2024 +0200 reflect feedback more commit 87245bb020b2d60a89afe318a951df0159404fc9 Author: sayakpaul <spsayakpaul@gmail.com> Date: Fri May 3 08:54:34 2024 +0530 fixes commit 1057cc26390ee839251e7f8b3326c4207595fb23 Author: sayakpaul <spsayakpaul@gmail.com> Date: Fri May 3 07:49:03 2024 +0530 don't explicit set attn_implementation in tests commit e33f75916fc8a99f516b1cf449dbbe9d3aabda81 Author: sayakpaul <spsayakpaul@gmail.com> Date: Fri May 3 07:43:54 2024 +0530 explicitly override attn_implementation in the towers. commit 4cf41cb1bc885c39df7cb8f2a0694ebf23299235 Author: sayakpaul <spsayakpaul@gmail.com> Date: Fri May 3 07:38:42 2024 +0530 import in one-line. commit f2cc447ae9e74ccfacb448140cdf88259d4afc8c Author: sayakpaul <spsayakpaul@gmail.com> Date: Fri May 3 07:34:58 2024 +0530 move sdpa mention to usage tips. commit 92884766c64dbb456926a3a84dd427be1349fa95 Author: sayakpaul <spsayakpaul@gmail.com> Date: Mon Apr 29 10:58:26 2024 +0530 fix: memory allocation problem. commit d7ffbbfe12f7750b7d0a361420f35c13e0ea787d Author: sayakpaul <spsayakpaul@gmail.com> Date: Mon Apr 29 09:56:59 2024 +0530 fix-copies commit 8dfc3731cedd02e36acd3fe56bb2e6d61efd25d8 Author: sayakpaul <spsayakpaul@gmail.com> Date: Fri Apr 26 20:16:12 2024 +0530 address arthur's comments. commit d2ed7b4ce4ff15ae9aa4d3d0500f1544e3dcd9e9 Author: Sayak Paul <spsayakpaul@gmail.com> Date: Fri Apr 26 20:08:15 2024 +0530 Apply suggestions from code review Co-authored-by:Arthur <48595927+ArthurZucker@users.noreply.github.com> commit 46e04361f37ded5c522ff05e9f725b9f82dce40e Author: sayakpaul <spsayakpaul@gmail.com> Date: Wed Apr 24 09:55:27 2024 +0530 add to docs. commit 831629158ad40d34d8983f209afb2740ba041af2 Author: sayakpaul <spsayakpaul@gmail.com> Date: Wed Apr 24 09:33:10 2024 +0530 styling.g commit d263a119c77314250f4b4c8469caf42559197f22 Author: sayakpaul <spsayakpaul@gmail.com> Date: Wed Apr 24 09:15:20 2024 +0530 up commit d44f9d3d7633d4c241a737a1bc317f791f6aedb3 Author: sayakpaul <spsayakpaul@gmail.com> Date: Tue Apr 23 18:40:42 2024 +0530 handle causal and attention mask commit 122f1d60153df6666b634a94e38d073f3f260926 Author: sayakpaul <spsayakpaul@gmail.com> Date: Tue Apr 23 15:18:21 2024 +0530 test fixes. commit 4382d8cff6fa1dee5dbcf0d06b3e2841231e36f5 Author: sayakpaul <spsayakpaul@gmail.com> Date: Tue Apr 23 09:39:25 2024 +0530 fix: scaling inside sdpa. commit 0f629989efc48b7315cf19405a81e02955efe7e5 Author: Sayak Paul <spsayakpaul@gmail.com> Date: Tue Apr 23 08:14:58 2024 +0530 Update src/transformers/models/clip/modeling_clip.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> commit 14367316877dc27ea40f767ad1aee38bbc97e4ce Author: sayakpaul <spsayakpaul@gmail.com> Date: Mon Apr 22 16:21:36 2024 +0530 add: sdpa support to clip. * Remove fallback for empty attention mask (expensive operation) * Fix typing in copies * Add flash attention * Add flash attention tests * List CLIP in FA docs * Fix embeddings attributes and tf * [run-slow] clip * Update clip documentation * Remove commented code, skip compile dynamic for CLIPModel * Fix doc * Fix doc 2 * Remove double transpose * Add torch version check for contiguous() * Add comment to test mixin * Fix copies * Add comment for mask * Update docs * [run-slow] clip
-
- 17 Jul, 2024 1 commit
-
-
Raushan Turganbay authored
* Chameleon model integration Co-authored-by:
Jacob Kahn <jacobkahn1@gmail.com> Co-authored-by:
Leonid Shamis <leonid.shamis@gmail.com> * fix 7B, again. mask away image tokens * Apply suggestions from code review Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * remove pretrained_config_map * make fixup passing up to utils/check_config_docstrings.py; vqgan moved to the modeling file * remove tokenizer (use llama's); remove codechameleon tests * a few copied from statements and minor changes * copied from in ChameleonModel * some copies in ChameleonForCausalLM * a few more copies * VQModel moved to ChameleonModel (as opposed to being in the processor) * ChameleonProcessor ready * Fix chameleon weights convert * update conversion script * clean-up processing * update modeling a bit * update * update (throws error...) * correct conversion ready * fix tests * fix docs * docs * ve swin norm * fix device for vocab map * add normalization * update * update script with rope rotations * final fix on model conversion * add slow tests * more info in docs * fix repo consistency tests * fix repo tests * fix-copies * hope this will make CI happy * fix for 30b model * Update docs/source/en/index.md Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/chameleon.md Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/chameleon/modeling_chameleon.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/chameleon.md Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/chameleon.md Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/chameleon.md Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/chameleon.md Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/auto/configuration_auto.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/chameleon/image_processing_chameleon.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/chameleon/image_processing_chameleon.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/chameleon/image_processing_chameleon.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/chameleon/image_processing_chameleon.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/chameleon/modeling_chameleon.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/chameleon/processing_chameleon.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/chameleon/processing_chameleon.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/chameleon/test_modeling_chameleon.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/chameleon/test_modeling_chameleon.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/chameleon/test_modeling_chameleon.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * address comments * remove assertion in conversion script * add image processor test * not copied * port changes for qk layernorm * fix-copies * read token decorator for tests * [run-slow] chameleon * one more read-token * address some comments * qk norm changes * tests and repo check * moved rope permutations to conversion, YAY! * fix past kv check * docs * layernorm done! * let's be consistent in naming * fix slow tests * weird thing with slow CI, but let's see * once more try * remove past-kv as tuple following llama * ignore * style --------- Co-authored-by:
Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by:
ArthurZucker <arthur.zucker@gmail.com> Co-authored-by:
jacobkahn <jacobkahn1@gmail.com> Co-authored-by:
Leonid Shamis <leonid.shamis@gmail.com> Co-authored-by:
Leonid Shamis <lshamis@meta.com> Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by:
Joao Gante <joao@huggingface.co> Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com>
-
- 08 Jul, 2024 1 commit
-
-
Pavel Iakubovskii authored
* Rebase to main * Fix attention implementation autoset for tex and vision configs * Fixup * Minor fixes * Fix copies * Fix attention_mask for FA2 * Add eqvivalence tests for siglip * Remove right padding test * Uncomment flaky * Fix import * Add to docs * Fix test message * Add sdpa * Add sdpa equivalence test * Add siglip sdpa to docs * Fix typing for attention output * Add sdpa tests * Fix signature of FA2 * Autoset attn_implementation in config * Rename bsz -> batch_size * Move back autoset attn method * Mark as flaky * Correct attention mask padding * [run-slow] siglip * Add FA2 and sdpa docs * Style fix * Remove flaky for FA2 test * Change attention implementation set * Change attn_implementaiton propogation * Fix typos * Add modality to assert message * Add more sdpa backends in test * [run slow] siglip * Add math sdpa backend for all options * [run slow] siglip
-
- 27 Jun, 2024 1 commit
-
-
Arthur authored
* nit * toctree issue * protect gemma2 tests as well * sdpa supported
-
- 26 Jun, 2024 2 commits
-
-
Raushan Turganbay authored
* squash into single commit * run diff once more * docstring * tests * minor chnages and ready to go * Update src/transformers/models/llava_next_video/processing_llava_next_video.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/vipllava/test_modeling_vipllava.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * [run-slow] llava-next-video * [run-slow] llava-next-video * [run-slow] llava_next_video * fix two tests * fix slow tests * remove logit checks due to numeric errors * run test once more * [run-slow] llava_next_video * final try to pass the test * [run-slow] llava_next_video * [run-slow] llava_next_video * [run-slow] llava_next_video * style * fix * style --------- Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by:
ydshieh <ydshieh@users.noreply.github.com>
-
Anton Vlasjuk authored
* starting support for sdpa in `gptneox` models * small comment on tests * fix dropout * documentation and style * clarify concrete paths for reference * generalise attn projections and rope application added head mask check to sdpa mask creation handle sdpa memory backend bug via own version flag * update docs and style * move dtype casting outside of general attn_projection_and_rope function fix flash_attn_2 stuff * more generic attn warning if output_attns or head_mask * simplify head mask check by moving head mask creation to a later point * remove copied llama artifact * remove padding_mask from attention function signature * removing unnecessary comments, only "save" attn implementation once * [run_slow] gpt_neox
-
- 19 Jun, 2024 1 commit
-
-
Anton Vlasjuk authored
* `gpt2` sdpa support * fix (at least) one test, style, repo consistency * fix sdpa mask in forward --> fixes generation * test * test2 * test3 * test4 * simplify shapes for attn mask creation and small comments * hub fail test * benchmarks * flash attn 2 mask should not be inverted on enc-dec setup * fix comment * apply some suggestion from code review - only save _attn_implentation once - remove unnecessary comment * change elif logic * [run-slow] gpt2 * modify `test_gpt2_sample_max_time` to follow previous assertion patterns
-
- 30 May, 2024 1 commit
-
-
Younes Belkada authored
Replace all occurences of `load_in_8bit` with bnb config
-
- 16 May, 2024 1 commit
-
-
hyenal authored
remove blank line (+1 squashed commit) Squashed commits: [24ccd2061] [run-slow]vit_msn,vision_encoder_decoder (+24 squashed commits) Squashed commits: [08bd27e7a] [run-slow]vit_msn,vision_encoder_decoder [ec96a8db3] [run-slow]vit_msn [ead817eca] fix vit msn multi gpu [d12cdc8fd] [run-slow]audio_spectrogram_transformer,deit,vision_encoder_decoder,vision_text_dual_encoder,vit,vit_hybrid,vit_mae,vit_msn,videomae,yolos [3fdbfa88f] doc [a3ff33e4a] finish implementation [e20b7b7fb] Update test_modeling_common.py [e290c5810] Update test_modeling_flax_common.py [d3af86f46] comment [ff7dd32d8] more comments [59b137889] suggestion [7e2ba6d67] attn_implementation as attribute of the class [fe66ab71f] minor [38642b568] Apply suggestions from code review Accept comments Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> [22cde7d52] Update tests/test_modeling_common.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> [48e137cc6] Update tests/test_modeling_common.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> [99f4c679f] Update tests/test_modeling_common.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> [96cf20a6d] Update src/transformers/models/vit_msn/modeling_vit_msn.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> [c59377d23] Update src/transformers/models/vit_mae/modeling_vit_mae.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> [b70a47259] Update tests/models/vision_text_dual_encoder/test_modeling_vision_text_dual_encoder.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> [00c84d216] [run-slow]audio_spectrogram_transformer,deit,vision_encoder_decoder,vision_text_dual_encoder,vit,vit_hybrid,vit_mae,vit_msn,videomae,yolos [61f00ebb0] all tests are passing locally [e9e0b82b7] vision encoder/decoder [4d5076b56] test-vision (+20 squashed commits) Squashed commits: [d1add8db9] yolo [9fde65716] fix flax [986566c28] minor [ca2f21d1f] vit [3333efd7a] easy models change [ebfc21402] [run-slow]audio_spectrogram_transformer,deit,vision_encoder_decoder,vision_text_dual_encoder,vit,vit_hybrid,vit_mae,vit_msn,videomae,yolos [b8b8603ed] [run-slow]vision_encoder_decoder,vision_text_dual_encoder,yolos [48ecc7e26] all tests are passing locally [bff7fc366] minor [62f88306f] fix yolo and text_encoder tests [121507555] [run-slow]audio_spectrogram_transformer,deit,vit,vit_hybrid,vit_mae,vit_msn,videomae [1064cae0a] [run-slow]vision_encoder_decoder,vision_text_dual_encoder,yolos [b7f52ff3a] [run-slow]audio_spectrogram_transformer,deit,vit,vit_hybrid,vit_mae,vit_msn,videomae [cffaa10dd] fix-copies [ef6c511c4] test vit hybrid [7d4ba8644] vit hybrid [66f919033] [run-slow]audio_spectrogram_transformer,deit,vit,vit_hybrid,vit_mae,vit_msn,videomae [1fcc0a031] fixes [cfde6eb21] fixup [e77df1ed3] all except yolo end encoder decoder (+17 squashed commits) Squashed commits: [602913e22] vit + vit_mae are working [547f6c4cc] RUN_SLOW=1 pytest tests/models/audio_spectrogram_transformer/ tests/models/deit/ tests/models/videomae/ passes [61a97dfa9] it s the complete opposite... [aefab37d4] fix more tests [71802a1b9] fix all torch tests [40b12eb58] encoder - decoder tests [941552b69] slow decorator where appropriate [14d055d80] has_attentions to yolo and msn [3381fa19f] add correct name [e261316a7] repo consistency [31c6d0c08] fixup [9d214276c] minor fix [11ed2e1b7] chore [eca6644c4] add sdpa to vit-based models [cffbf390b] make fix-copies result [6468319b0] fix style [d324cd02a] add sdpa for vit Co-authored-by:
Liubov Yaronskaya <luba.yaronskaya@gmail.com>
-
- 15 May, 2024 1 commit
-
-
Raushan Turganbay authored
* add model draft * update docstring * add tests * support image and video as input * update for better handling of mixed input and clean-up a bit * bug when mixed inputs & add tests * Update README.md Co-authored-by:
NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Merge remote-tracking branch 'upstream/main' into video_llava * link to abstract of paper in README * fix test * fix-copies * make tests happy * skip docstest for now * do not run doctest for now * Update src/transformers/models/video_llava/processing_video_llava.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/video_llava/image_processing_video_llava.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/video_llava/image_processing_video_llava.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/video_llava/image_processing_video_llava.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/video_llava/image_processing_video_llava.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/video_llava/test_modeling_video_llava.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/video_llava/image_processing_video_llava.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * address review comments * failing tests * Fix vocab_size in common tests for VLMs * codestyle * Update src/transformers/models/video_llava/configuration_video_llava.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/video_llava/configuration_video_llava.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/video_llava/modeling_video_llava.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/video_llava/modeling_video_llava.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/video_llava.md Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/video_llava.md Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/video_llava/image_processing_video_llava.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/video_llava.md Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/video_llava/processing_video_llava.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/video_llava/test_modeling_video_llava.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/video_llava/test_modeling_video_llava.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/video_llava/test_modeling_video_llava.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * PR suggestions * fix-copies * Update src/transformers/models/video_llava/configuration_video_llava.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/video_llava/configuration_video_llava.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add full example in docs * clean-up with new model-id * [run-slow] video_llava * update docstring * [run-slow] video_llava * remove all achive maps * fix some tests * test was supposed to be skipped for llava :) --------- Co-authored-by:
NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com>
-
- 14 May, 2024 2 commits
-
-
Pablo Montalvo authored
* add new model like * add state dict slicing + new model config * update palma config and weights, passes vision activations * fix * update * reorder loading/unpacking * clean up * add debug statements * change device * fix * debugging * fix noncausal mask * fixup sdpa + causal mask * fix activation function * remove debug before changing modeling file * add variants * debug attention mask in generate * revert to non-debug sdpa * revert gemma modifications * add custom language modeling * use Processor * add language modeling file to init * try thin wrapper around generate * Update * update mask * breakpoints galore * remove conflict * switch to left-padding * add incomplete model doc * add paligemma global files * batch rename paligemma * make generation match outputs and captioning * style * style * remove copied from + doc * remove more copied from * remove copy from projector * minor fix * update config and style * add readme - dummy * CORRECT image captioning * moving to args * add siglip proper + fix merging image + text features * take update_causal_mask from upstream * remove breakpoint * leverage AutoModel * fix input_ids slicing * make siglip head conditional * remove encoder_decoder value * remove unneeded modeling file * add commented 4d attention mask * FIXED generation with 4D mask * Update src/transformers/models/siglip/modeling_siglip.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix left padding detection * shuffle order of verifications * fix missing labels for training * fix * vectorize merging of features, improve slicing * improve testing before conversion * handle merging in processor * image token index depends on checkpoint * add variants, save processor too * save processors, base tokenizer off spm file * expand model embeddings due to additional image token * pass image processing args * add convert rgb to siglip processor * add \n token separately * fix tokenizer and prompts * fix docstrings * change to camel * fix casing * debug pos_ids and sdpa * pass and use cache_position * add flag for newline tokenization * Update src/transformers/models/paligemma/processing_paligemma.py Co-authored-by:
Merve Noyan <merveenoyan@gmail.com> * simplify conversion script * add copied from * add precision to conversion script * Update src/transformers/models/paligemma/modeling_paligemma.py Co-authored-by:
Pedro Cuenca <pedro@huggingface.co> * clean up * Shift attention mask from `1:` After discussion with @molbap * add docs, fix quality * quality, tied weights inheritance, and logits/label alignment * fix more tests * pass attn_implementation to language model correctly * add SiglipVisionTransformer to no split modules * skip paligemma test for sdpa dispatch to flash * skip incompatible tests * quality * [broken archive maps] * Apply suggestions - remove archive lists - style - take shape of inputs_embeds for batch Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/utils/dummy_pt_objects.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * simplify conversion script * add suggestions * add suggestions * add copied from * fix * move labels out * revert * fix * remove placeholder labels if None * use cache_position * fix quality + docstrings * fix quality * fix paligemma 4d gemma mask incompatibility * fix config docstring * fix query and attn_mask dtype --------- Co-authored-by:
ArthurZucker <arthur.zucker@gmail.com> Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by:
Merve Noyan <merveenoyan@gmail.com> Co-authored-by:
Pedro Cuenca <pedro@huggingface.co>
-
Yikang Shen authored
* init jetmoe code * update archive maps * remove flax import * fix import error * update README * ruff fix * update readme * fix * update config * fix issue * merge files * fix model bug * fix test * auto fix * model size * add comments * fix form * add flash attention support * fix attention head number * fix init * fix support list * sort auto mapping * fix test * fix docs * update test * fix test * fix test * change variable name * fix config * fix init * update format * clean code * fix config * fix config * change default config * update config * fix issues * update formate * update config argument * update format * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * change to mixtral aux loss * change to cache_position * debug * fix bugs * debug * fix format * fix format * fix copy * fix format * fix format * fix sort * fix sort * fix sort * add copy comment * add copy from * remove debug code * revert readme update * add copy * debug * remove debug code * fix flash attention * add comments * clean code * clean format * fix format * fix format * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * change variable name * add copied from * fix variable name * remove deprecated functinos * sync to llama implementation * fix format * fix copy * fix format * update format * remove repr * add comment for moe weight * fix copy * Update src/transformers/models/jetmoe/configuration_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * add comments and reformat config * fix format * fix format * fix format * update test * update doc string in config * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * update config doc * update attention cache * fix format * fix copy --------- Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
-
- 13 May, 2024 1 commit
-
-
fxmarty authored
* update to ROCm 6.0.2 and test MI300 * add callers for mi300 * update dockerfile * fix trainer tests * remove apex * style * Update tests/trainer/test_trainer_seq2seq.py * Update tests/trainer/test_trainer_seq2seq.py * Update tests/trainer/test_trainer_seq2seq.py * Update tests/trainer/test_trainer_seq2seq.py * update to torch 2.3 * add workflow dispatch target * we may need branches: mi300-ci after all * nit * fix docker build * nit * add check runner * remove docker-gpu * fix issues * fix --------- Co-authored-by:
Yih-Dar <2521628+ydshieh@users.noreply.github.com> Co-authored-by:
ydshieh <ydshieh@users.noreply.github.com>
-
- 26 Apr, 2024 2 commits
-
-
amyeroberts authored
Fix GroundingDINO, DPR after BET SDPA update
-
JB (Don) authored
* Adding SDPA support for BERT * Using the proper input name for testing model input in inference() * Adding documentation for SDPA in BERT model page * Use the stable link for the documentation * Adding a gate to only call .contiguous() for torch < 2.2.0 * Additions and fixes to the documentation * Minor updates to documentation * Adding extra requirements needed for the contiguous() bug * Adding "Adapted from" in plcae of the "Copied from" * Add benchmark speedup tables to the documentation * Minor fixes to the documentation * Use ClapText as a replacemenet for Bert in the Copied-From * Some more fixes for the fix-copies references * Overriding the test_eager_matches_sdpa_generate in bert tests to not load with low_cpu_mem_usage [test all] * Undo changes to separate test * Refactored SDPA self attention code for KV projections * Change use_sdpa to attn_implementation * Fix test_sdpa_can_dispatch_on_flash by preparing input (required for MultipleChoice models)
-
- 24 Apr, 2024 1 commit
-
-
Gustavo de Rosa authored
* chore(root): Initial commit of Phi-3 files. * fix(root): Fixes Phi-3 missing on readme. * fix(root): Ensures files are consistent. * fix(phi3): Fixes unit tests. * fix(tests): Fixes style of phi-3 test file. * chore(tests): Adds integration tests for Phi-3. * fix(phi3): Removes additional flash-attention usage, .e.g, swiglu and rmsnorm. * fix(phi3): Fixes incorrect docstrings. * fix(phi3): Fixes docstring typos. * fix(phi3): Adds support for Su and Yarn embeddings. * fix(phi3): Improves according first batch of reviews. * fix(phi3): Uses up_states instead of y in Phi3MLP. * fix(phi3): Uses gemma rotary embedding to support torch.compile. * fix(phi3): Improves how rotary embedding classes are defined. * fix(phi3): Fixes inv_freq not being re-computed for extended RoPE. * fix(phi3): Adds last suggestions to modeling file. * fix(phi3): Splits inv_freq calculation in two lines.
-
- 22 Apr, 2024 1 commit
-
-
Kamil Akesbi authored
* add sdpa to wav2vec. Co-authored-by:
kamilakesbi <kamil@huggingface.co> Co-authored-by:
jp1924 <jp42maru@gmail.com> * add fa2 to wav2vec2 * add tests * fix attention_mask compatibility with fa2 * minor dtype fix * replace fa2 slow test * fix fa2 slow test * apply code review + add fa2 batch test * add sdpa and fa2 to hubert * sdpa and fa2 to data2vec_audio * sdpa and fa2 to Sew * sdpa to unispeech + unispeech sat * small fix * attention mask in tests Co-authored-by:
Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * add_speedup_benchmark_to_doc --------- Co-authored-by:
kamil@huggingface.co <kamil.akesbi@gmail.com> Co-authored-by:
Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
-
- 18 Apr, 2024 3 commits
-
-
Abhi Venigalla authored
* wip * fix __init__.py * add docs * Apply suggestions from code review Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * address comments 1 * work on make fixup * pass configs down * add sdpa attention * remove DbrxBlock * add to configuration_auto * docstring now passes formatting test * fix style * update READMEs * add dbrx to modeling_auto * make fix-copies generated this * add DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP * config docstring passes formatting test * rename moe_loss_weight to router_aux_loss_coef * add to flash-attn documentation * fix model-path in tests * Explicitly make `"suli"` the default `ffn_act_fn` Co-authored-by:
Wing Lian <wing.lian@gmail.com> * default to using router_aux_loss_coef over ffn_config[moe_loss_weight] * fix _flash_attn_uses_top_left_mask and is_causal * fix tests path * don't use token type IDs * follow Llama and remove token_type_ids from test * init ConfigTester differently so tests pass * remove multiple choice test * remove question + answer test * remove sequence classification test * remove token classification test * copy Llama tests and remove token_type_ids from test inputs * do not test pruning or headmasking; style code * add _tied_weights_keys parameter to pass test * add type hints * fix type check * update config tester * remove masked_lm test * remove encoder tests * initialize DbrxModelTester with correct params * style * torch_dtype does not rely on torch * run make fixup, fix-copies * use https://huggingface.co/v2ray/dbrx-base-fixed/blob/main/modeling_dbrx.py * add copyright info * fix imports and DbrxRotaryEmbedding * update DbrxModel docstring * use copies * change model path in docstring * use config in DbrxFFN * fix flashattention2, sdpaattention * input config to DbrXAttention, DbrxNormAttentionNorm * more fixes * fix * fix again! * add informative comment * fix ruff? * remove print statement + style * change doc-test * fix doc-test * fix docstring * delete commented out text * make defaults match dbrx-instruct * replace `router_aux_loss_coef` with `moe_loss_weight` * is_decoder=True * remove is_decoder from configtester * implement sdpa properly * make is_decoder pass tests * start on the GenerationTesterMixin tests * add dbrx to sdpa documentation * skip weight typing test * style * initialize smaller model Co-authored-by:
Matt <Rocketknight1@users.noreply.github.com> * Add DBRX to toctree * skip test_new_cache_format * make config defaults smaller again * add pad_token_id * remove pad_token_id from config * Remove all references to DBRX_PRETRAINED_CONFIG_ARCHIVE_MAP * Update src/transformers/models/dbrx/__init__.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/dbrx/modeling_dbrx.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/model_doc/dbrx.md Co-authored-by:
Matt <Rocketknight1@users.noreply.github.com> * Update src/transformers/models/dbrx/configuration_dbrx.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/model_doc/dbrx.md Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix typo * Apply suggestions from code review Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * update docs, fix configuration_auto.py * address pr comments * remove is_decoder flag * slice * fix requires grad * remove grad * disconnect differently * remove grad * enable grads * patch * detach expert * nissan al ghaib * Update modeling_dbrx.py * Update src/transformers/models/dbrx/modeling_dbrx.py Co-authored-by:
Matt <Rocketknight1@users.noreply.github.com> * replace "Gemma" with "Dbrx" * remove # type: ignore * don't hardcode vocab_size * remove ToDo * Re-add removed idefics2 line * Update test to use tiny-random! * Remove TODO * Remove one more case of loading the entire dbrx-instruct in the tests * Update src/transformers/models/dbrx/modeling_dbrx.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * address some comments * small model * add dbrx to tokenization_auto * More docstrings with add_start_docstrings * Dbrx for now * add PipelineTesterMixin * Update src/transformers/models/dbrx/configuration_dbrx.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * remove flash-attn2 import error * fix docstring Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add useage example * put on one line Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix ffn_act_fn Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * change "dbrx" to "DBRX" for display purposes. * fix __init__.py? * fix __init__.py * fix README * return the aux_loss * remove extra spaces * fix configuration_auto.py * fix format in tokenization_auto * remove new line * add more useage examples --------- Co-authored-by:
Abhi Venigalla <abhi.venigalla@databricks.com> Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by:
Eitan Turok <eitan.turok@databricks.com> Co-authored-by:
Eitan Turok <150733043+eitanturok@users.noreply.github.com> Co-authored-by:
Wing Lian <wing.lian@gmail.com> Co-authored-by:
Eitan Turok <eitanturok@gmail.com> Co-authored-by:
Matt <Rocketknight1@users.noreply.github.com> Co-authored-by:
Matt <rocketknight1@gmail.com> Co-authored-by:
Your Name <you@example.com> Co-authored-by:
Mihir Patel <mihir.v.patel7@gmail.com> Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com>
-
tomeras91 authored
* Add jamba arch * apply "make fix-copies" changes * fix link to model in JambaConfig docstring * Add n_ctx in modeling file because repo-consistency wants that * Add jamba to flash attention and sdpa documentation * mamba dt_proj quant fix now works for LoRA as well * override test_left_padding_compatibility and use a more permissive tolerance. left padding numerical difference are accentuated by mamba layers * add jamba to tokenization auto * fix comments of shape (PR #24 in the model page: https://huggingface.co/ai21labs/Jamba-v0.1/discussions/24) * simple PR fixes * remove unnecessary kwargs from JambaAttentionDecoderLayer and JambaMambaDecoderLayer * remove the LoRA hack for the mamba dt_proj bias. It was solved in huggingface/peft#1530 (https://github.com/huggingface/peft/pull/1530) * Add copied comment on JambaMLP (it's the same as MixtralMLP) * remove padding_mask warnings. It's not supported anymore * fix docstring. Float instead of int * A few more minor PR fixes * (1) lowercase names for mamba layernorms (2) remove _apply_inner_layernorms and do it directly in the forward pass * Return None attention weights from mamba layers. Append to all attentions only if not None. * remove some leftover jamba archive lists * Better separation between expert vs non-expert layers. non-expert layers return None as router_logits, and it is not concatenated to all_router_logits returned from JambaModel * no need to take router_logits at config.expert_layer_offset anymore. result.router_logits now holds results only for expert layers * Add Jamba paper on READMEs * (1) rename n_ctx -> max_position_embeddings (2) don't use it in the modeling file since it's not needed (set it as an exception to check_config_attributes) * Add copied from comment * remove the code path for apply_inner_layernorms=False. Jamba always has the inner mamba layernorms * clearer docstring for _convert_to_standard_cache * style fixes * Change calc_logits_for_entire_prompt (bool) to num_logits_to_keep (int). Adapt assisted decoding code tp use it. Also small change in low memory beam search decoding path to support this new int value in model_inputs * rename test so it still overrides what its meant to override * draft * oups * nit * remove more complexe logic * fix names used in config * fix fix fix * style * fix some more failing tests * generate did not init the cache
馃檭 * more small nits * typo * config.mamba_expand * config.hidden_size for the intermediate size of the mamba shapes * fix init of pkv with torch.tensor() * empty tensor * fix some init issues * stupid changes required by generate because it does not even support it's own DynamicCache class * more fixes * fix general assisted gen cache_position bug * tests passing * Add offsets and periods as SPECIAL_CASES_TO_ALLOW in check_config_attributes.py * fix reorder_cache to reorder mamba states and override some more functions in HybridMambaAttentionDynamicCache * no need to override test_past_key_values_format() and _check_past_key_values_for_generate() in tests anymore * fix docstrings and typehints for past_key_values * style fixes * fix docs * change typehint due to copy from Mixtral * forgot import * import order * Add configuration_jamba and modeling_jamba to not_doctested because the model is too big to download (in docstring of JambaForCausalLM.forward) * Add integration test with tiny tandom Jamba model on hub * fix flash attention cache shapes * bring back forgotten hidden states * rename HybridMambaAttentionDynamicCache.seqlen_offset to has_previous_state (and make bool) and bugfix - it should be set to True after a finished forward pass of the entire model * align integration test after modeling fixes * bugfix - mamba can use precomputed states only of forward pass is on a single token * bugfix - mamba can use precomputed states only if they match the batch size * typo * remove making _prepare_4d_causal_attention_mask a leaf function * stop using past_seq_len.get_seq_length(). Use cache positions instead. Adjust test (test_decoder_model_past_with_large_inputs) accordingly --------- Co-authored-by:Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by:
Joao Gante <joao@huggingface.co>
-
Alexander Visheratin authored
* Added flash attention 2. * Fixes. * Fix inheritance. * Fixed init. * Remove stuff. * Added documentation. * Add FA2 to M2M100 documentation. * Add test. * Fixed documentation. * Update src/transformers/models/m2m_100/modeling_m2m_100.py Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update docs/source/en/model_doc/nllb.md Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Fixed variable name. --------- Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com>
-
- 17 Apr, 2024 1 commit
-
-
Shane A authored
* Add OLMo using add-new-model-like with Llama * Fix incorrect tokenizer for OLMo * Copy-paste relevant OLMo methods and their imports * Add OLMo config * Modify OLMo config to follow HF conventions * Remove unneeded Llama code from OLMo model * Add ability for OLMo model to output attentions * Add OLMoPreTrainedModel and OLMoModel * Add OLMoForCausalLM * Minor fixes to OLMo model for style and missing functions * Implement OLMo tokenizer * Implement OLMo to HF conversion script * Add tests for OLMo model * Add tests for OLMo fast tokenizer * Add auto-generated dummy objects * Remove unimplemented OLMo classes from auto and init classes and re-format * Add README and associated auto-generated files * Use OLMo names for common properties * Run make fixup * Remove `|` from OLMo typing * Remove unneeded tokenization_olmo.py * Revert model, config and converter to add-new-model-like Llama * Move logic for adding bos/eos token into GPTNeoxTokenizerFast * Change OLMoConfig defaults to match OLMo-7B * Use GPTNeoXToknizerFast in OLMo tokenizer tests * Modify auto-generated OLMoModelTests to work for OLMo * Add non-parametric layer norm OLMoLayerNorm * Update weight conversion script for OLMo * Fix __init__ and auto structure for OLMo * Fix errors from make fixup * Remove OLMoTokenizerFast from documentation * Add missing 'Copied from' for OLMoModel._update_causal_mask * Run make fix-copies * Rearrange string replacements in OLMoForCausalLM Copied from * Move OLMo and Llama CausalLM.forward example into global constants * Fix OLMO_GENERATION_EXAMPLE doc string typo * Add option for qkv clipping to OLMo * Rearrange OLMoConfig kwargs in convert_olmo_weights_to_hf * Add clip_qkv to OLMoConfig in convert_olmo_weights_to_hf * Fix OLMo tokenization bug using conversion script * Keep model in full precision after conversion * Do not add eos token automatically * Update references to OLMo model in HF Hub * Do not add eos token during encoding by default * Fix Llama generation example * Run make fixup * OLMo 7B integration test fix * Remove unneeded special case for OLMoConfig * OLMo 7B Twin 2T integration test fix * Fix test_model_7b_greedy_generation * Remove test_compile_static_cache * Fix OLMo and Llama generation example * Run make fixup * Revert "OLMo 7B integration test fix" This reverts commit 4df56a4b150681bfa559846f40e9b7b7f97d7908. * Revert "OLMo 7B Twin 2T integration test fix" This reverts commit 9ff65a4a294ace89ab047b793ca55e623a9ceefc. * Ungate 7B integration tests and fix greedy generation test * Add retries for flaky test_eager_matches_sdpa_generate * Fix output of doc example for OLMoForCausalLM.forward * Downsize OLMo doc test for OLMoForCausalLM.forward to 1B model * Try fix incorrect characters in OLMoForCausalLM.forward doct test * Try fix incorrect characters in OLMoForCausalLM.forward doc test using end quotes * Remove pretraining_tp from OLMo config and model * Add missing 'Copied from' instances * Remove unneeded causal_mask from OLMoModel * Revert Llama changes * Ignore copy for OLMoForCausalLM.forward * Change 'OLMo' to 'Olmo' in classes * Move minimal OLMo tokenization tests to model tests * Add missed 'Copied from' for repeat_kv
-
- 15 Apr, 2024 1 commit
-
-
amyeroberts authored
* Initial add model additions * Test * All weights loading * Can perform full forward pass * Local and remote the same * Matching local and remote * Fixup * Idefics2Model importable; fixup docstrings * Don't skip by default * Remove deprecated use_resampler arg * Remove self.config * DecoupledLinear takes config * Tidy up * Enable eager attention and tidy up * Most tests passing * Update for batch of processed images * Add image processor * Update doc pages * Update conversion script * Remove erroneous breakpoint * Remove accidendtal spelling change * Update to reflect changes on hub - make generate work * Fix up * Image processor tests * Update tests * Add a processor * Add a processor * Update convert script * Update modeling file - remove fixmes * Bug fix * Add processing test * Use processor * Fix up * Update src/transformers/models/idefics2/modeling_idefics2.py Co-authored-by:
Victor SANH <victorsanh@gmail.com> * Update src/transformers/models/idefics2/modeling_idefics2.py Co-authored-by:
Victor SANH <victorsanh@gmail.com> * Fix test * Update config - PR comments and defaults align with checkpoint * Reviewer comments * Add copied froms for flahs attention * Update src/transformers/models/idefics2/modeling_idefics2.py Co-authored-by:
Victor SANH <victorsanh@gmail.com> * Apply suggestions from code review Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Remove qk_layer_norm and freeze_layers functionality * Fix * Remove freeze_layer options from config * Sync with upstream main * Fix attention shapes siglip * Remove Llava-next refs - TO REBASE * Use AutoModel for text model * Add comment to explain vision embeddings * Fix issue with tie_word_embeddings * Address review comments * Fix and fix up * Chat templates for idefics * Fix copies * Fix * Add layer norms to FA2 * Fix tests * Apply suggestions from code review Co-authored-by:
Victor SANH <victorsanh@gmail.com> * Fix * Review comments * Update src/transformers/models/idefics2/modeling_idefics2.py Co-authored-by:
Victor SANH <victorsanh@gmail.com> * Update inputs merger * Merge weights in correct order * Update convert script * Update src/transformers/models/idefics2/processing_idefics2.py Co-authored-by:
Victor SANH <victorsanh@gmail.com> * Update template * Model code examples (fix idefics too) * More review comments * Tidy up * Update processing * Fix attention mask preparation * Update inputs_merger inputs * Vectorize inputs_merger * Update src/transformers/models/idefics2/__init__.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/idefics2/modeling_idefics2.py * Review comments * saying bye to the `qk_layer_norms` * Simplify * Update latents * Remove erroneuous readme changes * Return images when applying chat template * Fix bug - prompt images are for a single sample * Update src/transformers/models/idefics2/modeling_idefics2.py * image splitting * fix test * some more comment * some comment * Apply suggestions from code review Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/idefics2/image_processing_idefics2.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update processor * Update model tests * Update src/transformers/models/idefics2/processing_idefics2.py Co-authored-by:
Victor SANH <victorsanh@gmail.com> * Update src/transformers/models/idefics2/processing_idefics2.py Co-authored-by:
Victor SANH <victorsanh@gmail.com> * Don't add BOS in template * Update src/transformers/models/idefics2/processing_idefics2.py Co-authored-by:
Victor SANH <victorsanh@gmail.com> * Remove index in examples * Update tests to reflect #13 * Update src/transformers/models/idefics2/processing_idefics2.py Co-authored-by:
Victor SANH <victorsanh@gmail.com> * PR comment - consistent typing * Update readme and model doc * Update docs * Update checkpoint references * Update examples * Fix and update tests * Small addition * Update tests - remove copied from as no ignore placement copy could be found * Update example * small fixes * Update docs/source/en/model_doc/idefics2.md Co-authored-by:
Victor SANH <victorsanh@gmail.com> * Update docs/source/en/model_doc/idefics2.md Co-authored-by:
Victor SANH <victorsanh@gmail.com> * Update README.md Co-authored-by:
Victor SANH <victorsanh@gmail.com> * Connector model as bridge * Fix up * Fix up * Don't pass model inputs for generation kwargs update * IDEFICS-2 -> Idefics2 * Remove config archive name * IDEFICS-2 -> Idefics2 * Add back llava-next * Update readmes * Add requirements for processor tester * Use custom convert_to_rgb to avoid possible BC * Fix doc example * Fix doc example * Skip model doc tests - as model to large * More doc example - account for image splitting * Update src/transformers/image_transforms.py * Fix config doctest --------- Co-authored-by:
Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by:
ArthurZucker <arthur.zucker@gmail.com> Co-authored-by:
Victor SANH <victorsanh@gmail.com> Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
- 02 Apr, 2024 1 commit
-
-
Yoach Lacombe authored
* add FA2 to o.g Musicgen * make style * add FA2 support to Musicgen Melody * add generation FA2 tests to o.g Musicgen * make style and fix copies * add Musicgen to FA2 docs + deprecate list * add sdpa supports to Musicgen's * make style and fix copies * refactor attention implementation arguments * add Copied from to sdpa tests * add copied form in sdpa tests melody * add copied for FA2 generation tests * add FA2 inference copied from * make style
-
- 28 Mar, 2024 1 commit
-
-
Eduardo Pacheco authored
* First commit to add flash attention 2 for GPT-2 * more improvements * Make GPT2 pass tests and fixed Decison Transformers copies * Fixed missing arg * fix copies * Added expected speedup * Update src/transformers/models/gpt2/modeling_gpt2.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt2/modeling_gpt2.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt2/modeling_gpt2.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Added test * Fixed attn attribute * Update docs/source/en/model_doc/gpt2.md Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/model_doc/gpt2.md Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update Decision transformer attentions * More updates * Passing tests * Fix copies * Fix copies part 2 * Decision transformer updates * Update src/transformers/models/gpt2/modeling_gpt2.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Fix copies * Decision transformer not supporting flash attn * Addressed comments * Addressed comments * Addressed comments --------- Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com>
-
- 27 Mar, 2024 1 commit
-
-
Bo Zheng authored
* add support for qwen2 MoE models * update docs * add support for qwen2 MoE models * update docs * update model name & test * update readme * update class names & readme & model_doc of Qwen2MoE. * update architecture name * fix qwen2_moe tests * use Qwen2Tokenizer instead of Qwen2MoeTokenizer * update modeling_qwen2_moe.py * fix model architecture * fix qwen2_moe tests * use Qwen2Tokenizer instead of Qwen2MoeTokenizer * update modeling_qwen2_moe.py * fix model architecture * fix style * fix test when there are sparse and non sparse layers * fixup * Update README.md Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup * fixup * add archive back * add support for qwen2 MoE models * update docs * update model name & test * update readme * update class names & readme & model_doc of Qwen2MoE. * update architecture name * fix qwen2_moe tests * use Qwen2Tokenizer instead of Qwen2MoeTokenizer * update modeling_qwen2_moe.py * fix model architecture * fixup * fix qwen2_moe tests * use Qwen2Tokenizer instead of Qwen2MoeTokenizer * fix style * fix test when there are sparse and non sparse layers * fixup * add archive back * fix integration test * fixup --------- Co-authored-by:
bozheng-hit <dsoul0621@gmail.com> Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
- 20 Mar, 2024 1 commit
-
-
NielsRogge authored
* First draft * Fix tests, add docs * Improve docstrings * Fix test * Address comments * Address comments * Remove vocab_size attribute * Remove batch_size * Address comment * Add image processor tests * Support fx * Update docstring * Add support for 34b * Convert 34b model * Add integration tests * Update checkpoints * Convert vicuna-13b, remove doc tests * Remove script * Remove file * Address comments * Improve docstrings * Deprecate vocab_size * Remove aspect_ratio_setting * Address comments * Update READMEs * Add tips about chat templates * Fix tests * Deprecate vocab_size safely * Update tests --------- Co-authored-by:Amy Roberts <22614925+amyeroberts@users.noreply.github.com>
-
- 15 Mar, 2024 1 commit
-
-
Saurabh Dash authored
* Cohere Model Release (#1) Cohere Model Release * Remove unnecessary files and code (#2) Some cleanup * Delete cohere-model directory (#3) * Make Fix (#5) * Pr fixes (#6) * fixes for pr * pr fixes for the format * pr fixes for the format * src/transformers/models/auto/tokenization_auto.py * Tokenizer test (#8) * tokenizer test * format fix * Adding Docs and other minor changes (#7) * Add modeling tests (#9) * Smol Fix (#11) * tokenization tests are fixed * format fixes * fix pr doc tests * fix pr doc tests * fix pr doc tests * fix pr style check * small changes in cohere.md * FIX: Address final comments for transformers integration (#13) * fix modeling final nits and add proper test file * for now leave empty tests * add integration test * push new test * fix modeling cohere (#14) * Update chat templates to use the new API (#15) --------- Co-authored-by:
ahmetustun <ahmetustun89@gmail.com> Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by:
Matt <Rocketknight1@users.noreply.github.com>
-
- 13 Mar, 2024 1 commit
-
-
bytebarde authored
* initial implementation of flash attention for gptj * modify flash attention and overwrite test_flash_attn_2_generate_padding_right * update flash attention support list * remove the copy line in the `CodeGenBlock` * address copy mechanism * Update src/transformers/models/gptj/modeling_gptj.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Add GPTJ attention classes * add expected outputs in the gptj test * Ensure repo consistency with 'make fix-copies' --------- Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
- 28 Feb, 2024 1 commit
-
-
RaymondLi0 authored
* Copy model * changes * misc * fixes * add embed and residual dropout (#30) * misc * remove rms norm and gated MLP * remove copied mentions where its not a copy anymore * remove unused _shape * copied from mistral instead * fix copies * fix copies * add not doctested * fix * fix copyright * Update docs/source/en/model_doc/starcoder2.md Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/starcoder2/configuration_starcoder2.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/starcoder2/configuration_starcoder2.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix doc * revert some changes * add fa2 tests * fix styling nit * fix * push dummy docs --------- Co-authored-by:
Joel Lamy-Poirier <joel.lamy-poirier@servicenow.com> Co-authored-by:
younesbelkada <younesbelkada@gmail.com> Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
- 22 Feb, 2024 1 commit
-
-
fxmarty authored
* fix torch.export.export for llama * do not change doc title * make fix copies
-
- 21 Feb, 2024 2 commits
-
-
Arthur authored
* inital commit * update * update conversion checkpoint * update conversion script * nits * some fixes * nits * merge * fix permute * nits * fix * nits * nits * nits * fix rope * fix both rope * nites * style * make sure flax works * fix flax init code * fix foward * nits * print flax generation out * current code * nits * SIIIIIIIIIIIIIIIIIII * update * add new tokenizer * correct fast tokenizer * fix conversion * more comments * fix modeling and conversion * nits and nits * nits testing * add some tokenization tests * add some edge cases * add slow tests and fix them * fixup * fix copies for modeling * fix copies * add 7B slow tests * fix * fix * fix tests * make tokenizer cis go green * styling * last tokenizer nits * update jax tests * fix flax for 7b * add jit testing
馃 * cleanups * isolated nit, inv_freq for rotary_emb.inv_freq * propagate to jax * Apply suggestions from code review Co-authored-by:Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * adjust test * fix conversion script * change name * correct file names * update conversion script * Fix bos and eos token ids in the model configuration (#3) * update modelling * update conversion script * add static cache for gemma * fix sdpa generate * fix batched * multiple fixes * fix FA2 * final fix * Rename a few missing strings and filenames (#4) * merge with upstream main * fix copies * fix copies * fix fixup * fix fixup * fix * fix * final tests * fix fx gemma tests * fix fx bf16/fp16 tests * update slow fx tests * fx slow tests: one logits, one generation * move jit test standalone * Apply suggestions from code review * nits * tokenizer updates * more tokenization updates: custom GemmaSentencepieceExtrator * style * Update src/transformers/cache_utils.py * Update src/transformers/models/gemma/__init__.py * Update tests/models/gemma/test_modeling_flax_gemma.py * small nits * style * update tokenization test * fix the rotary embedding * with style * fix slow tests * WARNING this commit might be very important for precisions * Update tests/models/gemma/test_modeling_flax_gemma.py * Update src/transformers/models/gemma/configuration_gemma.py Co-authored-by:
Lysandre Debut <hi@lysand.re> * Update src/transformers/models/gemma/modeling_flax_gemma.py Co-authored-by:
Lysandre Debut <hi@lysand.re> * small nits here and there! * forgotten nit * remove on the fly computation of inv_freq * revert previous change, let's be safe and for now re-compute freq cis to make sure it's in float * Apply suggestions from code review Co-authored-by:
Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py Co-authored-by:
Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py Co-authored-by:
Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by:
Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by:
Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by:
Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_flax_gemma.py Co-authored-by:
Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by:
Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by:
Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_tokenization_gemma.py Co-authored-by:
Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_tokenization_gemma.py Co-authored-by:
Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_tokenization_gemma.py Co-authored-by:
Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_tokenization_gemma.py Co-authored-by:
Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by:
Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by:
Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by:
Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by:
Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by:
Pedro Cuenca <pedro@huggingface.co> * nit conversion script link * fix some tests * add not doctest and pr doctest * repo consistency * fix last CIs
馃殌 * update all readmes --------- Co-authored-by:younesbelkada <younesbelkada@gmail.com> Co-authored-by:
Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by:
Pedro Cuenca <pedro@huggingface.co> Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by:
sanchit-gandhi <sanchit@huggingface.co> Co-authored-by:
Lysandre Debut <hi@lysand.re>
-
Ekaterina Aidova authored
* support SDPA Attention in stablelm * add integration test * add fallback for output_attentions * Update src/transformers/models/stablelm/modeling_stablelm.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update tests/models/stablelm/test_modeling_stablelm.py Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/models/stablelm/modeling_stablelm.py Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * handle non-contiguous states --------- Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
-
- 20 Feb, 2024 1 commit
-
-
JB (Don) authored
-
- 16 Feb, 2024 1 commit
-
-
Lysandre Debut authored
* Script & Manual edition * Update
-
- 14 Feb, 2024 1 commit
-
-
Jonathan Tow authored
* Add `StableLM` * fix(model): re-create from `huggingface-cli add-new-model-like persimmon` * fix: re-add changes to address comments * fix(readme): add links to paper * fix(tokenization_auto): remove `GPTNeoXTokenizerFastFast` ref * fix(tests): re-add `@slow` decorator to integration tests * fix(tests): import slow... * fix(readme_hd): remove whitespace edit * fix(tokenizer): auto tokenizer tuple * skip doctests for `modeling_stablelm`
-
- 17 Jan, 2024 1 commit
-
-
Junyang Lin authored
* add config, modeling, and tokenization * add auto and init * update readme * update readme * update team name * fixup * fixup * update config * update code style * update for fixup * update for fixup * update for fixup * update for testing * update for testing * fix bug for config and tokenization * fix bug for bos token * not doctest * debug tokenizer * not doctest * debug tokenization * debug init for tokenizer * fix style * update init * delete if in token auto * add tokenizer doc * add tokenizer in init * Update dummy_tokenizers_objects.py * update * update * debug * Update tokenization_qwen2.py * debug * Update convert_slow_tokenizer.py * add copies * add copied from and make style * update files map * update test * fix style * fix merge reading and update tests * fix tests * fix tests * fix style * debug a variable in readme * Update src/transformers/models/qwen2/configuration_qwen2.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * update test and copied from * fix style * update qwen2 tokenization and tests * Update tokenization_qwen2.py * delete the copied from after property * fix style * update tests * update tests * add copied from * fix bugs * update doc * add warning for sliding window attention * update qwen2 tokenization * fix style * Update src/transformers/models/qwen2/modeling_qwen2.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix tokenizer fast --------- Co-authored-by:
Ren Xuancheng <jklj077@users.noreply.github.com> Co-authored-by:
renxuancheng.rxc <renxuancheng.rxc@alibaba-inc.com> Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
- 22 Dec, 2023 1 commit
-
-
Yih-Dar authored
fix Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
- 18 Dec, 2023 1 commit
-
-
Steven Liu authored
* doc fix friday * deprecated objects * update not_doctested * update toctree
-