1. 10 Apr, 2024 1 commit
    • Arthur's avatar
      Add recurrent gemma (#30143) · 0fe44059
      Arthur authored
      
      
      * Fork.
      
      * RecurrentGemma initial commit.
      
      * Updating __init__.py.
      
      * Minor modification to how we initialize the cache.
      Changing how the config specifies the architecture.
      
      * Reformat code to 4 spaces.
      Fixed a few typos.
      
      * Fixed the forward pass.
      Still unclear on the cache?
      
      * Fixed the RecurrentGemmaForCausalLM
      
      * Minor comment that we might not need attention_mask and output_attention arguments.
      
      * Now cache should work as well.
      
      * Adding a temporary example to check whether the model generation works.
      
      * Adding the tests and updating imports.
      
      * Adding the example file missing in the previous commit.
      
      * First working example.
      
      * Removing .gitignore and reverting parts of __init__.
      
      * Re-add .gitignore.
      
      * Addressing comments for configuration.
      
      * Move mask creation to `_prepare_inputs_for_generation`.
      
      * First try at integration tests:
      1. AttributeError: 'GriffinCausalLMOutput' object has no attribute 'attentions'.
      2. `cache_position` not passed
      
      * Transfoering between machines.
      
      * Running normal tests.
      
      * Minor fix.
      
      * More fixes.
      
      * Addressing more comments.
      
      * Minor fixes.
      
      * first stab at cleanup
      
      * more refactoring
      
      * fix copies and else
      
      * renaming and get init to work
      
      * fix causal mask creation
      
      * update
      
      * nit
      
      * fix a hell lot of things
      
      * updates
      
      * update conversion script
      
      * make all keys importable
      
      * nits
      
      * add auto mappings
      
      * properly convert ffw_up and down
      
      * add scaling
      
      * fix generations
      
      * for recurrent dtype
      
      * update
      
      * fix going beyong window
      
      * fixup
      
      * add missing files
      
      * current updates to remove last einops
      
      * finish modeling refactor
      
      * TADA
      
      * fix compile
      
      * fix most failing testt ? ?
      
      * update tests
      
      * refactor and update
      
      * update
      
      * nits, fixup and update tests
      
      * more fixup
      
      * nits
      
      * fix imports
      
      * test format
      
      * fixups
      
      * nits
      
      * tuple typing
      
      * fix code quality
      
      * add model card
      
      * fix doc
      
      * skip most generation tests
      
      * nits
      
      * style
      
      * doc fixes
      
      * fix pr and check_copies?
      
      * last nit
      
      * oupsy
      
      * Apply suggestions from code review
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      
      * update
      
      * Update src/transformers/models/recurrent_gemma/convert_recurrent_gemma_to_hf.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update tests/models/recurrent_gemma/test_modeling_recurrent_gemma.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * update based on review
      
      * doc nit
      
      * fix quality
      
      * quality
      
      * fix slow test model path
      
      * update default dype
      
      * ignore attributes that can be safely ignored in check config attributes
      
      * 0lallalala come on
      
      * save nit
      
      * style
      
      * remove to dict update
      
      * make sure we can also run in float16
      
      * style
      
      ---------
      Co-authored-by: default avatarPablo Montalvo <39954772+molbap@users.noreply.github.com>
      Co-authored-by: default avatarAleksandar Botev <botev@google.com>
      Co-authored-by: default avatarLeonard Berrada <lberrada@users.noreply.github.com>
      Co-authored-by: default avataranushanf <anushanf@google.com>
      Co-authored-by: default avatarbotev <botevmg@gmail.com>
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      0fe44059
  2. 30 Mar, 2024 1 commit
  3. 27 Mar, 2024 1 commit
    • Bo Zheng's avatar
      Add Qwen2MoE (#29377) · 1c39974a
      Bo Zheng authored
      
      
      * add support for qwen2 MoE models
      
      * update docs
      
      * add support for qwen2 MoE models
      
      * update docs
      
      * update model name & test
      
      * update readme
      
      * update class names & readme & model_doc of Qwen2MoE.
      
      * update architecture name
      
      * fix qwen2_moe tests
      
      * use Qwen2Tokenizer instead of Qwen2MoeTokenizer
      
      * update modeling_qwen2_moe.py
      
      * fix model architecture
      
      * fix qwen2_moe tests
      
      * use Qwen2Tokenizer instead of Qwen2MoeTokenizer
      
      * update modeling_qwen2_moe.py
      
      * fix model architecture
      
      * fix style
      
      * fix test when there are sparse and non sparse layers
      
      * fixup
      
      * Update README.md
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * fixup
      
      * fixup
      
      * add archive back
      
      * add support for qwen2 MoE models
      
      * update docs
      
      * update model name & test
      
      * update readme
      
      * update class names & readme & model_doc of Qwen2MoE.
      
      * update architecture name
      
      * fix qwen2_moe tests
      
      * use Qwen2Tokenizer instead of Qwen2MoeTokenizer
      
      * update modeling_qwen2_moe.py
      
      * fix model architecture
      
      * fixup
      
      * fix qwen2_moe tests
      
      * use Qwen2Tokenizer instead of Qwen2MoeTokenizer
      
      * fix style
      
      * fix test when there are sparse and non sparse layers
      
      * fixup
      
      * add archive back
      
      * fix integration test
      
      * fixup
      
      ---------
      Co-authored-by: default avatarbozheng-hit <dsoul0621@gmail.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      1c39974a
  4. 20 Mar, 2024 2 commits
    • NielsRogge's avatar
      Add LLaVa-1.6, bis (#29586) · d91fd7f9
      NielsRogge authored
      
      
      * First draft
      
      * Fix tests, add docs
      
      * Improve docstrings
      
      * Fix test
      
      * Address comments
      
      * Address comments
      
      * Remove vocab_size attribute
      
      * Remove batch_size
      
      * Address comment
      
      * Add image processor tests
      
      * Support fx
      
      * Update docstring
      
      * Add support for 34b
      
      * Convert 34b model
      
      * Add integration tests
      
      * Update checkpoints
      
      * Convert vicuna-13b, remove doc tests
      
      * Remove script
      
      * Remove file
      
      * Address comments
      
      * Improve docstrings
      
      * Deprecate vocab_size
      
      * Remove aspect_ratio_setting
      
      * Address comments
      
      * Update READMEs
      
      * Add tips about chat templates
      
      * Fix tests
      
      * Deprecate vocab_size safely
      
      * Update tests
      
      ---------
      Co-authored-by: default avatarAmy Roberts <22614925+amyeroberts@users.noreply.github.com>
      d91fd7f9
    • Arthur Zucker's avatar
      v4.40.0.dev.0 · 1248f092
      Arthur Zucker authored
      1248f092
  5. 19 Mar, 2024 2 commits
  6. 18 Mar, 2024 1 commit
    • Yoach Lacombe's avatar
      Add MusicGen Melody (#28819) · c43b380e
      Yoach Lacombe authored
      
      
      * first modeling code
      
      * make repository
      
      * still WIP
      
      * update model
      
      * add tests
      
      * add latest change
      
      * clean docstrings and copied from
      
      * update docstrings md and readme
      
      * correct chroma function
      
      * correct copied from and remove unreleated test
      
      * add doc to toctree
      
      * correct imports
      
      * add convert script to notdoctested
      
      * Add suggestion from Sanchit
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * correct get_uncoditional_inputs docstrings
      
      * modify README according to SANCHIT feedback
      
      * add chroma to audio utils
      
      * clean librosa and torchaudio hard dependencies
      
      * fix FE
      
      * refactor audio decoder -> audio encoder for consistency with previous musicgen
      
      * refactor conditional -> encoder
      
      * modify sampling rate logics
      
      * modify license at the beginning
      
      * refactor all_self_attns->all_attentions
      
      * remove ignore copy from causallm generate
      
      * add copied from for from_sub_models
      
      * fix make copies
      
      * add warning if audio is truncated
      
      * add copied from where relevant
      
      * remove artefact
      
      * fix convert script
      
      * fix torchaudio and FE
      
      * modify chroma method according to feedback-> better naming
      
      * refactor input_values->input_features
      
      * refactor input_values->input_features and fix import fe
      
      * add input_features to docstrigs
      
      * correct inputs_embeds logics
      
      * remove dtype conversion
      
      * refactor _prepare_conditional_hidden_states_kwargs_for_generation ->_prepare_encoder_hidden_states_kwargs_for_generation
      
      * change warning for chroma length
      
      * Update src/transformers/models/musicgen_melody/convert_musicgen_melody_transformers.py
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * change way to save wav, using soundfile
      
      * correct docs and change to soundfile
      
      * fix import
      
      * fix init proj layers
      
      * remove line breaks from md
      
      * fix issue with docstrings
      
      * add FE suggestions
      
      * improve is in logics and remove useless imports
      
      * remove custom from_pretrained
      
      * simplify docstring code
      
      * add suggestions for modeling tests
      
      * make style
      
      * update converting script with sanity check
      
      * remove encoder attention mask from conditional generation
      
      * replace musicgen melody checkpoints with official orga
      
      * rename ylacombe->facebook in checkpoints
      
      * fix copies
      
      * remove unecessary warning
      
      * add shape in code docstrings
      
      * add files to slow doc tests
      
      * fix md bug and add md to not_tested
      
      * make fix-copies
      
      * fix hidden states test and batching
      
      ---------
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      c43b380e
  7. 15 Mar, 2024 1 commit
    • Saurabh Dash's avatar
      Cohere Model Release (#29622) · 0e4a1c34
      Saurabh Dash authored
      
      
      * Cohere Model Release (#1)
      
      Cohere Model Release
      
      * Remove unnecessary files and code (#2)
      
      Some cleanup
      
      * Delete cohere-model directory (#3)
      
      * Make Fix (#5)
      
      * Pr fixes (#6)
      
      * fixes for pr
      
      * pr fixes for the format
      
      * pr fixes for the format
      
      * src/transformers/models/auto/tokenization_auto.py
      
      * Tokenizer test (#8)
      
      * tokenizer test
      
      * format fix
      
      * Adding Docs and other minor changes (#7)
      
      * Add modeling tests (#9)
      
      * Smol Fix (#11)
      
      * tokenization tests are fixed
      
      * format fixes
      
      * fix pr doc tests
      
      * fix pr doc tests
      
      * fix pr doc tests
      
      * fix pr style check
      
      * small changes in cohere.md
      
      * FIX: Address final comments for transformers integration (#13)
      
      * fix modeling final nits and add proper test file
      
      * for now leave empty tests
      
      * add integration test
      
      * push new test
      
      * fix modeling cohere (#14)
      
      * Update chat templates to use the new API (#15)
      
      ---------
      Co-authored-by: default avatarahmetustun <ahmetustun89@gmail.com>
      Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
      Co-authored-by: default avatarMatt <Rocketknight1@users.noreply.github.com>
      0e4a1c34
  8. 14 Mar, 2024 1 commit
  9. 11 Mar, 2024 2 commits
  10. 26 Feb, 2024 1 commit