"tests/models/vscode:/vscode.git/clone" did not exist on "12313838d33373d06d35b48c3c501fa832f16443"
  1. 13 Mar, 2024 1 commit
    • Nate Cibik's avatar
      Add PvT-v2 Model (#26812) · 1fc505b8
      Nate Cibik authored
      
      
      * Added pytests for pvt-v2, all passed
      
      * Added pvt_v2 to docs/source/end/model_doc
      
      * Ran fix-copies and fixup. All checks passed
      
      * Added additional ReLU for linear attention mode
      
      * pvt_v2_b2_linear converted and working
      
      * copied models/pvt to adapt to pvt_v2
      
      * First commit of pvt_v2
      
      * PvT-v2 now works in AutoModel
      
      * Reverted batch eval changes for PR
      
      * Expanded type support for Pvt-v2 config
      
      * Fixed config docstring. Added channels property
      
      * Fixed model names in tests
      
      * Fixed config backbone compat. Added additional type support for image size in config
      
      * Fixed config backbone compat
      
      * Allowed for batching of eval metrics
      
      * copied models/pvt to adapt to pvt_v2
      
      * First commit of pvt_v2
      
      * Set key and value layers to use separate linear modules. Fixed pruning function
      
      * Set AvgPool to 7
      
      * Fixed issue in init
      
      * PvT-v2 now works in AutoModel
      
      * Successful conversion of pretrained weights for PVT-v2
      
      * Successful conversion of pretrained weights for PVT-v2 models
      
      * Added pytests for pvt-v2, all passed
      
      * Ran fix-copies and fixup. All checks passed
      
      * Added additional ReLU for linear attention mode
      
      * pvt_v2_b2_linear converted and working
      
      * Allowed for batching of eval metrics
      
      * copied models/pvt to adapt to pvt_v2
      
      * First commit of pvt_v2
      
      * Set key and value layers to use separate linear modules. Fixed pruning function
      
      * Set AvgPool to 7
      
      * Fixed issue in init
      
      * PvT-v2 now works in AutoModel
      
      * Successful conversion of pretrained weights for PVT-v2
      
      * Successful conversion of pretrained weights for PVT-v2 models
      
      * Added pytests for pvt-v2, all passed
      
      * Ran fix-copies and fixup. All checks passed
      
      * Added additional ReLU for linear attention mode
      
      * pvt_v2_b2_linear converted and working
      
      * Reverted batch eval changes for PR
      
      * Updated index.md
      
      * Expanded type support for Pvt-v2 config
      
      * Fixed config docstring. Added channels property
      
      * Fixed model names in tests
      
      * Fixed config backbone compat
      
      * Ran fix-copies
      
      * Fixed PvtV2Backbone tests
      
      * Added TFRegNet to OBJECTS_TO_IGNORE in check_docstrings.py
      
      * Fixed backbone stuff and fixed tests: all passing
      
      * Ran make fixup
      
      * Made modifications for code checks
      
      * Remove ONNX config from configuration_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Use explicit image size dict in test_modeling_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Make image_size optional in test_modeling_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Remove _ntuple use in modeling_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Remove reference to fp16_enabled
      
      * Model modules now take config as first argument even when not used
      
      * Replaced abbreviations for "SR" and "AP" with explicit "spatialreduction" and "averagepooling"
      
      * All LayerNorm now instantiates with config.layer_norm_eps
      
      * Added docstring for depth-wise conv layer
      
      * PvtV2Config now only takes Union[int, Tuple[int, int]] for image size
      
      * Refactored PVTv2 in prep for gradient checkpointing
      
      * Gradient checkpointing ready to test
      
      * Removed override of _set_gradient_checkpointing
      
      * Cleaned out old code
      
      * Applied code fixup
      
      * Applied code fixup
      
      * Began debug of pvt_v2 tests
      
      * Leave handling of num_labels to base pretrained config class
      
      * Deactivated gradient checkpointing tests until it is fixed
      
      * Removed PvtV2ImageProcessor which duped PvtImageProcessor
      
      * Allowed for batching of eval metrics
      
      * copied models/pvt to adapt to pvt_v2
      
      * First commit of pvt_v2
      
      * Set key and value layers to use separate linear modules. Fixed pruning function
      
      * Set AvgPool to 7
      
      * Fixed issue in init
      
      * PvT-v2 now works in AutoModel
      
      * Successful conversion of pretrained weights for PVT-v2
      
      * Successful conversion of pretrained weights for PVT-v2 models
      
      * Added pytests for pvt-v2, all passed
      
      * Added pvt_v2 to docs/source/end/model_doc
      
      * Ran fix-copies and fixup. All checks passed
      
      * Added additional ReLU for linear attention mode
      
      * pvt_v2_b2_linear converted and working
      
      * copied models/pvt to adapt to pvt_v2
      
      * First commit of pvt_v2
      
      * PvT-v2 now works in AutoModel
      
      * Reverted batch eval changes for PR
      
      * Expanded type support for Pvt-v2 config
      
      * Fixed config docstring. Added channels property
      
      * Fixed model names in tests
      
      * Fixed config backbone compat. Added additional type support for image size in config
      
      * Fixed config backbone compat
      
      * Allowed for batching of eval metrics
      
      * copied models/pvt to adapt to pvt_v2
      
      * First commit of pvt_v2
      
      * Set key and value layers to use separate linear modules. Fixed pruning function
      
      * Set AvgPool to 7
      
      * Fixed issue in init
      
      * PvT-v2 now works in AutoModel
      
      * Successful conversion of pretrained weights for PVT-v2
      
      * Successful conversion of pretrained weights for PVT-v2 models
      
      * Added pytests for pvt-v2, all passed
      
      * Ran fix-copies and fixup. All checks passed
      
      * Added additional ReLU for linear attention mode
      
      * pvt_v2_b2_linear converted and working
      
      * Allowed for batching of eval metrics
      
      * copied models/pvt to adapt to pvt_v2
      
      * First commit of pvt_v2
      
      * Set key and value layers to use separate linear modules. Fixed pruning function
      
      * Set AvgPool to 7
      
      * Fixed issue in init
      
      * PvT-v2 now works in AutoModel
      
      * Successful conversion of pretrained weights for PVT-v2
      
      * Successful conversion of pretrained weights for PVT-v2 models
      
      * Added pytests for pvt-v2, all passed
      
      * Ran fix-copies and fixup. All checks passed
      
      * Added additional ReLU for linear attention mode
      
      * pvt_v2_b2_linear converted and working
      
      * Reverted batch eval changes for PR
      
      * Expanded type support for Pvt-v2 config
      
      * Fixed config docstring. Added channels property
      
      * Fixed model names in tests
      
      * Fixed config backbone compat
      
      * Ran fix-copies
      
      * Fixed PvtV2Backbone tests
      
      * Added TFRegNet to OBJECTS_TO_IGNORE in check_docstrings.py
      
      * Fixed backbone stuff and fixed tests: all passing
      
      * Ran make fixup
      
      * Made modifications for code checks
      
      * Remove ONNX config from configuration_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Use explicit image size dict in test_modeling_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Make image_size optional in test_modeling_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Remove _ntuple use in modeling_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Remove reference to fp16_enabled
      
      * Model modules now take config as first argument even when not used
      
      * Replaced abbreviations for "SR" and "AP" with explicit "spatialreduction" and "averagepooling"
      
      * All LayerNorm now instantiates with config.layer_norm_eps
      
      * Added docstring for depth-wise conv layer
      
      * PvtV2Config now only takes Union[int, Tuple[int, int]] for image size
      
      * Refactored PVTv2 in prep for gradient checkpointing
      
      * Gradient checkpointing ready to test
      
      * Removed override of _set_gradient_checkpointing
      
      * Cleaned out old code
      
      * Applied code fixup
      
      * Applied code fixup
      
      * Allowed for batching of eval metrics
      
      * copied models/pvt to adapt to pvt_v2
      
      * First commit of pvt_v2
      
      * PvT-v2 now works in AutoModel
      
      * Ran fix-copies and fixup. All checks passed
      
      * copied models/pvt to adapt to pvt_v2
      
      * First commit of pvt_v2
      
      * PvT-v2 now works in AutoModel
      
      * Reverted batch eval changes for PR
      
      * Fixed config docstring. Added channels property
      
      * Fixed config backbone compat
      
      * Allowed for batching of eval metrics
      
      * copied models/pvt to adapt to pvt_v2
      
      * First commit of pvt_v2
      
      * PvT-v2 now works in AutoModel
      
      * Ran fix-copies and fixup. All checks passed
      
      * Allowed for batching of eval metrics
      
      * copied models/pvt to adapt to pvt_v2
      
      * First commit of pvt_v2
      
      * PvT-v2 now works in AutoModel
      
      * Fixed config backbone compat
      
      * Ran fix-copies
      
      * Began debug of pvt_v2 tests
      
      * Leave handling of num_labels to base pretrained config class
      
      * Deactivated gradient checkpointing tests until it is fixed
      
      * Removed PvtV2ImageProcessor which duped PvtImageProcessor
      
      * Fixed issue from rebase
      
      * Fixed issue from rebase
      
      * Set tests for gradient checkpointing to skip those using reentrant since it isn't supported
      
      * Fixed issue from rebase
      
      * Fixed issue from rebase
      
      * Changed model name in docs
      
      * Removed duplicate PvtV2Backbone
      
      * Work around type switching issue in tests
      
      * Fix model name in config comments
      
      * Update docs/source/en/model_doc/pvt_v2.md
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Changed name of variable from 'attn_reduce' to 'sr_type'
      
      * Changed name of variable from 'attn_reduce' to 'sr_type'
      
      * Changed from using 'sr_type' to 'linear_attention' for clarity
      
      * Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
      
      Removed old code
      
      * Changed from using 'sr_type' to 'linear_attention' for clarity
      
      * Fixed Class names to be more descriptive
      
      * Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
      
      Removed outdated code
      
      * Moved paper abstract to single line in pvt_v2.md
      
      * Added usage tips to pvt_v2.md
      
      * Simplified module inits by passing layer_idx
      
      * Fixed typing for hidden_act in PvtV2Config
      
      * Removed unusued import
      
      * Add pvt_v2 to docs/source/en/_toctree.yml
      
      * Updated documentation in docs/source/en/model_doc/pvt_v2.md to be more comprehensive.
      
      * Updated documentation in docs/source/en/model_doc/pvt_v2.md to be more comprehensive.
      
      * Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
      
      Move function parameters to single line
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
      
      Update year of copyright to 2024
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
      
      Make code more explicit
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Updated sr_ratio to be more explicit spatial_reduction_ratio
      
      * Removed excess type hints in modeling_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Move params to single line in modeling_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Removed needless comment in modeling_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update copyright date in pvt_v2.md
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Moved params to single line in modeling_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Updated copyright date in configuration_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Cleaned comments in modeling_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Renamed spatial_reduction Conv2D operation
      
      * Revert "Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
      "
      
      This reverts commit c4a04416dde8f3475ab405d1feb368600e0f8538.
      
      * Updated conversion script to reflect module name change
      
      * Deprecated reshape_last_stage option in config
      
      * Removed unused imports
      
      * Code formatting
      
      * Fixed outdated decorators on test_inference_fp16
      
      * Added "Copied from" comments in test_modeling_pvt_v2.py
      
      * Fixed import listing
      
      * Updated model name
      
      * Force empty commit for PR refresh
      
      * Fixed linting issue
      
      * Removed # Copied from comments
      
      * Added PVTv2 to README_fr.md
      
      * Ran make fix-copies
      
      * Replace all FoamoftheSea hub references with OpenGVLab
      
      * Fixed out_indices and out_features logic in configuration_pvt_v2.py
      
      * Made ImageNet weight conversion verification optional in convert_pvt_v2_to_pytorch.py
      
      * Ran code fixup
      
      * Fixed order of parent classes in PvtV2Config to fix the to_dict method override
      
      ---------
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      1fc505b8
  2. 11 Mar, 2024 2 commits
  3. 05 Mar, 2024 1 commit
    • Arthur's avatar
      [`Add Mamba`] Adds support for the `Mamba` models (#28094) · fb1c62e9
      Arthur authored
      
      
      * initial-commit
      
      * start cleaning
      
      * small nits
      
      * small nits
      
      * current updates
      
      * add kernels
      
      * small refactoring little step
      
      * add comments
      
      * styling
      
      * nit
      
      * nits
      
      * Style
      
      * Small changes
      
      * Push dummy mambda simple slow
      
      * nit
      
      * Use original names
      
      * Use original names and remove norm
      
      * Updates for inference params
      
      * Style nd updates
      
      * nits
      
      * Match logits
      
      * Add a test
      
      * Add expected generated text
      
      * nits doc, imports and styling
      
      * style
      
      * oups
      
      * dont install kernels, invite users to install the required kernels
      
      * let use use the original packages
      
      * styling
      
      * nits
      
      * fix some copieds
      
      * update doc
      
      * fix-copies
      
      * styling done
      
      * nits
      
      * fix import check
      
      * run but wrong cuda ress
      
      * mamba CUDA works :)
      
      * fix the fast path
      
      * config naming nits
      
      * conversion script is not required at this stage
      
      * finish fixing the fast path: generation make sense now!
      
      * nit
      
      * Let's start working on the CIs
      
      * style
      
      * better style
      
      * more nits
      
      * test nit
      
      * quick fix for now
      
      * nits
      
      * nit
      
      * nit
      
      * nit
      
      * nits
      
      * update test rest
      
      * fixup
      
      * update test
      
      * nit
      
      * some fixes
      
      * nits
      
      * update test values
      
      * fix styling
      
      * nit
      
      * support peft
      
      * integrations tests require torchg
      
      * also add slow markers
      
      * styling
      
      * chose forward wisely
      
      * nits
      
      * update tests
      
      * fix gradient checkpointing
      
      * fixup
      
      * nit
      
      * fix doc
      
      * check copies
      
      * fix the docstring
      
      * fix some more tests
      
      * style
      
      * fix beam search
      
      * add init schene
      
      * update
      
      * nit
      
      * fix
      
      * fixup the doc
      
      * fix the doc
      
      * fixup
      
      * tentative update but slow is no longer good
      
      * nit
      
      * should we always use float32?
      
      * nits
      
      * revert wrong changes
      
      * res in float32
      
      * cleanup
      
      * skip fmt for now
      
      * update generation values
      
      * update test values running original model
      
      * fixup
      
      * update tests + rename inference_params to cache_params + make sure training does not use cache_params
      
      * small nits
      
      * more nits
      
      * fix final CIs
      
      * style
      
      * nit doc
      
      * I hope final doc nits
      
      * nit
      
      * 馃珷
      
      * final touch!
      
      * fix torch import
      
      * Apply suggestions from code review
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      
      * Apply suggestions from code review
      
      * fix fix and fix
      
      * fix base model prefix!
      
      * nit
      
      * Update src/transformers/models/mamba/__init__.py
      
      * Update docs/source/en/model_doc/mamba.md
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      
      * nit
      
      ---------
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      fb1c62e9
  4. 04 Mar, 2024 2 commits
    • NielsRogge's avatar
      Add UDOP (#22940) · 836921fd
      NielsRogge authored
      
      
      * First draft
      
      * More improvements
      
      * More improvements
      
      * More fixes
      
      * Fix copies
      
      * More improvements
      
      * More fixes
      
      * More improvements
      
      * Convert checkpoint
      
      * More improvements, set up tests
      
      * Fix more tests
      
      * Add UdopModel
      
      * More improvements
      
      * Fix equivalence test
      
      * More fixes
      
      * Redesign model
      
      * Extend conversion script
      
      * Use real inputs for conversion script
      
      * Add image processor
      
      * Improve conversion script
      
      * Add UdopTokenizer
      
      * Add fast tokenizer
      
      * Add converter
      
      * Update README's
      
      * Add processor
      
      * Add fully fledged tokenizer
      
      * Add fast tokenizer
      
      * Use processor in conversion script
      
      * Add tokenizer tests
      
      * Fix one more test
      
      * Fix more tests
      
      * Fix tokenizer tests
      
      * Enable fast tokenizer tests
      
      * Fix more tests
      
      * Fix additional_special_tokens of fast tokenizer
      
      * Fix tokenizer tests
      
      * Fix more tests
      
      * Fix equivalence test
      
      * Rename image to pixel_values
      
      * Rename seg_data to bbox
      
      * More renamings
      
      * Remove vis_special_token
      
      * More improvements
      
      * Add docs
      
      * Fix copied from
      
      * Update slow tokenizer
      
      * Update fast tokenizer design
      
      * Make text input optional
      
      * Add first draft of processor tests
      
      * Fix more processor tests
      
      * Fix decoder_start_token_id
      
      * Fix test_initialization
      
      * Add integration test
      
      * More improvements
      
      * Improve processor, add test
      
      * Add more copied from
      
      * Add more copied from
      
      * Add more copied from
      
      * Add more copied from
      
      * Remove print statement
      
      * Update README and auto mapping
      
      * Delete files
      
      * Delete another file
      
      * Remove code
      
      * Fix test
      
      * Fix docs
      
      * Remove asserts
      
      * Add doc tests
      
      * Include UDOP in exotic model tests
      
      * Add expected tesseract decodings
      
      * Add sentencepiece
      
      * Use same design as T5
      
      * Add UdopEncoderModel
      
      * Add UdopEncoderModel to tests
      
      * More fixes
      
      * Fix fast tokenizer
      
      * Fix one more test
      
      * Remove parallelisable attribute
      
      * Fix copies
      
      * Remove legacy file
      
      * Copy from T5Tokenizer
      
      * Fix rebase
      
      * More fixes, copy from T5
      
      * More fixes
      
      * Fix init
      
      * Use ArthurZ/udop for tests
      
      * Make all model tests pass
      
      * Remove UdopForConditionalGeneration from auto mapping
      
      * Fix more tests
      
      * fixups
      
      * more fixups
      
      * fix the tokenizers
      
      * remove un-necessary changes
      
      * nits
      
      * nits
      
      * replace truncate_sequences_boxes with truncate_sequences for fix-copies
      
      * nit current path
      
      * add a test for input ids
      
      * ids that we should get taken from c9f7a32f57440d90ff79890270d376a1cc0acb68
      
      * nits converting
      
      * nits
      
      * apply ruff
      
      * nits
      
      * nits
      
      * style
      
      * fix slow order of addition
      
      * fix udop fast range as well
      
      * fixup
      
      * nits
      
      * Add docstrings
      
      * Fix gradient checkpointing
      
      * Update code examples
      
      * Skip tests
      
      * Update integration test
      
      * Address comment
      
      * Make fixup
      
      * Remove extra ids from tokenizer
      
      * Skip test
      
      * Apply suggestions from code review
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update year
      
      * Address comment
      
      * Address more comments
      
      * Address comments
      
      * Add copied from
      
      * Update CI
      
      * Rename script
      
      * Update model id
      
      * Add AddedToken, skip tests
      
      * Update CI
      
      * Fix doc tests
      
      * Do not use Tesseract for the doc tests
      
      * Remove kwargs
      
      * Add original inputs
      
      * Update casting
      
      * Fix doc test
      
      * Update question
      
      * Update question
      
      * Use LayoutLMv3ImageProcessor
      
      * Update organization
      
      * Improve docs
      
      * Update forward signature
      
      * Make images optional
      
      * Remove deprecated device argument
      
      * Add comment, add add_prefix_space
      
      * More improvements
      
      * Remove kwargs
      
      ---------
      Co-authored-by: default avatarArthurZucker <arthur.zucker@gmail.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      836921fd
    • NielsRogge's avatar
      Convert SlimSAM checkpoints (#28379) · 5e4b69dc
      NielsRogge authored
      
      
      * First commit
      
      * Improve conversion script
      
      * Convert more checkpoints
      
      * Update src/transformers/models/sam/convert_sam_original_to_hf_format.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Rename file
      
      * More updates
      
      * Update docstring
      
      * Update script
      
      ---------
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      5e4b69dc
  5. 28 Feb, 2024 2 commits
  6. 26 Feb, 2024 2 commits
  7. 21 Feb, 2024 1 commit
    • Arthur's avatar
      [ `gemma`] Adds support for Gemma 馃拵 (#29167) · 594c1277
      Arthur authored
      * inital commit
      
      * update
      
      * update conversion checkpoint
      
      * update conversion script
      
      * nits
      
      * some fixes
      
      * nits
      
      * merge
      
      * fix permute
      
      * nits
      
      * fix
      
      * nits
      
      * nits
      
      * nits
      
      * fix rope
      
      * fix both rope
      
      * nites
      
      * style
      
      * make sure flax works
      
      * fix flax init code
      
      * fix foward
      
      * nits
      
      * print flax generation out
      
      * current code
      
      * nits
      
      * SIIIIIIIIIIIIIIIIIII
      
      * update
      
      * add new tokenizer
      
      * correct fast tokenizer
      
      * fix conversion
      
      * more comments
      
      * fix modeling and conversion
      
      * nits and nits
      
      * nits testing
      
      * add some tokenization tests
      
      * add some edge cases
      
      * add slow tests and fix them
      
      * fixup
      
      * fix copies for modeling
      
      * fix copies
      
      * add 7B slow tests
      
      * fix
      
      * fix
      
      * fix tests
      
      * make tokenizer cis go green
      
      * styling
      
      * last tokenizer nits
      
      * update jax tests
      
      * fix flax for 7b
      
      * add jit testing 馃
      
      
      
      * cleanups
      
      * isolated nit, inv_freq for rotary_emb.inv_freq
      
      * propagate to jax
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * adjust test
      
      * fix conversion script
      
      * change name
      
      * correct file names
      
      * update conversion script
      
      * Fix bos and eos token ids in the model configuration (#3)
      
      * update modelling
      
      * update conversion script
      
      * add static cache for gemma
      
      * fix sdpa generate
      
      * fix batched
      
      * multiple fixes
      
      * fix FA2
      
      * final fix
      
      * Rename a few missing strings and filenames (#4)
      
      * merge with upstream main
      
      * fix copies
      
      * fix copies
      
      * fix fixup
      
      * fix fixup
      
      * fix
      
      * fix
      
      * final tests
      
      * fix fx gemma tests
      
      * fix fx bf16/fp16 tests
      
      * update slow fx tests
      
      * fx slow tests: one logits, one generation
      
      * move jit test standalone
      
      * Apply suggestions from code review
      
      * nits
      
      * tokenizer updates
      
      * more tokenization updates: custom GemmaSentencepieceExtrator
      
      * style
      
      * Update src/transformers/cache_utils.py
      
      * Update src/transformers/models/gemma/__init__.py
      
      * Update tests/models/gemma/test_modeling_flax_gemma.py
      
      * small nits
      
      * style
      
      * update tokenization test
      
      * fix the rotary embedding
      
      * with style
      
      * fix slow tests
      
      * WARNING this commit might be very important for precisions
      
      * Update tests/models/gemma/test_modeling_flax_gemma.py
      
      * Update src/transformers/models/gemma/configuration_gemma.py
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      
      * Update src/transformers/models/gemma/modeling_flax_gemma.py
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      
      * small nits here and there!
      
      * forgotten nit
      
      * remove on the fly computation of inv_freq
      
      * revert previous change, let's be safe and for now re-compute freq cis to make sure it's in float
      
      * Apply suggestions from code review
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update tests/models/gemma/test_modeling_gemma.py
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update tests/models/gemma/test_modeling_gemma.py
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update tests/models/gemma/test_modeling_gemma.py
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update tests/models/gemma/test_modeling_flax_gemma.py
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update tests/models/gemma/test_modeling_gemma.py
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update tests/models/gemma/test_modeling_gemma.py
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update tests/models/gemma/test_tokenization_gemma.py
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update tests/models/gemma/test_tokenization_gemma.py
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update tests/models/gemma/test_tokenization_gemma.py
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update tests/models/gemma/test_tokenization_gemma.py
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update tests/models/gemma/test_modeling_gemma.py
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update tests/models/gemma/test_modeling_gemma.py
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update tests/models/gemma/test_modeling_gemma.py
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update tests/models/gemma/test_modeling_gemma.py
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update tests/models/gemma/test_modeling_gemma.py
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * nit conversion script link
      
      * fix some tests
      
      * add not doctest and pr doctest
      
      * repo consistency
      
      * fix last CIs 馃殌
      
      
      
      * update all readmes
      
      ---------
      Co-authored-by: default avataryounesbelkada <younesbelkada@gmail.com>
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
      Co-authored-by: default avatarsanchit-gandhi <sanchit@huggingface.co>
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      594c1277
  8. 20 Feb, 2024 1 commit
  9. 16 Feb, 2024 2 commits
  10. 14 Feb, 2024 2 commits
    • amyeroberts's avatar
      Backbone kwargs in config (#28784) · 0199a484
      amyeroberts authored
      
      
      * Enable instantiating model with pretrained backbone weights
      
      * Clarify pretrained import
      
      * Use load_backbone instead
      
      * Add backbone_kwargs to config
      
      * Pass kwargs to constructors
      
      * Fix up
      
      * Input verification
      
      * Add tests
      
      * Tidy up
      
      * Update tests/utils/test_backbone_utils.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      ---------
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      0199a484
    • Jonathan Tow's avatar
      Add `StableLM` (#28810) · de6029a0
      Jonathan Tow authored
      * Add `StableLM`
      
      * fix(model): re-create from `huggingface-cli add-new-model-like persimmon`
      
      * fix: re-add changes to address comments
      
      * fix(readme): add links to paper
      
      * fix(tokenization_auto): remove `GPTNeoXTokenizerFastFast` ref
      
      * fix(tests): re-add `@slow` decorator to integration tests
      
      * fix(tests): import slow...
      
      * fix(readme_hd): remove whitespace edit
      
      * fix(tokenizer): auto tokenizer tuple
      
      * skip doctests for `modeling_stablelm`
      de6029a0
  11. 05 Feb, 2024 1 commit
  12. 31 Jan, 2024 3 commits
    • Yih-Dar's avatar
      Split daily CI using 2 level matrix (#28773) · 47358661
      Yih-Dar authored
      
      
      * update / add new workflow files
      
      * Add comment
      
      * Use env.NUM_SLICES
      
      * use scripts
      
      * use scripts
      
      * use scripts
      
      * Fix
      
      * using one script
      
      * Fix
      
      * remove unused file
      
      * update
      
      * fail-fast: false
      
      * remove unused file
      
      * fix
      
      * fix
      
      * use matrix
      
      * inputs
      
      * style
      
      * update
      
      * fix
      
      * fix
      
      * no model name
      
      * add doc
      
      * allow args
      
      * style
      
      * pass argument
      
      ---------
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      47358661
    • Yih-Dar's avatar
      Add artifact name in job step to maintain job / artifact correspondence (#28682) · 95346e9d
      Yih-Dar authored
      
      
      * avoid using job name
      
      * apply to other files
      
      ---------
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      95346e9d
    • Kian Sierra McGettigan's avatar
      Flax mistral (#26943) · f7076cd3
      Kian Sierra McGettigan authored
      * direct copy from llama work
      
      * mistral modules forward pass working
      
      * flax mistral forward pass with sliding window
      
      * added tests
      
      * added layer collection approach
      
      * Revert "added layer collection approach"
      
      This reverts commit 0e2905bf2236ec323163fc1a9f0c016b21aa8b8f.
      
      * Revert "Revert "added layer collection approach""
      
      This reverts commit fb17b6187ac5d16da7c461e1130514dc3d137a43.
      
      * fixed attention outputs
      
      * added mistral to init and auto
      
      * fixed import name
      
      * fixed layernorm weight dtype
      
      * freeze initialized weights
      
      * make sure conversion consideres bfloat16
      
      * added backend
      
      * added docstrings
      
      * added cache
      
      * fixed sliding window causal mask
      
      * passes cache tests
      
      * passed all tests
      
      * applied make style
      
      * removed commented out code
      
      * applied fix-copies ignored other model changes
      
      * applied make fix-copies
      
      * removed unused functions
      
      * passed generation integration test
      
      * slow tests pass
      
      * fixed slow tests
      
      * changed default dtype from jax.numpy.float32 to float32 for docstring check
      
      * skip cache test  for FlaxMistralForSequenceClassification since if pad_token_id in input_ids it doesn't score previous input_ids
      
      * updated checkpoint since from_pt not included
      
      * applied black style
      
      * removed unused args
      
      * Applied styling and fixup
      
      * changed checkpoint for doc back
      
      * fixed rf after adding it to hf hub
      
      * Add dummy ckpt
      
      * applied styling
      
      * added tokenizer to new ckpt
      
      * fixed slice format
      
      * fix init and slice
      
      * changed ref for placeholder TODO
      
      * added copies from Llama
      
      * applied styling
      
      * applied fix-copies
      
      * fixed docs
      
      * update weight dtype reconversion for sharded weights
      
      * removed Nullable input ids
      
      * Removed unnecessary output attentions in Module
      
      * added embedding weight initialziation
      
      * removed unused past_key_values
      
      * fixed deterministic
      
      * Fixed RMS Norm and added copied from
      
      * removed input_embeds
      
      * applied make style
      
      * removed nullable input ids from sequence classification model
      
      * added copied from GPTJ
      
      * added copied from Llama on FlaxMistralDecoderLayer
      
      * added copied from to FlaxMistralPreTrainedModel methods
      
      * fix test deprecation warning
      
      * freeze gpt neox random_params and fix copies
      
      * applied make style
      
      * fixed doc issue
      
      * skipped docstring test to allign # copied from
      
      * applied make style
      
      * removed FlaxMistralForSequenceClassification
      
      * removed unused padding_idx
      
      * removed more sequence classification
      
      * removed sequence classification
      
      * applied styling and consistency
      
      * added copied from in tests
      
      * removed sequence classification test logic
      
      * applied styling
      
      * applied make style
      
      * removed freeze and fixed copies
      
      * undo test change
      
      * changed repeat_kv to tile
      
      * fixed to key value groups
      
      * updated copyright year
      
      * split casual_mask
      
      * empty to rerun failed pt_flax_equivalence test FlaxWav2Vec2ModelTest
      
      * went back to 2023 for tests_pr_documentation_tests
      
      * went back to 2024
      
      * changed tile to repeat
      
      * applied make style
      
      * empty for retry on Wav2Vec2
      f7076cd3
  13. 30 Jan, 2024 2 commits
  14. 29 Jan, 2024 1 commit
  15. 23 Jan, 2024 1 commit
  16. 18 Jan, 2024 1 commit
    • Yoach Lacombe's avatar
      Add new meta w2v2-conformer BERT-like model (#28165) · d2cdefb9
      Yoach Lacombe authored
      
      
      * first commit
      
      * correct default value non causal
      
      * update config and modeling code
      
      * update converting checkpoint
      
      * clean modeling and fix tests
      
      * make style
      
      * add new config parameters to docstring
      
      * fix copied from statements
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * make position_embeddings_type docstrings clearer
      
      * clean converting script
      
      * remove function not used
      
      * clean modeling file
      
      * apply suggestion for test file + add convert script to not_doctested
      
      * modify tests according to review - cleaner logic and more tests
      
      * Apply nit suggestions from code review
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * add checker of valid position embeddings type
      
      * instantiate new layer norm layer with the right eps
      
      * fix freeze_feature_encoder since it can be None in some cases
      
      * add test same output in convert script
      
      * restore wav2vec2conformer and add new model
      
      * create processor and FE + clean
      
      * add new model code
      
      * fix convert script and set default config parameters
      
      * correct model id paths
      
      * make style
      
      * make fix-copies and cleaning files
      
      * fix copied from statements
      
      * complete .md and fixe copies
      
      * clean convert script argument defaults
      
      * fix config parameters docstrings
      
      * fix config docstring
      
      * add copied from and enrich FE tests
      
      * fix copied from and repo-consistency
      
      * add autotokenizer
      
      * make test input length shorter and change docstring code
      
      * fix docstrings and copied from
      
      * add add_adapter to ASR training example
      
      * make testing of adapters more robust
      
      * adapt to multi adapter layers
      
      * refactor input_values->input_features and remove w2v2-bert feature extractor
      
      * remove pretraining model
      
      * remove depreciated features and useless lines
      
      * add copied from and ignore statements to modeling tests
      
      * remove pretraining model #2
      
      * change import in convert script
      
      * change default in convert script
      
      * update readme and remove useless line
      
      * Update tests/models/wav2vec2_bert/test_processor_wav2vec2_bert.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * refactor BERT to Bert for consistency
      
      * remove useless ignore copy statement
      
      * add persistent to buffer in rotary
      
      * add eps in LayerNorm init and remove copied from
      
      * add adapter activation parameters and add copied from statements
      
      * Fix copied statements and add unitest.skip reasons
      
      * add copied statement in test_processor
      
      * refactor processor
      
      * make style
      
      * replace numpy random by torch rand
      
      * remove expected output CTC
      
      * improve converting script with processor class
      
      * Apply suggestions from code review
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * remove gumbel class
      
      * remove tests related to previously deleted class
      
      * Update src/transformers/models/wav2vec2_bert/configuration_wav2vec2_bert.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * correct typos
      
      * remove uused parameters
      
      * update processor to takes both text and audio
      
      * update checkpoints
      
      * update expected output and add ctc expected output
      
      * add label_attention_mask
      
      * replace pt with np in processor tests
      
      * fix typo
      
      * revert to behaviour with labels_attention_mask
      
      ---------
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      d2cdefb9
  17. 17 Jan, 2024 1 commit
    • Junyang Lin's avatar
      Add qwen2 (#28436) · d6ffe74d
      Junyang Lin authored
      
      
      * add config, modeling, and tokenization
      
      * add auto and init
      
      * update readme
      
      * update readme
      
      * update team name
      
      * fixup
      
      * fixup
      
      * update config
      
      * update code style
      
      * update for fixup
      
      * update for fixup
      
      * update for fixup
      
      * update for testing
      
      * update for testing
      
      * fix bug for config and tokenization
      
      * fix bug for bos token
      
      * not doctest
      
      * debug tokenizer
      
      * not doctest
      
      * debug tokenization
      
      * debug init for tokenizer
      
      * fix style
      
      * update init
      
      * delete if in token auto
      
      * add tokenizer doc
      
      * add tokenizer in init
      
      * Update dummy_tokenizers_objects.py
      
      * update
      
      * update
      
      * debug
      
      * Update tokenization_qwen2.py
      
      * debug
      
      * Update convert_slow_tokenizer.py
      
      * add copies
      
      * add copied from and make style
      
      * update files map
      
      * update test
      
      * fix style
      
      * fix merge reading and update tests
      
      * fix tests
      
      * fix tests
      
      * fix style
      
      * debug a variable in readme
      
      * Update src/transformers/models/qwen2/configuration_qwen2.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * update test and copied from
      
      * fix style
      
      * update qwen2 tokenization  and tests
      
      * Update tokenization_qwen2.py
      
      * delete the copied from after property
      
      * fix style
      
      * update tests
      
      * update tests
      
      * add copied from
      
      * fix bugs
      
      * update doc
      
      * add warning for sliding window attention
      
      * update qwen2 tokenization
      
      * fix style
      
      * Update src/transformers/models/qwen2/modeling_qwen2.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * fix tokenizer fast
      
      ---------
      Co-authored-by: default avatarRen Xuancheng <jklj077@users.noreply.github.com>
      Co-authored-by: default avatarrenxuancheng.rxc <renxuancheng.rxc@alibaba-inc.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      d6ffe74d
  18. 15 Jan, 2024 1 commit
  19. 12 Jan, 2024 2 commits
  20. 11 Jan, 2024 1 commit
  21. 10 Jan, 2024 1 commit
  22. 08 Jan, 2024 1 commit
    • NielsRogge's avatar
      Add SigLIP (#26522) · 3b742ea8
      NielsRogge authored
      
      
      * Add first draft
      
      * Use appropriate gelu function
      
      * More improvements
      
      * More improvements
      
      * More improvements
      
      * Convert checkpoint
      
      * More improvements
      
      * Improve docs, remove print statements
      
      * More improvements
      
      * Add link
      
      * remove unused masking function
      
      * begin tokenizer
      
      * do_lower_case
      
      * debug
      
      * set split_special_tokens=True
      
      * Remove script
      
      * Fix style
      
      * Fix rebase
      
      * Use same design as CLIP
      
      * Add fast tokenizer
      
      * Add SiglipTokenizer to init, remove extra_ids
      
      * Improve conversion script
      
      * Use smaller inputs in conversion script
      
      * Update conversion script
      
      * More improvements
      
      * Add processor to conversion script
      
      * Add tests
      
      * Remove print statements
      
      * Add tokenizer tests
      
      * Fix more tests
      
      * More improvements related to weight initialization
      
      * More improvements
      
      * Make more tests pass
      
      * More improvements
      
      * More improvements
      
      * Add copied from
      
      * Add canonicalize_text
      
      * Enable fast tokenizer tests
      
      * More improvements
      
      * Fix most slow tokenizer tests
      
      * Address comments
      
      * Fix style
      
      * Remove script
      
      * Address some comments
      
      * Add copied from to tests
      
      * Add more copied from
      
      * Add more copied from
      
      * Add more copied from
      
      * Remove is_flax_available
      
      * More updates
      
      * Address comment
      
      * Remove SiglipTokenizerFast for now
      
      * Add caching
      
      * Remove umt5 test
      
      * Add canonicalize_text inside _tokenize, thanks Arthur
      
      * Fix image processor tests
      
      * Skip tests which are not applicable
      
      * Skip test_initialization
      
      * More improvements
      
      * Compare pixel values
      
      * Fix doc tests, add integration test
      
      * Add do_normalize
      
      * Remove causal mask and leverage ignore copy
      
      * Fix attention_mask
      
      * Fix remaining tests
      
      * Fix dummies
      
      * Rename temperature and bias
      
      * Address comments
      
      * Add copied from to tokenizer tests
      
      * Add SiglipVisionModel to auto mapping
      
      * Add copied from to image processor tests
      
      * Improve doc
      
      * Remove SiglipVisionModel from index
      
      * Address comments
      
      * Improve docs
      
      * Simplify config
      
      * Add first draft
      
      * Make it like mistral
      
      * More improvements
      
      * Fix attention_mask
      
      * Fix output_attentions
      
      * Add note in docs
      
      * Convert multilingual model
      
      * Convert large checkpoint
      
      * Convert more checkpoints
      
      * Add pipeline support, correct image_mean and image_std
      
      * Use padding=max_length by default
      
      * Make processor like llava
      
      * Add code snippet
      
      * Convert more checkpoints
      
      * Set keep_punctuation_string=None as in OpenCLIP
      
      * Set normalized=False for special tokens
      
      * Fix doc test
      
      * Update integration test
      
      * Add figure
      
      * Update organization
      
      * Happy new year
      
      * Use AutoModel everywhere
      
      ---------
      Co-authored-by: default avatarpatil-suraj <surajp815@gmail.com>
      3b742ea8
  23. 03 Jan, 2024 1 commit
    • Connor Henderson's avatar
      Add FastSpeech2Conformer (#23439) · d83ff5ee
      Connor Henderson authored
      * start - docs, SpeechT5 copy and rename
      
      * add relevant code from FastSpeech2 draft, have tests pass
      
      * make it an actual conformer, demo ex.
      
      * matching inference with original repo, includes debug code
      
      * refactor nn.Sequentials, start more desc. var names
      
      * more renaming
      
      * more renaming
      
      * vocoder scratchwork
      
      * matching vocoder outputs
      
      * hifigan vocoder conversion script
      
      * convert model script, rename some config vars
      
      * replace postnet with speecht5's implementation
      
      * passing common tests, file cleanup
      
      * expand testing, add output hidden states and attention
      
      * tokenizer + passing tokenizer tests
      
      * variety of updates and tests
      
      * g2p_en pckg setup
      
      * import structure edits
      
      * docstrings and cleanup
      
      * repo consistency
      
      * deps
      
      * small cleanup
      
      * forward signature param order
      
      * address comments except for masks and labels
      
      * address comments on attention_mask and labels
      
      * address second round of comments
      
      * remove old unneeded line
      
      * address comments part 1
      
      * address comments pt 2
      
      * rename auto mapping
      
      * fixes for failing tests
      
      * address comments part 3 (bart-like, train loss)
      
      * make style
      
      * pass config where possible
      
      * add forward method + tests to WithHifiGan model
      
      * make style
      
      * address arg passing and generate_speech comments
      
      * address Arthur comments
      
      * address Arthur comments pt2
      
      * lint  changes
      
      * Sanchit comment
      
      * add g2p-en to doctest deps
      
      * move up self.encoder
      
      * onnx compatible tensor method
      
      * fix is symbolic
      
      * fix paper url
      
      * move models to espnet org
      
      * make style
      
      * make fix-copies
      
      * update docstring
      
      * Arthur comments
      
      * update docstring w/ new updates
      
      * add model architecture images
      
      * header size
      
      * md wording update
      
      * make style
      d83ff5ee
  24. 22 Dec, 2023 2 commits
  25. 18 Dec, 2023 1 commit
  26. 13 Dec, 2023 1 commit
    • Younes Belkada's avatar
      Adds VIP-llava to transformers (#27932) · c7f076a0
      Younes Belkada authored
      * v1
      
      * add-new-model-like
      
      * revert
      
      * fix forward and conversion script
      
      * revert
      
      * fix copies
      
      * fixup
      
      * fix
      
      * Update docs/source/en/index.md
      
      * Apply suggestions from code review
      
      * push
      
      * fix
      
      * fixes here and there
      
      * up
      
      * fixup and fix tests
      
      * Apply suggestions from code review
      
      * add docs
      
      * fixup
      
      * fixes
      
      * docstring
      
      * add docstring
      
      * fixup
      
      * docstring
      
      * fixup
      
      * nit
      
      * docs
      
      * more copies
      
      * fix copies
      
      * nit
      
      * update test
      c7f076a0
  27. 11 Dec, 2023 1 commit
    • Arthur's avatar
      [`Add Mixtral`] Adds support for the Mixtral MoE (#27942) · accccdd0
      Arthur authored
      
      
      * up
      
      * up
      
      * test
      
      * logits ok
      
      * up
      
      * up
      
      * few fixes
      
      * conversion script
      
      * up
      
      * nits
      
      * nits
      
      * update
      
      * nuke
      
      * more updates
      
      * nites
      
      * fix many issues
      
      * nit
      
      * scatter
      
      * nit
      
      * nuke megablocks
      
      * nits
      
      * fix conversion script
      
      * nit
      
      * remove
      
      * nits
      
      * nit
      
      * update
      
      * oupsssss
      
      * change
      
      * nits device
      
      * nits
      
      * fixup
      
      * update
      
      * merge
      
      * add copied from
      
      * fix the copy mentions
      
      * update tests
      
      * more fixes
      
      * nits
      
      * conversion script
      
      * add parts of the readme
      
      * Update tests/models/mixtral/test_modeling_mixtral.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * new test + conversion script
      
      * Apply suggestions from code review
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Apply suggestions from code review
      
      * fix
      
      * fix copies
      
      * fix copies
      
      * ooops
      
      * fix config
      
      * Apply suggestions from code review
      
      * fix nits
      
      * nit
      
      * add copies
      
      * add batched tests
      
      * docs
      
      * fix flash attention
      
      * let's add more verbose
      
      * add correct outputs
      
      * support router ouptus
      
      * ignore copies where needed
      
      * fix
      
      * cat list if list is given for now
      
      * nits
      
      * Update docs/source/en/model_doc/mixtral.md
      
      * finish router refactoring
      
      * fix forward
      
      * fix expected values
      
      * nits
      
      * fixup
      
      * fix
      
      * fix bug
      
      * fix
      
      * fix dtype mismatch
      
      * fix
      
      * grrr grrr I support item assignment
      
      * fix CI
      
      * docs
      
      * fixup
      
      * remove some copied form
      
      * fix weird diff
      
      * skip doctest fast on the config and modeling
      
      * mark that is supports flash attention in the doc
      
      * update
      
      * Update src/transformers/models/mixtral/modeling_mixtral.py
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      
      * Update docs/source/en/model_doc/mixtral.md
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      
      * revert router logits config issue
      
      * update doc accordingly
      
      * Update src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py
      
      * nits
      
      * use torch testing asssert close
      
      * fixup
      
      * doc nits
      
      ---------
      Co-authored-by: default avataryounesbelkada <younesbelkada@gmail.com>
      Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      accccdd0
  28. 08 Dec, 2023 1 commit
  29. 07 Dec, 2023 1 commit