1. 02 Apr, 2024 2 commits
    • Yoach Lacombe's avatar
      Add Flash Attention 2 support to Musicgen and Musicgen Melody (#29939) · 0d04b1e2
      Yoach Lacombe authored
      * add FA2 to o.g Musicgen
      
      * make style
      
      * add FA2 support to Musicgen Melody
      
      * add generation FA2 tests to o.g Musicgen
      
      * make style and fix copies
      
      * add Musicgen to FA2 docs + deprecate list
      
      * add sdpa supports to Musicgen's
      
      * make style and fix copies
      
      * refactor attention implementation arguments
      
      * add Copied from to sdpa tests
      
      * add copied form in sdpa tests melody
      
      * add copied for FA2 generation tests
      
      * add FA2 inference copied from
      
      * make style
      0d04b1e2
    • Steven Liu's avatar
      [docs] Big model loading (#29920) · 096f3046
      Steven Liu authored
      * update
      
      * feedback
      096f3046
  2. 30 Mar, 2024 1 commit
  3. 29 Mar, 2024 1 commit
  4. 28 Mar, 2024 4 commits
  5. 27 Mar, 2024 1 commit
    • Bo Zheng's avatar
      Add Qwen2MoE (#29377) · 1c39974a
      Bo Zheng authored
      
      
      * add support for qwen2 MoE models
      
      * update docs
      
      * add support for qwen2 MoE models
      
      * update docs
      
      * update model name & test
      
      * update readme
      
      * update class names & readme & model_doc of Qwen2MoE.
      
      * update architecture name
      
      * fix qwen2_moe tests
      
      * use Qwen2Tokenizer instead of Qwen2MoeTokenizer
      
      * update modeling_qwen2_moe.py
      
      * fix model architecture
      
      * fix qwen2_moe tests
      
      * use Qwen2Tokenizer instead of Qwen2MoeTokenizer
      
      * update modeling_qwen2_moe.py
      
      * fix model architecture
      
      * fix style
      
      * fix test when there are sparse and non sparse layers
      
      * fixup
      
      * Update README.md
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * fixup
      
      * fixup
      
      * add archive back
      
      * add support for qwen2 MoE models
      
      * update docs
      
      * update model name & test
      
      * update readme
      
      * update class names & readme & model_doc of Qwen2MoE.
      
      * update architecture name
      
      * fix qwen2_moe tests
      
      * use Qwen2Tokenizer instead of Qwen2MoeTokenizer
      
      * update modeling_qwen2_moe.py
      
      * fix model architecture
      
      * fixup
      
      * fix qwen2_moe tests
      
      * use Qwen2Tokenizer instead of Qwen2MoeTokenizer
      
      * fix style
      
      * fix test when there are sparse and non sparse layers
      
      * fixup
      
      * add archive back
      
      * fix integration test
      
      * fixup
      
      ---------
      Co-authored-by: default avatarbozheng-hit <dsoul0621@gmail.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      1c39974a
  6. 26 Mar, 2024 2 commits
  7. 25 Mar, 2024 1 commit
  8. 24 Mar, 2024 1 commit
    • gamepad_coder's avatar
      model_summary.md - Restore link to Harvard's Annotated Transformer. (#29702) · 76a33a10
      gamepad_coder authored
      * model_summary.md - Add link to Harvard's Annotated Transformer.
      
      * model_summary.md - slight wording change + capitalize name of the paper
      
      * model_summary.md - moves the Annotated Transformer link in a praenthesis next to the link to the original paper (great idea, stevhliu!)
      
      * model_summary.md - moves the Annotated Transformer link in a praenthesis next to the link to the original paper (commit pt. 2, accidentally removed "has" in pt. 1)
      76a33a10
  9. 23 Mar, 2024 1 commit
  10. 21 Mar, 2024 1 commit
  11. 20 Mar, 2024 2 commits
    • NielsRogge's avatar
      Add LLaVa-1.6, bis (#29586) · d91fd7f9
      NielsRogge authored
      
      
      * First draft
      
      * Fix tests, add docs
      
      * Improve docstrings
      
      * Fix test
      
      * Address comments
      
      * Address comments
      
      * Remove vocab_size attribute
      
      * Remove batch_size
      
      * Address comment
      
      * Add image processor tests
      
      * Support fx
      
      * Update docstring
      
      * Add support for 34b
      
      * Convert 34b model
      
      * Add integration tests
      
      * Update checkpoints
      
      * Convert vicuna-13b, remove doc tests
      
      * Remove script
      
      * Remove file
      
      * Address comments
      
      * Improve docstrings
      
      * Deprecate vocab_size
      
      * Remove aspect_ratio_setting
      
      * Address comments
      
      * Update READMEs
      
      * Add tips about chat templates
      
      * Fix tests
      
      * Deprecate vocab_size safely
      
      * Update tests
      
      ---------
      Co-authored-by: default avatarAmy Roberts <22614925+amyeroberts@users.noreply.github.com>
      d91fd7f9
    • amyeroberts's avatar
      3c17c529
  12. 19 Mar, 2024 2 commits
  13. 18 Mar, 2024 2 commits
    • Abubakar Abid's avatar
      Update the pipeline tutorial to include `gradio.Interface.from_pipeline` (#29684) · 838b87ab
      Abubakar Abid authored
      
      
      * Update pipeline_tutorial.md to include gradio
      
      * Update pipeline_tutorial.md
      
      * Update docs/source/en/pipeline_tutorial.md
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      
      * Update docs/source/en/pipeline_tutorial.md
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      
      * Update docs/source/en/pipeline_tutorial.md
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      
      * Update docs/source/en/pipeline_tutorial.md
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      
      * Update pipeline_tutorial.md
      
      * Update docs/source/en/pipeline_tutorial.md
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      ---------
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      838b87ab
    • Yoach Lacombe's avatar
      Add MusicGen Melody (#28819) · c43b380e
      Yoach Lacombe authored
      
      
      * first modeling code
      
      * make repository
      
      * still WIP
      
      * update model
      
      * add tests
      
      * add latest change
      
      * clean docstrings and copied from
      
      * update docstrings md and readme
      
      * correct chroma function
      
      * correct copied from and remove unreleated test
      
      * add doc to toctree
      
      * correct imports
      
      * add convert script to notdoctested
      
      * Add suggestion from Sanchit
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * correct get_uncoditional_inputs docstrings
      
      * modify README according to SANCHIT feedback
      
      * add chroma to audio utils
      
      * clean librosa and torchaudio hard dependencies
      
      * fix FE
      
      * refactor audio decoder -> audio encoder for consistency with previous musicgen
      
      * refactor conditional -> encoder
      
      * modify sampling rate logics
      
      * modify license at the beginning
      
      * refactor all_self_attns->all_attentions
      
      * remove ignore copy from causallm generate
      
      * add copied from for from_sub_models
      
      * fix make copies
      
      * add warning if audio is truncated
      
      * add copied from where relevant
      
      * remove artefact
      
      * fix convert script
      
      * fix torchaudio and FE
      
      * modify chroma method according to feedback-> better naming
      
      * refactor input_values->input_features
      
      * refactor input_values->input_features and fix import fe
      
      * add input_features to docstrigs
      
      * correct inputs_embeds logics
      
      * remove dtype conversion
      
      * refactor _prepare_conditional_hidden_states_kwargs_for_generation ->_prepare_encoder_hidden_states_kwargs_for_generation
      
      * change warning for chroma length
      
      * Update src/transformers/models/musicgen_melody/convert_musicgen_melody_transformers.py
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * change way to save wav, using soundfile
      
      * correct docs and change to soundfile
      
      * fix import
      
      * fix init proj layers
      
      * remove line breaks from md
      
      * fix issue with docstrings
      
      * add FE suggestions
      
      * improve is in logics and remove useless imports
      
      * remove custom from_pretrained
      
      * simplify docstring code
      
      * add suggestions for modeling tests
      
      * make style
      
      * update converting script with sanity check
      
      * remove encoder attention mask from conditional generation
      
      * replace musicgen melody checkpoints with official orga
      
      * rename ylacombe->facebook in checkpoints
      
      * fix copies
      
      * remove unecessary warning
      
      * add shape in code docstrings
      
      * add files to slow doc tests
      
      * fix md bug and add md to not_tested
      
      * make fix-copies
      
      * fix hidden states test and batching
      
      ---------
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      c43b380e
  14. 15 Mar, 2024 3 commits
  15. 13 Mar, 2024 6 commits
    • Aaron Jimenez's avatar
      [docs] Remove broken ChatML format link from chat_templating.md (#29643) · f738ab3b
      Aaron Jimenez authored
      * remove ChatML link from en/
      
      * remove ChatML link in ja/
      
      * remove ChatML link in zh/
      f738ab3b
    • Nate Cibik's avatar
      Add PvT-v2 Model (#26812) · 1fc505b8
      Nate Cibik authored
      
      
      * Added pytests for pvt-v2, all passed
      
      * Added pvt_v2 to docs/source/end/model_doc
      
      * Ran fix-copies and fixup. All checks passed
      
      * Added additional ReLU for linear attention mode
      
      * pvt_v2_b2_linear converted and working
      
      * copied models/pvt to adapt to pvt_v2
      
      * First commit of pvt_v2
      
      * PvT-v2 now works in AutoModel
      
      * Reverted batch eval changes for PR
      
      * Expanded type support for Pvt-v2 config
      
      * Fixed config docstring. Added channels property
      
      * Fixed model names in tests
      
      * Fixed config backbone compat. Added additional type support for image size in config
      
      * Fixed config backbone compat
      
      * Allowed for batching of eval metrics
      
      * copied models/pvt to adapt to pvt_v2
      
      * First commit of pvt_v2
      
      * Set key and value layers to use separate linear modules. Fixed pruning function
      
      * Set AvgPool to 7
      
      * Fixed issue in init
      
      * PvT-v2 now works in AutoModel
      
      * Successful conversion of pretrained weights for PVT-v2
      
      * Successful conversion of pretrained weights for PVT-v2 models
      
      * Added pytests for pvt-v2, all passed
      
      * Ran fix-copies and fixup. All checks passed
      
      * Added additional ReLU for linear attention mode
      
      * pvt_v2_b2_linear converted and working
      
      * Allowed for batching of eval metrics
      
      * copied models/pvt to adapt to pvt_v2
      
      * First commit of pvt_v2
      
      * Set key and value layers to use separate linear modules. Fixed pruning function
      
      * Set AvgPool to 7
      
      * Fixed issue in init
      
      * PvT-v2 now works in AutoModel
      
      * Successful conversion of pretrained weights for PVT-v2
      
      * Successful conversion of pretrained weights for PVT-v2 models
      
      * Added pytests for pvt-v2, all passed
      
      * Ran fix-copies and fixup. All checks passed
      
      * Added additional ReLU for linear attention mode
      
      * pvt_v2_b2_linear converted and working
      
      * Reverted batch eval changes for PR
      
      * Updated index.md
      
      * Expanded type support for Pvt-v2 config
      
      * Fixed config docstring. Added channels property
      
      * Fixed model names in tests
      
      * Fixed config backbone compat
      
      * Ran fix-copies
      
      * Fixed PvtV2Backbone tests
      
      * Added TFRegNet to OBJECTS_TO_IGNORE in check_docstrings.py
      
      * Fixed backbone stuff and fixed tests: all passing
      
      * Ran make fixup
      
      * Made modifications for code checks
      
      * Remove ONNX config from configuration_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Use explicit image size dict in test_modeling_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Make image_size optional in test_modeling_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Remove _ntuple use in modeling_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Remove reference to fp16_enabled
      
      * Model modules now take config as first argument even when not used
      
      * Replaced abbreviations for "SR" and "AP" with explicit "spatialreduction" and "averagepooling"
      
      * All LayerNorm now instantiates with config.layer_norm_eps
      
      * Added docstring for depth-wise conv layer
      
      * PvtV2Config now only takes Union[int, Tuple[int, int]] for image size
      
      * Refactored PVTv2 in prep for gradient checkpointing
      
      * Gradient checkpointing ready to test
      
      * Removed override of _set_gradient_checkpointing
      
      * Cleaned out old code
      
      * Applied code fixup
      
      * Applied code fixup
      
      * Began debug of pvt_v2 tests
      
      * Leave handling of num_labels to base pretrained config class
      
      * Deactivated gradient checkpointing tests until it is fixed
      
      * Removed PvtV2ImageProcessor which duped PvtImageProcessor
      
      * Allowed for batching of eval metrics
      
      * copied models/pvt to adapt to pvt_v2
      
      * First commit of pvt_v2
      
      * Set key and value layers to use separate linear modules. Fixed pruning function
      
      * Set AvgPool to 7
      
      * Fixed issue in init
      
      * PvT-v2 now works in AutoModel
      
      * Successful conversion of pretrained weights for PVT-v2
      
      * Successful conversion of pretrained weights for PVT-v2 models
      
      * Added pytests for pvt-v2, all passed
      
      * Added pvt_v2 to docs/source/end/model_doc
      
      * Ran fix-copies and fixup. All checks passed
      
      * Added additional ReLU for linear attention mode
      
      * pvt_v2_b2_linear converted and working
      
      * copied models/pvt to adapt to pvt_v2
      
      * First commit of pvt_v2
      
      * PvT-v2 now works in AutoModel
      
      * Reverted batch eval changes for PR
      
      * Expanded type support for Pvt-v2 config
      
      * Fixed config docstring. Added channels property
      
      * Fixed model names in tests
      
      * Fixed config backbone compat. Added additional type support for image size in config
      
      * Fixed config backbone compat
      
      * Allowed for batching of eval metrics
      
      * copied models/pvt to adapt to pvt_v2
      
      * First commit of pvt_v2
      
      * Set key and value layers to use separate linear modules. Fixed pruning function
      
      * Set AvgPool to 7
      
      * Fixed issue in init
      
      * PvT-v2 now works in AutoModel
      
      * Successful conversion of pretrained weights for PVT-v2
      
      * Successful conversion of pretrained weights for PVT-v2 models
      
      * Added pytests for pvt-v2, all passed
      
      * Ran fix-copies and fixup. All checks passed
      
      * Added additional ReLU for linear attention mode
      
      * pvt_v2_b2_linear converted and working
      
      * Allowed for batching of eval metrics
      
      * copied models/pvt to adapt to pvt_v2
      
      * First commit of pvt_v2
      
      * Set key and value layers to use separate linear modules. Fixed pruning function
      
      * Set AvgPool to 7
      
      * Fixed issue in init
      
      * PvT-v2 now works in AutoModel
      
      * Successful conversion of pretrained weights for PVT-v2
      
      * Successful conversion of pretrained weights for PVT-v2 models
      
      * Added pytests for pvt-v2, all passed
      
      * Ran fix-copies and fixup. All checks passed
      
      * Added additional ReLU for linear attention mode
      
      * pvt_v2_b2_linear converted and working
      
      * Reverted batch eval changes for PR
      
      * Expanded type support for Pvt-v2 config
      
      * Fixed config docstring. Added channels property
      
      * Fixed model names in tests
      
      * Fixed config backbone compat
      
      * Ran fix-copies
      
      * Fixed PvtV2Backbone tests
      
      * Added TFRegNet to OBJECTS_TO_IGNORE in check_docstrings.py
      
      * Fixed backbone stuff and fixed tests: all passing
      
      * Ran make fixup
      
      * Made modifications for code checks
      
      * Remove ONNX config from configuration_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Use explicit image size dict in test_modeling_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Make image_size optional in test_modeling_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Remove _ntuple use in modeling_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Remove reference to fp16_enabled
      
      * Model modules now take config as first argument even when not used
      
      * Replaced abbreviations for "SR" and "AP" with explicit "spatialreduction" and "averagepooling"
      
      * All LayerNorm now instantiates with config.layer_norm_eps
      
      * Added docstring for depth-wise conv layer
      
      * PvtV2Config now only takes Union[int, Tuple[int, int]] for image size
      
      * Refactored PVTv2 in prep for gradient checkpointing
      
      * Gradient checkpointing ready to test
      
      * Removed override of _set_gradient_checkpointing
      
      * Cleaned out old code
      
      * Applied code fixup
      
      * Applied code fixup
      
      * Allowed for batching of eval metrics
      
      * copied models/pvt to adapt to pvt_v2
      
      * First commit of pvt_v2
      
      * PvT-v2 now works in AutoModel
      
      * Ran fix-copies and fixup. All checks passed
      
      * copied models/pvt to adapt to pvt_v2
      
      * First commit of pvt_v2
      
      * PvT-v2 now works in AutoModel
      
      * Reverted batch eval changes for PR
      
      * Fixed config docstring. Added channels property
      
      * Fixed config backbone compat
      
      * Allowed for batching of eval metrics
      
      * copied models/pvt to adapt to pvt_v2
      
      * First commit of pvt_v2
      
      * PvT-v2 now works in AutoModel
      
      * Ran fix-copies and fixup. All checks passed
      
      * Allowed for batching of eval metrics
      
      * copied models/pvt to adapt to pvt_v2
      
      * First commit of pvt_v2
      
      * PvT-v2 now works in AutoModel
      
      * Fixed config backbone compat
      
      * Ran fix-copies
      
      * Began debug of pvt_v2 tests
      
      * Leave handling of num_labels to base pretrained config class
      
      * Deactivated gradient checkpointing tests until it is fixed
      
      * Removed PvtV2ImageProcessor which duped PvtImageProcessor
      
      * Fixed issue from rebase
      
      * Fixed issue from rebase
      
      * Set tests for gradient checkpointing to skip those using reentrant since it isn't supported
      
      * Fixed issue from rebase
      
      * Fixed issue from rebase
      
      * Changed model name in docs
      
      * Removed duplicate PvtV2Backbone
      
      * Work around type switching issue in tests
      
      * Fix model name in config comments
      
      * Update docs/source/en/model_doc/pvt_v2.md
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Changed name of variable from 'attn_reduce' to 'sr_type'
      
      * Changed name of variable from 'attn_reduce' to 'sr_type'
      
      * Changed from using 'sr_type' to 'linear_attention' for clarity
      
      * Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
      
      Removed old code
      
      * Changed from using 'sr_type' to 'linear_attention' for clarity
      
      * Fixed Class names to be more descriptive
      
      * Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
      
      Removed outdated code
      
      * Moved paper abstract to single line in pvt_v2.md
      
      * Added usage tips to pvt_v2.md
      
      * Simplified module inits by passing layer_idx
      
      * Fixed typing for hidden_act in PvtV2Config
      
      * Removed unusued import
      
      * Add pvt_v2 to docs/source/en/_toctree.yml
      
      * Updated documentation in docs/source/en/model_doc/pvt_v2.md to be more comprehensive.
      
      * Updated documentation in docs/source/en/model_doc/pvt_v2.md to be more comprehensive.
      
      * Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
      
      Move function parameters to single line
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
      
      Update year of copyright to 2024
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
      
      Make code more explicit
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Updated sr_ratio to be more explicit spatial_reduction_ratio
      
      * Removed excess type hints in modeling_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Move params to single line in modeling_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Removed needless comment in modeling_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update copyright date in pvt_v2.md
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Moved params to single line in modeling_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Updated copyright date in configuration_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Cleaned comments in modeling_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Renamed spatial_reduction Conv2D operation
      
      * Revert "Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
      "
      
      This reverts commit c4a04416dde8f3475ab405d1feb368600e0f8538.
      
      * Updated conversion script to reflect module name change
      
      * Deprecated reshape_last_stage option in config
      
      * Removed unused imports
      
      * Code formatting
      
      * Fixed outdated decorators on test_inference_fp16
      
      * Added "Copied from" comments in test_modeling_pvt_v2.py
      
      * Fixed import listing
      
      * Updated model name
      
      * Force empty commit for PR refresh
      
      * Fixed linting issue
      
      * Removed # Copied from comments
      
      * Added PVTv2 to README_fr.md
      
      * Ran make fix-copies
      
      * Replace all FoamoftheSea hub references with OpenGVLab
      
      * Fixed out_indices and out_features logic in configuration_pvt_v2.py
      
      * Made ImageNet weight conversion verification optional in convert_pvt_v2_to_pytorch.py
      
      * Ran code fixup
      
      * Fixed order of parent classes in PvtV2Config to fix the to_dict method override
      
      ---------
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      1fc505b8
    • njackman-2344's avatar
      [docs] Spanish translate chat_templating.md & yml addition (#29559) · d3801aae
      njackman-2344 authored
      * torchscript and trainer md es translation
      
      * corrected md es files and even corrected spelling in en md
      
      * made es corrections to trainer.md
      
      * deleted entrenamiento... title on yml
      
      * placed entrenamiento in right place
      
      * translated es chat_templating.md w/ yml addition
      
      * requested es changes to md and yml
      
      * last es changes to md
      d3801aae
    • Dries Verachtert's avatar
      62478857
    • Lysandre Debut's avatar
      Warn about tool use (#29628) · 38bff8c8
      Lysandre Debut authored
      
      
      * Warn against remote tool use
      
      * Additional disclaimer
      
      * Update docs/source/en/custom_tools.md
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      ---------
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      38bff8c8
    • bytebarde's avatar
      [Flash Attention 2] Add flash attention 2 for GPT-J (#28295) · be3fd8a2
      bytebarde authored
      
      
      * initial implementation of flash attention for gptj
      
      * modify flash attention and overwrite test_flash_attn_2_generate_padding_right
      
      * update flash attention support list
      
      * remove the copy line in the `CodeGenBlock`
      
      * address copy mechanism
      
      * Update src/transformers/models/gptj/modeling_gptj.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Add GPTJ attention classes
      
      * add expected outputs in the gptj test
      
      * Ensure repo consistency with 'make fix-copies'
      
      ---------
      Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      be3fd8a2
  16. 12 Mar, 2024 3 commits
  17. 11 Mar, 2024 5 commits
  18. 07 Mar, 2024 1 commit
  19. 06 Mar, 2024 1 commit