1. 15 Mar, 2024 3 commits
  2. 14 Mar, 2024 3 commits
  3. 13 Mar, 2024 8 commits
    • Nate Cibik's avatar
      Add PvT-v2 Model (#26812) · 1fc505b8
      Nate Cibik authored
      
      
      * Added pytests for pvt-v2, all passed
      
      * Added pvt_v2 to docs/source/end/model_doc
      
      * Ran fix-copies and fixup. All checks passed
      
      * Added additional ReLU for linear attention mode
      
      * pvt_v2_b2_linear converted and working
      
      * copied models/pvt to adapt to pvt_v2
      
      * First commit of pvt_v2
      
      * PvT-v2 now works in AutoModel
      
      * Reverted batch eval changes for PR
      
      * Expanded type support for Pvt-v2 config
      
      * Fixed config docstring. Added channels property
      
      * Fixed model names in tests
      
      * Fixed config backbone compat. Added additional type support for image size in config
      
      * Fixed config backbone compat
      
      * Allowed for batching of eval metrics
      
      * copied models/pvt to adapt to pvt_v2
      
      * First commit of pvt_v2
      
      * Set key and value layers to use separate linear modules. Fixed pruning function
      
      * Set AvgPool to 7
      
      * Fixed issue in init
      
      * PvT-v2 now works in AutoModel
      
      * Successful conversion of pretrained weights for PVT-v2
      
      * Successful conversion of pretrained weights for PVT-v2 models
      
      * Added pytests for pvt-v2, all passed
      
      * Ran fix-copies and fixup. All checks passed
      
      * Added additional ReLU for linear attention mode
      
      * pvt_v2_b2_linear converted and working
      
      * Allowed for batching of eval metrics
      
      * copied models/pvt to adapt to pvt_v2
      
      * First commit of pvt_v2
      
      * Set key and value layers to use separate linear modules. Fixed pruning function
      
      * Set AvgPool to 7
      
      * Fixed issue in init
      
      * PvT-v2 now works in AutoModel
      
      * Successful conversion of pretrained weights for PVT-v2
      
      * Successful conversion of pretrained weights for PVT-v2 models
      
      * Added pytests for pvt-v2, all passed
      
      * Ran fix-copies and fixup. All checks passed
      
      * Added additional ReLU for linear attention mode
      
      * pvt_v2_b2_linear converted and working
      
      * Reverted batch eval changes for PR
      
      * Updated index.md
      
      * Expanded type support for Pvt-v2 config
      
      * Fixed config docstring. Added channels property
      
      * Fixed model names in tests
      
      * Fixed config backbone compat
      
      * Ran fix-copies
      
      * Fixed PvtV2Backbone tests
      
      * Added TFRegNet to OBJECTS_TO_IGNORE in check_docstrings.py
      
      * Fixed backbone stuff and fixed tests: all passing
      
      * Ran make fixup
      
      * Made modifications for code checks
      
      * Remove ONNX config from configuration_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Use explicit image size dict in test_modeling_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Make image_size optional in test_modeling_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Remove _ntuple use in modeling_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Remove reference to fp16_enabled
      
      * Model modules now take config as first argument even when not used
      
      * Replaced abbreviations for "SR" and "AP" with explicit "spatialreduction" and "averagepooling"
      
      * All LayerNorm now instantiates with config.layer_norm_eps
      
      * Added docstring for depth-wise conv layer
      
      * PvtV2Config now only takes Union[int, Tuple[int, int]] for image size
      
      * Refactored PVTv2 in prep for gradient checkpointing
      
      * Gradient checkpointing ready to test
      
      * Removed override of _set_gradient_checkpointing
      
      * Cleaned out old code
      
      * Applied code fixup
      
      * Applied code fixup
      
      * Began debug of pvt_v2 tests
      
      * Leave handling of num_labels to base pretrained config class
      
      * Deactivated gradient checkpointing tests until it is fixed
      
      * Removed PvtV2ImageProcessor which duped PvtImageProcessor
      
      * Allowed for batching of eval metrics
      
      * copied models/pvt to adapt to pvt_v2
      
      * First commit of pvt_v2
      
      * Set key and value layers to use separate linear modules. Fixed pruning function
      
      * Set AvgPool to 7
      
      * Fixed issue in init
      
      * PvT-v2 now works in AutoModel
      
      * Successful conversion of pretrained weights for PVT-v2
      
      * Successful conversion of pretrained weights for PVT-v2 models
      
      * Added pytests for pvt-v2, all passed
      
      * Added pvt_v2 to docs/source/end/model_doc
      
      * Ran fix-copies and fixup. All checks passed
      
      * Added additional ReLU for linear attention mode
      
      * pvt_v2_b2_linear converted and working
      
      * copied models/pvt to adapt to pvt_v2
      
      * First commit of pvt_v2
      
      * PvT-v2 now works in AutoModel
      
      * Reverted batch eval changes for PR
      
      * Expanded type support for Pvt-v2 config
      
      * Fixed config docstring. Added channels property
      
      * Fixed model names in tests
      
      * Fixed config backbone compat. Added additional type support for image size in config
      
      * Fixed config backbone compat
      
      * Allowed for batching of eval metrics
      
      * copied models/pvt to adapt to pvt_v2
      
      * First commit of pvt_v2
      
      * Set key and value layers to use separate linear modules. Fixed pruning function
      
      * Set AvgPool to 7
      
      * Fixed issue in init
      
      * PvT-v2 now works in AutoModel
      
      * Successful conversion of pretrained weights for PVT-v2
      
      * Successful conversion of pretrained weights for PVT-v2 models
      
      * Added pytests for pvt-v2, all passed
      
      * Ran fix-copies and fixup. All checks passed
      
      * Added additional ReLU for linear attention mode
      
      * pvt_v2_b2_linear converted and working
      
      * Allowed for batching of eval metrics
      
      * copied models/pvt to adapt to pvt_v2
      
      * First commit of pvt_v2
      
      * Set key and value layers to use separate linear modules. Fixed pruning function
      
      * Set AvgPool to 7
      
      * Fixed issue in init
      
      * PvT-v2 now works in AutoModel
      
      * Successful conversion of pretrained weights for PVT-v2
      
      * Successful conversion of pretrained weights for PVT-v2 models
      
      * Added pytests for pvt-v2, all passed
      
      * Ran fix-copies and fixup. All checks passed
      
      * Added additional ReLU for linear attention mode
      
      * pvt_v2_b2_linear converted and working
      
      * Reverted batch eval changes for PR
      
      * Expanded type support for Pvt-v2 config
      
      * Fixed config docstring. Added channels property
      
      * Fixed model names in tests
      
      * Fixed config backbone compat
      
      * Ran fix-copies
      
      * Fixed PvtV2Backbone tests
      
      * Added TFRegNet to OBJECTS_TO_IGNORE in check_docstrings.py
      
      * Fixed backbone stuff and fixed tests: all passing
      
      * Ran make fixup
      
      * Made modifications for code checks
      
      * Remove ONNX config from configuration_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Use explicit image size dict in test_modeling_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Make image_size optional in test_modeling_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Remove _ntuple use in modeling_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Remove reference to fp16_enabled
      
      * Model modules now take config as first argument even when not used
      
      * Replaced abbreviations for "SR" and "AP" with explicit "spatialreduction" and "averagepooling"
      
      * All LayerNorm now instantiates with config.layer_norm_eps
      
      * Added docstring for depth-wise conv layer
      
      * PvtV2Config now only takes Union[int, Tuple[int, int]] for image size
      
      * Refactored PVTv2 in prep for gradient checkpointing
      
      * Gradient checkpointing ready to test
      
      * Removed override of _set_gradient_checkpointing
      
      * Cleaned out old code
      
      * Applied code fixup
      
      * Applied code fixup
      
      * Allowed for batching of eval metrics
      
      * copied models/pvt to adapt to pvt_v2
      
      * First commit of pvt_v2
      
      * PvT-v2 now works in AutoModel
      
      * Ran fix-copies and fixup. All checks passed
      
      * copied models/pvt to adapt to pvt_v2
      
      * First commit of pvt_v2
      
      * PvT-v2 now works in AutoModel
      
      * Reverted batch eval changes for PR
      
      * Fixed config docstring. Added channels property
      
      * Fixed config backbone compat
      
      * Allowed for batching of eval metrics
      
      * copied models/pvt to adapt to pvt_v2
      
      * First commit of pvt_v2
      
      * PvT-v2 now works in AutoModel
      
      * Ran fix-copies and fixup. All checks passed
      
      * Allowed for batching of eval metrics
      
      * copied models/pvt to adapt to pvt_v2
      
      * First commit of pvt_v2
      
      * PvT-v2 now works in AutoModel
      
      * Fixed config backbone compat
      
      * Ran fix-copies
      
      * Began debug of pvt_v2 tests
      
      * Leave handling of num_labels to base pretrained config class
      
      * Deactivated gradient checkpointing tests until it is fixed
      
      * Removed PvtV2ImageProcessor which duped PvtImageProcessor
      
      * Fixed issue from rebase
      
      * Fixed issue from rebase
      
      * Set tests for gradient checkpointing to skip those using reentrant since it isn't supported
      
      * Fixed issue from rebase
      
      * Fixed issue from rebase
      
      * Changed model name in docs
      
      * Removed duplicate PvtV2Backbone
      
      * Work around type switching issue in tests
      
      * Fix model name in config comments
      
      * Update docs/source/en/model_doc/pvt_v2.md
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Changed name of variable from 'attn_reduce' to 'sr_type'
      
      * Changed name of variable from 'attn_reduce' to 'sr_type'
      
      * Changed from using 'sr_type' to 'linear_attention' for clarity
      
      * Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
      
      Removed old code
      
      * Changed from using 'sr_type' to 'linear_attention' for clarity
      
      * Fixed Class names to be more descriptive
      
      * Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
      
      Removed outdated code
      
      * Moved paper abstract to single line in pvt_v2.md
      
      * Added usage tips to pvt_v2.md
      
      * Simplified module inits by passing layer_idx
      
      * Fixed typing for hidden_act in PvtV2Config
      
      * Removed unusued import
      
      * Add pvt_v2 to docs/source/en/_toctree.yml
      
      * Updated documentation in docs/source/en/model_doc/pvt_v2.md to be more comprehensive.
      
      * Updated documentation in docs/source/en/model_doc/pvt_v2.md to be more comprehensive.
      
      * Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
      
      Move function parameters to single line
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
      
      Update year of copyright to 2024
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
      
      Make code more explicit
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Updated sr_ratio to be more explicit spatial_reduction_ratio
      
      * Removed excess type hints in modeling_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Move params to single line in modeling_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Removed needless comment in modeling_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update copyright date in pvt_v2.md
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Moved params to single line in modeling_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Updated copyright date in configuration_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Cleaned comments in modeling_pvt_v2.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Renamed spatial_reduction Conv2D operation
      
      * Revert "Update src/transformers/models/pvt_v2/modeling_pvt_v2.py
      "
      
      This reverts commit c4a04416dde8f3475ab405d1feb368600e0f8538.
      
      * Updated conversion script to reflect module name change
      
      * Deprecated reshape_last_stage option in config
      
      * Removed unused imports
      
      * Code formatting
      
      * Fixed outdated decorators on test_inference_fp16
      
      * Added "Copied from" comments in test_modeling_pvt_v2.py
      
      * Fixed import listing
      
      * Updated model name
      
      * Force empty commit for PR refresh
      
      * Fixed linting issue
      
      * Removed # Copied from comments
      
      * Added PVTv2 to README_fr.md
      
      * Ran make fix-copies
      
      * Replace all FoamoftheSea hub references with OpenGVLab
      
      * Fixed out_indices and out_features logic in configuration_pvt_v2.py
      
      * Made ImageNet weight conversion verification optional in convert_pvt_v2_to_pytorch.py
      
      * Ran code fixup
      
      * Fixed order of parent classes in PvtV2Config to fix the to_dict method override
      
      ---------
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      1fc505b8
    • Yih-Dar's avatar
    • Raushan Turganbay's avatar
      Fix batching tests for new models (Mamba and SegGPT) (#29633) · 5ac264d8
      Raushan Turganbay authored
      
      
      * fix batchinng tests for new models
      
      * Update tests/models/seggpt/test_modeling_seggpt.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      ---------
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      5ac264d8
    • Fanli Lin's avatar
      [tests] make `test_trainer_log_level_replica` to run on accelerators with more... · a7e5e154
      Fanli Lin authored
      [tests] make `test_trainer_log_level_replica` to run on accelerators with more than 2 devices (#29609)
      
      add new arg
      a7e5e154
    • Sourab Mangrulkar's avatar
      Add support for FSDP+QLoRA and DeepSpeed ZeRO3+QLoRA (#29587) · 350c5d15
      Sourab Mangrulkar authored
      
      
      * fsdp+qlora related changes
      
      * fixes
      
      * Update quantization_config.py
      
      * support fsdp+qlora and dsz3+qlora
      
      * Update quantization_config.py
      
      * Update modeling_utils.py
      
      * Update modeling_utils.py
      
      * Update modeling_utils.py
      
      * Update modeling_utils.py
      
      * Update modeling_utils.py
      
      * Update modeling_utils.py
      
      * handle fsdp+qlora and dsz3+qlora correctly while model loading
      
      * fix param count
      
      * quality
      
      * fsdp related changes
      
      * fsdp changes only when using LoRA/QLoRA
      
      * add accelerate version check
      
      * refactor, update min accelerate version and add tests
      
      1. Update minimum accelerate version to 0.26.0
      2. Clean the trainer wrt accelerate version checks
      3. FSDP refactor and test for fsdp config
      4. use `itemsize` instead of `dtype2bytes` dict
      
      * fix test
      
      * Address comments
      Co-Authored-By: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
      
      * fix the conditional flag
      
      * fix conditional flag
      
      * address comments
      Co-Authored-By: default avatarZach Mueller <7831895+muellerzr@users.noreply.github.com>
      
      ---------
      Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
      Co-authored-by: default avatarZach Mueller <7831895+muellerzr@users.noreply.github.com>
      350c5d15
    • Joao Gante's avatar
      Llama: allow custom 4d masks (#29618) · 1e21c4fb
      Joao Gante authored
      1e21c4fb
    • Lysandre Debut's avatar
      Adds pretrained IDs directly in the tests (#29534) · 11bbb505
      Lysandre Debut authored
      * Adds pretrained IDs directly in the tests
      
      * Fix tests
      
      * Fix tests
      
      * Review!
      11bbb505
    • bytebarde's avatar
      [Flash Attention 2] Add flash attention 2 for GPT-J (#28295) · be3fd8a2
      bytebarde authored
      
      
      * initial implementation of flash attention for gptj
      
      * modify flash attention and overwrite test_flash_attn_2_generate_padding_right
      
      * update flash attention support list
      
      * remove the copy line in the `CodeGenBlock`
      
      * address copy mechanism
      
      * Update src/transformers/models/gptj/modeling_gptj.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Add GPTJ attention classes
      
      * add expected outputs in the gptj test
      
      * Ensure repo consistency with 'make fix-copies'
      
      ---------
      Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      be3fd8a2
  4. 12 Mar, 2024 2 commits
  5. 11 Mar, 2024 2 commits
  6. 08 Mar, 2024 6 commits
  7. 07 Mar, 2024 6 commits
  8. 06 Mar, 2024 3 commits
  9. 05 Mar, 2024 7 commits
    • Lysandre Debut's avatar
      Automatic safetensors conversion when lacking these files (#29390) · a69cbf4e
      Lysandre Debut authored
      * Automatic safetensors conversion when lacking these files
      
      * Remove debug
      
      * Thread name
      
      * Typo
      
      * Ensure that raises do not affect the main thread
      a69cbf4e
    • Arthur's avatar
      [`Add Mamba`] Adds support for the `Mamba` models (#28094) · fb1c62e9
      Arthur authored
      
      
      * initial-commit
      
      * start cleaning
      
      * small nits
      
      * small nits
      
      * current updates
      
      * add kernels
      
      * small refactoring little step
      
      * add comments
      
      * styling
      
      * nit
      
      * nits
      
      * Style
      
      * Small changes
      
      * Push dummy mambda simple slow
      
      * nit
      
      * Use original names
      
      * Use original names and remove norm
      
      * Updates for inference params
      
      * Style nd updates
      
      * nits
      
      * Match logits
      
      * Add a test
      
      * Add expected generated text
      
      * nits doc, imports and styling
      
      * style
      
      * oups
      
      * dont install kernels, invite users to install the required kernels
      
      * let use use the original packages
      
      * styling
      
      * nits
      
      * fix some copieds
      
      * update doc
      
      * fix-copies
      
      * styling done
      
      * nits
      
      * fix import check
      
      * run but wrong cuda ress
      
      * mamba CUDA works :)
      
      * fix the fast path
      
      * config naming nits
      
      * conversion script is not required at this stage
      
      * finish fixing the fast path: generation make sense now!
      
      * nit
      
      * Let's start working on the CIs
      
      * style
      
      * better style
      
      * more nits
      
      * test nit
      
      * quick fix for now
      
      * nits
      
      * nit
      
      * nit
      
      * nit
      
      * nits
      
      * update test rest
      
      * fixup
      
      * update test
      
      * nit
      
      * some fixes
      
      * nits
      
      * update test values
      
      * fix styling
      
      * nit
      
      * support peft
      
      * integrations tests require torchg
      
      * also add slow markers
      
      * styling
      
      * chose forward wisely
      
      * nits
      
      * update tests
      
      * fix gradient checkpointing
      
      * fixup
      
      * nit
      
      * fix doc
      
      * check copies
      
      * fix the docstring
      
      * fix some more tests
      
      * style
      
      * fix beam search
      
      * add init schene
      
      * update
      
      * nit
      
      * fix
      
      * fixup the doc
      
      * fix the doc
      
      * fixup
      
      * tentative update but slow is no longer good
      
      * nit
      
      * should we always use float32?
      
      * nits
      
      * revert wrong changes
      
      * res in float32
      
      * cleanup
      
      * skip fmt for now
      
      * update generation values
      
      * update test values running original model
      
      * fixup
      
      * update tests + rename inference_params to cache_params + make sure training does not use cache_params
      
      * small nits
      
      * more nits
      
      * fix final CIs
      
      * style
      
      * nit doc
      
      * I hope final doc nits
      
      * nit
      
      * 🫠
      
      * final touch!
      
      * fix torch import
      
      * Apply suggestions from code review
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      
      * Apply suggestions from code review
      
      * fix fix and fix
      
      * fix base model prefix!
      
      * nit
      
      * Update src/transformers/models/mamba/__init__.py
      
      * Update docs/source/en/model_doc/mamba.md
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      
      * nit
      
      ---------
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      fb1c62e9
    • Arthur's avatar
      [`Udop imports`] Processor tests were not run. (#29456) · 4d892b72
      Arthur authored
      * fix udop imports
      
      * sort imports
      4d892b72
    • Arthur's avatar
      Revert-commit 0d52f9f5 (#29455) · 57d007b9
      Arthur authored
      * style
      
      * revert with RP
      
      * nit
      
      * exact revert
      57d007b9
    • Arthur Zucker's avatar
      more fix · 0d52f9f5
      Arthur Zucker authored
      0d52f9f5
    • Arthur's avatar
      [`UdopTokenizer`] Fix post merge imports (#29451) · 13285220
      Arthur authored
      * update
      
      * ...
      
      * nits
      
      * arf
      
      * 🧼
      
      * beat the last guy
      
      * style everyone
      13285220
    • Fanli Lin's avatar
      [tests] enable test_pipeline_accelerate_top_p on XPU (#29309) · fa7f3cf3
      Fanli Lin authored
      
      
      * use torch_device
      
      * Update tests/pipelines/test_pipelines_text_generation.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * fix style
      
      ---------
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      fa7f3cf3