- 28 Mar, 2024 1 commit
-
-
Minseo Kang authored
-
- 27 Mar, 2024 9 commits
-
-
Lorenzo Verardo authored
This commit adds gate jitter to MixtralSparseMoeBlock's input data before passing it through the MoE layer, if turned on.
-
huismiling authored
* add Cambricon MLUs support * fix mlu device rng state * up for quality check * up mlu to support fp16 * fix mlu device dependency error * fix mlu device dependency error * enable mlu device for bf16 * fix mlu device memory tracker
-
Raushan Turganbay authored
* add eos stopping criteria * minor fix * Update tests/generation/test_stopping_criteria.py Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> * check eos is not None and fix tests * make style and fixup * Update src/transformers/generation/stopping_criteria.py Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> * Update tests/generation/test_utils.py Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> * Update tests/generation/test_utils.py Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/__init__.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/generation/stopping_criteria.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/generation/stopping_criteria.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/generation/stopping_criteria.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * camel case everywhere * call stopping criteria list for candidate ids * make style and fixup * Empty commit * Empty commit to pass flaky test * set max length in PromptLookupCandidateGenerator * Update src/transformers/generation/utils.py Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> * lets fix this typo in docs * Update src/transformers/generation/utils.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/generation/utils.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * update PR * empty commit --------- Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
Marc Sun authored
fix foward
-
Lysandre Debut authored
* Automatic safetensors conversion when lacking these files (#29390) * Automatic safetensors conversion when lacking these files * Remove debug * Thread name * Typo * Ensure that raises do not affect the main thread * Catch all errors
-
Hovnatan Karapetyan authored
* Check for requires_grad when initing weights * Add unit test * Move sinusoidal positional encoding generation after post_init() * Add modules to skip init list * Move create_sinusoidal_embeddings to _init_weights
-
Anton Vlasjuk authored
* FIX: Cached slow forward in mamba - additionally added mamba cached test - added unused test (mamba causal lm forward and backward) - fixed typo: "causl" --> "causal" * formatting * fix: use real `slow_forward` call instead of torch module's * add shape assertion for mixer block test * adjust shape assertion
-
Bo Zheng authored
* add support for qwen2 MoE models * update docs * add support for qwen2 MoE models * update docs * update model name & test * update readme * update class names & readme & model_doc of Qwen2MoE. * update architecture name * fix qwen2_moe tests * use Qwen2Tokenizer instead of Qwen2MoeTokenizer * update modeling_qwen2_moe.py * fix model architecture * fix qwen2_moe tests * use Qwen2Tokenizer instead of Qwen2MoeTokenizer * update modeling_qwen2_moe.py * fix model architecture * fix style * fix test when there are sparse and non sparse layers * fixup * Update README.md Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup * fixup * add archive back * add support for qwen2 MoE models * update docs * update model name & test * update readme * update class names & readme & model_doc of Qwen2MoE. * update architecture name * fix qwen2_moe tests * use Qwen2Tokenizer instead of Qwen2MoeTokenizer * update modeling_qwen2_moe.py * fix model architecture * fixup * fix qwen2_moe tests * use Qwen2Tokenizer instead of Qwen2MoeTokenizer * fix style * fix test when there are sparse and non sparse layers * fixup * add archive back * fix integration test * fixup --------- Co-authored-by:
bozheng-hit <dsoul0621@gmail.com> Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
Benjamin Minixhofer authored
* fix tinyllama flax modelling * rename vars to minimize changes * move * formatting * remove unused var
-
- 26 Mar, 2024 4 commits
-
-
Yanyi Liu authored
* Add cosine_with_min_lr scheduler * Update error message for missing min_lr or min_lr_rate
-
Zhihao Lin authored
* update * add ut * update
-
yunxiangtang authored
* replace the 'decord' with 'av' in VideoClassificationPipeline * fix the check of backend in VideoClassificationPipeline * adjust the order of imports * format 'video_classification.py' * format 'video_classification.py' with ruff --------- Co-authored-by:wanqiancheng <13541261013@163.com>
-
Jonathan Flynn authored
* add warnings if training args differ from checkpoint args stored in trainer_state.json * run formatting and styling * add a test * format and styling --------- Co-authored-by:Jonathan Flynn <jonl.flynn@guardian.co.uk>
-
- 25 Mar, 2024 5 commits
-
-
Arthur Zucker authored
-
Arthur Zucker authored
-
Yuki Watanabe authored
* Populate torch_dtype from model to pipeline Signed-off-by:
B-Step62 <yuki.watanabe@databricks.com> * use property Signed-off-by:
B-Step62 <yuki.watanabe@databricks.com> * lint Signed-off-by:
B-Step62 <yuki.watanabe@databricks.com> * Remove default handling Signed-off-by:
B-Step62 <yuki.watanabe@databricks.com> --------- Signed-off-by:
B-Step62 <yuki.watanabe@databricks.com>
-
yhuang authored
fix the behavior of collecting 'num_input_tokens_seen' See https://github.com/huggingface/transformers/issues/28791 for more details.
-
Lysandre Debut authored
* [test_all] Remove static pretrained maps from the library's internals * Deprecate archive maps instead of removing them * Revert init changes * [test_all] Deprecate instead of removing * [test_all] PVT v2 support * [test_all] Tests should all pass * [test_all] Style * Address review comments * Update src/transformers/models/deprecated/_archive_maps.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/deprecated/_archive_maps.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * [test_all] trigger tests * [test_all] LLAVA * [test_all] Bad rebase --------- Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
- 22 Mar, 2024 7 commits
-
-
amyeroberts authored
[SuperPoint] Fix doc example
-
Arthur authored
nit
-
igeni authored
replaced concatenation to f-strings to improve readability and unify with the rest code
-
Joao Gante authored
remove unused attrs
-
jiqing-feng authored
* rm input dtype change in CPU * add warning when use CPU low-precision * rm useless logging
-
fxmarty authored
* correct llava mask * fix vipllava as wlel * mask out embedding for padding tokens * add test * fix style * add setter * fix test on suggestion
-
Steven Madere authored
Fix type hint for train_dataset param of Trainer.__init__() to allow IterableDataset. Issue 29678 (#29738) * Fixed typehint for train_dataset param in Trainer.__init__(). Added IterableDataset option. * make fixup
-
- 21 Mar, 2024 12 commits
-
-
Raushan Turganbay authored
* change in-place -> out-of-place * add tests * add more tests * naming consistency * fix doctest * forgot min-length processors * empty * Revert "fix doctest" This reverts commit 4772768457f9bc057f1d4d9d67ea94eb7224eb8d. * revert change in docstring * Update tests/generation/test_logits_process.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/generation/test_logits_process.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com>
-
Raushan Turganbay authored
* prepend "bos" to blip generation * minor changes * Update src/transformers/models/blip_2/modeling_blip_2.py Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/models/instructblip/modeling_instructblip.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add generation tester mixin --------- Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com>
-
Joao Gante authored
* always convert the mask * rebase and fix copies
-
Joao Gante authored
-
Zach Mueller authored
* Add deterministic config * Add note on slowdown * English fails me again
-
Zach Mueller authored
* Remove deprecations * Clean
-
Matt authored
* Cast bfloat16 to float32 for Numpy conversions * Add test
-
Arthur authored
* path llava-next * styling * styling
-
théo gigant authored
fix issue with logit processor in beam search in Flax
-
Matthias Dittrich authored
Fixes ``` File "/nix/store/rv8xdwghdad9jv2w86b8g08kan9l6ksm-python3.11-transformers-4.38.2/lib/python3.11/site-packages/transformers/models/auto/configuration_auto.py", line 987, in <module> class AutoConfig: File "/nix/store/rv8xdwghdad9jv2w86b8g08kan9l6ksm-python3.11-transformers-4.38.2/lib/python3.11/site-packages/transformers/models/auto/configuration_auto.py", line 1011, in AutoConfig @replace_list_option_in_docstrings() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/nix/store/rv8xdwghdad9jv2w86b8g08kan9l6ksm-python3.11-transformers-4.38.2/lib/python3.11/site-packages/transformers/models/auto/configuration_auto.py", line 966, in docstring_decorator lines = docstrings.split("\n") ^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'split' ``` -
Rahul Vinod Vishwakarma authored
* Calculating box_bias at the start once, then reusing it at inference * Updating the compute_box_bias function for backwards compatibility * Caching compute_box_bias function * Bux fix * Update owlv2 accordingly to ensure repo consistency * Co-authored by: nvbinh15 <binh.pdc01@gmail.com> * Fixup changes * Made copied code consistent * Co-authored by: nvbinh15 <binh.pdc01@gmail.com> --------- Co-authored-by: Nguyen Van Binh <> Co-authored-by:Nguyen Van Binh <binh.pdc01@gmail.com>
-
Ash Kuroki authored
Update quantization_config.py Fixed typo for clarity and correctness. previous: input time current: input type // changed time to type to fix the typo
-
- 20 Mar, 2024 2 commits
-
-
Arthur authored
* attempt to fix * the actual fix that works with compilation! * this? * temporary update * nit? * dispatcg to memory efficient? * update both models that have static cache support * fix copies fix compile * make sure fix * fix cohere and gemma * fix beams? * nit * slipped through the cracks * nit * nits * update * fix-copies * skip failing tests * nits
-
Benjamin Ye authored
[`BitsAndBytesConfig`] Warning for unused `kwargs` & safety checkers for `load_in_4bit` and `load_in_8bit` (#29761) * added safety checkers for load_in_4bit and load_in_8bit on init, as well as their setters * Update src/transformers/utils/quantization_config.py typo correction for load_in_8bit setter checks Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com> --------- Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
-