- 29 Mar, 2024 1 commit
-
-
Yih-Dar authored
* fix * revert for qwen2 * revert for qwen2 * update * update --------- Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
- 28 Mar, 2024 18 commits
-
-
MariaHei authored
Trainer with PyTorch now requires accelerate to be installed. Partly resolves huggingface/transformers#29174
-
Arthur authored
* fix * fix test * style * nit * rather rely on concert token to id * fix quality * Update src/transformers/convert_slow_tokenizer.py
-
VINAYAKK GARG authored
Fix doc issue in DebertaV2Config class Co-authored-by:Vinayakk Garg <vigar@akamai.com>
-
Arthur authored
* fi xbc? * nit
-
Yu Chin Fabian Lim authored
* add gradient_accumulation_kwargs to AcceleratorConfig * add suggestions from @muellerzr to docstrings, new behavior and tests * Documentation suggestions from @muellerz Co-authored-by:
Zach Mueller <muellerzr@gmail.com> * addressed @muellerzr comments regarding tests and test utils * moved accelerate version to top of file. * @muellerzr's variable fix Co-authored-by:
Zach Mueller <muellerzr@gmail.com> * address @amyeroberts. fix tests and docstrings * address @amyeroberts additional suggestions --------- Co-authored-by:
Yu Chin Fabian Lim <flim@sg.ibm.com> Co-authored-by:
Zach Mueller <muellerzr@gmail.com>
-
Arthur authored
[ `TokenizationLlama`] fix the way we convert tokens to strings to keep leading spaces
馃毃 breaking fix (#29453) * nit * update test and fix test * fixup -
Arthur authored
* nit * update * oups * Update src/transformers/models/mamba/modeling_mamba.py Co-authored-by:
Lysandre Debut <hi@lysand.re> --------- Co-authored-by:
Lysandre Debut <hi@lysand.re>
-
Joao Gante authored
* add hard rope scaling test * make fixup * quick rope scaling tests * add copy statements
-
Christopher Keibel authored
* add functions to get number of params which require grad, get optimizer group for parameters and get learning rates of param groups to trainer.py * add tests and raise ValueError when optimizer is None * add second layer to test and freeze its weigths * check if torch is available before running tests * use decorator to check if torch is available Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix test indentation Co-authored-by:
Zach Mueller <muellerzr@gmail.com> --------- Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by:
Zach Mueller <muellerzr@gmail.com>
-
amyeroberts authored
* Safe import of LRScheduler * Update src/transformers/trainer_pt_utils.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/trainer_pt_utils.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Fix up --------- Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
Aymeric Roucher authored
-
Joao Gante authored
* replace torch.testing.assert_allclose by torch.testing.assert_close * missing atol rtol
-
Fanli Lin authored
fix typo
-
Eduardo Pacheco authored
* First commit to add flash attention 2 for GPT-2 * more improvements * Make GPT2 pass tests and fixed Decison Transformers copies * Fixed missing arg * fix copies * Added expected speedup * Update src/transformers/models/gpt2/modeling_gpt2.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt2/modeling_gpt2.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt2/modeling_gpt2.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Added test * Fixed attn attribute * Update docs/source/en/model_doc/gpt2.md Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update docs/source/en/model_doc/gpt2.md Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update Decision transformer attentions * More updates * Passing tests * Fix copies * Fix copies part 2 * Decision transformer updates * Update src/transformers/models/gpt2/modeling_gpt2.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Fix copies * Decision transformer not supporting flash attn * Addressed comments * Addressed comments * Addressed comments --------- Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com>
-
Arthur authored
* add doc warning * fix build pr
-
Arthur authored
don't gather on pkv when using the trainer
-
Arthur authored
* add some help * style
-
Minseo Kang authored
-
- 27 Mar, 2024 9 commits
-
-
Lorenzo Verardo authored
This commit adds gate jitter to MixtralSparseMoeBlock's input data before passing it through the MoE layer, if turned on.
-
huismiling authored
* add Cambricon MLUs support * fix mlu device rng state * up for quality check * up mlu to support fp16 * fix mlu device dependency error * fix mlu device dependency error * enable mlu device for bf16 * fix mlu device memory tracker
-
Raushan Turganbay authored
* add eos stopping criteria * minor fix * Update tests/generation/test_stopping_criteria.py Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> * check eos is not None and fix tests * make style and fixup * Update src/transformers/generation/stopping_criteria.py Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> * Update tests/generation/test_utils.py Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> * Update tests/generation/test_utils.py Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/__init__.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/generation/stopping_criteria.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/generation/stopping_criteria.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/generation/stopping_criteria.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * camel case everywhere * call stopping criteria list for candidate ids * make style and fixup * Empty commit * Empty commit to pass flaky test * set max length in PromptLookupCandidateGenerator * Update src/transformers/generation/utils.py Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> * lets fix this typo in docs * Update src/transformers/generation/utils.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/generation/utils.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * update PR * empty commit --------- Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
Marc Sun authored
fix foward
-
Lysandre Debut authored
* Automatic safetensors conversion when lacking these files (#29390) * Automatic safetensors conversion when lacking these files * Remove debug * Thread name * Typo * Ensure that raises do not affect the main thread * Catch all errors
-
Hovnatan Karapetyan authored
* Check for requires_grad when initing weights * Add unit test * Move sinusoidal positional encoding generation after post_init() * Add modules to skip init list * Move create_sinusoidal_embeddings to _init_weights
-
Anton Vlasjuk authored
* FIX: Cached slow forward in mamba - additionally added mamba cached test - added unused test (mamba causal lm forward and backward) - fixed typo: "causl" --> "causal" * formatting * fix: use real `slow_forward` call instead of torch module's * add shape assertion for mixer block test * adjust shape assertion
-
Bo Zheng authored
* add support for qwen2 MoE models * update docs * add support for qwen2 MoE models * update docs * update model name & test * update readme * update class names & readme & model_doc of Qwen2MoE. * update architecture name * fix qwen2_moe tests * use Qwen2Tokenizer instead of Qwen2MoeTokenizer * update modeling_qwen2_moe.py * fix model architecture * fix qwen2_moe tests * use Qwen2Tokenizer instead of Qwen2MoeTokenizer * update modeling_qwen2_moe.py * fix model architecture * fix style * fix test when there are sparse and non sparse layers * fixup * Update README.md Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup * fixup * add archive back * add support for qwen2 MoE models * update docs * update model name & test * update readme * update class names & readme & model_doc of Qwen2MoE. * update architecture name * fix qwen2_moe tests * use Qwen2Tokenizer instead of Qwen2MoeTokenizer * update modeling_qwen2_moe.py * fix model architecture * fixup * fix qwen2_moe tests * use Qwen2Tokenizer instead of Qwen2MoeTokenizer * fix style * fix test when there are sparse and non sparse layers * fixup * add archive back * fix integration test * fixup --------- Co-authored-by:
bozheng-hit <dsoul0621@gmail.com> Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
Benjamin Minixhofer authored
* fix tinyllama flax modelling * rename vars to minimize changes * move * formatting * remove unused var
-
- 26 Mar, 2024 8 commits
-
-
Lucain authored
-
Ilyas Moutawwakil authored
* remove py3nvml to skip amd memory benchmarks * uninstall pynvml from docker images
-
Yanyi Liu authored
* Add cosine_with_min_lr scheduler * Update error message for missing min_lr or min_lr_rate
-
Zhihao Lin authored
* update * add ut * update
-
Michael authored
-
Merve Noyan authored
Update image_feature_extraction.md
-
yunxiangtang authored
* replace the 'decord' with 'av' in VideoClassificationPipeline * fix the check of backend in VideoClassificationPipeline * adjust the order of imports * format 'video_classification.py' * format 'video_classification.py' with ruff --------- Co-authored-by:wanqiancheng <13541261013@163.com>
-
Jonathan Flynn authored
* add warnings if training args differ from checkpoint args stored in trainer_state.json * run formatting and styling * add a test * format and styling --------- Co-authored-by:Jonathan Flynn <jonl.flynn@guardian.co.uk>
-
- 25 Mar, 2024 4 commits
-
-
Johannes Kolbe authored
Co-authored-by:Johannes <johannes.kolbe@tech.better.team>
-
Arthur Zucker authored
-
Arthur Zucker authored
-
Yuki Watanabe authored
* Populate torch_dtype from model to pipeline Signed-off-by:
B-Step62 <yuki.watanabe@databricks.com> * use property Signed-off-by:
B-Step62 <yuki.watanabe@databricks.com> * lint Signed-off-by:
B-Step62 <yuki.watanabe@databricks.com> * Remove default handling Signed-off-by:
B-Step62 <yuki.watanabe@databricks.com> --------- Signed-off-by:
B-Step62 <yuki.watanabe@databricks.com>
-