- 14 Jun, 2024 1 commit
-
-
Dmitry Rogozhkin authored
* xpu: support xpu backend from stock pytorch (>=2.4) Fixes: https://github.com/huggingface/transformers/issues/31237 XPU backend is available in the stock PyTorch starting from version 2.4, see [1]. This commit extends huggingface transformers to support XPU from both IPEX and the stock pytorch. IPEX is being tried first. See: https://github.com/pytorch/pytorch/issues/114842 Requires: https://github.com/huggingface/accelerate/pull/2825 Signed-off-by:
Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> * xpu: enable gpt2 and decision_transformer tests for xpu pytorch backend Note that running xpu tests requires TRANSFORMERS_TEST_DEVICE_SPEC=spec.py passed to the test runner: import torch DEVICE_NAME = 'xpu' MANUAL_SEED_FN = torch.xpu.manual_seed EMPTY_CACHE_FN = torch.xpu.empty_cache DEVICE_COUNT_FN = torch.xpu.device_count Signed-off-by:
Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> --------- Signed-off-by:
Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
-
- 07 Jun, 2024 1 commit
-
-
조준래 authored
* Implement JSON dump conversion for torch_dtype in TrainingArguments * Add unit test for converting torch_dtype in TrainingArguments to JSON * move unit test for converting torch_dtype into TrainerIntegrationTest class * reformating using ruff * convert dict_torch_dtype_to_str to private method _dict_torch_dtype_to_str --------- Co-authored-by:jun.4 <jun.4@kakaobrain.com>
-
- 03 Jun, 2024 2 commits
-
-
miivanov90 authored
* update to not(endswith(loss)) * ruff formatting
-
Qubitium authored
* Rename sanity_evaluation to eval_on_start * move arg back to last
-
- 31 May, 2024 1 commit
-
-
Marc Sun authored
* add sanity evaluation * fix * Apply suggestions from code review Co-authored-by:
Zach Mueller <muellerzr@gmail.com> * fix --------- Co-authored-by:
Zach Mueller <muellerzr@gmail.com>
-
- 28 May, 2024 1 commit
-
-
Hengwen Tong authored
* Remove backend checks in training_args.py * Expilicit initialize the device --------- Co-authored-by:tonghengwen <tonghengwen@cambricon.com>
-
- 23 May, 2024 1 commit
-
-
Yasmin Moslem authored
* Add a check that warmup_setps is either 0 or >= 1 Update training_args.py to add a check that warmup_setps is either 0 or >= 1. Otherwise, raise an error. * Update src/transformers/training_args.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com>
-
- 21 May, 2024 2 commits
-
-
Zach Mueller authored
* Enforce saving at end of training * Fix test * Rework test * Fixup tests' * Update comment based on sourab feedback * Clean
-
Younes Belkada authored
* add V1 - adalomo not working yet * add todo docs + refactor from comments * adjust LR * add docs * add more elaborated test * Apply suggestions from code review Co-authored-by:
Zach Mueller <muellerzr@gmail.com> * fix * push * add accelerate check * fix DDP case * Apply suggestions from code review Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix * init kwargs * safely add attribute * revert to enum logic * Update src/transformers/trainer.py --------- Co-authored-by:
Zach Mueller <muellerzr@gmail.com> Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com>
-
- 20 May, 2024 1 commit
-
-
Zach Mueller authored
* Introduce configured_state * Include note on tuning * Allow for users to have defined a state already * Include tests * Add note on hpam tune * Guard a bit better * Update src/transformers/training_args.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Finish rebase * Finish rebase * Guard carefully * Fixup test * Refactor * Fin refactor * Comment * Update wrt feedback --------- Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com>
-
- 13 May, 2024 1 commit
-
-
fxmarty authored
* update to ROCm 6.0.2 and test MI300 * add callers for mi300 * update dockerfile * fix trainer tests * remove apex * style * Update tests/trainer/test_trainer_seq2seq.py * Update tests/trainer/test_trainer_seq2seq.py * Update tests/trainer/test_trainer_seq2seq.py * Update tests/trainer/test_trainer_seq2seq.py * update to torch 2.3 * add workflow dispatch target * we may need branches: mi300-ci after all * nit * fix docker build * nit * add check runner * remove docker-gpu * fix issues * fix --------- Co-authored-by:
Yih-Dar <2521628+ydshieh@users.noreply.github.com> Co-authored-by:
ydshieh <ydshieh@users.noreply.github.com>
-
- 06 May, 2024 1 commit
-
-
Nate Cibik authored
* Added cache clearing for GPU efficiency. * Added cache clearing for GPU efficiency. * Added batch_eval_metrics capability * Ran make fixup * Fixed bug * Fixed whitespace issue * Fixed outdated condition * Updated docstrings with instructions for batch_eval_metrics. Updated end of dataloader logic * Added first version of batch_eval_metrics Trainer test * Fixed batch_eval_metrics Trainer tests for both eval and predict * Fixed batch_eval_metrics behavior for new Trainer variables * Fixed batch_eval_metrics Trainer tests * Ran fixup
-
- 03 May, 2024 1 commit
-
-
Pavel Iakubovskii authored
* Remove comparison to output_dir * Update docs for `run_name` * Add warning
-
- 02 May, 2024 1 commit
-
-
Michael Benayoun authored
-
- 29 Apr, 2024 1 commit
-
-
Howard Liberty authored
* Allow boolean FSDP options in fsdp_config * Use lower() to be safe
-
- 25 Apr, 2024 1 commit
-
-
Zach Mueller authored
* Introduce saveable callbacks * Add note * Test for non-present and flag * Support early stopping and refusing to train further * Update docstring * More saving * Import oopsie * Apply suggestions from code review Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Make it go through TrainerArguments * Document * Fix test * Apply suggestions from code review Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Rework to allow for duplicates * CLean * Fix failing tests --------- Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com>
-
- 24 Apr, 2024 1 commit
-
-
Zach Mueller authored
* Check removing flag for torch * LLM oops * Getting there... * More discoveries * Change * Clean up and prettify * Logic check * Not
-
- 22 Apr, 2024 1 commit
-
-
Howard Liberty authored
* Add FSDP config for CPU RAM efficient loading * Style fix * Update src/transformers/training_args.py Co-authored-by:
Zach Mueller <muellerzr@gmail.com> * Update src/transformers/training_args.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Add sync_module_states and cpu_ram_efficient_loading validation logic * Update src/transformers/training_args.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Style --------- Co-authored-by:
Zach Mueller <muellerzr@gmail.com> Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com>
-
- 18 Apr, 2024 1 commit
-
-
Zach Mueller authored
* Alias * Note alias * Tests and src * Rest * Clean * Change typing? * Fix tests * Deprecation versions
-
- 17 Apr, 2024 1 commit
-
-
Pavel Iakubovskii authored
* Add evaluation loop container for interm. results * Add tests for EvalLoopContainer * Formatting * Fix padding_index in test and typo * Move EvalLoopContainer to pr_utils to avoid additional imports * Fix `eval_do_concat_batches` arg description * Fix EvalLoopContainer import
-
- 16 Apr, 2024 2 commits
-
-
Zach Mueller authored
* Raise relevent err * Use type instead
-
Zach Mueller authored
* Bookmark, initial impelemtation. Need to test * Clean * Working fully, woop woop * I think working version now, testing * Fin! * rm cast, could keep None * Fix typing issue * rm typehint * Add test * Add tests and make more rigid
-
- 10 Apr, 2024 1 commit
-
-
Matthew Hoffman authored
* Add str to TrainingArguments report_to type hint * Swap order in Union * Merge Optional into Union https://github.com/huggingface/transformers/pull/30078#issuecomment-2042227546
-
- 03 Apr, 2024 1 commit
-
-
Zach Mueller authored
* Docstring to note about zero init * Check for accelerate * Change conditional return * Tweak * Add new accelerate-specific zero3 check * Fix import * Revert to RTFM * Update src/transformers/modeling_utils.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com>
-
- 27 Mar, 2024 1 commit
-
-
huismiling authored
* add Cambricon MLUs support * fix mlu device rng state * up for quality check * up mlu to support fp16 * fix mlu device dependency error * fix mlu device dependency error * enable mlu device for bf16 * fix mlu device memory tracker
-
- 19 Mar, 2024 1 commit
-
-
Younes Belkada authored
* add galore v1 * add import * add tests and doc * fix doctest * forward contrib credits from discussions * forward contrib credits from discussions * Apply suggestions from code review Co-authored-by:
Zach Mueller <muellerzr@gmail.com> * fix failing tests' * switch to `optim_target_modules` and clarify docs * more clarification * enhance lookup logic * update a test to add peak memory * add regex, all-linear and single string support * add layer-wise optimization through DummyOptimizers and LRSchedulers * forward contrib credits from discussions and original idea * add a section about DDP not supported in layerwise * Update src/transformers/trainer.py Co-authored-by:
Zach Mueller <muellerzr@gmail.com> * fix self * check only if layer_wise * Update src/transformers/training_args.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * oops * make use of intervals * clarify comment * add matching tests * GaLoRe -> GaLore * move to `get_scheduler` * add note on docs * add a warning * adapt a bit the docs * update docstring * support original API * Update docs/source/en/trainer.md * slightly refactor * Update docs/source/en/trainer.md Co-authored-by:
Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix args parsing and add tests * remove warning for regex * fix type hint * add note about extra args * make `is_regex` return optional --------- Co-authored-by: Maxime <maximegmd @users.noreply.github.com> Co-authored-by: Wing Lian <winglian @users.noreply.github.com> Co-authored-by:
Zach Mueller <muellerzr@gmail.com> Co-authored-by:
hiyouga <hiyouga@users.noreply.github.com> Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by:
Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
-
- 13 Mar, 2024 1 commit
-
-
Sourab Mangrulkar authored
* fsdp+qlora related changes * fixes * Update quantization_config.py * support fsdp+qlora and dsz3+qlora * Update quantization_config.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * handle fsdp+qlora and dsz3+qlora correctly while model loading * fix param count * quality * fsdp related changes * fsdp changes only when using LoRA/QLoRA * add accelerate version check * refactor, update min accelerate version and add tests 1. Update minimum accelerate version to 0.26.0 2. Clean the trainer wrt accelerate version checks 3. FSDP refactor and test for fsdp config 4. use `itemsize` instead of `dtype2bytes` dict * fix test * Address comments Co-Authored-By:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * fix the conditional flag * fix conditional flag * address comments Co-Authored-By:
Zach Mueller <7831895+muellerzr@users.noreply.github.com> --------- Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by:
Zach Mueller <7831895+muellerzr@users.noreply.github.com>
-
- 11 Mar, 2024 1 commit
-
-
Yitong Huang authored
* add USE_TORCH_XLA env * rename torch_tpu to torch_xla * better is_torch_xla_available; fix some fsdp and performance issues * fix format * fix bug when pjrt_device is cpu * fix bug * fix the deprecation handling --------- Co-authored-by:
anw90 <ang868@gmail.com> Co-authored-by:
wangang.wa <wangang.wa@alibaba-inc.com>
-
- 08 Mar, 2024 1 commit
-
-
Yun Dai authored
fix FSDP config
-
- 06 Mar, 2024 1 commit
-
-
Matthew Hoffman authored
* Fix TrainingArguments regression with torch <2.0.0 for dataloader_prefetch_factor dataloader_prefetch_factor was added to TrainingArguments in #28498 with the default value None, but versions of torch<2.0.0 do not accept None and will raise an error if num_workers == 0 and prefetch_factor != 2 * Add is_torch_available() check * Use is_torch_greater_or_equal_than_2_0 add back check for dataloader_prefetch_factor
-
- 01 Mar, 2024 1 commit
-
-
Zach Mueller authored
* Fix deprecated arg issue * Trainer check too * Check for dict or dataclass * Simplify, make config always AcceleratorConfig * Upstream to Trainer
-
- 20 Feb, 2024 1 commit
-
-
Younes Belkada authored
* add RMSProp to Trainer * revert some change * Update src/transformers/trainer.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com>
-
- 14 Feb, 2024 2 commits
-
-
Jiewen Tan authored
* Initial commit * Add guards for the global mesh * Address more comments * Move the dataloader into integrations/tpu.py * Fix linters * Make karg more explicitly * Remove the move device logic * Fix the CI * Fix linters * Re-enable checkpointing
-
Zach Mueller authored
* Introduce acceleratorconfig dataclass * Extra second warn * Move import * Try moving import under is_accelerate_available * Quality * Apply suggestions from code review Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Clean * Remove to_kwargs * Change version * Improve tests by including dispatch and split batches * Improve reliability * Update tests/trainer/test_trainer.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Fixup tests and review nits * Make tests pass * protect import * Protect import * Empty-Commit * Make training_args.to_dict handle the AcceleratorConfig --------- Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com>
-
- 09 Feb, 2024 1 commit
-
-
Philip Blair authored
-
- 07 Feb, 2024 1 commit
-
-
Sai-Suraj-27 authored
Fixed the documentation for logging_first_step by removing evaluate.
-
- 05 Feb, 2024 1 commit
-
-
Zizhao Chen authored
Fix bad doc: replace save with logging
-
- 23 Jan, 2024 1 commit
-
-
Quentin Meeus authored
* add dataloader prefetch factor in training args and trainer * remove trailing spaces * prevent dataloader_num_workers == 0 and dataloader_prefetch_factor != None dataloader_prefetch_factor works only when data is loaded in a different process as the main one. This commit adds the necessary checks to avoid having prefetch_factor set when there is no such process. * Remove whitespaces in empty line * Update src/transformers/training_args.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com>
-
- 19 Jan, 2024 1 commit
-
-
Fanli Lin authored
* remove elif xpu * remove redudant code
-
- 12 Jan, 2024 1 commit
-
-
Joao Gante authored
-