"examples/legacy/vscode:/vscode.git/clone" did not exist on "61e191987d8aa0778e0f44613deaf7ad99253cab"
- 07 Jun, 2023 3 commits
-
-
Sylvain Gugger authored
* Do not prepare lr scheduler as it as the right number of steps * Trigger CI * Trigger CI * Trigger CI * Add fake comment * Remove fake comment * Trigger CI please!
-
Sourab Mangrulkar authored
* fix executable batch size issue * fix * undo
-
Younes Belkada authored
* support PEFT models when saving the model using trainer * fixup
-
- 05 Jun, 2023 1 commit
-
-
Sourab Mangrulkar authored
* fix trainer slow tests * commit 2
-
- 02 Jun, 2023 1 commit
-
-
Claudius Kienle authored
Trainer: fixed KeyError on evaluate for ReduceLROnPlateau Co-authored-by:Claudius Kienle <claudius.kienle@artiminds.com>
-
- 31 May, 2023 8 commits
-
-
Sourab Mangrulkar authored
remove the extra `accelerator.prepare` that slipped in with multiple update from main
😅 -
Sylvain Gugger authored
-
Sourab Mangrulkar authored
* mixed precision support via accelerate * fix issues * fix for the sharded ddp case * fix flax and tf failing tests * `refactor the place to create `Accelerator` object * move ddp prep to accelerate * fix
😅 * resolving comments * move fsdp handling to accelerate * fixex * fix saving * shift torch dynamo handling to accelerate * shift deepspeed integration and save & load utils to accelerate * fix accelerate launcher support * oops * fix🐛 * save ckpt fix * Trigger CI * nasty🐛 😅 * as deepspeed needs grad_acc fixes, transfer grad_acc to accelerate * make tests happy * quality✨ * loss tracked needs to account for grad_acc * fixing the deepspeed tests * quality✨ *😅 😅 😅 * tests😡 * quality✨ * Trigger CI * resolve comments and fix the issue with the previous merge from branch * Trigger CI * accelerate took over deepspeed integration --------- Co-authored-by:Stas Bekman <stas@stason.org>
-
Sylvain Gugger authored
-
Sourab Mangrulkar authored
* mixed precision support via accelerate * fix issues * fix for the sharded ddp case * fix flax and tf failing tests * `refactor the place to create `Accelerator` object * move ddp prep to accelerate * fix
😅 * resolving comments * move fsdp handling to accelerate * fixex * fix saving * shift torch dynamo handling to accelerate -
Sourab Mangrulkar authored
* mixed precision support via accelerate * fix issues * fix for the sharded ddp case * fix flax and tf failing tests * `refactor the place to create `Accelerator` object * move ddp prep to accelerate * fix
😅 * resolving comments * move fsdp handling to accelerate * fixex * fix saving -
Sourab Mangrulkar authored
* mixed precision support via accelerate * fix issues * fix for the sharded ddp case * fix flax and tf failing tests * `refactor the place to create `Accelerator` object * move ddp prep to accelerate * fix
😅 * resolving comments -
Sourab Mangrulkar authored
* mixed precision support via accelerate * fix issues * fix for the sharded ddp case * fix flax and tf failing tests * `refactor the place to create `Accelerator` object * address comments by removing debugging print statements
-
- 26 May, 2023 1 commit
-
-
Zachary Mueller authored
Log the right train_batch_size if using auto_find_batch_size and also log the adjusted value seperately. (#23800) * Log right bs * Log * Diff message
-
- 25 May, 2023 1 commit
-
-
Sylvain Gugger authored
-
- 24 May, 2023 3 commits
-
-
Zachary Mueller authored
* Check for use_sagemaker_dp * Add a check for is_sagemaker_mp when setting _n_gpu again. Should be last broken thing * Try explicit check? * Quality
-
Tim Dettmers authored
* Added lion and paged optimizers and made original tests pass. * Added tests for paged and lion optimizers. * Added and fixed optimizer tests. * Style and quality checks. --------- Co-authored-by:younesbelkada <younesbelkada@gmail.com>
-
Tim Dettmers authored
* Added lion and paged optimizers and made original tests pass. * Added tests for paged and lion optimizers. * Added and fixed optimizer tests. * Style and quality checks. * Initial draft. Some tests fail. * Fixed dtype bug. * Fixed bug caused by torch_dtype='auto'. * All test green for 8-bit and 4-bit layers. * Added fix for fp32 layer norms and bf16 compute in LLaMA. * Initial draft. Some tests fail. * Fixed dtype bug. * Fixed bug caused by torch_dtype='auto'. * All test green for 8-bit and 4-bit layers. * Added lion and paged optimizers and made original tests pass. * Added tests for paged and lion optimizers. * Added and fixed optimizer tests. * Style and quality checks. * Fixing issues for PR #23479. * Added fix for fp32 layer norms and bf16 compute in LLaMA. * Reverted variable name change. * Initial draft. Some tests fail. * Fixed dtype bug. * Fixed bug caused by torch_dtype='auto'. * All test green for 8-bit and 4-bit layers. * Added lion and paged optimizers and made original tests pass. * Added tests for paged and lion optimizers. * Added and fixed optimizer tests. * Style and quality checks. * Added missing tests. * Fixup changes. * Added fixup changes. * Missed some variables to rename. * revert trainer tests * revert test trainer * another revert * fix tests and safety checkers * protect import * simplify a bit * Update src/transformers/trainer.py * few fixes * add warning * replace with `load_in_kbit = load_in_4bit or load_in_8bit` * fix test * fix tests * this time fix tests * safety checker * add docs * revert torch_dtype * Apply suggestions from code review Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * multiple fixes * update docs * version checks and multiple fixes * replace `is_loaded_in_kbit` * replace `load_in_kbit` * change methods names * better checks * oops * oops * address final comments --------- Co-authored-by:
younesbelkada <younesbelkada@gmail.com> Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
- 23 May, 2023 1 commit
-
-
小桐桐 authored
Ref: https://github.com/huggingface/peft/issues/394 Loading a quantized checkpoint into non-quantized Linear8bitLt is not supported. call module.cuda() before module.load_state_dict()
-
- 17 May, 2023 1 commit
-
-
Hugo Abonizio authored
-
- 16 May, 2023 1 commit
-
-
ropoctl authored
Logging an error and continuing is probably following the principle of least surprise.
-
- 09 May, 2023 1 commit
-
-
Konstantin Dobler authored
* Ratio option for `logging_steps`, `eval_steps`, `save_steps` * Add guards if arguments are not set * Add more detailed comments + formatting * Update src/transformers/training_args.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Convert args values to `int` if bigger than 1 * `black` * `make fixup` --------- Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
- 04 May, 2023 1 commit
-
-
Qingyang Wu authored
* fix resume fsdp * fix rank 0 loading * fix style and quality
-
- 02 May, 2023 1 commit
-
-
Wing Lian authored
-
- 28 Apr, 2023 2 commits
-
-
Shivam Shrirao authored
Cuda rng_state_all is used when saving in distributed mode so same should also be used when loading (#23045) cuda rng state should be all for distributed bc all were saved
-
Maxime Méloux authored
* Add Trainer support for ReduceLROnPlateau Fixes #16503 * Remove training argument and add default instance --------- Co-authored-by:mmeloux <maxime.meloux@loria.fr>
-
- 21 Apr, 2023 1 commit
-
-
Wing Lian authored
ddp fixes for stable lm training
-
- 19 Apr, 2023 1 commit
-
-
Liu Chenyang authored
* move preprocess_logits_for_metrics before _nested_gather in trainer.evaluation_loop * fix * Update src/transformers/trainer.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fix * fix --------- Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
- 17 Apr, 2023 1 commit
-
-
Zachary Mueller authored
* Use accelerate for device management * Add accelerate to setup Co-authored-by:Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
- 07 Apr, 2023 1 commit
-
-
Seung-Moo Yang authored
-
- 06 Apr, 2023 2 commits
-
-
Sourab Mangrulkar authored
fix fsdp
-
Younes Belkada authored
add safety checker
-
- 05 Apr, 2023 1 commit
-
-
Quentin Meeus authored
The logger prints a summary at the beginning of training that displays some info such as number of examples, number of parameters, total number of steps, etc. Those numbers can be quite large and difficult to read. I added a thousand separator to improve readability for the following: - num_examples - num_train_epochs - per_device_train_batch_size - total_train_batch_size - max_steps - num_trainable_params
-
- 04 Apr, 2023 1 commit
-
-
Viktor Scherbakov authored
* implemented safetensors save/load * remove duplicated file * added tests * more tests * style fix * fix tf tests * change to list comprehension Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * review fixes + safe load for sharded checkpoint * style fix * remove rogue import * remove partial to avoid undefined exception * use naming alias instead of safetensors.torch * fix safe sharding in tests * grammar Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * update docs Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * update docs Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * minor corrections * style --------- Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
- 03 Apr, 2023 3 commits
-
-
Xuehai Pan authored
* [setup] drop deprecated `distutils` usage * drop deprecated `distutils.util.strtobool` usage * fix import order * reformat docstring by `doc-builder`
-
Ilya authored
-
Younes Belkada authored
[`Trainer`] Force `is_model_parallel` when model is loaded in multiple GPUs using `accelerate` (#22532) * add `is_model_parallel` arg on Trainer * add warning * adapt from suggestions * revert t5 changes * remove commas * adapt from suggestions
-
- 29 Mar, 2023 1 commit
-
-
jeffhataws authored
This reverts commit fd81746dbec5f17c8285a0fdc72ca4b4c025cc33.
-
- 23 Mar, 2023 2 commits
-
-
jeffhataws authored
This PR fixes the "RuntimeError: No CUDA GPUs are available" when running with --bf16 option on Neuron. Related PRs: https://github.com/huggingface/transformers/pull/20684 https://github.com/huggingface/transformers/pull/22300
-
Quentin Lhoest authored
* Mention why one needs to specify max_steps in Trainer * dummy change to trigger CI
-