- 04 Feb, 2021 2 commits
-
-
Sylvain Gugger authored
-
Stas Bekman authored
* trainer fixes * don't switch the model just for deepspeed and mp * correct the fix
-
- 03 Feb, 2021 1 commit
-
-
yylun authored
* fix steps_in_epoch variable when using max_steps * redundant sentence * Revert "redundant sentence" This reverts commit ad5c0e9b6e66d65732dee2239cdc9c76dfa0dc5a. * remove redundant sentence Co-authored-by:wujindou <wujindou@sogou-inc.com>
-
- 02 Feb, 2021 1 commit
-
-
Sylvain Gugger authored
-
- 29 Jan, 2021 1 commit
-
-
Sylvain Gugger authored
* When on sagemaker use their env variables for saves * Address review comments * Quality
-
- 28 Jan, 2021 4 commits
-
-
abhishek thakur authored
-
abhishek thakur authored
-
Sylvain Gugger authored
-
abhishek thakur authored
Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by:
Stas Bekman <stas00@users.noreply.github.com>
-
- 27 Jan, 2021 2 commits
-
-
Sylvain Gugger authored
* Whenresuming training from checkpoint, Trainer loads model * Finish cleaning tests * Address review comment * Use global_step from state
-
Sylvain Gugger authored
* Add a flag for find_unused_parameters * Apply suggestions from code review Co-authored-by:
Stas Bekman <stas00@users.noreply.github.com> * Remove negation Co-authored-by:
Stas Bekman <stas00@users.noreply.github.com>
-
- 26 Jan, 2021 1 commit
-
-
Sylvain Gugger authored
* Add a debug print * Adapt Trainer to use smdistributed if available * Forgotten parenthesis * Real check for sagemaker * Donforget to define device... * Woopsie, local)rank is defined differently * Update since local_rank has the proper value * Remove debug statement * More robust check for smdistributed * Quality * Deal with key not present error
-
- 25 Jan, 2021 2 commits
-
-
Sylvain Gugger authored
-
Sorami Hisamoto authored
`compute_objectie` => `compute_objective`
-
- 22 Jan, 2021 1 commit
-
-
Sylvain Gugger authored
-
- 21 Jan, 2021 2 commits
-
-
Sylvain Gugger authored
* Fix memory regression in Seq2Seq example * Fix test and properly deal with -100 * Easier condition with device safety * Patch for MBartTokenzierFast
-
Stas Bekman authored
* no --deepspeed and --sharded_ddp together * Update src/transformers/trainer.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * style Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
- 20 Jan, 2021 1 commit
-
-
Stas Bekman authored
-
- 15 Jan, 2021 1 commit
-
-
Stas Bekman authored
-
- 14 Jan, 2021 1 commit
-
-
Sylvain Gugger authored
* Upstream (and rename) sortish sampler * Use proper sampler * Update src/transformers/trainer_pt_utils.py Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
- 13 Jan, 2021 1 commit
-
-
Stas Bekman authored
* deepspeed integration * style * add test * ds wants to do its own backward * fp16 assert * Update src/transformers/training_args.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * style * for clarity extract what args are being passed to deepspeed * introduce the concept of self.wrapped_model * s/self.wrapped_model/self.model_wrapped/ * complete transition to self.wrapped_model / self.model * fix * doc * give ds its own init * add custom overrides, handle bs correctly * fix test * clean up model_init logic, fix small bug * complete fix * collapse --deepspeed_config into --deepspeed * style * start adding doc notes * style * implement hf2ds optimizer and scheduler configuration remapping * oops * call get_num_training_steps absolutely when needed * workaround broken auto-formatter * deepspeed_config arg is no longer needed - fixed in deepspeed master * use hf's fp16 args in config * clean * start on the docs * rebase cleanup * finish up --fp16 * clarify the supported stages * big refactor thanks to discovering deepspeed.init_distributed * cleanup * revert fp16 part * add checkpoint-support * more init ds into integrations * extend docs * cleanup * unfix docs * clean up old code * imports * move docs * fix logic * make it clear which file it's referring to * document nodes/gpus * style * wrong format * style * deepspeed handles gradient clipping * easier to read * major doc rewrite * Apply suggestions from code review Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * docs * switch to AdamW optimizer * style * Apply suggestions from code review Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * clarify doc Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
- 11 Jan, 2021 2 commits
-
-
Stas Bekman authored
* round numbers * style * round only on logging
-
Stas Bekman authored
* fix bad merge - dropped code * remove --model_parallel * Deal with TrainingArguments * Use a private attr and fix batch sizes * fix _n_gpu * add is_parallel helper wrapper * fix attribute * introduce a new attribute is_model_parallel * docs * docs * Put back init False and rearrange doc * Ignore non-init args in HFArgumentParser Co-authored-by:Sylvain Gugger <sylvain.gugger@gmail.com>
-
- 06 Jan, 2021 2 commits
-
-
Sylvain Gugger authored
* Don't import libs to check they are available * Don't import integrations at init * Add importlib_metdata to deps * Remove old vars references * Avoid syntax error * Adapt testing utils * Try to appease torchhub * Add dependency * Remove more private variables * Fix typo * Another typo * Refine the tf availability test
-
Stas Bekman authored
* model wrapped + model_unwrap * cleanup * Apply suggestions from code review Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * style * deprecation warning * Apply suggestions from code review Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
- 05 Jan, 2021 2 commits
-
-
Stas Bekman authored
* --model_parallel hasn't been implemented for most models * make the help clear as well * implement is_parallelizable; use it * oops * remove property
-
Boris Dayma authored
* feat(wandb): log artifacts * fix: typo * feat(wandb): ensure name is allowed * feat(wandb): log artifact * feat(wandb): saving logic * style: improve formatting * fix: unrelated typo * feat:聽use a fake trainer * fix:聽simplify * feat(wandb): log model files as artifact * style: fix style * docs(wandb): correct description * feat: unpack model + allow env Truethy values * feat: TrainerCallback can access tokenizer * style:聽fix style * feat(wandb): log more interesting metadata * feat: unpack tokenizer * feat(wandb): metadata with load_best_model_at_end * feat(wandb): more robust metadata * style(wandb): fix formatting
-
- 04 Jan, 2021 1 commit
-
-
Stas Bekman authored
This PR: * fixes trainer to have the logger agree with the actual default `output_dir`, but setting it one place and passing it as an argument to both places @sgugger
-
- 22 Dec, 2020 1 commit
-
-
Sylvain Gugger authored
* Add label smoothing in Trainer * Add options for scheduler and Adafactor in Trainer * Put Seq2SeqTrainer in the main lib * Apply suggestions from code review Co-authored-by:
Stas Bekman <stas00@users.noreply.github.com> Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com> * Address review comments and adapt scripts * Documentation * Move test not using script to tests folder Co-authored-by:
Stas Bekman <stas00@users.noreply.github.com> Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com>
-
- 21 Dec, 2020 1 commit
-
-
Amog Kamsetty authored
* wip * wip * wip * wip * wip * wip * wip * wip * uncomment * uncomment * wip * updates * add docstring * updates * fix arg * fixes * add unit tests * update readme * update readme * update finetune script * update test * add test * add ray to test dependencies * separate ray and ray tune * formatting * shutdown ray at end of test * fix tests * formatting * formatting * even more formatting * address comments * formatting * add files * Update examples/research_projects/rag/test_distributed_retriever.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * address comments * addressing comments Co-authored-by:
Ubuntu <ubuntu@ip-172-31-21-208.us-west-2.compute.internal> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
- 18 Dec, 2020 2 commits
-
-
Sylvain Gugger authored
* Add timing inside Trainer * Fix tests * Add n_objs for train * Sort logs
-
Stas Bekman authored
-
- 17 Dec, 2020 1 commit
-
-
Sylvain Gugger authored
* Fix gradient clipping for Sharded DDP * Fix typos in comments
-
- 16 Dec, 2020 1 commit
-
-
Sylvain Gugger authored
* Experimental stupport for fairscale ShardedDDP * Add import error if fairscale not available * Address review comments * Fix seq2seq trainer
-
- 15 Dec, 2020 2 commits
-
-
Sylvain Gugger authored
* Add possibility to switch between APEX and AMP in Trainer * Update src/transformers/training_args.py Co-authored-by:
Stas Bekman <stas00@users.noreply.github.com> * Address review comments * Update src/transformers/training_args.py Co-authored-by:
Stas Bekman <stas00@users.noreply.github.com> Co-authored-by:
Stas Bekman <stas00@users.noreply.github.com>
-
Stas Bekman authored
* trainer and finetune_trainer enhancements and fixes * add fallback default * move the fixing of incorrect keys back into finetune trainer * s/eval/val/ to match the split * trainer can now use a different prefix than eval_ for metrics * document new arg * Apply suggestions from code review Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * use 'eval' as the default for metric_key_prefix * complete adjust var names + disambiguate * fix logger * add clarifying comment * add clarifying comment * style * Apply suggestions from code review Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/trainer.py Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com> * complete removal of optional for metric_key_prefix * Apply suggestions from code review Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com>
-
- 09 Dec, 2020 1 commit
-
-
Sylvain Gugger authored
-
- 02 Dec, 2020 1 commit
-
-
Stas Bekman authored
* [trainer] improve code This PR: - removes redundant code ``` self.model = model if model is not None else None ``` and ``` self.model = model ``` are the same. * separate attribute assignment from code logic - which simplifies things further. * whitespace
-
- 01 Dec, 2020 1 commit
-
-
Sylvain Gugger authored
-
- 30 Nov, 2020 1 commit
-
-
Shai Erera authored
* Use model.from_pretrained for DataParallel also When training on multiple GPUs, the code wraps a model with torch.nn.DataParallel. However if the model has custom from_pretrained logic, it does not get applied during load_best_model_at_end. This commit uses the underlying model during load_best_model_at_end, and re-wraps the loaded model with DataParallel. If you choose to reject this change, then could you please move the this logic to a function, e.g. def load_best_model_checkpoint(best_model_checkpoint) or something, so that it can be overridden? * Fix silly bug * Address review comments Thanks for the feedback. I made the change that you proposed, but I also think we should update L811 to check if `self.mode` is an instance of `PreTrained`, otherwise we would still not get into that `if` section, right?
-