- 18 May, 2021 1 commit
-
-
Sylvain Gugger authored
-
- 13 May, 2021 1 commit
-
-
Volodymyr Byno authored
-
- 11 May, 2021 2 commits
-
-
Sylvain Gugger authored
* Add test and see where CI is unhappy * Load with strict=False
-
Sylvain Gugger authored
* Autogenerate model cards from the Trainer * ModelCard deprecated * Fix test * Style * Apply suggestions from code review Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com> * Address review comments * Quality * With all metadata * Metadata * Post-merge conflict mess * Data args and all examples * Default license and languages when possible Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com>
-
- 10 May, 2021 1 commit
-
-
Sylvain Gugger authored
-
- 06 May, 2021 1 commit
-
-
Sylvain Gugger authored
* Fix RNG saves in distributed mode. * Update src/transformers/trainer.py Co-authored-by:
Stas Bekman <stas00@users.noreply.github.com> Co-authored-by:
Stas Bekman <stas00@users.noreply.github.com>
-
- 04 May, 2021 1 commit
-
-
Sylvain Gugger authored
* Set generator in dataloader * Use generator in all random samplers * Checkpoint all RNG states * Final version * Quality * Test * Address review comments * Quality * Remove debug util * Add python and numpy RNGs * Split states in different files in distributed * Quality * local_rank for TPUs * Only use generator when accepted * Add test * Set seed to avoid flakiness * Make test less flaky * Quality
-
- 03 May, 2021 1 commit
-
-
Sylvain Gugger authored
-
- 30 Apr, 2021 1 commit
-
-
Stas Bekman authored
* sync * add activation overflow debug utility * cleanup * document detect_overflow * import torch * add deprecation warning * Apply suggestions from code review Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * convert to rst, add note * add class * fix docs * improve the doc * rework to dump a lot more info about each frame * complete expansion * cleanup * format * cleanup * doesn't have to be transformers * Apply suggestions from code review Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * wrap long line * style Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
- 26 Apr, 2021 4 commits
-
-
Sylvain Gugger authored
* Pass along seed to DistributedSampler * Add seed to DistributedLengthGroupedSampler
-
LSinev authored
-
Sylvain Gugger authored
* Add FP16 support for SageMaker MP * Add print debugs * Squeeze * Remove debug statements * Add defensive check * Typo
-
Patrick von Platen authored
-
- 23 Apr, 2021 2 commits
-
-
Sylvain Gugger authored
* Initial support for upload to hub * push -> upload * Fixes + examples * Fix torchhub test * Torchhub test I hate you * push_model_to_hub -> push_to_hub * Apply mixin to other pretrained models * Remove ABC inheritance * Add tests * Typo * Run tests * Install git-lfs * Change approach * Add push_to_hub to all * Staging test suite * Typo * Maybe like this? * More deps * Cache * Adapt name * Quality * MOAR tests * Put it in testing_utils * Docs + torchhub last hope * Styling * Wrong method * Typos * Update src/transformers/file_utils.py Co-authored-by:
Julien Chaumond <julien@huggingface.co> * Address review comments * Apply suggestions from code review Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by:
Julien Chaumond <julien@huggingface.co> Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com>
-
Teven authored
* Fixed trainer total_flos relaoding in distributed mode * logging flos at the end of training
-
- 22 Apr, 2021 1 commit
-
-
Sylvain Gugger authored
* Fix Trainer with remove_unused_columns=False * Typo
-
- 21 Apr, 2021 1 commit
-
-
Stas Bekman authored
This PR fixes a bug that most likely somehow got exposed (not caused) by https://github.com/huggingface/transformers/pull/11318 - surprisingly the same test worked just fine before that other PR.
-
- 20 Apr, 2021 2 commits
-
-
Sylvain Gugger authored
* Update to use datasets remove_cloumns method * Quality
-
Sylvain Gugger authored
-
- 19 Apr, 2021 2 commits
-
-
Sylvain Gugger authored
-
Stas Bekman authored
* fix the placement on device with fp16_full_eval * deepspeed never goes on device
-
- 16 Apr, 2021 1 commit
-
-
Sylvain Gugger authored
* Bulk of the work * Polish and tests * Update QA Trainer * Avoid breaking the predict method * Deprecation warnings * Store real eval dataloder * Get eval dataset reference before wrap
-
- 15 Apr, 2021 1 commit
-
-
Sylvain Gugger authored
-
- 14 Apr, 2021 1 commit
-
-
Sylvain Gugger authored
* IterableDatasetShard * Test and integration in Trainer * Update src/transformers/trainer_pt_utils.py Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * Style Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
- 08 Apr, 2021 4 commits
-
-
Stas Bekman authored
* make fairscale and deepspeed setup extras * fix default * Apply suggestions from code review Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * no reason not to ask for the good version * update the CIs Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Stas Bekman authored
* solve "scheduler before optimizer step" warning * style * correct the state evaluation test
-
Stas Bekman authored
* synced gpus * fix * fix * need to use t5-small for quality tests * notes * complete merge * fix a disappearing std stream problem * start zero3 tests * wip * tune params * sorting out the pre-trained model loading * reworking generate loop wip * wip * style * fix tests * split the tests * refactor tests * wip * parameterized * fix * workout the resume from non-ds checkpoint pass + test * cleanup * remove no longer needed code * split getter/setter functions * complete the docs * suggestions * gpus and their compute capabilities link * Apply suggestions from code review Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * style * remove invalid paramgd * automatically configure zero3 params that rely on hidden size * make _get_resized_embeddings zero3-aware * add test exercising resize_token_embeddings() * add docstring Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
Jannis Born authored
* fix: docstrings in prediction_step * ci: Satisfy line length requirements * ci: character length requirements
-
- 31 Mar, 2021 2 commits
-
-
Sylvain Gugger authored
* Replace is_sagemaker_distributed_available * Merge SageMakerTrainer into Trainer * Test with shorter condition * Put back deleted line * Deprecate SageMakerTrainer and SageMakerTrainingArguments * Apply suggestions from code review Co-authored-by:
Philipp Schmid <32632186+philschmid@users.noreply.github.com> Co-authored-by:
Philipp Schmid <32632186+philschmid@users.noreply.github.com>
-
Sylvain Gugger authored
* First third * Styling and fix mistake * Quality * All the rest * Treat %s and %d * typo * Missing ) * Apply suggestions from code review Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
- 29 Mar, 2021 1 commit
-
-
pcuenca authored
A new argument `length_column_name` has been added to `TrainingArguments`, with default value `"length"`. If this column exists and `group_by_length` is `True`, the train sampler will use it for grouping rather than computing it before training starts. This is an optimization that allows the user to prepare data for fast processing, preventing sequential access to the dataset as described in issue #10909.
-
- 24 Mar, 2021 1 commit
-
-
imzhengzx authored
the orignal code in line 246 is ``` tokenizer: Optional["PreTrainedTokenizerBase"] = None, ``` it should be ``` tokenizer: Optional[PreTrainedTokenizerBase] = None, ```
-
- 23 Mar, 2021 1 commit
-
-
Bhadresh Savani authored
-
- 22 Mar, 2021 2 commits
-
-
Ruan Chaves authored
* Modify the _hp_search_setup method on the Trainer class to handle the wandb argument passed by Ray Tune to model config. * Reformat single quotes as double quotes.
-
Sidd Karamcheti authored
Add simple one character fix so that on_step_begin and on_step_end are called at the right times (#10839)
-
- 18 Mar, 2021 1 commit
-
-
Sylvain Gugger authored
* Fix distributed evaluation * Use logger
-
- 17 Mar, 2021 3 commits
-
-
Mansi Mane authored
* Added debug prints * Added config * Added prints * Added prints * Added extra samples to SequentialDistributedSampler * Added extra samples to SequentialDistributedSampler Updated SequentialDistributedSampler call * Added deubg prints * Removed extra prints * Making predicitons and labels multiple of batchsize * updated number of microbatches * Removed extra prints * Made start_remainder similar to DistributedSamplerWithLoop * Minor spacing update * Added debug prints Added config Added prints Added prints * Added extra samples to SequentialDistributedSampler Updated SequentialDistributedSampler call Added extra samples to SequentialDistributedSampler Added deubg prints Removed extra prints Making predicitons and labels multiple of batchsize updated number of microbatches Removed extra prints Squashing redundant commits * Made start_remainder similar to DistributedSamplerWithLoop Minor spacing update Made start_remainder similar to DistributedSamplerWithLoop * Test and styling * Rename test Co-authored-by:Sylvain Gugger <sylvain.gugger@gmail.com>
-
Stas Bekman authored
-
Stas Bekman authored
* deepspeed checkpoint loading code plus tests * style * style
-
- 16 Mar, 2021 1 commit
-
-
Cheng Li authored
* pass hf optimizer and scheduler to deepspeed if not specified in ds config * pass hf optimizer and scheduler to deepspeed if not specified in ds config * update * make init_deepspeed support config dict * fix docstring formatting * clean up trainer's comments * add new tests * fix type * composit argparse doesn't work * style * add a new test, rename others * document new functionality * complete tests, add docs * style * correct level * Apply suggestions from code review Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * add new methods to the doc * must tell DS we are using a non-native optimizer * add protection against cpu_offload + HF optimizer combo * fix the cli overrides * sync docs + tests * restore AdamW * better docs * need new version * no longer needed * remove outdate information * refactor duplicated code Co-authored-by:
Stas Bekman <stas@stason.org> Co-authored-by:
Stas Bekman <stas00@users.noreply.github.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-