- 07 Jul, 2021 1 commit
-
-
Souvic Chakraborty authored
* Validation split percentage to be used for custom data files also Issue same as https://github.com/huggingface/transformers/issues/12406 fixed for pytorch branch run_mlm.py * Validation split added in the right place * Update run_clm.py * validation split added for custom files * Validation split added for custom files * Update run_plm.py * fixed validation split for custom files as input for pytorch examples in lm * Update run_clm_no_trainer.py * args modified
-
- 28 Jun, 2021 2 commits
-
-
Bhadresh Savani authored
* added cotext manager to datasets map * fixed style and spaces * fixed warning of deprecation * changed desc
-
Taha ValizadehAslani authored
Before the code could not be used for validation only because of this line: extension = data_args.train_file.split(".")[-1] was assuming that extension must be extracted from the training dataset. This line would run regardless of the training or validation options of the user. This would lead to an error if the user only wants to run an evaluation only and does not want to do train (because the training file does not exist). I modified it to extract extension from the training file if the user wants to do train and extract it from the validation file if the user wants to run eval. This way the code can be used for both training and validation separately.
-
- 26 Jun, 2021 1 commit
-
-
Bhadresh Savani authored
-
- 25 Jun, 2021 4 commits
-
-
Bhadresh Savani authored
* added log_level * fix comment * fixed log_level * Trigger CI * Unfied logging * simplified args for log_level
-
Stas Bekman authored
* main_process_first context manager * handle multi-node, add context description * sync desc
-
Stas Bekman authored
-
michal pitr authored
-
- 23 Jun, 2021 2 commits
-
-
Sylvain Gugger authored
-
Sylvain Gugger authored
-
- 22 Jun, 2021 2 commits
-
-
Stas Bekman authored
* bug fixes and a rename * add extended DDP test
-
Stas Bekman authored
* set log level from CLI * add log_level_replica + test + extended docs * cleanup * Apply suggestions from code review Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * rename datasets objects to allow datasets module * improve the doc * style * doc improve Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
- 17 Jun, 2021 3 commits
-
-
Bhavitvya Malik authored
* update desc for map in all examples * added plm * suggestions
-
Lysandre authored
-
Lysandre authored
-
- 15 Jun, 2021 2 commits
-
-
Sylvain Gugger authored
* [WIP] Model card defaults * finetuned_from default value * Add all mappings to the mapping file * Be more defensive on finetuned_from arg * Add default task tag * Separate tags from tasks * Edge case for dataset * Apply suggestions from code review Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
kumapo authored
-
- 14 Jun, 2021 3 commits
-
-
Kumar Abhishek authored
* [lm examples] Replicate --config_overrides addition to other LM examples * Removing no trainer files changes * Update README Co-authored-by:Kumar Abhishek <kabhishek@expedia.com>
-
Nicholas Broad authored
* Use text_column_name variable instead of "text" `text_column_name` was already defined above where I made the changes and it was also used below where I made changes. This is a very minor change. If a dataset does not use "text" as the column name, then the `tokenize_function` will now use whatever column is assigned to `text_column_name`. `text_column_name` is just the first column name if "text" is not a column name. It makes the function a little more robust, though I would assume that 90% + of datasets use "text" anyway. * black formatting * make style Co-authored-by:Nicholas Broad <nicholas@nmbroad.com>
-
Sylvain Gugger authored
* Don't log anything before logging is setup in examples * Last example
-
- 10 Jun, 2021 4 commits
-
-
Bhavitvya Malik authored
* add relevant `desc` in examples * require_version datasets>=1.8.0
-
Matt authored
-
Sylvain Gugger authored
-
kumapo authored
* Add text_column_name and label_column_name to run_ner args * Minor fix: grouping for text and label column name
-
- 09 Jun, 2021 1 commit
-
-
Koichi Yasuoka authored
-
- 08 Jun, 2021 3 commits
-
-
Sylvain Gugger authored
-
cdleong authored
* Add torch to requirements.txt in language-modeling * Update examples/pytorch/language-modeling/requirements.txt Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Russell Klopfer authored
* adds metric prefix. * update tests to include prefix
-
- 01 Jun, 2021 1 commit
-
-
Fan Zhang authored
* modify qa-trainer * fix flax model
-
- 31 May, 2021 1 commit
-
-
Philip May authored
* Add MT5ForConditionalGeneration as supported arch. * Update README.md
-
- 25 May, 2021 4 commits
-
-
Stas Bekman authored
* create custom model on the flight * better wording * add update_from_string * cleanup * cleanup * Update src/transformers/configuration_utils.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * more bool options * style * fix logger * add test * add the doc * assert on conflict of options Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Stas Bekman authored
* fix overflow in perplexity calc * use inf * fix
-
Sylvain Gugger authored
* Add option to long only once in multinode training * Use an alternate property
-
Wang Ran (姹劧) authored
-
- 20 May, 2021 1 commit
-
-
Keren Fuentes authored
* add separator for windows * fixes test_is_copy_consistent on Windows * fixing writing encoding issue on extended test (for Windows) * resolving comments
-
- 18 May, 2021 4 commits
-
-
Tomy Hsieh authored
-
Philipp Schmid authored
* add `dataset_name` to data_args and added accuracy metric * added documentation for dataset_name * spelling correction
-
Patrick von Platen authored
* add headers to main doc * Apply suggestions from code review * update * upload
-
Tommy Chiang authored
-
- 17 May, 2021 1 commit
-
-
Sylvain Gugger authored
-