Commits · 1d6623c6a25f9c1be3af36ffdcc3b0e0d3848999 · chenpangpang / transformers

07 Jul, 2021 1 commit

MLM training fails with no validation file(same as #12406 for pytorch now) (#12517) · 1d6623c6

Souvic Chakraborty authored Jul 07, 2021

* Validation split percentage to be used for custom data files also

Issue same as https://github.com/huggingface/transformers/issues/12406 fixed for pytorch branch run_mlm.py

* Validation split added in the right place

* Update run_clm.py

* validation split added for custom files

* Validation split added for custom files

* Update run_plm.py

* fixed validation split for custom files as input for pytorch examples in lm

* Update run_clm_no_trainer.py

* args modified

1d6623c6

28 Jun, 2021 2 commits

[Examples] Added context manager to datasets map (#12367) · 04dbea31

Bhadresh Savani authored Jun 28, 2021

* added cotext manager to datasets map

* fixed style and spaces

* fixed warning of deprecation

* changed desc

04dbea31

Update run_mlm.py (#12344) · 9490d668

Taha ValizadehAslani authored Jun 28, 2021

Before the code could not be used for validation only because of this line:
extension = data_args.train_file.split(".")[-1]
was assuming that extension must be extracted from the training dataset. This line would run regardless of the training or validation options of the user. This would lead to an error if the user only wants to run an evaluation only and does not want to do train (because the training file does not exist). I modified it to extract extension from the training file if the user wants to do train and extract it from the validation file if the user wants to run eval. This way the code can be used for both training and validation separately.

9490d668

26 Jun, 2021 1 commit
- replace print with logger (#12368) · ff5cdc08
  Bhadresh Savani authored Jun 26, 2021
  
  ff5cdc08
25 Jun, 2021 4 commits
- [Examples] Replicates the new --log_level feature to all trainer-based pytorch (#12359) · 539ee456
  Bhadresh Savani authored Jun 25, 2021
```
* added log_level

* fix comment

* fixed log_level

* Trigger CI

* Unfied logging

* simplified args for log_level
```
  539ee456
- [trainer] add main_process_first context manager (#12351) · 64e60980
  Stas Bekman authored Jun 25, 2021
```
* main_process_first context manager

* handle multi-node, add context description

* sync desc
```
  64e60980
- remove extra white space from log format (#12360) · 4a872cae
  Stas Bekman authored Jun 25, 2021
  
  4a872cae
- fixed typo (#12356) · d4ce31e8
  michal pitr authored Jun 25, 2021
  
  d4ce31e8
23 Jun, 2021 2 commits
- v4.9.0.dev0 · 2150dfed
  Sylvain Gugger authored Jun 23, 2021
  
  2150dfed
- Release: v4.8.0 · 9252a512
  Sylvain Gugger authored Jun 23, 2021
  
  9252a512
22 Jun, 2021 2 commits

[trainer] 2 bug fixes and a rename (#12309) · ebe54135
Stas Bekman authored Jun 22, 2021
```
* bug fixes and a rename

* add extended DDP test
```
ebe54135

[trainer + examples] set log level from CLI (#12276) · dad414d5

Stas Bekman authored Jun 21, 2021



* set log level from CLI

* add log_level_replica + test + extended docs

* cleanup

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* rename datasets objects to allow datasets module

* improve the doc

* style

* doc improve
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

dad414d5

17 Jun, 2021 3 commits
- update desc for map in all examples (#12226) · e43e1126
  Bhavitvya Malik authored Jun 18, 2021
```
* update desc for map in all examples

* added plm

* suggestions
```
  e43e1126
- Docs for v4.8.0 · 0daadc19
  Lysandre authored Jun 17, 2021
  
  0daadc19
- Release: v4.7.0 · 7a6c9fab
  Lysandre authored Jun 17, 2021
  
  7a6c9fab
15 Jun, 2021 2 commits

Model card defaults (#12122) · 7d7ceca3

Sylvain Gugger authored Jun 15, 2021



* [WIP] Model card defaults

* finetuned_from default value

* Add all mappings to the mapping file

* Be more defensive on finetuned_from arg

* Add default task tag

* Separate tags from tasks

* Edge case for dataset

* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

7d7ceca3

Enable add_prefix_space if model_type is roberta or gpt2 (#12116) · 955b2b97
kumapo authored Jun 15, 2021

955b2b97

14 Jun, 2021 3 commits

[lm examples] Replicate --config_overrides addition to other LM examples (#12135) · 9de62cfb

Kumar Abhishek authored Jun 14, 2021



* [lm examples] Replicate --config_overrides addition to other LM examples

* Removing no trainer files changes

* Update README
Co-authored-by: Kumar Abhishek <kabhishek@expedia.com>

9de62cfb

Use text_column_name variable instead of "text" (#12132) · cd7961b6

Nicholas Broad authored Jun 14, 2021



* Use text_column_name variable instead of "text"

`text_column_name` was already defined above where I made the changes and it was also used below where I made changes.

This is a very minor change. If a dataset does not use "text" as the column name, then the `tokenize_function` will now use whatever column is assigned to `text_column_name`. `text_column_name` is just the first column name if "text" is not a column name. It makes the function a little more robust, though I would assume that 90% + of datasets use "text" anyway.

* black formatting

* make style
Co-authored-by: Nicholas Broad <nicholas@nmbroad.com>

cd7961b6

Don't log anything before logging is setup in examples (#12121) · b8ab5413
Sylvain Gugger authored Jun 14, 2021
```
* Don't log anything before logging is setup in examples

* Last example
```
b8ab5413

10 Jun, 2021 4 commits
- add relevant description to tqdm in examples (#11927) · d2753dcb
  Bhavitvya Malik authored Jun 11, 2021
```
* add relevant `desc` in examples

* require_version datasets>=1.8.0
```
  d2753dcb
- Appending label2id and id2label to models to ensure inference works properly (#12102) · bebbdd0f
  Matt authored Jun 10, 2021
  
  bebbdd0f
- Fix quality · d72e5a3a
  Sylvain Gugger authored Jun 10, 2021
  
  d72e5a3a
- Add text_column_name and label_column_name to run_ner and run_ner_no_trainer args (#12083) · 472a8676
  kumapo authored Jun 10, 2021
```
* Add text_column_name and label_column_name to run_ner args

* Minor fix: grouping for text and label column name
```
  472a8676
09 Jun, 2021 1 commit
- Update run_ner.py with id2label config (#12001) · 82a2b76c
  Koichi Yasuoka authored Jun 09, 2021
  
  82a2b76c
08 Jun, 2021 3 commits

Properly indent block_size (#12070) · fd690283
Sylvain Gugger authored Jun 08, 2021

fd690283

Add torch to requirements.txt in language-modeling (#12040) · 49bee0ae

cdleong authored Jun 08, 2021



* Add torch to requirements.txt in language-modeling

* Update examples/pytorch/language-modeling/requirements.txt
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

49bee0ae

adds metric prefix. (#12057) · e363e1d9
Russell Klopfer authored Jun 07, 2021
```
* adds metric prefix.

* update tests to include prefix
```
e363e1d9

01 Jun, 2021 1 commit
- modify qa-trainer (#11872) · 7e73601f
  Fan Zhang authored Jun 01, 2021
```
* modify qa-trainer

* fix flax model
```
  7e73601f
31 May, 2021 1 commit
- Add MT5ForConditionalGeneration as supported arch. to summarization README (#11961) · cfca638a
  Philip May authored May 31, 2021
```
* Add MT5ForConditionalGeneration as supported arch.

* Update README.md
```
  cfca638a
25 May, 2021 4 commits

[Examples] create model with custom config on the fly (#11798) · 1b653010

Stas Bekman authored May 25, 2021



* create custom model on the flight

* better wording

* add update_from_string

* cleanup

* cleanup

* Update src/transformers/configuration_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* more bool options

* style

* fix logger

* add test

* add the doc

* assert on conflict of options
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

1b653010

[lm examples] fix overflow in perplexity calc (#11855) · 6287c929
Stas Bekman authored May 25, 2021
```
* fix overflow in perplexity calc

* use inf

* fix
```
6287c929
Add option to log only once in multinode training (#11819) · f086652b
Sylvain Gugger authored May 25, 2021
```
* Add option to long only once in multinode training

* Use an alternate property
```
f086652b
typo (#11858) · b8344a27
Wang Ran (汪然) authored May 25, 2021

b8344a27

20 May, 2021 1 commit

Fix failing test on Windows Platform (#11589) · 22394387

Keren Fuentes authored May 20, 2021

* add separator for windows

* fixes test_is_copy_consistent on Windows

* fixing writing encoding issue on extended test (for Windows)

* resolving comments

22394387

18 May, 2021 4 commits
- Fix a small error in summarization example (#11762) · eb3e072a
  Tomy Hsieh authored May 19, 2021
  
  eb3e072a
- add `dataset_name` to data_args and added accuracy metric (#11760) · 04e25c62
  Philipp Schmid authored May 18, 2021
```
* add `dataset_name` to data_args and added accuracy metric

* added documentation for dataset_name

* spelling correction
```
  04e25c62
- Add more subsections to main doc (#11758) · cebb96f5
  Patrick von Platen authored May 18, 2021
```
* add headers to main doc

* Apply suggestions from code review

* update

* upload
```
  cebb96f5
- Fix incorrect newline in #11650 (#11757) · da7e73b7
  Tommy Chiang authored May 18, 2021
  
  da7e73b7
17 May, 2021 1 commit
- Use new evaluation loop in TrainerQA (#11746) · 936b5715
  Sylvain Gugger authored May 17, 2021
  
  936b5715