Commits · 4002f95eb66622a048ff01a383968af57023134d · chenpangpang / transformers

29 Mar, 2021 6 commits

Remove duplicate code · 4002f95e
Sylvain Gugger authored Mar 29, 2021

4002f95e

Add `examples/run_ner_no_trainer.py` (#10902) · d7b50ce4

Daniel Stancl authored Mar 29, 2021

* Add NER example with accelerate library

* This commit contains the first (yet really unfinished)
version of a script for showing how to train HuggingFace model
with their new accelerate library.

* Fix metric calculation

* make style quality

* mv ner_no_trainer to token-classification dir

* Delete --debug flag from running script

* hf_datasets -> raw_datasets

* Make a few slight adjustments

* Add an informative comment + rewrite a help comment

* Change header

* Fix a few things

* Enforce to use fast tokenizers only

* DataCollatorWithPadding -> DataCollatorForTokenClassification

* Change bash script: python3 -> accelerate launch

* make style

* Add a few missing things (see below)

* Add a max-lenghth padding to predictions and labels to
enable accelerate gather functionality

* Add PyTorch no trainer example to the example README.md

* Remove --do-train from args as being redundant for now

* DataCollatorWithPadding -> DataCollatorForTokenClassification

* Remove some obsolete args.do_train conditions from the script

* Delete --do_train from bash running script

* Delete use_slow_tokenizer from args

* Add unintentionally removed flag --label_all_tokens

* Delete --debug flag from running script

d7b50ce4

Instantiate model only once in pipeline (#10888) · 06a6fea7

Sylvain Gugger authored Mar 29, 2021



* Instantiate model only once in pipeline

* Remove documentation of deprecated method

* Add FutureWarning

* Update src/transformers/pipelines/base.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

06a6fea7

Ignore not initialized NO_CONFIG_TOKENIZERs (#10936) · cc2366bb
Masatoshi Suzuki authored Mar 29, 2021

cc2366bb
Updated colab links in readme of examples (#10932) · ddea8771
WybeKoper authored Mar 29, 2021
```
Co-authored-by: WybeKoper <WybeKoper@users.noreply.github.com>
```
ddea8771
Return global attentions (see #7514) (#10906) · b3544e4c
Guillaume Filion authored Mar 29, 2021

b3544e4c

28 Mar, 2021 1 commit
- fixed finename (#10939) · 4f21e1dd
  Bhadresh Savani authored Mar 28, 2021
  
  4f21e1dd
26 Mar, 2021 3 commits

Add ImageFeatureExtractionMixin (#10905) · b0595d33

Sylvain Gugger authored Mar 26, 2021

* Add ImageFeatureExtractionMixin

* Add dummy vision objects

* Add require_vision

* Add tests

* Fix test

b0595d33

[vulnerability] fix dependency (#10914) · 3c27d246

Stas Bekman authored Mar 26, 2021

this PR fixes https://github.com/huggingface/transformers/security/dependabot/examples/research_projects/lxmert/requirements.txt/PyYAML/open

3c27d246

Rename NLP library to Datasets library (#10920) · 4b2b50aa
Tomy Hsieh authored Mar 26, 2021
```
* Rename NLP library to Datasets library

* Update github template

* Fix styling
```
4b2b50aa

25 Mar, 2021 7 commits

Fix comment (#10886) · 86c6f8a8
lexhuismans authored Mar 25, 2021

86c6f8a8
Reorder init imports · 9856c921
Sylvain Gugger authored Mar 25, 2021

9856c921
Fix typo · e70068a7
Sylvain Gugger authored Mar 25, 2021

e70068a7
Sort init imports · f183a7a3
Sylvain Gugger authored Mar 25, 2021

f183a7a3

Layout lm tf 2 (#10636) · 4684bfc7

Amir Tahmasbi authored Mar 25, 2021



* Added embeddings layer

* Added layoutlm layers, main model, maskedlm and token classification classes

* Added model classes to tf auto models

* Added model to PT to TF conversion script

* Added model to doc README

* Added tests

* Removed unused imports

* Added layoutlm model, test, and doc for sequence classification, and fix imports in __init__.py

* Made tests pass!

* Fixed typos in imports and docs

* Fixed a typo in embeddings layer

* Removed imports

* Fixed formatting issues, imports, tests

* Added layoutlm layers, main model, maskedlm and token classification classes

* Added model classes to tf auto models

* Added model to PT to TF conversion script

* Removed unused imports

* Added layoutlm model, test, and doc for sequence classification, and fix imports in __init__.py

* Made tests pass!

* Fixed typos in imports and docs

* Removed imports

* Fixed small formatting issues

* Removed duplicates import from main __init__.py

* Chnaged deafult arg to true for adding  pooling layer to tf layoutlm

* Fixed formatting issues

* Style

* Added copied from to classes copied from bert

* Fixed doc strings examples to work with layoutlm inputs

* Removed PyTorch reference in doc strings example

* Added integration tests

* Cleaned up initialization file

* Updated model checkpoint identifiers

* Fixed imports
Co-authored-by: Amir Tahmasbi <amir@ehsai.ca>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

4684bfc7

make local setup more clearer and added missing links (#10899) · 1a3e0c4f
Philipp Schmid authored Mar 25, 2021

1a3e0c4f
run_glue_no_trainer: datasets -> raw_datasets (#10898) · 5f1491d3
Jethro Kuan authored Mar 25, 2021
```
Use the correct variable (raw_datasets) instead of the module (datasets)
where appropriate.
```
5f1491d3

24 Mar, 2021 6 commits
- Update training args ignore_skip_data -> ignore_data_skip (#10891) · 1c06240e
  Sidd Karamcheti authored Mar 24, 2021
  
  1c06240e
- Remove version warning in pretrained BART models (#10890) · 3b20e910
  Sylvain Gugger authored Mar 24, 2021
```
* Remove version warning in pretrained BART models

* Put it at the base model
```
  3b20e910
- Fix overflowing bad word ids (#10889) · 3c12e3c1
  Lysandre Debut authored Mar 24, 2021
```
* Removes overflowing bad word IDs

* Raise warning
```
  3c12e3c1
- Add notebook on fine-tuning Bart (#10883) · 1f5ea9e0
  Eliza Szczechla authored Mar 24, 2021
```
Co-authored-by: Eliza <eliza@habanero.tiger.com.pl>
```
  1f5ea9e0
- error type of tokenizer in __init__ definition (#10879) · f81077fc
  imzhengzx authored Mar 24, 2021
```
the orignal code in line 246 is
```
  tokenizer: Optional["PreTrainedTokenizerBase"] = None,
```

it should be
```
  tokenizer: Optional[PreTrainedTokenizerBase] = None,
```
```
  f81077fc
- Add new notebook links in the docs (#10876) · 1aed2b90
  Sylvain Gugger authored Mar 24, 2021
  
  1aed2b90
23 Mar, 2021 12 commits
- Fix test_trainer_distributed (#10875) · a735f727
  Sylvain Gugger authored Mar 23, 2021
  
  a735f727
- Sm trainer smp init fix (#10870) · 8c297cdb
  Philipp Schmid authored Mar 23, 2021
```
* rewrote is_sagemaker_model_parallel_available

* added is_sagemaker_model_parallel_available to SageMakerTrainer

* removed unnecessary mp_parameters as TrainingArguments

* make style happy

* added mp_parameters again to parse mp-specific args.
```
  8c297cdb
- fixed prefix_allowed_tokens_fn docstring in generate() (#10862) · d4d4447d
  RafaelWO authored Mar 23, 2021
  
  d4d4447d
- [Examples] Added predict stage and Updated Example Template (#10868) · 7ef40120
  Bhadresh Savani authored Mar 23, 2021
```
* added predict stage

* added test keyword in exception message

* removed example specific saving predictions

* fixed f-string error

* removed extra line
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
```
  7ef40120
- [file_utils] import refactor (#10859) · fb2b8984
  Stas Bekman authored Mar 23, 2021
```
* import refactor

* fix the fallback
```
  fb2b8984
- Update stable docs · 3f48b2bc
  Lysandre authored Mar 23, 2021
  
  3f48b2bc
- Amazon SageMaker Documentation (#10867) · 77ffd5ed
  Philipp Schmid authored Mar 23, 2021
```
* added finished documentation

* changed version from 1.6 to 1.6.0 for distributed

* updated versions

* updated urls
```
  77ffd5ed
- Update the example template for a no Trainer option (#10865) · bf1f43fb
  Sylvain Gugger authored Mar 23, 2021
  
  bf1f43fb
- Fix p_mask cls token masking in qa pipeline (#10863) · 2eb596f0
  Marta Maślankowska authored Mar 23, 2021
  
  2eb596f0
- fixed typo (#10861) · eb330e89
  Bhadresh Savani authored Mar 23, 2021
  
  eb330e89
- fix nan in full-fp16 label_smoothing eval (#10815) · e21f89f6
  Stas Bekman authored Mar 22, 2021
  
  e21f89f6
- Make convert_to_onnx runable as script again (#10857) · b5b957a6
  Sylvain Gugger authored Mar 22, 2021
  
  b5b957a6
22 Mar, 2021 5 commits
- [Generate] Add save mode logits processor to remove nans and infs if necessary (#10769) · 77bf3fe7
  Patrick von Platen authored Mar 23, 2021
```
* push

* finish

* finish

* make fix copies

* change name
```
  77bf3fe7
- Use DataCollatorForSeq2Seq in run_summarization in all cases (#10856) · 9f8fa4e9
  Eliza Szczechla authored Mar 22, 2021
```
Co-authored-by: Eliza <eliza@habanero.tiger.com.pl>
```
  9f8fa4e9
- Modify the Trainer class to handle simultaneous execution of Ray Tune and Weights & Biases (#10823) · a8d4d677
  Ruan Chaves authored Mar 22, 2021
```
* Modify the _hp_search_setup method on the Trainer class to handle the wandb argument passed by Ray Tune to model config.

* Reformat single quotes as double quotes.
```
  a8d4d677
- feat(wandb): logging and configuration improvements (#10826) · 125ccead
  Boris Dayma authored Mar 22, 2021
```
* feat: ensure unique artifact id

* feat: allow manual init

* fix: simplify reinit logic

* fix: no dropped value + immediate commits

* fix: wandb use in sagemaker

* docs: improve documenation and formatting

* fix: typos

* docs: improve formatting
```
  125ccead
- Add simple one character fix so that on_step_begin and on_step_end are called... · b230181d
  Sidd Karamcheti authored Mar 22, 2021
```
Add simple one character fix so that on_step_begin and on_step_end are called at the right times (#10839)
```
  b230181d