Commits · ae6b6963adc75649e6c21b92b55cd9ff09f0a30f · chenpangpang / transformers

29 Mar, 2021 7 commits

Allow use of pre-computed lengths when grouping by length. (#10953) · ae6b6963

pcuenca authored Mar 29, 2021

A new argument `length_column_name` has been added to
`TrainingArguments`, with default value `"length"`. If this column
exists and `group_by_length` is `True`, the train sampler will use
it for grouping rather than computing it before training starts.

This is an optimization that allows the user to prepare data for fast
processing, preventing sequential access to the dataset as described in
issue #10909.

ae6b6963

Remove duplicate code · 4002f95e
Sylvain Gugger authored Mar 29, 2021

4002f95e

Add `examples/run_ner_no_trainer.py` (#10902) · d7b50ce4

Daniel Stancl authored Mar 29, 2021

* Add NER example with accelerate library

* This commit contains the first (yet really unfinished)
version of a script for showing how to train HuggingFace model
with their new accelerate library.

* Fix metric calculation

* make style quality

* mv ner_no_trainer to token-classification dir

* Delete --debug flag from running script

* hf_datasets -> raw_datasets

* Make a few slight adjustments

* Add an informative comment + rewrite a help comment

* Change header

* Fix a few things

* Enforce to use fast tokenizers only

* DataCollatorWithPadding -> DataCollatorForTokenClassification

* Change bash script: python3 -> accelerate launch

* make style

* Add a few missing things (see below)

* Add a max-lenghth padding to predictions and labels to
enable accelerate gather functionality

* Add PyTorch no trainer example to the example README.md

* Remove --do-train from args as being redundant for now

* DataCollatorWithPadding -> DataCollatorForTokenClassification

* Remove some obsolete args.do_train conditions from the script

* Delete --do_train from bash running script

* Delete use_slow_tokenizer from args

* Add unintentionally removed flag --label_all_tokens

* Delete --debug flag from running script

d7b50ce4

Instantiate model only once in pipeline (#10888) · 06a6fea7

Sylvain Gugger authored Mar 29, 2021



* Instantiate model only once in pipeline

* Remove documentation of deprecated method

* Add FutureWarning

* Update src/transformers/pipelines/base.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

06a6fea7

Ignore not initialized NO_CONFIG_TOKENIZERs (#10936) · cc2366bb
Masatoshi Suzuki authored Mar 29, 2021

cc2366bb
Updated colab links in readme of examples (#10932) · ddea8771
WybeKoper authored Mar 29, 2021
```
Co-authored-by: WybeKoper <WybeKoper@users.noreply.github.com>
```
ddea8771
Return global attentions (see #7514) (#10906) · b3544e4c
Guillaume Filion authored Mar 29, 2021

b3544e4c

28 Mar, 2021 1 commit
- fixed finename (#10939) · 4f21e1dd
  Bhadresh Savani authored Mar 28, 2021
  
  4f21e1dd
26 Mar, 2021 3 commits

Add ImageFeatureExtractionMixin (#10905) · b0595d33

Sylvain Gugger authored Mar 26, 2021

* Add ImageFeatureExtractionMixin

* Add dummy vision objects

* Add require_vision

* Add tests

* Fix test

b0595d33

[vulnerability] fix dependency (#10914) · 3c27d246

Stas Bekman authored Mar 26, 2021

this PR fixes https://github.com/huggingface/transformers/security/dependabot/examples/research_projects/lxmert/requirements.txt/PyYAML/open

3c27d246

Rename NLP library to Datasets library (#10920) · 4b2b50aa
Tomy Hsieh authored Mar 26, 2021
```
* Rename NLP library to Datasets library

* Update github template

* Fix styling
```
4b2b50aa

25 Mar, 2021 7 commits

Fix comment (#10886) · 86c6f8a8
lexhuismans authored Mar 25, 2021

86c6f8a8
Reorder init imports · 9856c921
Sylvain Gugger authored Mar 25, 2021

9856c921
Fix typo · e70068a7
Sylvain Gugger authored Mar 25, 2021

e70068a7
Sort init imports · f183a7a3
Sylvain Gugger authored Mar 25, 2021

f183a7a3

Layout lm tf 2 (#10636) · 4684bfc7

Amir Tahmasbi authored Mar 25, 2021



* Added embeddings layer

* Added layoutlm layers, main model, maskedlm and token classification classes

* Added model classes to tf auto models

* Added model to PT to TF conversion script

* Added model to doc README

* Added tests

* Removed unused imports

* Added layoutlm model, test, and doc for sequence classification, and fix imports in __init__.py

* Made tests pass!

* Fixed typos in imports and docs

* Fixed a typo in embeddings layer

* Removed imports

* Fixed formatting issues, imports, tests

* Added layoutlm layers, main model, maskedlm and token classification classes

* Added model classes to tf auto models

* Added model to PT to TF conversion script

* Removed unused imports

* Added layoutlm model, test, and doc for sequence classification, and fix imports in __init__.py

* Made tests pass!

* Fixed typos in imports and docs

* Removed imports

* Fixed small formatting issues

* Removed duplicates import from main __init__.py

* Chnaged deafult arg to true for adding  pooling layer to tf layoutlm

* Fixed formatting issues

* Style

* Added copied from to classes copied from bert

* Fixed doc strings examples to work with layoutlm inputs

* Removed PyTorch reference in doc strings example

* Added integration tests

* Cleaned up initialization file

* Updated model checkpoint identifiers

* Fixed imports
Co-authored-by: Amir Tahmasbi <amir@ehsai.ca>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

4684bfc7

make local setup more clearer and added missing links (#10899) · 1a3e0c4f
Philipp Schmid authored Mar 25, 2021

1a3e0c4f
run_glue_no_trainer: datasets -> raw_datasets (#10898) · 5f1491d3
Jethro Kuan authored Mar 25, 2021
```
Use the correct variable (raw_datasets) instead of the module (datasets)
where appropriate.
```
5f1491d3

24 Mar, 2021 6 commits
- Update training args ignore_skip_data -> ignore_data_skip (#10891) · 1c06240e
  Sidd Karamcheti authored Mar 24, 2021
  
  1c06240e
- Remove version warning in pretrained BART models (#10890) · 3b20e910
  Sylvain Gugger authored Mar 24, 2021
```
* Remove version warning in pretrained BART models

* Put it at the base model
```
  3b20e910
- Fix overflowing bad word ids (#10889) · 3c12e3c1
  Lysandre Debut authored Mar 24, 2021
```
* Removes overflowing bad word IDs

* Raise warning
```
  3c12e3c1
- Add notebook on fine-tuning Bart (#10883) · 1f5ea9e0
  Eliza Szczechla authored Mar 24, 2021
```
Co-authored-by: Eliza <eliza@habanero.tiger.com.pl>
```
  1f5ea9e0
- error type of tokenizer in __init__ definition (#10879) · f81077fc
  imzhengzx authored Mar 24, 2021
```
the orignal code in line 246 is
```
  tokenizer: Optional["PreTrainedTokenizerBase"] = None,
```

it should be
```
  tokenizer: Optional[PreTrainedTokenizerBase] = None,
```
```
  f81077fc
- Add new notebook links in the docs (#10876) · 1aed2b90
  Sylvain Gugger authored Mar 24, 2021
  
  1aed2b90
23 Mar, 2021 12 commits
- Fix test_trainer_distributed (#10875) · a735f727
  Sylvain Gugger authored Mar 23, 2021
  
  a735f727
- Sm trainer smp init fix (#10870) · 8c297cdb
  Philipp Schmid authored Mar 23, 2021
```
* rewrote is_sagemaker_model_parallel_available

* added is_sagemaker_model_parallel_available to SageMakerTrainer

* removed unnecessary mp_parameters as TrainingArguments

* make style happy

* added mp_parameters again to parse mp-specific args.
```
  8c297cdb
- fixed prefix_allowed_tokens_fn docstring in generate() (#10862) · d4d4447d
  RafaelWO authored Mar 23, 2021
  
  d4d4447d
- [Examples] Added predict stage and Updated Example Template (#10868) · 7ef40120
  Bhadresh Savani authored Mar 23, 2021
```
* added predict stage

* added test keyword in exception message

* removed example specific saving predictions

* fixed f-string error

* removed extra line
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
```
  7ef40120
- [file_utils] import refactor (#10859) · fb2b8984
  Stas Bekman authored Mar 23, 2021
```
* import refactor

* fix the fallback
```
  fb2b8984
- Update stable docs · 3f48b2bc
  Lysandre authored Mar 23, 2021
  
  3f48b2bc
- Amazon SageMaker Documentation (#10867) · 77ffd5ed
  Philipp Schmid authored Mar 23, 2021
```
* added finished documentation

* changed version from 1.6 to 1.6.0 for distributed

* updated versions

* updated urls
```
  77ffd5ed
- Update the example template for a no Trainer option (#10865) · bf1f43fb
  Sylvain Gugger authored Mar 23, 2021
  
  bf1f43fb
- Fix p_mask cls token masking in qa pipeline (#10863) · 2eb596f0
  Marta Maślankowska authored Mar 23, 2021
  
  2eb596f0
- fixed typo (#10861) · eb330e89
  Bhadresh Savani authored Mar 23, 2021
  
  eb330e89
- fix nan in full-fp16 label_smoothing eval (#10815) · e21f89f6
  Stas Bekman authored Mar 22, 2021
  
  e21f89f6
- Make convert_to_onnx runable as script again (#10857) · b5b957a6
  Sylvain Gugger authored Mar 22, 2021
  
  b5b957a6
22 Mar, 2021 4 commits

[Generate] Add save mode logits processor to remove nans and infs if necessary (#10769) · 77bf3fe7
Patrick von Platen authored Mar 23, 2021
```
* push

* finish

* finish

* make fix copies

* change name
```
77bf3fe7
Use DataCollatorForSeq2Seq in run_summarization in all cases (#10856) · 9f8fa4e9
Eliza Szczechla authored Mar 22, 2021
```
Co-authored-by: Eliza <eliza@habanero.tiger.com.pl>
```
9f8fa4e9

Modify the Trainer class to handle simultaneous execution of Ray Tune and Weights & Biases (#10823) · a8d4d677

Ruan Chaves authored Mar 22, 2021

* Modify the _hp_search_setup method on the Trainer class to handle the wandb argument passed by Ray Tune to model config.

* Reformat single quotes as double quotes.

a8d4d677

feat(wandb): logging and configuration improvements (#10826) · 125ccead

Boris Dayma authored Mar 22, 2021

* feat: ensure unique artifact id

* feat: allow manual init

* fix: simplify reinit logic

* fix: no dropped value + immediate commits

* fix: wandb use in sagemaker

* docs: improve documenation and formatting

* fix: typos

* docs: improve formatting

125ccead